Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Newbie needs a little kick start
I love GNU/Linux and I don't own a windows box.  I do however work in a
windows envrionment and have no choice but to use M$ stuff.  I found out
the other day that you can get all the  GNU stuff like grep, cat and
gawk for Windows and I was excited try them out.  Then I discovered that

I have playing with Linux too much and not learning it.  At home I never
had to mine data out of a 10MB tab delimited text file.  I can't seem to
get gawk to do what I need.  I looked all over google and everything I
found was either too simple or way over my head.  I need help.

Here is what I need to do -->

Text1.txt = a plain old list of machine names 7 characters long, one on
each line

Text2.txt = a giant tab delimited text file with multiple fields.  There
are 4 fields which are

1.  Machine Name, 2.  SMS Site,  3. Last logged on user and 4. yes/no

Please show me how to use awk to take each machine name from text1.txt
and output all the lines in text2.txt that have a matching machine name.

Also I was thinking of buying the Oreilly book "sed & awk 2nd edition"
if this book sucks or if there is a much better one please let me know.

Thanks in advance for putting up with yet another helpless newbie.


James

Report this thread to moderator Post Follow-up to this message
Old Post
c0rN_g0aT
11-25-04 01:55 AM


Re: Newbie needs a little kick start

c0rN_g0aT wrote:
<snip>
> Here is what I need to do -->
>
> Text1.txt = a plain old list of machine names 7 characters long, one on
> each line
>
> Text2.txt = a giant tab delimited text file with multiple fields.  There
> are 4 fields which are
>
> 1.  Machine Name, 2.  SMS Site,  3. Last logged on user and 4. yes/no
>
> Please show me how to use awk to take each machine name from text1.txt
> and output all the lines in text2.txt that have a matching machine name.

I'm assuming the "1.", etc above were added by you for illustration and
that in the real Text2.txt, the machine name is just the first field. If
so, this'll do it in gawk:

gawk -F'\t' 'NR==FNR{names[$1]="";next}$1 in names' Text1.txt Text2.txt

> Also I was thinking of buying the Oreilly book "sed & awk 2nd edition"
> if this book sucks or if there is a much better one please let me know.

I'd just print off the GNU awk documentation from:

http://www.gnu.org/software/gawk/manual

Ed.

> Thanks in advance for putting up with yet another helpless newbie.
>
>
> James

Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
11-25-04 01:55 AM


Re: Newbie needs a little kick start
On Wed, 24 Nov 2004 18:41:28 -0500, c0rN_g0aT
<seasoned_crouton@yahoo.com> wrote:

>I love GNU/Linux and I don't own a windows box.  I do however work in a
>windows envrionment and have no choice but to use M$ stuff.  I found out
>the other day that you can get all the  GNU stuff like grep, cat and
>gawk for Windows and I was excited try them out.  Then I discovered that
>
>I have playing with Linux too much and not learning it.  At home I never
>had to mine data out of a 10MB tab delimited text file.  I can't seem to
>get gawk to do what I need.  I looked all over google and everything I
>found was either too simple or way over my head.  I need help.
>
>Here is what I need to do -->
>
>Text1.txt = a plain old list of machine names 7 characters long, one on
>each line
>
>Text2.txt = a giant tab delimited text file with multiple fields.  There
>are 4 fields which are
>
>1.  Machine Name, 2.  SMS Site,  3. Last logged on user and 4. yes/no
>
>Please show me how to use awk to take each machine name from text1.txt
>and output all the lines in text2.txt that have a matching machine name.
>
>Also I was thinking of buying the Oreilly book "sed & awk 2nd edition"
>if this book sucks or if there is a much better one please let me know.
>
>Thanks in advance for putting up with yet another helpless newbie.

If you are working in NT/W2K/XP, the shell can handle this directly

for /f %A in (text1.txt) do (find "%A" < text2.txt > target.file)

Just as in Linux/Unix, the combination of shell commands or scripts
(use %% instead of % in FOR commands in batch files) and GNU (and
other) utilities are a powerful team, but sometimes single language
solutions are easier than mixed language ones.

BTW, I just more or less finished a mixed batch, gawk, and utilities
program that could take your list of machine names and produce
extensive reports on each by probing them remotely
(<http://gearbox.maem.umr.edu/batch/probe_pc.html) - it illustrates
many (heavily commented) batch and gawk techniques.


--
T.E.D. (tdavis@gearbox.maem.umr.edu)

Report this thread to moderator Post Follow-up to this message
Old Post
Ted Davis
11-25-04 08:55 AM


Re: Newbie needs a little kick start
In article <8bydne9lB6xBgjjcRVn-iA@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:

> c0rN_g0aT wrote:
> <snip> 
>
> I'm assuming the "1.", etc above were added by you for illustration and
> that in the real Text2.txt, the machine name is just the first field. If
> so, this'll do it in gawk:
>
> gawk -F'\t' 'NR==FNR{names[$1]="";next}$1 in names' Text1.txt Text2.txt

and because you said you were a 'newbie' this translates into
Use Tab as the separate -F'\t'

NR==FNR translates to if NR (record number) is equal to FNR (the current
file's record number).  NR will keep increasing as each record is
processed.  FNR will reset when the 2nd file is started, so NR==FNR only
while reading the first file (neat trick :-)

So if processing the first file, save the list of machine names in the
context addressable array names[$1].  The important think is not the
value in each entry, but rather the index.  You can look at all the
values and index value for an array with something like
for( idx in names ) print idx, names[idx]

next - says skip rest of script since while processing the first file we
only want to capture the machine names.

$1 in names - the 'in' operator tests an index (in this case $1 which
should be your machine name) to see if it is and index in the array
names.  The result will be true if it is an index, and false other wise.

If $1 in names is true, then the current line is printed.  This happens
because the standard awk line is

pattern { action }

if pattern is missing, the result is assume to be true.  if { action }
is missing, the default action is 'print' and the default operand to
print is $0.
 
>
> I'd just print off the GNU awk documentation from:
>
> http://www.gnu.org/software/gawk/manual
>
> 	Ed.
> 

I would also take a look at the following web site.  It is a short intro
to awk that covers a lot of awk features without being too long.

http://www.tru64unix.compaq.com/doc...N/V51B_HTML/ARH
9WBTE/WKXXXXXX.HTM

Bob Harris

Report this thread to moderator Post Follow-up to this message
Old Post
Bob Harris
11-25-04 08:55 AM


Re: Newbie needs a little kick start
In article <8bydne9lB6xBgjjcRVn-iA@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:

> c0rN_g0aT wrote:
> <snip> 
>
> I'm assuming the "1.", etc above were added by you for illustration and
> that in the real Text2.txt, the machine name is just the first field. If
> so, this'll do it in gawk:
>
> gawk -F'\t' 'NR==FNR{names[$1]="";next}$1 in names' Text1.txt Text2.txt

and because you said you were a 'newbie' this translates into
Use Tab as the separate -F'\t'

NR==FNR translates to if NR (record number) is equal to FNR (the current
file's record number).  NR will keep increasing as each record is
processed.  FNR will reset when the 2nd file is started, so NR==FNR only
while reading the first file (neat trick :-)

So if processing the first file, save the list of machine names in the
context addressable array names[$1].  The important think is not the
value in each entry, but rather the index.  You can look at all the
values and index value for an array with something like
for( idx in names ) print idx, names[idx]

next - says skip rest of script since while processing the first file we
only want to capture the machine names.

$1 in names - the 'in' operator tests an index (in this case $1 which
should be your machine name) to see if it is and index in the array
names.  The result will be true if it is an index, and false other wise.

If $1 in names is true, then the current line is printed.  This happens
because the standard awk line is

pattern { action }

if pattern is missing, the result is assume to be true.  if { action }
is missing, the default action is 'print' and the default operand to
print is $0.
 
>
> I'd just print off the GNU awk documentation from:
>
> http://www.gnu.org/software/gawk/manual
>
> 	Ed.
> 

I would also take a look at the following web site.  It is a short intro
to awk that covers a lot of awk features without being too long.

http://www.tru64unix.compaq.com/doc...N/V51B_HTML/ARH
9WBTE/WKXXXXXX.HTM

Bob Harris

Report this thread to moderator Post Follow-up to this message
Old Post
Bob Harris
11-29-04 08:58 PM


Re: Newbie needs a little kick start

c0rN_g0aT wrote:
<snip>
> Here is what I need to do -->
>
> Text1.txt = a plain old list of machine names 7 characters long, one on
> each line
>
> Text2.txt = a giant tab delimited text file with multiple fields.  There
> are 4 fields which are
>
> 1.  Machine Name, 2.  SMS Site,  3. Last logged on user and 4. yes/no
>
> Please show me how to use awk to take each machine name from text1.txt
> and output all the lines in text2.txt that have a matching machine name.

I'm assuming the "1.", etc above were added by you for illustration and
that in the real Text2.txt, the machine name is just the first field. If
so, this'll do it in gawk:

gawk -F'\t' 'NR==FNR{names[$1]="";next}$1 in names' Text1.txt Text2.txt

> Also I was thinking of buying the Oreilly book "sed & awk 2nd edition"
> if this book sucks or if there is a much better one please let me know.

I'd just print off the GNU awk documentation from:

http://www.gnu.org/software/gawk/manual

Ed.

> Thanks in advance for putting up with yet another helpless newbie.
>
>
> James

Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
11-29-04 08:58 PM


Re: Newbie needs a little kick start
On Wed, 24 Nov 2004 18:41:28 -0500, c0rN_g0aT
<seasoned_crouton@yahoo.com> wrote:

>I love GNU/Linux and I don't own a windows box.  I do however work in a
>windows envrionment and have no choice but to use M$ stuff.  I found out
>the other day that you can get all the  GNU stuff like grep, cat and
>gawk for Windows and I was excited try them out.  Then I discovered that
>
>I have playing with Linux too much and not learning it.  At home I never
>had to mine data out of a 10MB tab delimited text file.  I can't seem to
>get gawk to do what I need.  I looked all over google and everything I
>found was either too simple or way over my head.  I need help.
>
>Here is what I need to do -->
>
>Text1.txt = a plain old list of machine names 7 characters long, one on
>each line
>
>Text2.txt = a giant tab delimited text file with multiple fields.  There
>are 4 fields which are
>
>1.  Machine Name, 2.  SMS Site,  3. Last logged on user and 4. yes/no
>
>Please show me how to use awk to take each machine name from text1.txt
>and output all the lines in text2.txt that have a matching machine name.
>
>Also I was thinking of buying the Oreilly book "sed & awk 2nd edition"
>if this book sucks or if there is a much better one please let me know.
>
>Thanks in advance for putting up with yet another helpless newbie.

If you are working in NT/W2K/XP, the shell can handle this directly

for /f %A in (text1.txt) do (find "%A" < text2.txt > target.file)

Just as in Linux/Unix, the combination of shell commands or scripts
(use %% instead of % in FOR commands in batch files) and GNU (and
other) utilities are a powerful team, but sometimes single language
solutions are easier than mixed language ones.

BTW, I just more or less finished a mixed batch, gawk, and utilities
program that could take your list of machine names and produce
extensive reports on each by probing them remotely
(<http://gearbox.maem.umr.edu/batch/probe_pc.html) - it illustrates
many (heavily commented) batch and gawk techniques.


--
T.E.D. (tdavis@gearbox.maem.umr.edu)

Report this thread to moderator Post Follow-up to this message
Old Post
Ted Davis
11-29-04 08:58 PM


Re: Newbie needs a little kick start
On Wed, 24 Nov 2004 19:34:05 -0600, Ted Davis
<tdavis@gearbox.maem.umr.edu> wrote:

> On Wed, 24 Nov 2004 18:41:28 -0500, c0rN_g0aT
> <seasoned_crouton@yahoo.com> wrote:
<snip> 
<snip>
> If you are working in NT/W2K/XP, the shell can handle this directly
>
>   for /f %A in (text1.txt) do (find "%A" < text2.txt > target.file)
>
In addition to being offtopic, of course ...

That needs a >> on the output file. At least on XP 5.1.2600(.0)
CMD.EXE 2002-08-29  07:00. You don't actually need to parenthesise a
single command as the body of a for, although it doesn't hurt and is a
good habit in general and arguably clearer.

And it matches each pattern (machine name) anywhere on a line not just
in field 1, including only partially e.g. foo matches anyfooishnode.

If those are close enough, and also order of machine names in the
output doesn't matter or is already correct in file2 or can be
post-sorted so, you can do it in a single execution of Unixoid grep
(which OP already mentioned) with -f, and preferably fgrep or -F.
(Although in the mingw version I have to hand, MSYS 1.0.9 on I think
3.1, -f file only works if file has LF line endings, not CRLF as
Windows files typically do. -f - <file does work. Bleah.)

> Just as in Linux/Unix, the combination of shell commands or scripts
> (use %% instead of % in FOR commands in batch files) and GNU (and
> other) utilities are a powerful team, but sometimes single language
> solutions are easier than mixed language ones.
>
I would say *almost* as in Unix; the various slightly incompatible
Windows command interpreters have IME&HO more gotchas to learn and
remember. But they can do *some* of the good things Unix shells do,
and it doesn't hurt to be reminded of that every now and again.

- David.Thompson1 at worldnet.att.net

Report this thread to moderator Post Follow-up to this message
Old Post
Dave Thompson
12-01-04 08:55 AM


Re: Newbie needs a little kick start
On Wed, 24 Nov 2004 19:34:05 -0600, Ted Davis
<tdavis@gearbox.maem.umr.edu> wrote:

> On Wed, 24 Nov 2004 18:41:28 -0500, c0rN_g0aT
> <seasoned_crouton@yahoo.com> wrote:
<snip> 
<snip>
> If you are working in NT/W2K/XP, the shell can handle this directly
>
>   for /f %A in (text1.txt) do (find "%A" < text2.txt > target.file)
>
In addition to being offtopic, of course ...

That needs a >> on the output file. At least on XP 5.1.2600(.0)
CMD.EXE 2002-08-29  07:00. You don't actually need to parenthesise a
single command as the body of a for, although it doesn't hurt and is a
good habit in general and arguably clearer.

And it matches each pattern (machine name) anywhere on a line not just
in field 1, including only partially e.g. foo matches anyfooishnode.

If those are close enough, and also order of machine names in the
output doesn't matter or is already correct in file2 or can be
post-sorted so, you can do it in a single execution of Unixoid grep
(which OP already mentioned) with -f, and preferably fgrep or -F.
(Although in the mingw version I have to hand, MSYS 1.0.9 on I think
3.1, -f file only works if file has LF line endings, not CRLF as
Windows files typically do. -f - <file does work. Bleah.)

> Just as in Linux/Unix, the combination of shell commands or scripts
> (use %% instead of % in FOR commands in batch files) and GNU (and
> other) utilities are a powerful team, but sometimes single language
> solutions are easier than mixed language ones.
>
I would say *almost* as in Unix; the various slightly incompatible
Windows command interpreters have IME&HO more gotchas to learn and
remember. But they can do *some* of the good things Unix shells do,
and it doesn't hurt to be reminded of that every now and again.

- David.Thompson1 at worldnet.att.net

Report this thread to moderator Post Follow-up to this message
Old Post
Dave Thompson
12-07-04 08:59 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

AWK archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 06:54 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.