For Programmers: Free Programming Magazines  


Home > Archive > AWK > December 2004 > Newbie needs a little kick start









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Newbie needs a little kick start
c0rN_g0aT

2004-11-24, 8:55 pm

I love GNU/Linux and I don't own a windows box. I do however work in a
windows envrionment and have no choice but to use M$ stuff. I found out
the other day that you can get all the GNU stuff like grep, cat and
gawk for Windows and I was excited try them out. Then I discovered that

I have playing with Linux too much and not learning it. At home I never
had to mine data out of a 10MB tab delimited text file. I can't seem to
get gawk to do what I need. I looked all over google and everything I
found was either too simple or way over my head. I need help.

Here is what I need to do -->

Text1.txt = a plain old list of machine names 7 characters long, one on
each line

Text2.txt = a giant tab delimited text file with multiple fields. There
are 4 fields which are

1. Machine Name, 2. SMS Site, 3. Last logged on user and 4. yes/no

Please show me how to use awk to take each machine name from text1.txt
and output all the lines in text2.txt that have a matching machine name.

Also I was thinking of buying the Oreilly book "sed & awk 2nd edition"
if this book sucks or if there is a much better one please let me know.

Thanks in advance for putting up with yet another helpless newbie.


James
Ed Morton

2004-11-24, 8:55 pm



c0rN_g0aT wrote:
<snip>
> Here is what I need to do -->
>
> Text1.txt = a plain old list of machine names 7 characters long, one on
> each line
>
> Text2.txt = a giant tab delimited text file with multiple fields. There
> are 4 fields which are
>
> 1. Machine Name, 2. SMS Site, 3. Last logged on user and 4. yes/no
>
> Please show me how to use awk to take each machine name from text1.txt
> and output all the lines in text2.txt that have a matching machine name.


I'm assuming the "1.", etc above were added by you for illustration and
that in the real Text2.txt, the machine name is just the first field. If
so, this'll do it in gawk:

gawk -F'\t' 'NR==FNR{names[$1]="";next}$1 in names' Text1.txt Text2.txt

> Also I was thinking of buying the Oreilly book "sed & awk 2nd edition"
> if this book sucks or if there is a much better one please let me know.


I'd just print off the GNU awk documentation from:

http://www.gnu.org/software/gawk/manual

Ed.

> Thanks in advance for putting up with yet another helpless newbie.
>
>
> James

Ted Davis

2004-11-25, 3:55 am

On Wed, 24 Nov 2004 18:41:28 -0500, c0rN_g0aT
<seasoned_crouton@yahoo.com> wrote:

>I love GNU/Linux and I don't own a windows box. I do however work in a
>windows envrionment and have no choice but to use M$ stuff. I found out
>the other day that you can get all the GNU stuff like grep, cat and
>gawk for Windows and I was excited try them out. Then I discovered that
>
>I have playing with Linux too much and not learning it. At home I never
>had to mine data out of a 10MB tab delimited text file. I can't seem to
>get gawk to do what I need. I looked all over google and everything I
>found was either too simple or way over my head. I need help.
>
>Here is what I need to do -->
>
>Text1.txt = a plain old list of machine names 7 characters long, one on
>each line
>
>Text2.txt = a giant tab delimited text file with multiple fields. There
>are 4 fields which are
>
>1. Machine Name, 2. SMS Site, 3. Last logged on user and 4. yes/no
>
>Please show me how to use awk to take each machine name from text1.txt
>and output all the lines in text2.txt that have a matching machine name.
>
>Also I was thinking of buying the Oreilly book "sed & awk 2nd edition"
>if this book sucks or if there is a much better one please let me know.
>
>Thanks in advance for putting up with yet another helpless newbie.


If you are working in NT/W2K/XP, the shell can handle this directly

for /f %A in (text1.txt) do (find "%A" < text2.txt > target.file)

Just as in Linux/Unix, the combination of shell commands or scripts
(use %% instead of % in FOR commands in batch files) and GNU (and
other) utilities are a powerful team, but sometimes single language
solutions are easier than mixed language ones.

BTW, I just more or less finished a mixed batch, gawk, and utilities
program that could take your list of machine names and produce
extensive reports on each by probing them remotely
(<http://gearbox.maem.umr.edu/batch/probe_pc.html) - it illustrates
many (heavily commented) batch and gawk techniques.


--
T.E.D. (tdavis@gearbox.maem.umr.edu)
Bob Harris

2004-11-25, 3:55 am

In article <8bydne9lB6xBgjjcRVn-iA@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:

> c0rN_g0aT wrote:
> <snip>
>
> I'm assuming the "1.", etc above were added by you for illustration and
> that in the real Text2.txt, the machine name is just the first field. If
> so, this'll do it in gawk:
>
> gawk -F'\t' 'NR==FNR{names[$1]="";next}$1 in names' Text1.txt Text2.txt


and because you said you were a 'newbie' this translates into
Use Tab as the separate -F'\t'

NR==FNR translates to if NR (record number) is equal to FNR (the current
file's record number). NR will keep increasing as each record is
processed. FNR will reset when the 2nd file is started, so NR==FNR only
while reading the first file (neat trick :-)

So if processing the first file, save the list of machine names in the
context addressable array names[$1]. The important think is not the
value in each entry, but rather the index. You can look at all the
values and index value for an array with something like
for( idx in names ) print idx, names[idx]

next - says skip rest of script since while processing the first file we
only want to capture the machine names.

$1 in names - the 'in' operator tests an index (in this case $1 which
should be your machine name) to see if it is and index in the array
names. The result will be true if it is an index, and false other wise.

If $1 in names is true, then the current line is printed. This happens
because the standard awk line is

pattern { action }

if pattern is missing, the result is assume to be true. if { action }
is missing, the default action is 'print' and the default operand to
print is $0.
[color=darkred]
>
> I'd just print off the GNU awk documentation from:
>
> http://www.gnu.org/software/gawk/manual
>
> Ed.
>

I would also take a look at the following web site. It is a short intro
to awk that covers a lot of awk features without being too long.

http://www.tru64unix.compaq.com/doc...N/V51B_HTML/ARH
9WBTE/WKXXXXXX.HTM

Bob Harris
Bob Harris

2004-11-29, 3:58 pm

In article <8bydne9lB6xBgjjcRVn-iA@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:

> c0rN_g0aT wrote:
> <snip>
>
> I'm assuming the "1.", etc above were added by you for illustration and
> that in the real Text2.txt, the machine name is just the first field. If
> so, this'll do it in gawk:
>
> gawk -F'\t' 'NR==FNR{names[$1]="";next}$1 in names' Text1.txt Text2.txt


and because you said you were a 'newbie' this translates into
Use Tab as the separate -F'\t'

NR==FNR translates to if NR (record number) is equal to FNR (the current
file's record number). NR will keep increasing as each record is
processed. FNR will reset when the 2nd file is started, so NR==FNR only
while reading the first file (neat trick :-)

So if processing the first file, save the list of machine names in the
context addressable array names[$1]. The important think is not the
value in each entry, but rather the index. You can look at all the
values and index value for an array with something like
for( idx in names ) print idx, names[idx]

next - says skip rest of script since while processing the first file we
only want to capture the machine names.

$1 in names - the 'in' operator tests an index (in this case $1 which
should be your machine name) to see if it is and index in the array
names. The result will be true if it is an index, and false other wise.

If $1 in names is true, then the current line is printed. This happens
because the standard awk line is

pattern { action }

if pattern is missing, the result is assume to be true. if { action }
is missing, the default action is 'print' and the default operand to
print is $0.
[color=darkred]
>
> I'd just print off the GNU awk documentation from:
>
> http://www.gnu.org/software/gawk/manual
>
> Ed.
>

I would also take a look at the following web site. It is a short intro
to awk that covers a lot of awk features without being too long.

http://www.tru64unix.compaq.com/doc...N/V51B_HTML/ARH
9WBTE/WKXXXXXX.HTM

Bob Harris
Ed Morton

2004-11-29, 3:58 pm



c0rN_g0aT wrote:
<snip>
> Here is what I need to do -->
>
> Text1.txt = a plain old list of machine names 7 characters long, one on
> each line
>
> Text2.txt = a giant tab delimited text file with multiple fields. There
> are 4 fields which are
>
> 1. Machine Name, 2. SMS Site, 3. Last logged on user and 4. yes/no
>
> Please show me how to use awk to take each machine name from text1.txt
> and output all the lines in text2.txt that have a matching machine name.


I'm assuming the "1.", etc above were added by you for illustration and
that in the real Text2.txt, the machine name is just the first field. If
so, this'll do it in gawk:

gawk -F'\t' 'NR==FNR{names[$1]="";next}$1 in names' Text1.txt Text2.txt

> Also I was thinking of buying the Oreilly book "sed & awk 2nd edition"
> if this book sucks or if there is a much better one please let me know.


I'd just print off the GNU awk documentation from:

http://www.gnu.org/software/gawk/manual

Ed.

> Thanks in advance for putting up with yet another helpless newbie.
>
>
> James

Ted Davis

2004-11-29, 3:58 pm

On Wed, 24 Nov 2004 18:41:28 -0500, c0rN_g0aT
<seasoned_crouton@yahoo.com> wrote:

>I love GNU/Linux and I don't own a windows box. I do however work in a
>windows envrionment and have no choice but to use M$ stuff. I found out
>the other day that you can get all the GNU stuff like grep, cat and
>gawk for Windows and I was excited try them out. Then I discovered that
>
>I have playing with Linux too much and not learning it. At home I never
>had to mine data out of a 10MB tab delimited text file. I can't seem to
>get gawk to do what I need. I looked all over google and everything I
>found was either too simple or way over my head. I need help.
>
>Here is what I need to do -->
>
>Text1.txt = a plain old list of machine names 7 characters long, one on
>each line
>
>Text2.txt = a giant tab delimited text file with multiple fields. There
>are 4 fields which are
>
>1. Machine Name, 2. SMS Site, 3. Last logged on user and 4. yes/no
>
>Please show me how to use awk to take each machine name from text1.txt
>and output all the lines in text2.txt that have a matching machine name.
>
>Also I was thinking of buying the Oreilly book "sed & awk 2nd edition"
>if this book sucks or if there is a much better one please let me know.
>
>Thanks in advance for putting up with yet another helpless newbie.


If you are working in NT/W2K/XP, the shell can handle this directly

for /f %A in (text1.txt) do (find "%A" < text2.txt > target.file)

Just as in Linux/Unix, the combination of shell commands or scripts
(use %% instead of % in FOR commands in batch files) and GNU (and
other) utilities are a powerful team, but sometimes single language
solutions are easier than mixed language ones.

BTW, I just more or less finished a mixed batch, gawk, and utilities
program that could take your list of machine names and produce
extensive reports on each by probing them remotely
(<http://gearbox.maem.umr.edu/batch/probe_pc.html) - it illustrates
many (heavily commented) batch and gawk techniques.


--
T.E.D. (tdavis@gearbox.maem.umr.edu)
Dave Thompson

2004-12-01, 3:55 am

On Wed, 24 Nov 2004 19:34:05 -0600, Ted Davis
<tdavis@gearbox.maem.umr.edu> wrote:

> On Wed, 24 Nov 2004 18:41:28 -0500, c0rN_g0aT
> <seasoned_crouton@yahoo.com> wrote:

<snip>
<snip>[color=darkred]
> If you are working in NT/W2K/XP, the shell can handle this directly
>
> for /f %A in (text1.txt) do (find "%A" < text2.txt > target.file)
>

In addition to being offtopic, of course ...

That needs a >> on the output file. At least on XP 5.1.2600(.0)
CMD.EXE 2002-08-29 07:00. You don't actually need to parenthesise a
single command as the body of a for, although it doesn't hurt and is a
good habit in general and arguably clearer.

And it matches each pattern (machine name) anywhere on a line not just
in field 1, including only partially e.g. foo matches anyfooishnode.

If those are close enough, and also order of machine names in the
output doesn't matter or is already correct in file2 or can be
post-sorted so, you can do it in a single execution of Unixoid grep
(which OP already mentioned) with -f, and preferably fgrep or -F.
(Although in the mingw version I have to hand, MSYS 1.0.9 on I think
3.1, -f file only works if file has LF line endings, not CRLF as
Windows files typically do. -f - <file does work. Bleah.)

> Just as in Linux/Unix, the combination of shell commands or scripts
> (use %% instead of % in FOR commands in batch files) and GNU (and
> other) utilities are a powerful team, but sometimes single language
> solutions are easier than mixed language ones.
>

I would say *almost* as in Unix; the various slightly incompatible
Windows command interpreters have IME&HO more gotchas to learn and
remember. But they can do *some* of the good things Unix shells do,
and it doesn't hurt to be reminded of that every now and again.

- David.Thompson1 at worldnet.att.net
Dave Thompson

2004-12-07, 3:59 am

On Wed, 24 Nov 2004 19:34:05 -0600, Ted Davis
<tdavis@gearbox.maem.umr.edu> wrote:

> On Wed, 24 Nov 2004 18:41:28 -0500, c0rN_g0aT
> <seasoned_crouton@yahoo.com> wrote:

<snip>
<snip>[color=darkred]
> If you are working in NT/W2K/XP, the shell can handle this directly
>
> for /f %A in (text1.txt) do (find "%A" < text2.txt > target.file)
>

In addition to being offtopic, of course ...

That needs a >> on the output file. At least on XP 5.1.2600(.0)
CMD.EXE 2002-08-29 07:00. You don't actually need to parenthesise a
single command as the body of a for, although it doesn't hurt and is a
good habit in general and arguably clearer.

And it matches each pattern (machine name) anywhere on a line not just
in field 1, including only partially e.g. foo matches anyfooishnode.

If those are close enough, and also order of machine names in the
output doesn't matter or is already correct in file2 or can be
post-sorted so, you can do it in a single execution of Unixoid grep
(which OP already mentioned) with -f, and preferably fgrep or -F.
(Although in the mingw version I have to hand, MSYS 1.0.9 on I think
3.1, -f file only works if file has LF line endings, not CRLF as
Windows files typically do. -f - <file does work. Bleah.)

> Just as in Linux/Unix, the combination of shell commands or scripts
> (use %% instead of % in FOR commands in batch files) and GNU (and
> other) utilities are a powerful team, but sometimes single language
> solutions are easier than mixed language ones.
>

I would say *almost* as in Unix; the various slightly incompatible
Windows command interpreters have IME&HO more gotchas to learn and
remember. But they can do *some* of the good things Unix shells do,
and it doesn't hurt to be reminded of that every now and again.

- David.Thompson1 at worldnet.att.net
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com