For Programmers: Free Programming Magazines  


Home > Archive > AWK > March 2006 > newbie help with first program









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author newbie help with first program
Fred

2006-02-24, 6:56 pm

Hi,

What I would like to do is : read a file and put it in an array.
Use the array and depending on condition adding lines and such. I'm
able to read the file into the array but once I try to put simple code
in the action block it keeps looping on itself (once I hit ENTER) and I
don't understand why. I know I should not use getline but I found no
other way of putting the file into the array. I'm using my script with
gawk and the -f option.

Regards,

BEGIN\
{
list_file=ARGV[1]

for(x=1; (getline var[x] < list_file) > 0; x++)
{print ""
}
close(ARGV[1])
ARGV[1] = ""
}
{
i==1
for(i=1; i<=x;i++)
{print var[i]
}
}

Jürgen Kahrs

2006-02-24, 6:56 pm

Fred wrote:

> BEGIN\
> {
> list_file=ARGV[1]
>
> for(x=1; (getline var[x] < list_file) > 0; x++)
> {print ""
> }
> close(ARGV[1])
> ARGV[1] = ""
> }
> {
> i==1
> for(i=1; i<=x;i++)
> {print var[i]
> }
> }
>


Almost everything that you do will be done by
AWK as a default action.

{var[NR] = $1; print var[NR] }

You already know other imperative programming languages.
Dont try to impose the C-way of proceding upon the AWK
interpreter. Let AWK do the processing in a way that
AWK was made for: line-by-line reading and applying
actions if the patterns are true.
Ted Davis

2006-02-24, 6:56 pm

On 24 Feb 2006 12:43:27 -0800, "Fred" <brydhenn@yahoo.com> wrote:

>Hi,
>
> What I would like to do is : read a file and put it in an array.
>Use the array and depending on condition adding lines and such. I'm
>able to read the file into the array but once I try to put simple code
>in the action block it keeps looping on itself (once I hit ENTER) and I
>don't understand why. I know I should not use getline but I found no
>other way of putting the file into the array. I'm using my script with
>gawk and the -f option.
>
>Regards,
>
>BEGIN\
>{
>list_file=ARGV[1]
>
> for(x=1; (getline var[x] < list_file) > 0; x++)
> {print ""
> }
> close(ARGV[1])
> ARGV[1] = ""
>}
>{
> i==1
> for(i=1; i<=x;i++)
> {print var[i]
> }
>}



Whew! I don't have time to go into what all is wrong with that. I
will say that when you delete a command line argument, you have to
decrement ARGC. Actually, I'm suprised you get anything besides error
messages with that.

{
Array[ NR ] = $0
}
END{
for( x = 1; x <= NR; x++) print Array[ x ]
}
--
T.E.D. (tdavis@gearbox.maem.umr.edu)
SPAM filter: Messages to this address *must* contain "T.E.D."
somewhere in the body or they will be automatically rejected.
William James

2006-02-24, 6:56 pm


Fred wrote:
> Hi,
>
> What I would like to do is : read a file and put it in an array.
> Use the array and depending on condition adding lines and such. I'm
> able to read the file into the array but once I try to put simple code
> in the action block it keeps looping on itself (once I hit ENTER) and I
> don't understand why. I know I should not use getline but I found no
> other way of putting the file into the array. I'm using my script with
> gawk and the -f option.
>
> Regards,
>
> BEGIN\
> {
> list_file=ARGV[1]
>
> for(x=1; (getline var[x] < list_file) > 0; x++)
> {print ""
> }
> close(ARGV[1])
> ARGV[1] = ""
> }
> {
> i==1
> for(i=1; i<=x;i++)
> {print var[i]
> }
> }


{ var[NR] = $0 }
END \
{ for (i=1; i in var; i++)
print var[i]
}

Grant

2006-02-24, 6:56 pm

On 24 Feb 2006 12:43:27 -0800, "Fred" <brydhenn@yahoo.com> wrote:

> What I would like to do is : read a file and put it in an array.


Here's a working example of loading a database table:

function read_bogon_data( k)
{
if (!quiet) printf "%s reading %s", strftime("%F-%T"), bogonfil
if ((getline < bogonfil) > 0) {
if ($2 ~ /List/ && $5 ~ /bogons/) {
bgsize = 0
while ((getline < bogonfil) > 0) {
if (/^$/) continue
if (/^#.*$/) continue
# format: 0.0.0.0/7
split($0, k, "/")
bogondat[++bgsize] = sprintf("%10d %d", \
dotquad2bin(k[1]), k[2])
}
if (!quiet) printf ", %d records.\n", bgsize
}
else {
if (!quiet) print " datafile not recognised"; exit 2
}
}
else {
if (!quiet) print " datafile not found"; exit 3
}
close(bogonfil)
if (dataload == 2) exit 1 # for timing database load
}
context: http://bugsplatter.mine.nu/junk/ip2country-server

The other answers I saw assume the primary purpose of program
is to load the array. But then, I dislike playing games with
ARG? Also new to awk ;) Feel free to correct errors.

Grant.
--
.... The computer scientist, who had listened to all of this said,
"Yes, but where do you think the chaos came from?"
William James

2006-02-24, 6:56 pm

Grant wrote:
> On 24 Feb 2006 12:43:27 -0800, "Fred" <brydhenn@yahoo.com> wrote:
>
>
> Here's a working example of loading a database table:
>
> function read_bogon_data( k)
> {
> if (!quiet) printf "%s reading %s", strftime("%F-%T"), bogonfil
> if ((getline < bogonfil) > 0) {
> if ($2 ~ /List/ && $5 ~ /bogons/) {
> bgsize = 0
> while ((getline < bogonfil) > 0) {
> if (/^$/) continue
> if (/^#.*$/) continue
> # format: 0.0.0.0/7
> split($0, k, "/")
> bogondat[++bgsize] = sprintf("%10d %d", \
> dotquad2bin(k[1]), k[2])
> }
> if (!quiet) printf ", %d records.\n", bgsize
> }
> else {
> if (!quiet) print " datafile not recognised"; exit 2
> }
> }
> else {
> if (!quiet) print " datafile not found"; exit 3
> }
> close(bogonfil)
> if (dataload == 2) exit 1 # for timing database load
> }


For the following reasons, I believe this is a bad example
for a beginner.

1. It uses getline.
2. It uses extreme indentation that makes the code harder to read.
Two spaces would be sufficient.
3. It uses too many global variables.
4. It uses printf too often. Example:
printf ", %d records.\n", bgsize
should be
print ",", bgsize, "records."
5. Variables are poorly named. Example:
bogonfil
should be something like
bogon_file

Grant

2006-02-24, 6:56 pm

On 24 Feb 2006 14:46:22 -0800, "William James" <w_a_x_man@yahoo.com> wrote:

>Grant wrote:

....
>
>For the following reasons, I believe this is a bad example
>for a beginner.
>
>1. It uses getline.

So what's wrong with getline? I think stuffing around with awk's
CLI parameter filenames for secondary data input is silly ;) six
vs half a dozen?

>2. It uses extreme indentation that makes the code harder to read.
> Two spaces would be sufficient.

I'm used to linux CodingStyle -- personal taste (8x tabs got converted
to spaces 'cos I copied from terminal) So these days I use CodingStyle
in awk, bash, and C -- I like the discipline large indents enforce,
and agree with Torvalds' arguments supporting large indents.

But then a few years ago I wrote an online info-system in perl with
2 space indents, attitudes change.

>3. It uses too many global variables.


Could have used a couple call parameters (filename and test_level)
then only expose datastore name and size as globals -- agree.

>4. It uses printf too often. Example:
> printf ", %d records.\n", bgsize
> should be
> print ",", bgsize, "records."

Disagree, your way cannot do format control, consistent coding means
to use similar constructs for similar tasks, so I choose printf for
mixed constant + var output.

You're advocating printing one way with variables, and another way
if you need to control output formatting here?

printf (sprintf) is a common well known method that transfers between
awk, bash and C (what I write in these days), I like it for that reason,
consistency in coding style -- my old eyes got used to it.

>5. Variables are poorly named. Example:
> bogonfil
> should be something like
> bogon_file


Sure, it could also have been passed as "f" call parameter ;)

Thanks for your comments.

Grant.
--
.... The computer scientist, who had listened to all of this said,
"Yes, but where do you think the chaos came from?"
Fred

2006-02-25, 3:55 am

Thanks for your help and prompt replies. But what should I do if I
need to insert a a later time than the current line i.e.

{
{FIELDWIDTHS = "8 64"}
{ if ($1 == "PBAR* ")
# need to insert a line but not now should be at "current line +
4"

else print $1,$2
}
}

thanks

Joel Reicher

2006-02-25, 3:55 am

"Fred" <brydhenn@yahoo.com> writes:

> Thanks for your help and prompt replies. But what should I do if I
> need to insert a a later time than the current line i.e.
>
> {
> {FIELDWIDTHS = "8 64"}


What is that supposed to do?

> { if ($1 == "PBAR* ")
> # need to insert a line but not now should be at "current line +
> 4"
>
> else print $1,$2
> }
> }


You're not editing the file, so in no sense at all are you doing an
"insert". If what you mean is that you want to print out the file, but
that at some point you want to print out a line before the current
line in the file, then do *exactly* that:

(condition for line following extra line) {
print extra line
}

{
print normal line
}

Reverse the blocks if you want the extra line coming after the line
that satisfies the condition.

If what you want is to print the extra line *instead* of the normal
line, then do an if-else in a single action block.

Hope that helps, but I have to say I really don't understand what
you're trying to do.

Cheers,

- Joel
Ed Morton

2006-02-25, 6:55 pm

Grant wrote:
> On 24 Feb 2006 14:46:22 -0800, "William James" <w_a_x_man@yahoo.com> wrote:
>
>
>
> ...
>
>
> So what's wrong with getline?


Glad you asked ;-). Now where to begin....

You don't need it except in very rare cases. Using it the way you did
above to bypass having awk just read each line as it naturally does is a
bit like pushing your car down the street instead of starting the
engine. Pushing your car is only a reasonable option if your battary's
flat or you have some other unusual situation that makes starting your
car impossible.

Now, look at this list of different side-effects of using getline:

Variant Effect
getline Sets $0, NF, FNR, and NR
getline var Sets var, FNR, and NR
getline < file Sets $0 and NF
getline var < file Sets var
command | getline Sets $0 and NF
command | getline var Sets var

(taken from the gawk users guide), throw in the fact that getline also:

a) sets FILENAME in the BEGIN section in some cases,
b) causes a different input stream to be opened when it's reading input
redirected from FILENAME so it reads from the start of the file again
but updates $0 and NF, and
c) causes the main awk body to be skipped for any lines read using it.

There's just way too much to think about to make getline solutions
generally attractive and the alternatives are usually simple anyway.

Regards,

Ed.
Harlan Grove

2006-02-25, 9:55 pm

Joel Reicher wrote...
>"Fred" <brydhenn@yahoo.com> writes:

....
>
>What is that supposed to do?

....

FIELDWIDTHS is a gawk extension. Aside from that, the braces around the
statement are unnecessary.

Joel Reicher

2006-02-25, 9:55 pm

"Harlan Grove" <hrlngrv@aol.com> writes:

> Joel Reicher wrote...
> ...
> ...
>
> FIELDWIDTHS is a gawk extension. Aside from that, the braces around the
> statement are unnecessary.


Ahh, ta. I should have guessed; I don't use gawk extensions, although
I occasionally use gawk.

Cheers,

- Joel
Joel Reicher

2006-02-25, 9:55 pm

Ed Morton <morton@lsupcaemnt.com> writes:

> You don't need it except in very rare cases. Using it the way you did
> above to bypass having awk just read each line as it naturally does is
> a bit like pushing your car down the street instead of starting the
> engine. Pushing your car is only a reasonable option if your battary's
> flat or you have some other unusual situation that makes starting your
> car impossible.


I agree with all that, but I don't think what follows is so difficult.

> Now, look at this list of different side-effects of using getline:
>
> Variant Effect
> getline Sets $0, NF, FNR, and NR
> getline var Sets var, FNR, and NR
> getline < file Sets $0 and NF
> getline var < file Sets var
> command | getline Sets $0 and NF
> command | getline var Sets var
>
> (taken from the gawk users guide), throw in the fact that getline also:
>
> a) sets FILENAME in the BEGIN section in some cases,
> b) causes a different input stream to be opened when it's reading input
> redirected from FILENAME so it reads from the start of the file again
> but updates $0 and NF, and
> c) causes the main awk body to be skipped for any lines read using it.
>
> There's just way too much to think about to make getline solutions
> generally attractive and the alternatives are usually simple anyway.


It depends how you think about it, perhaps...

Assigns to Takes input
standard input specified by
variables commandline

getline yes yes
getline var no yes
getline < file yes no
getline var < file no no
command | getline yes no
command | getline var no no

vars changed for "yes" NF NR
$0..$NF FNR
FILENAME

b) and c) are, as far as I can tell, taken care of by the fact that
getline input is independent of standard awk input.

Cheers,

- Joel
Grant

2006-02-25, 9:55 pm

On Sun, 26 Feb 2006 02:16:48 GMT, Joel Reicher <joel@panacea.null.org> wrote:

>Ed Morton <morton@lsupcaemnt.com> writes:


No context -- I seem to have missed Ed's post, so this mostly
in response to Ed as I snipped Joel's responses I agree with

[getline]

I don't see this point, the gawk manual clearly outlines use of getline
to open secondary datafiles, or to read from co process. I use getline
for these purposes in several places, mostly they worked as expected,
the one unreliable use was /inet/, but I learned here to use FIFOs
instead, they work as expected.

I fail to see how battling the command line to read setup files in
order to process the main data files makes sense. In one application
I read one sometimes two data files for processing, obviously I need
to read database files required for that processing out-of-band to
avoid nasty CLI interaction with the natural process-these-files-list
that comes after CLI options.
[color=darkred]

This one I disagree with, I've seen many example in c.l.s. of people
doing odd things to play with command line arguments -- using getline
seems far simpler, cleaner to me. Keeps the out-of-band library
loading separate from the main data processing loop.
[color=darkred]
>It depends how you think about it, perhaps...


That's the truth ;) Only issue I have now is with awk's lack of trap
handling, so I'll write a supervisor in bash to break lockups, no big
deal.

Grant.
--
.... The computer scientist, who had listened to all of this said,
"Yes, but where do you think the chaos came from?"
Ed Morton

2006-02-26, 6:55 pm

Grant wrote:
> On Sun, 26 Feb 2006 02:16:48 GMT, Joel Reicher <joel@panacea.null.org> wrote:
>
>
>
>
> No context -- I seem to have missed Ed's post, so this mostly
> in response to Ed as I snipped Joel's responses I agree with
>
> [getline]
>
>
>
> I don't see this point, the gawk manual clearly outlines use of getline
> to open secondary datafiles,


It's just showing how to use getline, not recommending use of getline
for that purpose.

or to read from co process.

Yes, that's one of those cases where you need it.

<snip>
>
> I fail to see how battling the command line to read setup files in
> order to process the main data files makes sense.


There's no battling involved. If you're battling with it, then you're
approaching it wrong.

<snip>
>
>
> This one I disagree with, I've seen many example in c.l.s. of people
> doing odd things to play with command line arguments -- using getline
> seems far simpler, cleaner to me. Keeps the out-of-band library
> loading separate from the main data processing loop.


You mentioned in another post that you're new to awk. I work with many C
programmers. When they first have to use C++ they frequently initially
complain about the difficutly of decomposing a problem into classes,
etc. The reason for that is that:

a) they don't understand the object oriented paradigm, and
b) they don't know what constructs the C++ language provides to support
that paradigm

Since C++ is a multi-paradigm language, it allows you to write
procedural programs and so they end up just doing that. After a year or
2s experience, typically the brighter programmers have learned the new
paradigm and the language and are embarassed by their original programs.
I'm not saying that OO is better or worse than procedural programming in
general but I am saying that in C++, you have more success with OO
designs than with procedural programs.

Similairly in awk, there's a paradigm shift and there are language
constructs, both of which take time to learn. Maybe in a couple of years
you'll still be perfectly happy using getline, but you may want to start
writing some programs without it just so you can really learn what the
differences are between the 2 approaches and what awk can do to support
you. Don't think of this as "avoiding getline" but rather "not avoiding
awks natural input processing".

Anyway, good luck with whatever you decide.

Ed.
Ed Morton

2006-02-26, 6:55 pm

Joel Reicher wrote:
> Ed Morton <morton@lsupcaemnt.com> writes:
>
>
>
>
> I agree with all that, but I don't think what follows is so difficult.


It's not "so" difficult. It's just not something you need to worry about
on a daily basis since the necessity of using getline is very rare ONCE
YOU KNOW AWK. A beginner, however, sees getline and thinks of it like
"read" or "fgets" or "cin" or whatever they're used to using to read
input and thinks that's what you normally use in awk to read input
without realising that:

a) it isn't, and
b) it has several caveats to take into consideration.

Regards,

Ed.
Joel Reicher

2006-02-26, 6:55 pm

Ed Morton <morton@lsupcaemnt.com> writes:

> It's not "so" difficult. It's just not something you need to worry
> about on a daily basis since the necessity of using getline is very
> rare ONCE YOU KNOW AWK.


I agree it's rare, but I'm less sure about it being "very" rare. :) If
you had some input that *could* be done without getline, does that
always mean you would? Sometimes I think it's worth using getline for
the extra clarity it provides in the program, such as saving the main
input loop for the on-the-fly data crunching and putting some
prerequisite load in BEGIN, even though there is an alternative in
using the main input loop with some state to do it.

> A beginner, however, sees getline and thinks
> of it like "read" or "fgets" or "cin" or whatever they're used to
> using to read input and thinks that's what you normally use in awk


Yes, I could well believe that. Beginners, and especially those who do
not already know a few languages, often see a new language as a new
syntax to learn, completely ignoring any idioms or, worse still,
paradigm differences.

It's a shame, to say the least.

Cheers,

- Joel
Ed Morton

2006-02-27, 3:57 am

Joel Reicher wrote:
> Ed Morton <morton@lsupcaemnt.com> writes:
>
>
>
>
> I agree it's rare, but I'm less sure about it being "very" rare. :) If
> you had some input that *could* be done without getline, does that
> always mean you would?


If you had some input that *could* be done *with* getline, does that
always mean you would? The answer in both cases is, hopefully, no.

FWIW, I have precisely 2 programs that use getline without co-processes
and they both expand referenced files within a specified input file (one
using an "include" directive, the other not). Every other time I've been
tempted to use it, I've learned something very valuable about awk and
been happier with the result of not using it.

Ed.
Joel Reicher

2006-02-27, 3:57 am

Ed Morton <morton@lsupcaemnt.com> writes:

>
> If you had some input that *could* be done *with* getline, does that
> always mean you would? The answer in both cases is, hopefully, no.


Err, umm. Not sure you answered my question. :) I understand what
you've said, and of course you shouldn't use getline just because you
can. You can, in fact, trivially use getline for everything.

> FWIW, I have precisely 2 programs that use getline without
> co-processes and they both expand referenced files within a specified
> input file (one using an "include" directive, the other not). Every
> other time I've been tempted to use it, I've learned something very
> valuable about awk and been happier with the result of not using it.


Perhaps it's time for a concrete example. Let's say you were doing a
database query, and the query comes in on stdin because it's
inappropriate (for whatever reason) for it to be supplied as command
line arguments. Would you use getline for the database load?

Similarity with Grant's little project is purely coincidental. :)

Cheers,

- Joel
Harlan Grove

2006-02-27, 6:57 pm

Joel Reicher wrote...
....
>Perhaps it's time for a concrete example. Let's say you were doing a
>database query, and the query comes in on stdin because it's
>inappropriate (for whatever reason) for it to be supplied as command
>line arguments. Would you use getline for the database load?
>
>Similarity with Grant's little project is purely coincidental. :)


How large would the database file be? A few hundred KB, then what's
wrong with

awk -f script dbfile -

where script would look something like

FNR == NR { for (i = 1; i <= NF; ++i) dbtbl[NR,i] = $i; next }
# query processing against dbfile/dbtbl code here

Ed Morton

2006-02-27, 6:57 pm

Joel Reicher wrote:
> Ed Morton <morton@lsupcaemnt.com> writes:
>
>
>
>
> Err, umm. Not sure you answered my question. :)


Yes, I did. Please re-read my response.

I understand what
> you've said, and of course you shouldn't use getline just because you
> can. You can, in fact, trivially use getline for everything.


Precisely. Horses for courses...

>
>
>
> Perhaps it's time for a concrete example. Let's say you were doing a
> database query, and the query comes in on stdin because it's
> inappropriate (for whatever reason) for it to be supplied as command
> line arguments. Would you use getline for the database load?


I'd need to see the example, but it wouldn't be the first thing that
sprang to mind. If I assume "database load" means you're reading some
file into an array, then it's not obvious why you'd want to do that, but
if we assume you do need to do it for some reason, then given this
"database":

$ cat file
1 a b c
2 d e f
3 g h i
4 j k l
5 m n o

and we say it's keyed on the third field, then if you want to find the
record with that key field being "h" and that, again for whatever
reason, has to come from stdin, the obvious approach would be to just do
this:

$ echo "h" | awk 'NR==FNR{a[$3]=$0;next}$0 in a{print a[$0]}' file -
3 g h i

If that's not what you had in mind, post a SMALL example if you want to
discuss it as I've spent just about enough time on this particular
thread and have no interest in studying and rewriting a large example.

Regards,

Ed.
Patrick TJ McPhee

2006-02-28, 3:55 am

In article <e5tuv1lrtltpb5p51nqis6p5d04t9uv3ci@4ax.com>,
Ted Davis <tdavis@gearbox.maem.umr.edu> wrote:

% will say that when you delete a command line argument, you have to
% decrement ARGC. Actually, I'm suprised you get anything besides error
% messages with that.

It's because you don't have to do anything of the sort. You delete
ARGV[1], say, then try and retrieve it, and you get an empty string.
awk's default ARGV processing silently skips over empty arguments.
--

Patrick TJ McPhee
North York Canada
ptjm@interlog.com
Ted Davis

2006-02-28, 7:55 am

On 28 Feb 2006 04:21:10 GMT, ptjm@interlog.com (Patrick TJ McPhee)
wrote:

>In article <e5tuv1lrtltpb5p51nqis6p5d04t9uv3ci@4ax.com>,
>Ted Davis <tdavis@gearbox.maem.umr.edu> wrote:
>
>% will say that when you delete a command line argument, you have to
>% decrement ARGC. Actually, I'm suprised you get anything besides error
>% messages with that.
>
>It's because you don't have to do anything of the sort. You delete
>ARGV[1], say, then try and retrieve it, and you get an empty string.
>awk's default ARGV processing silently skips over empty arguments.


I recall that one version some years ago - even if there were no
automatic input block - would try to run a null program on STDIN if
ARGV was cleared but ARGC not reset.

--
T.E.D. (tdavis@gearbox.maem.umr.edu)
SPAM filter: Messages to this address *must* contain "T.E.D."
somewhere in the body or they will be automatically rejected.
Joel Reicher

2006-03-02, 3:55 am

"Harlan Grove" <hrlngrv@aol.com> writes:

> How large would the database file be? A few hundred KB, then what's
> wrong with
>
> awk -f script dbfile -
>
> where script would look something like
>
> FNR == NR { for (i = 1; i <= NF; ++i) dbtbl[NR,i] = $i; next }
> # query processing against dbfile/dbtbl code here


There's not much wrong with it, I guess, but the FNR==NR is really a
hack to make it work. Having a getline loop in BEGIN is far more
natural, IMHO, and more clearly separates the database loading from
the input processing.

I guess it depends on whether you consider the database part of "main
input" or not.

Cheers,

- Joel
Joel Reicher

2006-03-02, 3:55 am

Ed Morton <morton@lsupcaemnt.com> writes:

> $ echo "h" | awk 'NR==FNR{a[$3]=$0;next}$0 in a{print a[$0]}' file -
> 3 g h i
>
> If that's not what you had in mind, post a SMALL example if you want to
> discuss it as I've spent just about enough time on this particular
> thread and have no interest in studying and rewriting a large example.


You've answered my questions, thanks. You might like to read my
response to Harlan.

Cheers,

- Joel
Harlan Grove

2006-03-02, 9:55 pm

Joel Reicher wrote...
>"Harlan Grove" <hrlngrv@aol.com> writes:
>
>There's not much wrong with it, I guess, but the FNR==NR is really a
>hack to make it work. Having a getline loop in BEGIN is far more
>natural, IMHO, and more clearly separates the database loading from
>the input processing.

....

'hack'! Almost everything in awk that doesn't resemble C would seem to
be a hack as you'd define it. Having the getline loop in the BEGIN
block would require either hardcoding the filename there, using
environment variables, processing ARGV, relying on nonstandard -v
command line switches, or using stdin to feed getline. Using
environment variables to pass filenames would be OK. Hardcoding may not
be so bad for canned scripts. But the other alternatives would be worse
hacks.

And there are ways to generalize the FNR == NR pattern, such as

FILENAME == ARGV[1] { . . . process 1st file . . . }
FILENAME == ARGV[2] { . . . process 2nd file . . . }
FILENAME == ARGV[3] { . . . process 3rd file . . . }

And this allows for common pre- and post-processing pattern-actions
bracketting these file-specific pattern-actions.

Grant

2006-03-02, 9:55 pm

On 2 Mar 2006 10:09:55 -0800, "Harlan Grove" <hrlngrv@aol.com> wrote:

>Joel Reicher wrote...
>...
>
>'hack'! Almost everything in awk that doesn't resemble C would seem to
>be a hack as you'd define it. Having the getline loop in the BEGIN
>block would require either hardcoding the filename there, using
>environment variables, processing ARGV, relying on nonstandard -v
>command line switches, or using stdin to feed getline. Using
>environment variables to pass filenames would be OK. Hardcoding may not
>be so bad for canned scripts. But the other alternatives would be worse
>hacks.


Alright, I have awk script to process some files, it also reads some
control files (database), if I want to use the script as

zcat file.gz | awk_program [options], or
awk_program [options] file

There's no consistent way to read a database specified on CLI, therefore
I read database in BEGIN or END block with getline, how would you do
this? I also plan to read option from a .conf file, again a hard-coded
setup file, same story.

How does one do this in awk?

Grant.
--
Living in a land down under / Where women glow and men plunder / Can't you
hear, can't you hear the thunder? / You better run, you better take cover!
--Men At Work
Harlan Grove

2006-03-02, 9:55 pm

Grant wrote...
....
....[color=darkred]
>Alright, I have awk script to process some files, it also reads some
>control files (database), if I want to use the script as
>
>zcat file.gz | awk_program [options], or
>awk_program [options] file
>
>There's no consistent way to read a database specified on CLI, therefore
>I read database in BEGIN or END block with getline, how would you do
>this? I also plan to read option from a .conf file, again a hard-coded
>setup file, same story.
>
>How does one do this in awk?


You seem to want to use a file with awk code and #!/usr/bin/awk -f in
the top line. At the risk of being accused of yet more hacks, nothing
prevents you from using state markers on the command line. Consider the
following.

$ cat a
a
aa
aaa
aaaa
aaaaa
aaaaaa
$ cat b
b
bb
bbb
bbbb
$ cat c
c
cc
ccc
cccc
ccccc
cccccc
ccccccc
cccccccc
$ cat d
d
dd
d
dd
d
dd
d
$ cat script
#!/usr/bin/awk -f
FNR == 1 { fn[++s] = FILENAME }
s == 1 { a[FNR] = $0; next }
s == 2 { b[FNR] = $0; next }
s == 3 { c[FNR] = $0; next }
{ x[NR] = $0 }
END {
if (1 in a) {
print 1, fn[1]
for (j = 1; j in a; ++j) printf("\t%4d\t%s\n", j, a[j])
}
if (1 in b) {
print 2, fn[2]
for (j = 1; j in b; ++j) printf("\t%4d\t%s\n", j, b[j])
}
if (1 in c) {
print 3, fn[3]
for (j = 1; j in c; ++j) printf("\t%4d\t%s\n", j, c[j])
}
printf "\n\n"
for (j = 1; j <= NR; ++j) if (j in x) printf("-> %4d: %s\n", j, x[j])
}
$ script a - c d < b
1 a
1 a
2 aa
3 aaa
4 aaaa
5 aaaaa
6 aaaaaa
2 -
1 b
2 bb
3 bbb
4 bbbb
3 c
1 c
2 cc
3 ccc
4 cccc
5 ccccc
6 cccccc
7 ccccccc
8 cccccccc


-> 19: d
-> 20: dd
-> 21: d
-> 22: dd
-> 23: d
-> 24: dd
-> 25: d
$ script s=2 a d s=0 - b < c
1 -
1 c
2 cc
3 ccc
4 cccc
5 ccccc
6 cccccc
7 ccccccc
8 cccccccc
2 b
1 b
2 bb
3 bbb
4 bbbb
3 a
1 a
2 aa
3 aaa
4 aaaa
5 aaaaa
6 aaaaaa


-> 7: d
-> 8: dd
-> 9: d
-> 10: dd
-> 11: d
-> 12: dd
-> 13: d
$

Notice any getlines in script?

As for reading a .conf file, with the environment variable myconf set
to ~/.conf and marked for export,

$ export myconf=~/.conf
$ cat $myconf
this
is
a
test
$ cat script2
BEGIN {
for (i = ARGC; i > 1; --i) ARGV[i] = ARGV[i - 1]
++ARGC
ARGV[1] = ENVIRON["myconf"]
}
FNR == NR { conf[FNR] = $0; next }
FNR in conf {
if (length <= length(conf[FNR]))
++shorter[FNR]
else
++longer[FNR]
next
}
{ if (length > maxlen[FNR]) { maxlen[FNR] = length; longest[FNR] = $0 }
}
END {
for (i = 1; i in conf; ++i) printf("%4d\t%4d\t%4d\n", i, shorter[i],
longer[i])
printf "--\n"
for ( ; i in maxlen; ++i) printf("%4d\t%4d\t%s\n", i, maxlen[i],
longest[i])
}
$ script2 a b c
1 3 0
2 3 0
3 0 3
4 3 0
--
5 5 aaaaa
6 6 aaaaaa
7 7 ccccccc
8 8 cccccccc
$

This was just to show that it's *possible* to read configuration files
not given on the command line using awk's main input loop/processing by
altering ARGV/ARGC in the BEGIN block. Myself, I might use getline to
read configuration files, but in practice I've always used shell
scripts to run my awk scripts, so I've always been able to set my awk
command lines to include all the arguments I need in processing order.

Grant

2006-03-02, 9:55 pm

On 2 Mar 2006 14:49:13 -0800, "Harlan Grove" <hrlngrv@aol.com> wrote:

>Grant wrote...


[how to avoid getline and]

....[color=darkred]
>Notice any getlines in script?

Nope ;)

$ cat b | ./script c -
1 c
1 c
2 cc
3 ccc
4 cccc
5 ccccc
6 cccccc
7 ccccccc
8 cccccccc
9
2 -
1 b
2 bb
3 bbb
4 bbbb
5


$ ./script c b
1 c
1 c
2 cc
3 ccc
4 cccc
5 ccccc
6 cccccc
7 ccccccc
8 cccccccc
9
2 b
1 b
2 bb
3 bbb
4 bbbb
5

Okay, makes sense, thanks for illuminating example, have saved it for
reference.

[...]
>This was just to show that it's *possible* to read configuration files
>not given on the command line using awk's main input loop/processing by
>altering ARGV/ARGC in the BEGIN block.


Hmm, not this w :o) getline looks easier here (for me).

>Myself, I might use getline to
>read configuration files, but in practice I've always used shell
>scripts to run my awk scripts, so I've always been able to set my awk
>command lines to include all the arguments I need in processing order.


I've not yet settled on reading a config file in awk, exploring the
options. Using a bash wrapper to feed an awk program its arguments
seems easiest, also what I'm doing now with one awk program run from
cron job.

Thanks,
Grant.
--
Living in a land down under / Where women glow and men plunder / Can't you
hear, can't you hear the thunder? / You better run, you better take cover!
--Men At Work
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com