Code Comments
Programming Forum and web based access to our favorite programming groups.On Mar 27, 8:05 pm, stan <smo...@exis.net> wrote:
> Ed Morton wrote:
>
>
>
> <snip>
>
>
>
> Personally, I found that I needed a firm fixed rule that every time I
> type getline I have to get up and go get a cup of coffee. Without any
> exceptions that I can remember I always find that I was trying to write
> a c progrm in awk and with a little reflection I can find a solution
> that uses awk instead of fighting against it.
I'm the reverse - I LIKE getline.
I have a utility subroutine for checking if a file exists:
function exists(file , line)
{
if ( (getline line < file) > 0 )
{
close(file);
return 1;
}
else
{
return 0;
}
}
I tend to use gawk as a general purpose programming language, so the
fact that it is C-like is a plus to me.
When the pattern-matching paradigm is appropriate, I use it; when it
isn't, I use getline to read from any number of files.
imho, the contortions needed to not use getline when reading multiple
files are less clear than using getline for all but one.
Please don't vote me off the island.
martin cohen
Post Follow-up to this message
On 3/29/2008 3:12 PM, mjc wrote:
> On Mar 27, 8:05 pm, stan <smo...@exis.net> wrote:
>
>
>
> I'm the reverse - I LIKE getline.
>
> I have a utility subroutine for checking if a file exists:
>
> function exists(file , line)
> {
> if ( (getline line < file) > 0 )
> {
> close(file);
> return 1;
> }
> else
> {
> return 0;
> }
> }
>
> I tend to use gawk as a general purpose programming language, so the
> fact that it is C-like is a plus to me.
Deciding to explicitly write
while read line {
split line into field1 field2 field3....
}
when that's already provided by the tool by default doesn't make that tool a
ny
more general purpose or C-like.
> When the pattern-matching paradigm is appropriate, I use it; when it
> isn't, I use getline to read from any number of files.
I don't see the inverse relationship between pattern-matching and explicitly
reading input.
> imho, the contortions needed to not use getline when reading multiple
> files are less clear than using getline for all but one.
What contortions? Could you give a small example of the problem?
> Please don't vote me off the island.
getline has it's uses, see http://tinyurl.com/yn9ka9 for a list.
Ed.
Post Follow-up to this messageOn Mar 29, 2:18 pm, Ed Morton <mor...@lsupcaemnt.com> wrote: > .... > What contortions? Could you give a small example of the problem? > ..... > Ed. I wrote a program in gawk to compare the results of a computation (written in assembly language with debugging info) with a simulation of the computation. In addition, the program also read the assembly listing so it would know what debugging info could be written and read its own source so it would know what debugging info was being looked for. So, there were four files being read - the first three (especially the first) needed to be completely read before the last was started. I read the first three in the BEGIN block using getline in three separate loops. Since there was no overlap, I used the standard splitting of $0 (i.e., getline < file). Patterns were matched using combinations of "if ( $x == ..." and "if ( match($x, ...)". The fourth file was read in the standard pattern-matching from stdin paradigm. These patterns were what the program looked for when it read itself so it could find out which assembly language debugging statements were being looked for. At each pattern-match, the assembly-language debug output was compared with the corresponding simulation results (read from the first file) and statistics about that particular part of the computation gathered (min, max, and mean error). If the results were too bad, an error message was written and also saved to be written at the end where it could be readily noticed. At the end, the program compared the assembly listing info with its own listing info to tell which debugging statement had not been reached (in 3rd and not in 2nd file) and which debugging statements had not been looked for (in 2nd and not in 3rd file). Finally, the statistics about the final errors were output. This was the first time I had a gawk program that read its own source - I found that somewhat amusing. btw, I always use the "-lint" option, and ignore the "variable shadows" and "nonstandard" messages. The other messages I often find very helpful. I suppose I could check when the record number resets to 1 to see when a new file starts and look at FILENAME to see what the file is, but I find using getline in this case much more straightforward. In particular, I would have to have every pattern check for which file the pattern applied to. That's my story, a trifle gory, but I don't worry because it's not an allegory. martin cohen
Post Follow-up to this messageIn article <853cada4-f51a-47b8-b39e-ea056c00d1d4@c26g2000prf.googlegroups.co
m>,
mjc <mjcohen@acm.org> wrote:
>On Mar 29, 2:18 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
>....
>.....
>
>I wrote a program in gawk to compare the results of a computation
>(written in assembly language with debugging info) with a simulation
>of the computation. In addition, the program also read the assembly
>listing so it would know what debugging info could be written and read
>its own source so it would know what debugging info was being looked
>for.
Basically, getline is only _needed_ when you are reading more than one
file at a time (i.e., "in parallel"). Now, having said that, some
people find its use (in those cases where it is not necessary) to be
aesthetically appealing - and others don't. Obviously, there's no
accounting for taste. I agree with Ed's basic position on the matter,
which is that using getline appeals to people who don't quite get "awk
qua awk".
Your example certainly fits the classic "people think they need getline,
but they don't" archetype. That is, your program should (in the Ed/Kenny
sense of the word "should") be written:
ARGIND == 1 {
# Do stuff for file 1
next
}
ARGIND == 2 {
# Do stuff for file 2
next
}
ARGIND == 3 {
# Do stuff for file 3
next
}
{
# else do stuff for file 4
# Note that for this, the last file, you can also use all the usual
# AWK pattern/action stuff
}
Note: I hope I got the ARGIND stuff right (ARGIND is, AFAIK,
gawk-specific). I normally use TAWK, which has a variable called ARGI,
which is the same as gawk's ARGIND, except that it is higher by one (so
the first file is: ARGI == 2 {})
Notes:
1) Yes, it is unfortunate that you can only use the "automatic patterns"
in the last file. But this is (obviously) the same as if using getline.
2) The real point of using the above style in place of getline is that
you specify the files to be read on the command line, rather than
hard-coding them into the script. This is a Good Thing, although many
see it as a minus at first sight.
Post Follow-up to this message
On 3/30/2008 10:22 AM, mjc wrote:
> On Mar 29, 2:18 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
>
> ....
>
>
> .....
>
>
>
> I wrote a program in gawk to compare the results of a computation
> (written in assembly language with debugging info) with a simulation
> of the computation. In addition, the program also read the assembly
> listing so it would know what debugging info could be written and read
> its own source so it would know what debugging info was being looked
> for.
>
> So, there were four files being read - the first three (especially the
> first) needed to be completely read before the last was started.
>
> I read the first three in the BEGIN block using getline in three
> separate loops. Since there was no overlap, I used the standard
> splitting of $0 (i.e., getline < file). Patterns were matched using
> combinations of "if ( $x == ..." and "if ( match($x, ...)".
>
> The fourth file was read in the standard pattern-matching from stdin
> paradigm. These patterns were what the program looked for when it read
> itself so it could find out which assembly language debugging
> statements were being looked for.
>
> At each pattern-match, the assembly-language debug output was compared
> with the corresponding simulation results (read from the first file)
> and statistics about that particular part of the computation gathered
> (min, max, and mean error). If the results were too bad, an error
> message was written and also saved to be written at the end where it
> could be readily noticed.
>
> At the end, the program compared the assembly listing info with its
> own listing info to tell which debugging statement had not been
> reached (in 3rd and not in 2nd file) and which debugging statements
> had not been looked for (in 2nd and not in 3rd file).
>
> Finally, the statistics about the final errors were output.
>
> This was the first time I had a gawk program that read its own source
> - I found that somewhat amusing.
>
> btw, I always use the "-lint" option, and ignore the "variable
> shadows" and "nonstandard" messages. The other messages I often find
> very helpful.
>
> I suppose I could check when the record number resets to 1 to see when
> a new file starts and look at FILENAME to see what the file is, but I
> find using getline in this case much more straightforward. In
> particular, I would have to have every pattern check for which file
> the pattern applied to.
>
> That's my story, a trifle gory, but I don't worry because it's not an
> allegory.
>
> martin cohen
So, you had something like this:
BEGIN {
while ((getline < ARGV[1]) > 0) {
do first file stuff
}
close(ARGV[1])
while ((getline < ARGV[2]) > 0) {
do second file stuff
}
close(ARGV[2])
while ((getline < ARGV[3]) > 0) {
do second file stuff
}
close(ARGV[3])
ARGV[1]=ARGV[2]=ARGV[3]=""
}
/pattern/ { pattern match in fourth file }
END { do the end stuff }
when all you really needed was:
ARGIND == 1 { do first file stuff; next }
ARGIND == 2 { do second file stuff; next }
ARGIND == 3 { do third file stuff; next }
/pattern/ { pattern match in fourth file }
END { do the end stuff }
Replace "ARGIND == N" with "FILENAME == ARGV[N]" if you want a solution that
isn't gawk-specific.
Ed.
Post Follow-up to this messageIn article <47EFDC32.8020103@lsupcaemnt.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
...
>So, you had something like this:
>
>BEGIN {
> while ((getline < ARGV[1]) > 0) {
> do first file stuff
> }
> close(ARGV[1])
As I pointed in my previous response, when people use getline, they
usually hard-code the name of the file they are reading from in the
script. This is generally a Bad Thing, but has superficial appeal.
Post Follow-up to this messageEd Morton wrote:
>
> So, you had something like this:
>
> BEGIN {
> while ((getline < ARGV[1]) > 0) {
> do first file stuff
> }
> close(ARGV[1])
> while ((getline < ARGV[2]) > 0) {
> do second file stuff
> }
> close(ARGV[2])
> while ((getline < ARGV[3]) > 0) {
> do second file stuff
> }
> close(ARGV[3])
> ARGV[1]=ARGV[2]=ARGV[3]=""
> }
> /pattern/ { pattern match in fourth file }
> END { do the end stuff }
>
> when all you really needed was:
>
> ARGIND == 1 { do first file stuff; next }
> ARGIND == 2 { do second file stuff; next }
> ARGIND == 3 { do third file stuff; next }
> /pattern/ { pattern match in fourth file }
> END { do the end stuff }
>
> Replace "ARGIND == N" with "FILENAME == ARGV[N]" if you want a solution that
> isn't gawk-specific.
Once, in a similar case, I've used something like
awk -f prog.awk phase=1 file1 phase=2 file2 fileX fileY fileZ
Where prog.awk had been something like
phase == 1 { do first file stuff ; next }
phase == 2 { do second file stuff ; next }
/whatever/ { do rest of the files}
I've done that to avoid the filename comparison and GNU specifics.
Janis
>
> Ed.
>
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.