Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Re: How can I check if a file exists in gawk?
On Mar 27, 8:05 pm, stan <smo...@exis.net> wrote:
> Ed Morton wrote:
> 
> 
>
> <snip>
> 
> 
>
> Personally, I found that I needed a firm fixed rule that every time I
> type getline I have to get up and go get a cup of coffee. Without any
> exceptions that I can remember I always find that I was trying to write
> a c progrm in awk and with a little reflection I can find a solution
> that uses awk instead of fighting against it.

I'm the reverse - I LIKE getline.

I have a utility subroutine for checking if a file exists:

function exists(file      , line)
{
if ( (getline line < file) > 0 )
{
close(file);
return 1;
}
else
{
return 0;
}
}

I tend to use gawk as a general purpose programming language, so the
fact that it is C-like is a plus to me.

When the pattern-matching paradigm is appropriate, I use it; when it
isn't, I use getline to read from any number of files.

imho, the contortions needed to not use getline when reading multiple
files are less clear than using getline for all but one.

Please don't vote me off the island.

martin cohen

Report this thread to moderator Post Follow-up to this message
Old Post
mjc
03-30-08 12:09 AM


Re: How can I check if a file exists in gawk?

On 3/29/2008 3:12 PM, mjc wrote:
> On Mar 27, 8:05 pm, stan <smo...@exis.net> wrote:
> 
>
>
> I'm the reverse - I LIKE getline.
>
> I have a utility subroutine for checking if a file exists:
>
> function exists(file      , line)
> {
>    if ( (getline line < file) > 0 )
>     {
>       close(file);
>       return 1;
>     }
>    else
>     {
>       return 0;
>     }
> }
>
> I tend to use gawk as a general purpose programming language, so the
> fact that it is C-like is a plus to me.

Deciding to explicitly write

while read line {
split line into field1 field2 field3....
}

when that's already provided by the tool by default doesn't make that tool a
ny
more general purpose or C-like.

> When the pattern-matching paradigm is appropriate, I use it; when it
> isn't, I use getline to read from any number of files.

I don't see the inverse relationship between pattern-matching and explicitly
reading input.

> imho, the contortions needed to not use getline when reading multiple
> files are less clear than using getline for all but one.

What contortions? Could you give a small example of the problem?

> Please don't vote me off the island.

getline has it's uses, see http://tinyurl.com/yn9ka9 for a list.

Ed.


Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
03-30-08 12:09 AM


Re: How can I check if a file exists in gawk?
On Mar 29, 2:18 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
>
....
> What contortions? Could you give a small example of the problem?
>
.....
>         Ed.

I wrote a program in gawk to compare the results of a computation
(written in assembly language with debugging info) with a simulation
of the computation. In addition, the program also read the assembly
listing so it would know what debugging info could be written and read
its own source so it would know what debugging info was being looked
for.

So, there were four files being read - the first three (especially the
first) needed to be completely read before the last was started.

I read the first three in the BEGIN block using getline in three
separate loops. Since there was no overlap, I used the standard
splitting of $0 (i.e., getline < file). Patterns were matched using
combinations of "if ( $x == ..." and "if ( match($x, ...)".

The fourth file was read in the standard pattern-matching from stdin
paradigm. These patterns were what the program looked for when it read
itself so it could find out which assembly language debugging
statements were being looked for.

At each pattern-match, the assembly-language debug output was compared
with the corresponding simulation results (read from the first file)
and statistics about that particular part of the computation gathered
(min, max, and mean error). If the results were too bad, an error
message was written and also saved to be written at the end where it
could be readily noticed.

At the end, the program compared the assembly listing info with its
own listing info to tell which debugging statement had not been
reached (in 3rd and not in 2nd file) and which debugging statements
had not been looked for (in 2nd and not in 3rd file).

Finally, the statistics about the final errors were output.

This was the first time I had a gawk program that read its own source
- I found that somewhat amusing.

btw, I always use the "-lint" option, and ignore the "variable
shadows" and "nonstandard" messages. The other messages I often find
very helpful.

I suppose I could check when the record number resets to 1 to see when
a new file starts and look at FILENAME to see what the file is, but I
find using getline in this case much more straightforward. In
particular, I would have to have every pattern check for which file
the pattern applied to.

That's my story, a trifle gory, but I don't worry because it's not an
allegory.

martin cohen

Report this thread to moderator Post Follow-up to this message
Old Post
mjc
03-31-08 12:18 AM


Re: How can I check if a file exists in gawk?
In article <853cada4-f51a-47b8-b39e-ea056c00d1d4@c26g2000prf.googlegroups.co
m>,
mjc  <mjcohen@acm.org> wrote:
>On Mar 29, 2:18 pm, Ed Morton <mor...@lsupcaemnt.com> wrote: 
>.... 
>..... 
>
>I wrote a program in gawk to compare the results of a computation
>(written in assembly language with debugging info) with a simulation
>of the computation. In addition, the program also read the assembly
>listing so it would know what debugging info could be written and read
>its own source so it would know what debugging info was being looked
>for.

Basically, getline is only _needed_ when you are reading more than one
file at a time (i.e., "in parallel").  Now, having said that, some
people find its use (in those cases where it is not necessary) to be
aesthetically appealing - and others don't.  Obviously, there's no
accounting for taste.  I agree with Ed's basic position on the matter,
which is that using getline appeals to people who don't quite get "awk
qua awk".

Your example certainly fits the classic "people think they need getline,
but they don't" archetype.  That is, your program should (in the Ed/Kenny
sense of the word "should") be written:

ARGIND == 1 {
# Do stuff for file 1
next
}
ARGIND == 2 {
# Do stuff for file 2
next
}
ARGIND == 3 {
# Do stuff for file 3
next
}
{
# else do stuff for file 4
# Note that for this, the last file, you can also use all the usual
# AWK pattern/action stuff
}

Note: I hope I got the ARGIND stuff right (ARGIND is, AFAIK,
gawk-specific).  I normally use TAWK, which has a variable called ARGI,
which is the same as gawk's ARGIND, except that it is higher by one (so
the first file is: ARGI == 2 {})

Notes:
1) Yes, it is unfortunate that you can only use the "automatic patterns"
in the last file.  But this is (obviously) the same as if using getline.
2) The real point of using the above style in place of getline is that
you specify the files to be read on the command line, rather than
hard-coding them into the script.  This is a Good Thing, although many
see it as a minus at first sight.


Report this thread to moderator Post Follow-up to this message
Old Post
Kenny McCormack
03-31-08 12:18 AM


Re: How can I check if a file exists in gawk?

On 3/30/2008 10:22 AM, mjc wrote:
> On Mar 29, 2:18 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
>
> ....
> 
>
> .....
> 
>
>
> I wrote a program in gawk to compare the results of a computation
> (written in assembly language with debugging info) with a simulation
> of the computation. In addition, the program also read the assembly
> listing so it would know what debugging info could be written and read
> its own source so it would know what debugging info was being looked
> for.
>
> So, there were four files being read - the first three (especially the
> first) needed to be completely read before the last was started.
>
> I read the first three in the BEGIN block using getline in three
> separate loops. Since there was no overlap, I used the standard
> splitting of $0 (i.e., getline < file). Patterns were matched using
> combinations of "if ( $x == ..." and "if ( match($x, ...)".
>
> The fourth file was read in the standard pattern-matching from stdin
> paradigm. These patterns were what the program looked for when it read
> itself so it could find out which assembly language debugging
> statements were being looked for.
>
> At each pattern-match, the assembly-language debug output was compared
> with the corresponding simulation results (read from the first file)
> and statistics about that particular part of the computation gathered
> (min, max, and mean error). If the results were too bad, an error
> message was written and also saved to be written at the end where it
> could be readily noticed.
>
> At the end, the program compared the assembly listing info with its
> own listing info to tell which debugging statement had not been
> reached (in 3rd and not in 2nd file) and which debugging statements
> had not been looked for (in 2nd and not in 3rd file).
>
> Finally, the statistics about the final errors were output.
>
> This was the first time I had a gawk program that read its own source
> - I found that somewhat amusing.
>
> btw, I always use the "-lint" option, and ignore the "variable
> shadows" and "nonstandard" messages. The other messages I often find
> very helpful.
>
> I suppose I could check when the record number resets to 1 to see when
> a new file starts and look at FILENAME to see what the file is, but I
> find using getline in this case much more straightforward. In
> particular, I would have to have every pattern check for which file
> the pattern applied to.
>
> That's my story, a trifle gory, but I don't worry because it's not an
> allegory.
>
> martin cohen

So, you had something like this:

BEGIN {
while ((getline < ARGV[1]) > 0) {
do first file stuff
}
close(ARGV[1])
while ((getline < ARGV[2]) > 0) {
do second file stuff
}
close(ARGV[2])
while ((getline < ARGV[3]) > 0) {
do second file stuff
}
close(ARGV[3])
ARGV[1]=ARGV[2]=ARGV[3]=""
}
/pattern/ { pattern match in fourth file }
END { do the end stuff }

when all you really needed was:

ARGIND == 1 { do first file stuff; next }
ARGIND == 2 { do second file stuff; next }
ARGIND == 3 { do third file stuff; next }
/pattern/ { pattern match in fourth file }
END { do the end stuff }

Replace "ARGIND == N" with "FILENAME == ARGV[N]" if you want a solution that
isn't gawk-specific.

Ed.


Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
03-31-08 12:18 AM


Re: How can I check if a file exists in gawk?
In article <47EFDC32.8020103@lsupcaemnt.com>,
Ed Morton  <morton@lsupcaemnt.com> wrote:
...
>So, you had something like this:
>
>BEGIN {
>	while ((getline < ARGV[1]) > 0) {
>		do first file stuff
>	}
>	close(ARGV[1])

As I pointed in my previous response, when people use getline, they
usually hard-code the name of the file they are reading from in the
script.  This is generally a Bad Thing, but has superficial appeal.


Report this thread to moderator Post Follow-up to this message
Old Post
Kenny McCormack
03-31-08 12:18 AM


Re: How can I check if a file exists in gawk?
Ed Morton wrote:
>
> So, you had something like this:
>
> BEGIN {
> 	while ((getline < ARGV[1]) > 0) {
> 		do first file stuff
> 	}
> 	close(ARGV[1])
> 	while ((getline < ARGV[2]) > 0) {
> 		do second file stuff
> 	}
> 	close(ARGV[2])
> 	while ((getline < ARGV[3]) > 0) {
> 		do second file stuff
> 	}
> 	close(ARGV[3])
> 	ARGV[1]=ARGV[2]=ARGV[3]=""
> }
> /pattern/ { pattern match in fourth file }
> END { do the end stuff }
>
> when all you really needed was:
>
> ARGIND == 1 { do first file stuff; next }
> ARGIND == 2 { do second file stuff; next }
> ARGIND == 3 { do third file stuff; next }
> /pattern/ { pattern match in fourth file }
> END { do the end stuff }
>
> Replace "ARGIND == N" with "FILENAME == ARGV[N]" if you want a solution that
> isn't gawk-specific.

Once, in a similar case, I've used something like

awk -f prog.awk  phase=1 file1  phase=2 file2  fileX fileY fileZ

Where prog.awk had been something like

phase == 1 { do first file stuff ; next }
phase == 2 { do second file stuff ; next }
/whatever/ { do rest of the files}

I've done that to avoid the filename comparison and GNU specifics.

Janis

>
> 	Ed.
>

Report this thread to moderator Post Follow-up to this message
Old Post
Janis Papanagnou
03-31-08 12:18 AM


Sponsored Links




Last Thread Next Thread Next
Pages (2): « 1 [2]
Search this forum -> 
Post New Thread

AWK archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 03:26 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.