Home > Archive > AWK > October 2006 > nextfile syntax and BEGINFILE/ENDFILE patch
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
nextfile syntax and BEGINFILE/ENDFILE patch
|
|
| Peter V. Saveliev 2006-10-04, 6:56 pm |
| Hello!
Please, review the patch:
http://xgawk.radlinux.org/Articles/patch-fileworks/show
It was created for xgawk branch, but can be interesting for other
branches too. And, maybe, this functionality is already implemented in
other awk versions? Then it would be sane to try to make this patch
compatible with existing syntax of other branches.
Thanks for comments.
| |
| Jürgen Kahrs 2006-10-04, 6:56 pm |
| Peter V. Saveliev wrote:
> It was created for xgawk branch, but can be interesting for other
> branches too. And, maybe, this functionality is already implemented in
> other awk versions? Then it would be sane to try to make this patch
> compatible with existing syntax of other branches.
Is any user of TAWK reading this newsgroup ?
I vaguely remember that TAWK had some features
like the ones Peter has implemented.
| |
| Kenny McCormack 2006-10-04, 6:56 pm |
| In article <4oigelFd8iuoU1@individual.net>,
Jürgen Kahrs <Juergen.KahrsDELETETHIS@vr-web.de> wrote:
>Peter V. Saveliev wrote:
>
>
>Is any user of TAWK reading this newsgroup ?
>I vaguely remember that TAWK had some features
>like the ones Peter has implemented.
1) I thought GAWK already had nextfile.
2) Yes, Tawk has BEGINFILE/ENDFILE - and very useful they are.
It has "nextfile" in the form of: close(FILENAME)
| |
| Peter V. Saveliev 2006-10-04, 6:56 pm |
|
<skip />
> 1) I thought GAWK already had nextfile.
nextfile -- had. But the patch allows to use it as nextfile "FILENAME"
to open an arbitrary file, not only the next from ARGV.
> 2) Yes, Tawk has BEGINFILE/ENDFILE - and very useful they are.
Ok, I see... So, here we'll be compatible. Gut.
<skip />
| |
| Jürgen Kahrs 2006-10-04, 6:56 pm |
| Peter V. Saveliev wrote:
>
> Ok, I see... So, here we'll be compatible. Gut.
Kenny, do you have a pointer to some documentation
about this feature of TAWK ? Compatibility means
more than just the name of a variable.
| |
| Kenny McCormack 2006-10-04, 6:56 pm |
| In article <4oin24Feof9jU1@individual.net>,
Jürgen Kahrs <Juergen.KahrsDELETETHIS@vr-web.de> wrote:
>Peter V. Saveliev wrote:
>
>
>Kenny, do you have a pointer to some documentation
>about this feature of TAWK ? Compatibility means
>more than just the name of a variable.
Alas, no. You can try www.tasoft.com - but I couldn't find any mention
of ENDFILE there.
This is what the manual says:
BEGINFILE { statements }
The statements are executed each time TAWK starts processing a
new file.
ENDFILE { statements }
The statements are execute deach time TAWK finishes processing a
file.
Later, it explains that the BEGINFILE block is executed after TAWK's
Automatic Input Loop opens a file, but before the first record is read
in. And that the ENDFILE block is executed after all records have been
read from the file, but before the file is closed.
And this agrees with my experience. I have used BEGINFILE frequently to
change FS when processing multiple files with different field delimiters.
I.e., something like:
BEGINFILE {
if (ARGI == 2)
FS ="," ; first input file is comma sep
else
if (ARGI == 3)
FS ="|" ; second input file is bar sep
etc.
}
I often use ENDFILE to produce a report on each input file (like END, but
allows you to report on each individual file)
| |
| Anton Treuenfels 2006-10-05, 3:57 am |
|
"Peter V. Saveliev" <peet@peet.spb.ru> wrote in message
news:1159989971.608251.34700@b28g2000cwb.googlegroups.com...
> Hello!
>
> Please, review the patch:
>
> http://xgawk.radlinux.org/Articles/patch-fileworks/show
I'm having a little trouble understanding exactly what this code sample is
good for:
/nextfile/ { if ( $1 ) nextfile $1 }
I may be mis-understanding it, but it seems to say "when $0 contains the
pattern /nextfile/ then if $1 exists then set $1 as the next file to read"
First, if $0 contains the pattern /nextfile/ at all doesn't that imply that
$1 is non-null? Assuming default field separators, of course. So why test
for that again? This may be just a bit of throwaway code but it should at
least make some kind of sense, if only to show that the feature is actually
useful-:)
Second it's not entirely clear just when the next file becomes the current
file. Immediately? At the end of the current file? And in either case, what
happens to any other files specified on the command line?
Something like this:
for ( i = ++ARGC; i > ARGI+1; i-- )
ARGV[i] = ARGV[i-1]
ARGV[ARGI+1] = nextfile
or something like this:
close( ARGV[ARGI] )
ARGV[ARGI] = nextfile
open( ARGV[ARGI] )
I like TAWK's BEGINFILE (after file is opened, but before first record) and
ENDFILE (after last record, but before file is closed) because then I know
that all the relevant file variables (FILENAME, FNR, etc) are valid.
I've used them for status messages regarding progress, for example, as well
as initializing/finalizing file-specific information.
- Anton Treuenfels
| |
| Peter V. Saveliev 2006-10-05, 3:57 am |
|
J=FCrgen Kahrs wrote:
> Peter V. Saveliev wrote:
>
>
> Kenny, do you have a pointer to some documentation
> about this feature of TAWK ? Compatibility means
> more than just the name of a variable.
I found smtng:
http://www.sm.luth.se/~alapaa/file_...awk/ch11_03.htm
| |
| Peter V. Saveliev 2006-10-05, 3:57 am |
|
Anton Treuenfels wrote:
<skip />
> I'm having a little trouble understanding exactly what this code sample is
> good for:
>
> /nextfile/ { if ( $1 ) nextfile $1 }
>
> I may be mis-understanding it, but it seems to say "when $0 contains the
> pattern /nextfile/ then if $1 exists then set $1 as the next file to read"
<skip />
It was a typo, sorry :))) Surely, I tried to handle this case -- if $2
is not null, open
$2. So, string like "nextfile bala.txt" will cause bala.txt to be
opened for input.
/nextfile/ { if ( $2 ) nextfile $2 }
> Second it's not entirely clear just when the next file becomes the current
> file. Immediately? At the end of the current file? And in either case, what
> happens to any other files specified on the command line?
Immediately, as if plain nextfile was used. The rest of ARGV will be
processed
after the end of the file opened in such way, as if it was used in
ARGV, but
ARGV will not be modified, so, there can be no ARGV overflow.
>
> Something like this:
>
> for ( i = ++ARGC; i > ARGI+1; i-- )
> ARGV[i] = ARGV[i-1]
> ARGV[ARGI+1] = nextfile
>
> or something like this:
>
> close( ARGV[ARGI] )
> ARGV[ARGI] = nextfile
> open( ARGV[ARGI] )
close(ARGV[ARGI])
open("bala.txt")
ARGV will not be modified.
>
> I like TAWK's BEGINFILE (after file is opened, but before first record) and
> ENDFILE (after last record, but before file is closed) because then I know
> that all the relevant file variables (FILENAME, FNR, etc) are valid.
It is important :| The patch interprets BEGINFILE _before_ FILENAME is
opened, so, there is no guarantee that it exists and actually will be
opened and that FILENAME is valid. But it is done so to let a developer
to
control the way FILENAME will be opened. Ok, we'll discuss this
topic...
<skip />
| |
| Anton Treuenfels 2006-10-06, 3:56 am |
|
"Peter V. Saveliev" <peet@peet.spb.ru> wrote in message
news:1160035013.108334.83520@k70g2000cwa.googlegroups.com...
>
> It is important :| The patch interprets BEGINFILE _before_ FILENAME is
> opened, so, there is no guarantee that it exists and actually will be
> opened and that FILENAME is valid. But it is done so to let a developer
> to
> control the way FILENAME will be opened. Ok, we'll discuss this
> topic...
Perhaps you might consider re-naming the pattern as BEFOREFILE (and
AFTERFILE?), reserving BEGINFILE and ENDFILE for TAWK-like behavior?
Although again I'm having some trouble understanding. If I don't know what
FILENAME is, how can I hope to control how it gets opened?
The only method I can immediately imagine is for some global flag to be set
or cleared that I can then check within a BEFOREFILE action block, and then
conditionally do something based on its state. Ie, "The flag is set, so
NEXTFILE was used, and I know what FILENAME will be..."
And who set or cleared that flag in the first place? Well presumably I did
in some other action block. So what if there's more than one NEXTFILE in my
program? How do I know which one triggered the execution of BEFOREFILE?
Maybe it's not a flag at all but a variable holding the argument of
NEXTFILE.
Or maybe NEXTFILE *is* the variable? Hmm, now that might be a bit closer to
the spirit of AWK:
/somepat/ { NEXTFILE = "somefile" }
/otherpat/ { NEXTFILE = "otherfile" }
BEFOREFILE {
if ( NEXTFILE == "somefile" )
do_open( "thisway" )
else if ( NEXTFILE == "otherfile" )
do_open( "thatway" )
else
do_open( "normalway" )
}
So if we assume NEXTFILE holds the name of the next file to be opened, and
automatically it is ARGV[ARGI+1] (which may be null), but it can be read and
written, what should this do:
{ FILENAME = NEXTFILE }
Or this:
{ NEXTFILE = FILENAME }
And another thought: presumably if I do set NEXTFILE it is no longer
ARGV[ARGI+1] but will become so again once the new current file is
completely read. But what if I interrupt the new NEXTFILE with another
NEXTFILE?
Maybe NEXTFILE has no default value, is null, and if referenced when its
value is null returns ARGV[ARGI+1] (which may itself be null). NEXTFILE
becomes non-null if I set it, and I can read and write it as often as I want
without distrubing ARGV. When at last I let one of the NEXTFILEs read to
completion, NEXTFILE is set to null and again returns ARGV[ARGI+1] if
referenced.
So what should this do:
{ NEXTFILE = "" }
Just thinking out loud here. Dunno how practical any of this is.
- Anton Treuenfels
| |
| Peter V. Saveliev 2006-10-06, 3:56 am |
|
<skip />
> Perhaps you might consider re-naming the pattern as BEFOREFILE (and
> AFTERFILE?), reserving BEGINFILE and ENDFILE for TAWK-like behavior?
maybe.
>
> Although again I'm having some trouble understanding. If I don't know what
> FILENAME is, how can I hope to control how it gets opened?
In our case, FILENAME _is_ defined (by nextfile -- implicitly or
explicitly), but is _not_ opened yet. Strictly speaking,
interpret(beginfile_block) is called from iop_open() just before it
opens a file. So, here we can set up, e.g., XMLMODE to control how xml
core will interpret FILENAME.
<skip />
> /somepat/ { NEXTFILE = "somefile" }
>
> /otherpat/ { NEXTFILE = "otherfile" }
>
> BEFOREFILE {
>
> if ( NEXTFILE == "somefile" )
> do_open( "thisway" )
> else if ( NEXTFILE == "otherfile" )
> do_open( "thatway" )
> else
> do_open( "normalway" )
> }
a little bit easier:
BEGINFILE {
if (FILENAME ~ /somepat/)
{
blah_blah...
}
else
{
....
}
}
<skip />
> Just thinking out loud here. Dunno how practical any of this is.
<skip />
Of NEXTFILE -- maybe. We tried and rejected this way some days earlier.
| |
| Jürgen Kahrs 2006-10-06, 6:56 pm |
| Peter V. Saveliev wrote:
>
> maybe.
I also like the BEFOREFILE and AFTERFILE keywords.
These words are very precise descriptions of what's intended.
Manuel already explained to you on the SourceForge mailing
list the counter-intuitive effects of the TAWK-like keywords:
> The most amazing find is that BEGINFILE is executed after the first
> getline (in user code) from a given file, and ENDFILE is executed
> after END (apparently once for each file opened with getline).
You see, Manuel is good at exploring hidden implications
of a new concept. The principle of least surprise requires
us to avoid semantic details contradictory to user
expectations.
|
|
|
|
|