For Programmers: Free Programming Magazines  


Home > Archive > AWK > April 2005 > "including" in a stream....









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author "including" in a stream....
Antonio Dell'elce

2005-03-17, 8:55 pm

Dear all,

I am processing with awk a "command file"
where I have commands similar to this:

#samplefile
command_a
command_b
command_c
include filename
command_d
command_e
etc...
#EOF

the "include" command should read each line from "filename" and "push" it
to be processed by the set of rules...
of course I am trying to use getline and/or changing FILENAME variable
value,
but my attempts until now have failed... I am sure this could be down
with a loop
but then I would have to move all rules to functions and this would be a
bit bad!!

any suggestions?


Antonio


PS Sorry if I was unclear or this was a FAQ .....


--
Antonio Dell'elce
http://www.dellelce.com/MyHome/
Ph: (IT) +39 347 6761377 (UK) +44 7816 216 963
"Timendi causa est nescire"
Ian Stirling

2005-03-17, 8:55 pm

Antonio Dell'elce <antonio@dellelce.com> wrote:
> Dear all,
>
> I am processing with awk a "command file"
> where I have commands similar to this:
>
> #samplefile
> command_a
> command_b
> command_c
> include filename
> command_d
> command_e
> etc...
> #EOF
>
> the "include" command should read each line from "filename" and "push" it
> to be processed by the set of rules...
> of course I am trying to use getline and/or changing FILENAME variable
> value,
> but my attempts until now have failed... I am sure this could be down
> with a loop
> but then I would have to move all rules to functions and this would be a
> bit bad!!


You can't do (to process the file input after the first 10 lines of the first
file)
NR==10{FILENAME="input";nextfile }
You can do
NR==10{ARGV[ARGIND+1]="input";ARGC++;nextfile}

This adds another filename after the current filename,

The problem is that if you need to include files, you lose the existing
one, so you're going to have to have a line like
line[FILENAME]<FNR{next}
Before the 'real' code, so that when it gets back to a file it's seen
before it'll skip the bit it's seen.

This will be ok, if the files are small.
If they are large, then rereading them may be a problem.


Ed Morton

2005-03-17, 8:55 pm



Antonio Dell'elce wrote:

> Dear all,
>
> I am processing with awk a "command file"
> where I have commands similar to this:
>
> #samplefile
> command_a
> command_b
> command_c
> include filename
> command_d
> command_e
> etc...
> #EOF
>
> the "include" command should read each line from "filename" and "push" it
> to be processed by the set of rules...
> of course I am trying to use getline and/or changing FILENAME variable
> value,
> but my attempts until now have failed... I am sure this could be down
> with a loop
> but then I would have to move all rules to functions and this would be a
> bit bad!!
>
> any suggestions?
>
>
> Antonio
>
>
> PS Sorry if I was unclear or this was a FAQ .....
>


I'll take a stab at what I think you might mean if I ignore that face
that the text in the input file is to be treated as "commands" since I
can't imagine what you mean by that.

If you have 2 files, a.txt and b.txt as follows:

a.txt:
1
2
include b.txt
5
6

b.txt:
3
4

you'd like to be able to write an awk script that just takes a.txt as an
argument, e.g.:

awk '...' a.txt

and outputs:

1
2
3
4
5
6

If that's correct, then I don't think you can do it in a way that
naturally lets you just piggyback on awks normal text processing loop.
You can do it by parsing both files in the BEGIN section, e.g.:

awk 'function read(file) {
while ( (getline < file) > 0) {
if ($1 == "include") {
read($2)
} else {
i0[++nr]=$0
}
}
}
BEGIN{
read(ARGV[1])
for (i=1;i<=nr;i++) {
print i0[i]
}
}' a.txt

An alternative would be to create a tmp file in the BEGIN section
instead of creating an array, then you can parse that in the main body,
e.g.:

awk 'function read(file) {
while ( (getline < file) > 0) {
if ($1 == "include") {
read($2)
} else {
print $0 > ARGV[2]
}
}
}
BEGIN{
read(ARGV[1])
ARGV[1]=""
close(ARGV[2])
}
{print $0}' a.txt tmp

I made the "{print $0}" explicit just so it was clear where the output's
coming from.

Regards,

Ed.
Antonio Dell'elce

2005-03-17, 8:55 pm

Ian Stirling wrote:
> Antonio Dell'elce <antonio@dellelce.com> wrote:
>
>
>
> You can't do (to process the file input after the first 10 lines of the first
> file)
> NR==10{FILENAME="input";nextfile }
> You can do
> NR==10{ARGV[ARGIND+1]="input";ARGC++;nextfile}
>
> This adds another filename after the current filename,
>
> The problem is that if you need to include files, you lose the existing
> one, so you're going to have to have a line like
> line[FILENAME]<FNR{next}
> Before the 'real' code, so that when it gets back to a file it's seen
> before it'll skip the bit it's seen.
>
> This will be ok, if the files are small.
> If they are large, then rereading them may be a problem.
>


Thanks, that seems very close to what I need, however I hoped for
something "POSIX-compliant" and so I would need to avoid nextfile
and ARGIND which are gawk extensions... however reading gawk info this
appears could be done...

Antonio



--
Antonio Dell'elce
http://www.dellelce.com/MyHome/
Ph: (IT) +39 347 6761377 (UK) +44 7816 216 963
"Timendi causa est nescire"
Ed Morton

2005-03-23, 3:55 am



Antonio Dell'elce wrote:
> Ed Morton wrote:
>
<snip>[color=darkred]
>
>
> Thanks Ed,
>
> Let me include an example input file which may clarify things:
>
> ###SAMPLE FILE
> parser "SampleParser"
> owner "Antonio Dell'elce"
> version "1"
> include "standard_tokens.pdl"


It just wasn't clear to me what you meant by a "command", but it looks
like it's irrelevant to the script so modify my 2 alternative scripts to
strip the double quotes from the filename (gsub("\"","",$2)) and they
should work even for nested includes.

Ed.
glen herrmannsfeldt

2005-04-22, 3:56 pm

Ed Morton wrote:
(snip)

> while ( (getline < file) > 0) {


Note that AWK knows what to do with:

while(getline < file > 0)

even though it looks funny.

-- glen

Ed Morton

2005-04-22, 8:55 pm



glen herrmannsfeldt wrote:
> Ed Morton wrote:
> (snip)
>
>
>
> Note that AWK knows what to do with:
>
> while(getline < file > 0)
>
> even though it looks funny.


There are situations where it doesn't. Don't ask me what they are
because I don't remember - just use the parens to be safe or google for
it (or, of course, ignore this post and do it however you like).

All I could find on it at a brief glance was this from the POSIX
standard
(http://www.opengroup.org/onlinepubs...lities/awk.html) which
may or may not be applicable:

-----
The getline operator can form ambiguous constructs when there are
unparenthesized binary operators (including concatenate) to the right of
the '<' (up to the end of the expression containing the getline). The
result of evaluating such a construct is unspecified, and conforming
applications shall parenthesize properly all such usages.
-----

but I think means that this:

getline < file > 0

could be interpretted on some awks as:

getline < (file > 0)

instead of:

(getline < file) > 0

Regards,

Ed.
Kenny McCormack

2005-04-22, 8:55 pm

In article <8PSdnVG-Udco-_TfRVn-ug@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
....
>
>There are situations where it doesn't. Don't ask me what they are
>because I don't remember - just use the parens to be safe or google for
>it (or, of course, ignore this post and do it however you like).


You are right to be concerned. I find that the use of parens when using
the redirected I/O commands to be an "always good idea". And, I am
speaking as one who usually disdains superfluity (unlike some who argue
the other side - e.g., always use "kill -9" because it always works... (*))

(*) Unixy thing - not directly related to AWK - for which I do apologize.

Compare the results of:

print 5 > 7

print (5 > 7)

print 5 > 7+2

glen herrmannsfeldt

2005-04-24, 8:55 pm

Ed Morton wrote:
(snip)

> while ( (getline < file) > 0) {


Note that AWK knows what to do with:

while(getline < file > 0)

even though it looks funny.

-- glen

Kenny McCormack

2005-04-24, 8:55 pm

In article <8PSdnVG-Udco-_TfRVn-ug@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
....
>
>There are situations where it doesn't. Don't ask me what they are
>because I don't remember - just use the parens to be safe or google for
>it (or, of course, ignore this post and do it however you like).


You are right to be concerned. I find that the use of parens when using
the redirected I/O commands to be an "always good idea". And, I am
speaking as one who usually disdains superfluity (unlike some who argue
the other side - e.g., always use "kill -9" because it always works... (*))

(*) Unixy thing - not directly related to AWK - for which I do apologize.

Compare the results of:

print 5 > 7

print (5 > 7)

print 5 > 7+2

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com