Home > Archive > AWK > June 2004 > Awk multi record at a time......is it possible..!!!??
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Awk multi record at a time......is it possible..!!!??
|
|
| Hesham Elhadad 2004-05-17, 10:30 am |
| Hi there,
Is there a way to read more than one record at a time with AWK.
I need to process the previous record upon some conditions on the next
one.
What I have made is:
I read the whole file first and use arrays to process the whole
records as they have been already read.
I know that sed can handle multi line at a time but I do not want to
embed it within my awk script.
Any idea rather than mine.
Regards,
Hesham Elhadad
| |
| Ed Morton 2004-05-17, 10:30 am |
|
Hesham Elhadad wrote:
> Hi there,
> Is there a way to read more than one record at a time with AWK.
No. You can redefine what separates records by explicitly defining RS
but I suspect you can't do that in your case.
> I need to process the previous record upon some conditions on the next
> one.
> What I have made is:
> I read the whole file first and use arrays to process the whole
> records as they have been already read.
You don't need to do that and you can run out of space trying to. Just
do something like this:
awk '{ this = $0 }
/condition/ {
$0 = prev
do stuff with $0 which is now the previous line
}
{ prev = this }'
Regards,
Ed.
> I know that sed can handle multi line at a time but I do not want to
> embed it within my awk script.
>
> Any idea rather than mine.
>
> Regards,
>
> Hesham Elhadad
| |
| Tapani Tarvainen 2004-05-17, 10:30 am |
| helhadad@soficom.com.eg (Hesham Elhadad) writes:
> Is there a way to read more than one record at a time with AWK.
Of course. Several ways in fact.
> I need to process the previous record upon some conditions on the next
> one.
NR>1 && condition { process(previous_line); }
{ previous_line=$0; }
where condition and process are whatever you want.
E.g., to print a line only if the following line contains "string":
NR>1 && /string/ { print previous_line; }
{ previous_line=$0; }
Watch out for the last line, you may need an END block for it.
Another approach is using getline, but from your description
you don't need it here.
--
Tapani Tarvainen
| |
| Hesham Elhadad 2004-05-18, 6:30 am |
| Dear All,
I have tried out what you suggested,unfortunately it did not work as I
want.
To get a complete idea about what I want to do, it is as the
following:
I have a file in a format like this:
NAME = 3A5_CHEM_INJ:30PXS01
NAME = 3A5_CHEM_INJ:30PXV01
NAME = 3A5_CHEM_INJ:30PXA01
TYPE = CIN
PERIOD = 3
PHASE = 1
IOMOPT = 0
END
NAME = 3A5_CHEM_INJ:30PXA02
TYPE = AIN
PERIOD = 3
PHASE = 1
IOMOPT = 0
END
NAME = 3A5_CHEM_INJ:30PXAB1
NAME = 4A5_CHEM_INJ:30PXAC1
NAME = 6A5_CHEM_INJ:30PXAD1
NAME = 2A5_CHEM_INJ:30PXAE1
TYPE = CIN
PERIOD = 2
PHASE = 5
IOMOPT = 1
END
Now what I need to do with this file is to discard all the repetitive
NAME line in red but keep the one which is not followed by NAME. to
get the file like that:
NAME = 3A5_CHEM_INJ:30PXA01
TYPE = CIN
PERIOD = 3
PHASE = 1
IOMOPT = 0
END
NAME = 3A5_CHEM_INJ:30PXA02
TYPE = AIN
PERIOD = 3
PHASE = 1
IOMOPT = 0
END
NAME = 2A5_CHEM_INJ:30PXAE1
TYPE = CIN
PERIOD = 2
PHASE = 5
IOMOPT = 1
END
Actually, I have done this by using arrays and reading the whole file
first, however I may have a very long files like that (some of them is
about 20,000 lines) which it takes a long time for processing
regardless of the lack of memory I may face.
Now the complete process is clear, do you have an idea completeing
this without arrays.
Regards,
Hesham Elhadad
| |
| Hesham Elhadad 2004-05-18, 9:30 am |
| Hi again,
I have done the following but still have soem unclear problems.
my script is now like this :
NR > 1 {this = $0}
/^NAME/ {
$0 = prev
if ($1 ~ /^NAME/)
;
else
print prev
}
!/^NAME/ {print prev}
{prev = this }
The problems I got are as follows:
1- I got 4 balnk line in the head of the file (in case I redirect the output)
2- I got a duplicate of the END line as follows:
If the input file is like this:
NAME = 3A5_CHEM_INJ:30PXA01
NAME = 3A5_CHEM_INJ:30PXA05
TYPE = CIN
PERIOD = 3
PHASE = 1
IOMOPT = 0
END
NAME = 3A5_CHEM_INJ:30PXA02
TYPE = AIN
PERIOD = 3
PHASE = 1
IOMOPT = 0
END
NAME = 3A5_CHEM_INJ:30PXAB1
NAME = 4A5_CHEM_INJ:30PXAC1
NAME = 6A5_CHEM_INJ:30PXAD1
NAME = 2A5_CHEM_INJ:30PXAE1
TYPE = CIN
PERIOD = 2
PHASE = 5
IOMOPT = 1
END
The output comes like this:
Blank
Blank
Blank
Blank
NAME = 3A5_CHEM_INJ:30PXA05
TYPE = CIN
PERIOD = 3
PHASE = 1
IOMOPT = 0
END
END
NAME = 3A5_CHEM_INJ:30PXA02
TYPE = AIN
PERIOD = 3
PHASE = 1
IOMOPT = 0
END
END
NAME = 2A5_CHEM_INJ:30PXAE1
TYPE = CIN
PERIOD = 2
PHASE = 5
IOMOPT = 1
And of course there is no END at the END of file (it is no problem for it).
Could you advise me where is the fault in my script?
Thanks a lot
Hesham Elhadad
| |
| Ed Morton 2004-05-18, 10:30 am |
|
Hesham Elhadad wrote:
> Dear All,
> I have tried out what you suggested,unfortunately it did not work as I
> want.
> To get a complete idea about what I want to do, it is as the
> following:
> I have a file in a format like this:
> NAME = 3A5_CHEM_INJ:30PXS01
> NAME = 3A5_CHEM_INJ:30PXV01
> NAME = 3A5_CHEM_INJ:30PXA01
> TYPE = CIN
> PERIOD = 3
> PHASE = 1
> IOMOPT = 0
> END
> NAME = 3A5_CHEM_INJ:30PXA02
> TYPE = AIN
> PERIOD = 3
> PHASE = 1
> IOMOPT = 0
> END
> NAME = 3A5_CHEM_INJ:30PXAB1
> NAME = 4A5_CHEM_INJ:30PXAC1
> NAME = 6A5_CHEM_INJ:30PXAD1
> NAME = 2A5_CHEM_INJ:30PXAE1
> TYPE = CIN
> PERIOD = 2
> PHASE = 5
> IOMOPT = 1
> END
>
> Now what I need to do with this file is to discard all the repetitive
> NAME line in red but keep the one which is not followed by NAME. to
> get the file like that:
>
> NAME = 3A5_CHEM_INJ:30PXA01
> TYPE = CIN
> PERIOD = 3
> PHASE = 1
> IOMOPT = 0
> END
> NAME = 3A5_CHEM_INJ:30PXA02
> TYPE = AIN
> PERIOD = 3
> PHASE = 1
> IOMOPT = 0
> END
> NAME = 2A5_CHEM_INJ:30PXAE1
> TYPE = CIN
> PERIOD = 2
> PHASE = 5
> IOMOPT = 1
> END
>
> Actually, I have done this by using arrays and reading the whole file
> first, however I may have a very long files like that (some of them is
> about 20,000 lines) which it takes a long time for processing
> regardless of the lack of memory I may face.
>
> Now the complete process is clear, do you have an idea completeing
> this without arrays.
Sure.
awk '$1 == "NAME" { name = $0 "\n" ; next }
{ printf "%s",name; name=""; print }'
Regards,
Ed.
>
>
> Regards,
>
> Hesham Elhadad
| |
| Ed Morton 2004-05-18, 11:30 am |
|
Hesham Elhadad wrote:
> Hi again,
> I have done the following but still have soem unclear problems.
>
> my script is now like this :
This will always skip saving the first line:
> NR > 1 {this = $0}
> /^NAME/ {
> $0 = prev
You could've written this as:
if ($1 !~ /^NAME/)
print prev
or even:
if ($1 != "NAME")
print prev
and avoided the null statement and the else:
> if ($1 ~ /^NAME/)
> ;
> else
> print prev
> }
Above you tested for $1 not being name and in that situation printed
"prev". Below you again tests for $1 not being NAME and again print
prev. That's why you get duplication. Also not that at line 2, prev is
null so the combination of these 2 prints is causing your leading 4
blank lines:
> !/^NAME/ {print prev}
> {prev = this }
>
> The problems I got are as follows:
> 1- I got 4 balnk line in the head of the file (in case I redirect the output)
> 2- I got a duplicate of the END line as follows:
>
> If the input file is like this:
> NAME = 3A5_CHEM_INJ:30PXA01
> NAME = 3A5_CHEM_INJ:30PXA05
> TYPE = CIN
> PERIOD = 3
> PHASE = 1
> IOMOPT = 0
> END
<snip>
> NAME = 2A5_CHEM_INJ:30PXAE1
> TYPE = CIN
> PERIOD = 2
> PHASE = 5
> IOMOPT = 1
> END
>
> The output comes like this:
>
> Blank
> Blank
> Blank
> Blank
> NAME = 3A5_CHEM_INJ:30PXA05
> TYPE = CIN
> PERIOD = 3
> PHASE = 1
> IOMOPT = 0
> END
> END
<snip>
> NAME = 2A5_CHEM_INJ:30PXAE1
> TYPE = CIN
> PERIOD = 2
> PHASE = 5
> IOMOPT = 1
> And of course there is no END at the END of file (it is no problem for it).
Right, you're always printing preceeding lines - if you want to handle
the last line using this style of script, you'll need an END section in
your script.
> Could you advise me where is the fault in my script?
Take another look at the pattern I originally suggested following and
you'll see that what you posted here doesn't follow that pattern. Given
the additional information you provided in your earlier post today
though, you don't need to get this complicated anyway.
Ed.
> Thanks a lot
>
> Hesham Elhadad
| |
| Charles Demas 2004-05-18, 2:30 pm |
| In article <DYydnYO1w606jTfdRVn-jg@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>Hesham Elhadad wrote:
>
>Sure.
>awk '$1 == "NAME" { name = $0 "\n" ; next }
> { printf "%s",name; name=""; print }'
>
awk '$1 == "NAME" { name = $0 ; next }
{ print name; name=""; print }'
Is more straightforward, IMO.
Chuck Demas
--
Eat Healthy | _ _ | Nothing would be done at all,
Stay Fit | @ @ | If a man waited to do it so well,
Die Anyway | v | That no one could find fault with it.
demas@theworld.com | \___/ | http://world.std.com/~cpd
| |
| Ed Morton 2004-05-18, 3:30 pm |
|
Charles Demas wrote:
> In article <DYydnYO1w606jTfdRVn-jg@comcast.com>,
> Ed Morton <morton@lsupcaemnt.com> wrote:
<snip>
>
>
> awk '$1 == "NAME" { name = $0 ; next }
> { print name; name=""; print }'
>
> Is more straightforward, IMO.
Yes, but it doesn't quite work - it'll print an unwanted blank line
after every line following the one after NAME.
Ed.
>
> Chuck Demas
>
| |
| Donald 'Paddy' McCarthy 2004-05-18, 4:30 pm |
| Hesham Elhadad wrote:
> Dear All,
> I have tried out what you suggested,unfortunately it did not work as I
> want.
> To get a complete idea about what I want to do, it is as the
> following:
> I have a file in a format like this:
> NAME = 3A5_CHEM_INJ:30PXS01
> NAME = 3A5_CHEM_INJ:30PXV01
> NAME = 3A5_CHEM_INJ:30PXA01
> TYPE = CIN
> PERIOD = 3
> PHASE = 1
> IOMOPT = 0
> END
> NAME = 3A5_CHEM_INJ:30PXA02
> TYPE = AIN
> PERIOD = 3
> PHASE = 1
> IOMOPT = 0
> END
> NAME = 3A5_CHEM_INJ:30PXAB1
> NAME = 4A5_CHEM_INJ:30PXAC1
> NAME = 6A5_CHEM_INJ:30PXAD1
> NAME = 2A5_CHEM_INJ:30PXAE1
> TYPE = CIN
> PERIOD = 2
> PHASE = 5
> IOMOPT = 1
> END
>
> Now what I need to do with this file is to discard all the repetitive
> NAME line in red but keep the one which is not followed by NAME. to
> get the file like that:
>
> NAME = 3A5_CHEM_INJ:30PXA01
> TYPE = CIN
> PERIOD = 3
> PHASE = 1
> IOMOPT = 0
> END
> NAME = 3A5_CHEM_INJ:30PXA02
> TYPE = AIN
> PERIOD = 3
> PHASE = 1
> IOMOPT = 0
> END
> NAME = 2A5_CHEM_INJ:30PXAE1
> TYPE = CIN
> PERIOD = 2
> PHASE = 5
> IOMOPT = 1
> END
>
The following works:
gawk '/^NAME /{n=$0;next}/^ TYPE /{print n} {print}'
Cheers, Paddy.
| |
| Hesham Elhadad 2004-05-19, 5:30 am |
| Dear All,
Thank you so much, it is really working very well and very fast too.
It is a little bet complicated but it is better than using arrays.
Regards,
Hesham Elhadad
| |
| Martin Neitzel 2004-05-20, 2:30 pm |
| Donald 'Paddy' McCarthy <paddy3118@netscape.net> wrote:
>The following works:
> gawk '/^NAME /{n=$0;next}/^ TYPE /{print n} {print}'
Once again, redefining RS to deal with multiline records is
highly under-appreciated.
awk -v RS=NAME /TYPE/
does the essential job. Rather than just repairing the initial NAME
string, one would typically do further post-processing on the selected
items.
Martin Neitzel
| |
| Donald 'Paddy' McCarthy 2004-05-20, 4:30 pm |
|
Martin Neitzel wrote:
> Donald 'Paddy' McCarthy <paddy3118@netscape.net> wrote:
>
>
>
> Once again, redefining RS to deal with multiline records is
> highly under-appreciated.
>
> awk -v RS=NAME /TYPE/
>
> does the essential job. Rather than just repairing the initial NAME
> string, one would typically do further post-processing on the selected
> items.
>
> Martin Neitzel
Hi,
Your solution, is incomplete so I fleshed it out and found I needed the
following to work:
awk -v RS=NAME '/TYPE/{printf"%s",RS $0}'
Thanks for the reminder about changing RS ( It's one of the later
things that I think of).
I was going to add something about haw there is very little between the
solutions but, then again, I like your solution of defining the record
and only printing the records that have the guts you want.
The small negative is that you need the record separator in the output
so it must be added in again.
I guess I am used to having record separators vanish.
Thanks for your reminder on RS,
Paddy.
| |
| Martin Neitzel 2004-05-21, 1:30 pm |
| Donald 'Paddy' McCarthy <paddy3118@netscape.net> wrote:
>
>The small negative is that you need the record separator in the output
>so it must be added in again.
I don't need it. You don't need it. The issue here is whether Hesham
Elhadad needs it. If so, he can patch RS back just like you wrote, true.
If not, awk's default action to print allows a solution which is much
more elegant than the Pascaleeze solution we saw elsewhere in this thread.
Also worthy of a reminder whenever the subject asks "multiline?" is
the setting of RS to an empty string (as in "awk -v RS= ....")
to turn paragraphs into records.
Martin
| |
| Wayne Brissette 2004-06-09, 8:55 am |
| On Thu, 20 May 2004 12:01:58 -0500, Martin Neitzel wrote
(in article <Hy0vBA.BpF@gaertner.de> ):
> Donald 'Paddy' McCarthy <paddy3118@netscape.net> wrote:
>
> Once again, redefining RS to deal with multiline records is
> highly under-appreciated.
>
> awk -v RS=NAME /TYPE/
>
> does the essential job. Rather than just repairing the initial NAME
> string, one would typically do further post-processing on the selected
> items.
>
> Martin Neitzel
Can RS be changed within a script? In other words, if I'm looking for a chunk
of text that starts with <blah> and I parse some data out, can I then change
the record separator to find <myblah> which may also be embedded within a
record? (record within a record)...
I'm working on a script that contains threaded messages and that's how they
are done. I've been trying the record separator option, but it looks to only
handle one at a time within a script.
Wayne
| |
| Donald 'Paddy' McCarthy 2004-06-11, 3:55 am |
| Wayne Brissette wrote:
> On Thu, 20 May 2004 12:01:58 -0500, Martin Neitzel wrote
> (in article <Hy0vBA.BpF@gaertner.de> ):
>
>
>
>
> Can RS be changed within a script? In other words, if I'm looking for a chunk
> of text that starts with <blah> and I parse some data out, can I then change
> the record separator to find <myblah> which may also be embedded within a
> record? (record within a record)...
>
> I'm working on a script that contains threaded messages and that's how they
> are done. I've been trying the record separator option, but it looks to only
> handle one at a time within a script.
>
> Wayne
>
Can you give a (small) sample input and what you expect the
corresponding outpyt should be ?
- Pad.
|
|
|
|
|