Code Comments
Programming Forum and web based access to our favorite programming groups.running gawk;
I have an ascii file with the following format:
start: record 1
head1: fjoijefowijfwoijf
head2: fiwjowiefojwf
head3: fwofjwfoiwfoj
headx: woifjowjwioef
end
name: blb abl bla blb fjie j
address: fwoijwe fwlkjwefj
phone: wfjowejf wfjw ofi
cell: ifejw foiw jfowi jeoi
value: fi woiw fowiej owefj
start: record 2 etc...
here's my gawk code:
BEGIN { RS="(start.*end)*" }
{
print "---\n"$0"\n===";
}
for those not familiar w/ gawk, you can use a full regex for the
record separator.
I have a long RS because 1) I don't care about that data, & 2) there
is a variable amount of data there.
the problem I'm having is this:
the regex, as I'm using it matches the "start" at the begining of the
FILE, and the end at the END of the FILE.
I therefore only get 2 records printed.
I want to see all the records - I need my regex to match EVERY
occurance of the start...end "string".
any ideas?
tia - Bob
Post Follow-up to this message["Followup-To:" header set to comp.lang.awk.]
On Tue, 02 Mar 2004 16:34:45 -0600, Bob
<nospam_nsh@starnetwx.net> wrote:
>
> here's my gawk code:
> BEGIN { RS="(start.*end)*" }
> {
> print "---\n"$0"\n===";
> }
>
>
> for those not familiar w/ gawk, you can use a full regex for the
> record separator.
>
> I have a long RS because 1) I don't care about that data, & 2) there
> is a variable amount of data there.
>
> the problem I'm having is this:
> the regex, as I'm using it matches the "start" at the begining of the
> FILE, and the end at the END of the FILE.
>
> I therefore only get 2 records printed.
>
> I want to see all the records - I need my regex to match EVERY
> occurance of the start...end "string".
>
RS="start[^e]*end"
--
Incrsease your earoning poswer and gaerner profwessional resspect.
Get the Un1iversity Dewgree you have already earned.
[from the prestigious, non-accredited University of Spam!]
Post Follow-up to this messageOn Wed, 3 Mar 2004 02:53:30 -0500, Bill Marcum <bmarcum@iglou.com.urgent> wrote: >["Followup-To:" header set to comp.lang.awk.] >On Tue, 02 Mar 2004 16:34:45 -0600, Bob > <nospam_nsh@starnetwx.net> wrote: >RS="start[^e]*end" Bill - Tera-thanks! that did the trick. Another question though; as I was playing around with other permutations of your RE, trying to gain understanding as to why your RE worked, and mine didn't; I discovered another strange thing. I THOUGHT that: "start.*end" == "start[.]*end" I found, in fact each of these RS regex's produced vastly different results. I suppose that to understand why my original RE didn't work, and yours did, I should re-read the order of precidence for gawk; but in my last example, I can't imagine why the 2 RE's shouldn't be the same. can you shed any lite? tx again ia!!! Bob
Post Follow-up to this messageOn Wed, 03 Mar 2004 06:26:33 -0600, Bob <nospam_nsh@starnetwx.net> wrote: > >Bill - Tera-thanks! > >that did the trick. Another question though; as I was playing around >with other permutations of your RE, trying to gain understanding as to >why your RE worked, and mine didn't; I discovered another strange >thing. > >I THOUGHT that: >"start.*end" == "start[.]*end" OH MY GOD - what the hell was I thinking!!! sorry to bother - I just released my brain fart..... ;-)
Post Follow-up to this messageOn Wed, 03 Mar 2004 06:26:33 -0600, Bob
<nospam_nsh@starnetwx.net> wrote:
>
> Bill - Tera-thanks!
>
> that did the trick. Another question though; as I was playing around
> with other permutations of your RE, trying to gain understanding as to
> why your RE worked, and mine didn't; I discovered another strange
> thing.
>
> I THOUGHT that:
> "start.*end" == "start[.]*end"
>
> I found, in fact each of these RS regex's produced vastly different
> results. I suppose that to understand why my original RE didn't work,
> and yours did, I should re-read the order of precidence for gawk; but
> in my last example, I can't imagine why the 2 RE's shouldn't be the
> same.
>
> can you shed any lite?
>
Regular expressions like "a.*b" are greedy; as the expression is
evaluated from left to right, each "*" matches the longest possible
string.
Actually, my "start[^e]*end" might not work if the letter "e" appears
between "start" and "end". A better solution might be
BEGIN{RS="end"}
{sub(/start.*/,"")}
--
Incrsease your earoning poswer and gaerner profwessional resspect.
Get the Un1iversity Dewgree you have already earned.
[from the prestigious, non-accredited University of Spam!]
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.