For Programmers: Free Programming Magazines  


Home > Archive > AWK > March 2006 > regex troubles with awk









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author regex troubles with awk
Eric Belhomme

2006-03-02, 9:55 pm

Hi,

I'm trying to use mawk on a Linux Debian Sarge. The goal of my awk script
is to extract certain adresses from my /etc/hosts file. To perform this, I
labelled lines I want to extract on the hosts file like this :

#awk#
123.123.123.123 host.somewhere.net host

So I tried a regex like this : '/^#awk#.*\d+3'
The idea is to detect a line that begins by the string "#awk#" and the net
line begins by a bumber composed by 1 to 3 digits.

But my regex doesn't match anything !
I'm not a regex guru, and a almost totally newbie on using awk, so I wonder
I missed something, but can't understand what :-/

Thanks for your help ;)

--
Rico
ddaglas@gmail.com

2006-03-02, 9:55 pm

Eric,

You'll need to combine the regexp with "getline":

/^#awk$/ {
getline
if (/^\d\d?\d?/) {
#ACTION#
}
}

Awk regexps do not expose record separators for pattern matching. In
this case, the default case, the record separator (RS) is some form of
the newline character (\r\n or \n).

getline consumes the next line of input, assigns it to $0, $1, $2,
etc., and returns 1 if successful, 0 otherwise.

Dan

Eric Belhomme wrote:
> Hi,
>
> I'm trying to use mawk on a Linux Debian Sarge. The goal of my awk script
> is to extract certain adresses from my /etc/hosts file. To perform this, I
> labelled lines I want to extract on the hosts file like this :
>
> #awk#
> 123.123.123.123 host.somewhere.net host
>
> So I tried a regex like this : '/^#awk#.*\d+3'
> The idea is to detect a line that begins by the string "#awk#" and the net
> line begins by a bumber composed by 1 to 3 digits.
>
> But my regex doesn't match anything !
> I'm not a regex guru, and a almost totally newbie on using awk, so I wonder
> I missed something, but can't understand what :-/
>
> Thanks for your help ;)
>
> --
> Rico


Harlan Grove

2006-03-02, 9:55 pm

ddaglas@gmail.com wrote...
....
>You'll need to combine the regexp with "getline":
>
>/^#awk$/ {
> getline
> if (/^\d\d?\d?/) {
> #ACTION#
> }
>}

....

*could*, not *need*. Also, no version of awk I know of supports the
perl-like character class \d. Gotta use [0-9] or [[:digit:]]. Also, may
need to check that the 4th char of the second line isn't also a decimal
numeral.

One alternative would be

/^#awk#[ \t]*$/ { s = 1; next }
s && /^[0-9][0-9]?[0-9]?[^0-9]/ { . . . do your stuff here . . . }

Ed Morton

2006-03-03, 3:55 am



ddaglas@gmail.com wrote:
> Eric,

\

Please don't top-post.

> You'll need to combine the regexp with "getline":


No you won't. getline is usually the wrong solution. Search the archives
for why...

> /^#awk$/ {
> getline
> if (/^\d\d?\d?/) {
> #ACTION#
> }
> }


All you need to do is set a flag when you hit the #awk record then check
for that with your other RE on the subsequent record, e.g. (retaining as
much of the above code as possible for comparison):

f && /^\d\d?\d?/ { #ACTION# }
{ f = ($0 ~ /^#awk$/) }

Regards,

Ed.

> Awk regexps do not expose record separators for pattern matching. In
> this case, the default case, the record separator (RS) is some form of
> the newline character (\r\n or \n).
>
> getline consumes the next line of input, assigns it to $0, $1, $2,
> etc., and returns 1 if successful, 0 otherwise.
>
> Dan
>
> Eric Belhomme wrote:
>
this, I[color=darkred]
the net[color=darkred]
wonder[color=darkred]
>
>

Ed Morton

2006-03-03, 3:55 am

Harlan Grove wrote:

> ddaglas@gmail.com wrote...
> ...
>
>
> ...
>
> *could*, not *need*. Also, no version of awk I know of supports the
> perl-like character class \d. Gotta use [0-9] or [[:digit:]]. Also, may
> need to check that the 4th char of the second line isn't also a decimal
> numeral.
>
> One alternative would be
>
> /^#awk#[ \t]*$/ { s = 1; next }
> s && /^[0-9][0-9]?[0-9]?[^0-9]/ { . . . do your stuff here . . . }
>


Just remember to reset s afterwards...

Ed.
martin cohen

2006-03-03, 6:56 pm

Ed Morton wrote:

>
>
> ddaglas@gmail.com wrote:
> \
>
> Please don't top-post.
>
>
> No you won't. getline is usually the wrong solution. Search the archives
> for why...
>
>
> All you need to do is set a flag when you hit the #awk record then check
> for that with your other RE on the subsequent record, e.g. (retaining as
> much of the above code as possible for comparison):
>
> f && /^\d\d?\d?/ { #ACTION# }
> { f = ($0 ~ /^#awk$/) }
>
> Regards,
>
> Ed.
>

For this case, I prefer the getline solution. It seems simpler (and less
error-prone) to just read the line and process it. The only problem
would be the case where the next line is not what is expected.

Martin Cohen
Ed Morton

2006-03-03, 6:56 pm

martin cohen wrote:

> Ed Morton wrote:
>
> For this case, I prefer the getline solution. It seems simpler (and less
> error-prone) to just read the line and process it. The only problem
> would be the case where the next line is not what is expected.


That wouldn't be the only problem, but I've been posting way too much on
this topic in the past w or so I'm out of enthusiasm. It's in the
archives...

Ed.
Grant

2006-03-03, 6:56 pm

On Fri, 03 Mar 2006 14:07:14 -0600, Ed Morton <morton@lsupcaemnt.com> wrote:

[getline]
> I've been posting way too much on
>this topic in the past w or so I'm out of enthusiasm.


Thank you for your time spent on getline last w, I've learnt much
about awk -- by asking, doing. There's that part of the learning
curve on new language where some answers don't make sense for a time.

Persistence working through suggestions with one's own data and the
penny drops.

Grant.
--
Cats are smarter than dogs. You can't make eight cats pull
a sled through the snow.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com