Home > Archive > AWK > March 2006 > regex troubles with awk
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
regex troubles with awk
|
|
| Eric Belhomme 2006-03-02, 9:55 pm |
| Hi,
I'm trying to use mawk on a Linux Debian Sarge. The goal of my awk script
is to extract certain adresses from my /etc/hosts file. To perform this, I
labelled lines I want to extract on the hosts file like this :
#awk#
123.123.123.123 host.somewhere.net host
So I tried a regex like this : '/^#awk#.*\d+3'
The idea is to detect a line that begins by the string "#awk#" and the net
line begins by a bumber composed by 1 to 3 digits.
But my regex doesn't match anything !
I'm not a regex guru, and a almost totally newbie on using awk, so I wonder
I missed something, but can't understand what :-/
Thanks for your help ;)
--
Rico
| |
| ddaglas@gmail.com 2006-03-02, 9:55 pm |
| Eric,
You'll need to combine the regexp with "getline":
/^#awk$/ {
getline
if (/^\d\d?\d?/) {
#ACTION#
}
}
Awk regexps do not expose record separators for pattern matching. In
this case, the default case, the record separator (RS) is some form of
the newline character (\r\n or \n).
getline consumes the next line of input, assigns it to $0, $1, $2,
etc., and returns 1 if successful, 0 otherwise.
Dan
Eric Belhomme wrote:
> Hi,
>
> I'm trying to use mawk on a Linux Debian Sarge. The goal of my awk script
> is to extract certain adresses from my /etc/hosts file. To perform this, I
> labelled lines I want to extract on the hosts file like this :
>
> #awk#
> 123.123.123.123 host.somewhere.net host
>
> So I tried a regex like this : '/^#awk#.*\d+3'
> The idea is to detect a line that begins by the string "#awk#" and the net
> line begins by a bumber composed by 1 to 3 digits.
>
> But my regex doesn't match anything !
> I'm not a regex guru, and a almost totally newbie on using awk, so I wonder
> I missed something, but can't understand what :-/
>
> Thanks for your help ;)
>
> --
> Rico
| |
| Harlan Grove 2006-03-02, 9:55 pm |
| ddaglas@gmail.com wrote...
....
>You'll need to combine the regexp with "getline":
>
>/^#awk$/ {
> getline
> if (/^\d\d?\d?/) {
> #ACTION#
> }
>}
....
*could*, not *need*. Also, no version of awk I know of supports the
perl-like character class \d. Gotta use [0-9] or [[:digit:]]. Also, may
need to check that the 4th char of the second line isn't also a decimal
numeral.
One alternative would be
/^#awk#[ \t]*$/ { s = 1; next }
s && /^[0-9][0-9]?[0-9]?[^0-9]/ { . . . do your stuff here . . . }
| |
| Ed Morton 2006-03-03, 3:55 am |
|
ddaglas@gmail.com wrote:
> Eric,
\
Please don't top-post.
> You'll need to combine the regexp with "getline":
No you won't. getline is usually the wrong solution. Search the archives
for why...
> /^#awk$/ {
> getline
> if (/^\d\d?\d?/) {
> #ACTION#
> }
> }
All you need to do is set a flag when you hit the #awk record then check
for that with your other RE on the subsequent record, e.g. (retaining as
much of the above code as possible for comparison):
f && /^\d\d?\d?/ { #ACTION# }
{ f = ($0 ~ /^#awk$/) }
Regards,
Ed.
> Awk regexps do not expose record separators for pattern matching. In
> this case, the default case, the record separator (RS) is some form of
> the newline character (\r\n or \n).
>
> getline consumes the next line of input, assigns it to $0, $1, $2,
> etc., and returns 1 if successful, 0 otherwise.
>
> Dan
>
> Eric Belhomme wrote:
>
this, I[color=darkred]
the net[color=darkred]
wonder[color=darkred]
>
>
| |
| Ed Morton 2006-03-03, 3:55 am |
| Harlan Grove wrote:
> ddaglas@gmail.com wrote...
> ...
>
>
> ...
>
> *could*, not *need*. Also, no version of awk I know of supports the
> perl-like character class \d. Gotta use [0-9] or [[:digit:]]. Also, may
> need to check that the 4th char of the second line isn't also a decimal
> numeral.
>
> One alternative would be
>
> /^#awk#[ \t]*$/ { s = 1; next }
> s && /^[0-9][0-9]?[0-9]?[^0-9]/ { . . . do your stuff here . . . }
>
Just remember to reset s afterwards...
Ed.
| |
| martin cohen 2006-03-03, 6:56 pm |
| Ed Morton wrote:
>
>
> ddaglas@gmail.com wrote:
> \
>
> Please don't top-post.
>
>
> No you won't. getline is usually the wrong solution. Search the archives
> for why...
>
>
> All you need to do is set a flag when you hit the #awk record then check
> for that with your other RE on the subsequent record, e.g. (retaining as
> much of the above code as possible for comparison):
>
> f && /^\d\d?\d?/ { #ACTION# }
> { f = ($0 ~ /^#awk$/) }
>
> Regards,
>
> Ed.
>
For this case, I prefer the getline solution. It seems simpler (and less
error-prone) to just read the line and process it. The only problem
would be the case where the next line is not what is expected.
Martin Cohen
| |
| Ed Morton 2006-03-03, 6:56 pm |
| martin cohen wrote:
> Ed Morton wrote:
>
> For this case, I prefer the getline solution. It seems simpler (and less
> error-prone) to just read the line and process it. The only problem
> would be the case where the next line is not what is expected.
That wouldn't be the only problem, but I've been posting way too much on
this topic in the past w or so I'm out of enthusiasm. It's in the
archives...
Ed.
| |
|
| On Fri, 03 Mar 2006 14:07:14 -0600, Ed Morton <morton@lsupcaemnt.com> wrote:
[getline]
> I've been posting way too much on
>this topic in the past w or so I'm out of enthusiasm.
Thank you for your time spent on getline last w , I've learnt much
about awk -- by asking, doing. There's that part of the learning
curve on new language where some answers don't make sense for a time.
Persistence working through suggestions with one's own data and the
penny drops.
Grant.
--
Cats are smarter than dogs. You can't make eight cats pull
a sled through the snow.
|
|
|
|
|