For Programmers: Free Programming Magazines  


Home > Archive > AWK > August 2007 > Repeated regex doesn't work?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Repeated regex doesn't work?
IMnew

2007-08-18, 9:58 pm

To extract mac addreses from lines, (asuming they are
xx:xx:xx:xx:xx:xx ) I figured a regex like:
/([0-9A-Fa-f]{2}[:]){5}[0-9A-Fa-f]/
would find them and match() could tell it's position and length, so a
line like:

echo "$a" | awk --posix '{z = match( $0 , /([0-9A-Fa-f]{2}[:]){5,5}
[0-9A-Fa-f]/ ); print z, RSTART, RLENGTH }'

should print position and length of mac address on lines of $a.

Well....., it doesn't.

On finding what was wrong, a line up to (note the difference on the
regex):

echo "$a" | awk --posix '{z = match( $0 , /([0-9A-Fa-f]{2}[:])
{1,5}/ ); print z, RSTART, RLENGTH }'

prints: 19 19 15 on $a = '92.103.26.1 ether 00:90:1A:33:12:41 C eth1',
so it is working.
As soon as {1,5} is changed to {2,5} {3,5} ... etc it fails. It should
be {5,5}.

Is there a misinterpretation of the {n,m} on my part or is awk
failing?

P.D: Sure there are other programs, systems to extract MAC address. I
am already using a perl one.
is just that i cant understand the previous one.

Ed Morton

2007-08-18, 9:58 pm

IMnew wrote:
> To extract mac addreses from lines, (asuming they are
> xx:xx:xx:xx:xx:xx ) I figured a regex like:
> /([0-9A-Fa-f]{2}[:]){5}[0-9A-Fa-f]/
> would find them and match() could tell it's position and length, so a
> line like:
>
> echo "$a" | awk --posix '{z = match( $0 , /([0-9A-Fa-f]{2}[:]){5,5}
> [0-9A-Fa-f]/ ); print z, RSTART, RLENGTH }'
>
> should print position and length of mac address on lines of $a.
>
> Well....., it doesn't.
>
> On finding what was wrong, a line up to (note the difference on the
> regex):
>
> echo "$a" | awk --posix '{z = match( $0 , /([0-9A-Fa-f]{2}[:])
> {1,5}/ ); print z, RSTART, RLENGTH }'
>
> prints: 19 19 15 on $a = '92.103.26.1 ether 00:90:1A:33:12:41 C eth1',
> so it is working.
> As soon as {1,5} is changed to {2,5} {3,5} ... etc it fails. It should
> be {5,5}.
>
> Is there a misinterpretation of the {n,m} on my part or is awk
> failing?
>
> P.D: Sure there are other programs, systems to extract MAC address. I
> am already using a perl one.
> is just that i cant understand the previous one.
>


Looks like it's either something wrong with your awk or something you
don't understand about your locale. Try using character classes (e.g.
"[[:alnum:]]") instead of explicit ranges (e.g. "[0-9A-Fa-f]") to try to
rule out locale issues. Also, it's best to use --re-interval instead of
--posix so you don't lose the other useful gawk extensions, e.g.
gensub(). So, try this:

$ echo "$a" | awk --re-interval '{z = match( $0 ,
/([[:alnum:]]{2}[:]){2,5}/ ); print z, RSTART, RLENGTH }'
19 19 15

Regards,

Ed.

IMnew

2007-08-18, 9:58 pm

Thanks Ed for your input.
I already tried that before posting. Same failure.

I looks that it worked on your setup. So, it seems that my awk is
failing....

would you agree?

Ed Morton

2007-08-19, 3:57 am

IMnew wrote:

[please provide enough context for your response to stand alone - this
is usenet, not a web forum. Fixed below]

> Thanks Ed for your input.
> I already tried that before posting. Same failure.
>
> I looks that it worked on your setup. So, it seems that my awk is
> failing....
>
> would you agree?
>


Yes, but I'd like to see a screen copy/paste of you running the command
to be sure. Also, try "awk --version" to see what version of awk you're
using:

$ a='92.103.26.1 ether 00:90:1A:33:12:41 C eth1'
$ echo "$a" | awk --re-interval '{z = match( $0 ,
/([[:alnum:]]{2}[:]){2,5}/ ); print z, RSTART, RLENGTH }'
19 19 15
$ awk --version | head -1
GNU Awk 3.1.5

Regards,

Ed.
IMnew

2007-08-19, 6:57 pm

On Aug 19, 1:04 am, Ed Morton <mor...@lsupcaemnt.com> wrote:
> IMnew wrote:
>
> [please provide enough context for your response to stand alone - this
> is usenet, not a web forum. Fixed below]
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Yes, but I'd like to see a screen copy/paste of you running the command
> to be sure. Also, try "awk --version" to see what version of awk you're
> using:
>
> $ a='92.103.26.1 ether 00:90:1A:33:12:41 C eth1'
> $ echo "$a" | awk --re-interval '{z = match( $0 ,
> /([[:alnum:]]{2}[:]){2,5}/ ); print z, RSTART, RLENGTH }'
> 19 19 15
> $ awk --version | head -1
> GNU Awk 3.1.5
>
> Regards,
>
> Ed


Sorry for erasing context in previous post.
Things checked:
Locale: all charaters are ascii page (<128), so i doubth locale would
affect.
[[:alnum:]] is not what is intended, [[:xdigit:]] tested, same
situation.
--re-interval already tested previouslly, same situation.
Screen run:

$ awk --version| head -1
GNU Awk 3.1.5
$ a='92.103.26.1 ether 00:90:1A:33:12:41 C eth1'
$ echo "$a" | awk --re-interval '{z = match( $0 , /([[:alnum:]]{2}[:])
{1,5}/ ); print z, RSTART, RLENGTH }'
19 19 15
$ echo "$a" | awk --re-interval '{z = match( $0 , /([[:alnum:]]{2}[:])
{2,5}/ ); print z, RSTART, RLENGTH }'
0 0 -1
$ echo "$a" | awk --re-interval '{z = match( $0 , /([[:alnum:]]{2}[:])
{5}/ ); print z, RSTART, RLENGTH }'
0 0 -1


Hope it helps,

IM.

IMnew

2007-08-20, 8:00 am

On Aug 19, 12:59 pm, IMnew <IsaacMarcos100...@gmail.com> wrote:
> On Aug 19, 1:04 am, Ed Morton <mor...@lsupcaemnt.com> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Sorry for erasing context in previous post.
> Things checked:
> Locale: all charaters are ascii page (<128), so i doubth locale would
> affect.
> [[:alnum:]] is not what is intended, [[:xdigit:]] tested, same
> situation.
> --re-interval already tested previouslly, same situation.
> Screen run:
>
> $ awk --version| head -1
> GNU Awk 3.1.5
> $ a='92.103.26.1 ether 00:90:1A:33:12:41 C eth1'
> $ echo "$a" | awk --re-interval '{z = match( $0 , /([[:alnum:]]{2}[:])
> {1,5}/ ); print z, RSTART, RLENGTH }'
> 19 19 15
> $ echo "$a" | awk --re-interval '{z = match( $0 , /([[:alnum:]]{2}[:])
> {2,5}/ ); print z, RSTART, RLENGTH }'
> 0 0 -1
> $ echo "$a" | awk --re-interval '{z = match( $0 , /([[:alnum:]]{2}[:])
> {5}/ ); print z, RSTART, RLENGTH }'
> 0 0 -1
>
> Hope it helps,
>
> IM.


You are right. LANG=C solves the problem......
exactly how or why, still not clear. :-)

thanks
IM.

Ed Morton

2007-08-20, 6:58 pm

IMnew wrote:
> On Aug 19, 12:59 pm, IMnew <IsaacMarcos100...@gmail.com> wrote:
>
<snip>[color=darkred]
>
>
> You are right. LANG=C solves the problem......
> exactly how or why, still not clear. :-)
>
> thanks
> IM.
>


The definition of an "alphabetic character" (etc.) can vary between
countries so I guess that must be your problem. See:

http://www.gnu.org/software/gawk/ma...wk.html#Locales
http://www.gnu.org/software/gawk/ma...Character-Lists

for more details.

Ed.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com