Home > Archive > AWK > August 2007 > Repeated regex doesn't work?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Repeated regex doesn't work?
|
|
|
| To extract mac addreses from lines, (asuming they are
xx:xx:xx:xx:xx:xx ) I figured a regex like:
/([0-9A-Fa-f]{2}[:]){5}[0-9A-Fa-f]/
would find them and match() could tell it's position and length, so a
line like:
echo "$a" | awk --posix '{z = match( $0 , /([0-9A-Fa-f]{2}[:]){5,5}
[0-9A-Fa-f]/ ); print z, RSTART, RLENGTH }'
should print position and length of mac address on lines of $a.
Well....., it doesn't.
On finding what was wrong, a line up to (note the difference on the
regex):
echo "$a" | awk --posix '{z = match( $0 , /([0-9A-Fa-f]{2}[:])
{1,5}/ ); print z, RSTART, RLENGTH }'
prints: 19 19 15 on $a = '92.103.26.1 ether 00:90:1A:33:12:41 C eth1',
so it is working.
As soon as {1,5} is changed to {2,5} {3,5} ... etc it fails. It should
be {5,5}.
Is there a misinterpretation of the {n,m} on my part or is awk
failing?
P.D: Sure there are other programs, systems to extract MAC address. I
am already using a perl one.
is just that i cant understand the previous one.
| |
| Ed Morton 2007-08-18, 9:58 pm |
| IMnew wrote:
> To extract mac addreses from lines, (asuming they are
> xx:xx:xx:xx:xx:xx ) I figured a regex like:
> /([0-9A-Fa-f]{2}[:]){5}[0-9A-Fa-f]/
> would find them and match() could tell it's position and length, so a
> line like:
>
> echo "$a" | awk --posix '{z = match( $0 , /([0-9A-Fa-f]{2}[:]){5,5}
> [0-9A-Fa-f]/ ); print z, RSTART, RLENGTH }'
>
> should print position and length of mac address on lines of $a.
>
> Well....., it doesn't.
>
> On finding what was wrong, a line up to (note the difference on the
> regex):
>
> echo "$a" | awk --posix '{z = match( $0 , /([0-9A-Fa-f]{2}[:])
> {1,5}/ ); print z, RSTART, RLENGTH }'
>
> prints: 19 19 15 on $a = '92.103.26.1 ether 00:90:1A:33:12:41 C eth1',
> so it is working.
> As soon as {1,5} is changed to {2,5} {3,5} ... etc it fails. It should
> be {5,5}.
>
> Is there a misinterpretation of the {n,m} on my part or is awk
> failing?
>
> P.D: Sure there are other programs, systems to extract MAC address. I
> am already using a perl one.
> is just that i cant understand the previous one.
>
Looks like it's either something wrong with your awk or something you
don't understand about your locale. Try using character classes (e.g.
"[[:alnum:]]") instead of explicit ranges (e.g. "[0-9A-Fa-f]") to try to
rule out locale issues. Also, it's best to use --re-interval instead of
--posix so you don't lose the other useful gawk extensions, e.g.
gensub(). So, try this:
$ echo "$a" | awk --re-interval '{z = match( $0 ,
/([[:alnum:]]{2}[:]){2,5}/ ); print z, RSTART, RLENGTH }'
19 19 15
Regards,
Ed.
| |
|
| Thanks Ed for your input.
I already tried that before posting. Same failure.
I looks that it worked on your setup. So, it seems that my awk is
failing....
would you agree?
| |
| Ed Morton 2007-08-19, 3:57 am |
| IMnew wrote:
[please provide enough context for your response to stand alone - this
is usenet, not a web forum. Fixed below]
> Thanks Ed for your input.
> I already tried that before posting. Same failure.
>
> I looks that it worked on your setup. So, it seems that my awk is
> failing....
>
> would you agree?
>
Yes, but I'd like to see a screen copy/paste of you running the command
to be sure. Also, try "awk --version" to see what version of awk you're
using:
$ a='92.103.26.1 ether 00:90:1A:33:12:41 C eth1'
$ echo "$a" | awk --re-interval '{z = match( $0 ,
/([[:alnum:]]{2}[:]){2,5}/ ); print z, RSTART, RLENGTH }'
19 19 15
$ awk --version | head -1
GNU Awk 3.1.5
Regards,
Ed.
| |
|
| On Aug 19, 1:04 am, Ed Morton <mor...@lsupcaemnt.com> wrote:
> IMnew wrote:
>
> [please provide enough context for your response to stand alone - this
> is usenet, not a web forum. Fixed below]
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Yes, but I'd like to see a screen copy/paste of you running the command
> to be sure. Also, try "awk --version" to see what version of awk you're
> using:
>
> $ a='92.103.26.1 ether 00:90:1A:33:12:41 C eth1'
> $ echo "$a" | awk --re-interval '{z = match( $0 ,
> /([[:alnum:]]{2}[:]){2,5}/ ); print z, RSTART, RLENGTH }'
> 19 19 15
> $ awk --version | head -1
> GNU Awk 3.1.5
>
> Regards,
>
> Ed
Sorry for erasing context in previous post.
Things checked:
Locale: all charaters are ascii page (<128), so i doubth locale would
affect.
[[:alnum:]] is not what is intended, [[:xdigit:]] tested, same
situation.
--re-interval already tested previouslly, same situation.
Screen run:
$ awk --version| head -1
GNU Awk 3.1.5
$ a='92.103.26.1 ether 00:90:1A:33:12:41 C eth1'
$ echo "$a" | awk --re-interval '{z = match( $0 , /([[:alnum:]]{2}[:])
{1,5}/ ); print z, RSTART, RLENGTH }'
19 19 15
$ echo "$a" | awk --re-interval '{z = match( $0 , /([[:alnum:]]{2}[:])
{2,5}/ ); print z, RSTART, RLENGTH }'
0 0 -1
$ echo "$a" | awk --re-interval '{z = match( $0 , /([[:alnum:]]{2}[:])
{5}/ ); print z, RSTART, RLENGTH }'
0 0 -1
Hope it helps,
IM.
| |
|
| On Aug 19, 12:59 pm, IMnew <IsaacMarcos100...@gmail.com> wrote:
> On Aug 19, 1:04 am, Ed Morton <mor...@lsupcaemnt.com> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Sorry for erasing context in previous post.
> Things checked:
> Locale: all charaters are ascii page (<128), so i doubth locale would
> affect.
> [[:alnum:]] is not what is intended, [[:xdigit:]] tested, same
> situation.
> --re-interval already tested previouslly, same situation.
> Screen run:
>
> $ awk --version| head -1
> GNU Awk 3.1.5
> $ a='92.103.26.1 ether 00:90:1A:33:12:41 C eth1'
> $ echo "$a" | awk --re-interval '{z = match( $0 , /([[:alnum:]]{2}[:])
> {1,5}/ ); print z, RSTART, RLENGTH }'
> 19 19 15
> $ echo "$a" | awk --re-interval '{z = match( $0 , /([[:alnum:]]{2}[:])
> {2,5}/ ); print z, RSTART, RLENGTH }'
> 0 0 -1
> $ echo "$a" | awk --re-interval '{z = match( $0 , /([[:alnum:]]{2}[:])
> {5}/ ); print z, RSTART, RLENGTH }'
> 0 0 -1
>
> Hope it helps,
>
> IM.
You are right. LANG=C solves the problem......
exactly how or why, still not clear. :-)
thanks
IM.
| |
|
|
|
|
|