For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > June 2006 > Re: puzzling '.{1,4}'









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: puzzling '.{1,4}'
Paul Lalli

2006-06-26, 6:58 pm

Tom Arnall wrote:
> do you have any idea why:
>
> $_ = " x11x x22x a ";
>
> $re1 = qr/x.*?\d\dx|a/;
> $re2 = qr/($re1\s)?$re1/;
> ($_) = /($re2)/;
> print $_;
>
> doesn't produce 'x11x' ?


Because you're telling the first part of $re2 to be greedy. Your $re2
is looking for "an instance of $re1, a single space, and another
instance of $re1. This causes $re1 to match the x, the two ones, the
two spaces, the x, two twos, and the x. The \s then matches the space,
and the second $re1 matches the a. In order for this to produce just
'x11x', the first part of $re2 would have to not match at all, meaning
it would match ($re1\s) zero times, thus letting $re1 match just
'x11x'. If that's the behavior you want, then change ($re1\s)? to
($re1\s)??

> (note btw that if you insert '\n' between the first
> two tokens of the target string, the result >does become 'x11x'.


I don't know what two tokens you're referring to, or where you're
putting this \n in relation to the two spaces, so I'm not going to
bother theorizing about this.

> note also that if you drop '|a' from $re1 you also get 'x11x'.)


When you take the |a out of the pattern, you prevent the second $re1
from matching the 'a'. So now the pattern first tries to set the first
$re1 to 'x11x'. This doesn't work, because there are two spaces
between that and the next $re1, and you can only match one of them. So
it next tries to set the first $re1 to 'x11x x22x'. This doesn't
work, because what follows the space after this substring doesn't match
$re1 anymore. So the pattern has no choice but to allow ($re1\s)? to
match 0 times, thus letting the second $re1 match 'x11x'.

>
> i read this example as follows:
>
> $re1 = qr/
> x #find an 'x'
> .*? #find whatever of whatever length


No. Find anything (not including newline) of any length, but *only as
much as is necessary to let the pattern succeed*


> \d\d #find two digits
> x #find an 'x'
> | #or, instead of all the foregoing,
> a #find an 'a'
> /x;
>
> $re2 = qr/
> (
> $re1 #find $re1
> \s #and whitespace
> )? #or maybe none of the foregoing


It's not just "or" in the general sense of "it doesn't matter which".
? is very specifically a greedy qualifier. If it can match the
subpattern once, it will. Only if that matching would prevent the
pattern from succeeding will ? instead match 0 times. If you want the
non-greedy version, use the ?? quantifier instead.


> $re1 #find for sure $re1
> #in sum, find $re1 possibly preceded by $re1+whitespace
> /x;
>
> does this sound right?


Hope this helps,
Paul Lalli

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com