Home > Archive > PERL Miscellaneous > November 2005 > /(foo|)/ vs /(foo)?/
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
/(foo|)/ vs /(foo)?/
|
|
|
|
While looking at someone else's code I came across a regular
expression that included a construct like /(foo|)/. As far as I
can tell, it should produce the same result as /(foo)?/. But the
author of the code knows a heck of a lot more Perl than I do, so
I'm wondering why she would have picked the former over the latter.
Any ideas?
Thanks!
kj
P.S. I'm aware of the fact that /(|foo)/ would produce very different
results from /(foo|)/ or /(foo)?/, but that's neither here nor
there.
--
NOTE: In my address everything before the first period is backwards;
and the last period, and everything after it, should be discarded.
| |
| Ingo Menger 2005-11-17, 7:57 am |
|
kj schrieb:
> While looking at someone else's code I came across a regular
> expression that included a construct like /(foo|)/. As far as I
> can tell, it should produce the same result as /(foo)?/. But the
> author of the code knows a heck of a lot more Perl than I do, so
> I'm wondering why she would have picked the former over the latter.
> Any ideas?
Hmmm....
A year ago I had to throw away my laptop, since the keyboard had given
up completely. But before it stopped working alltogether, there were
some w s, where only certain keys did not work anymore. Perhaps the
author had a similar problem with his keyboard. After all, on some
keybords, the '?' key is in the upper right corner and the '|' key is
in the lower left, so it may well be that one of them works while the
other does not.
| |
| Anno Siegel 2005-11-17, 7:57 am |
| kj <socyl@987jk.com.invalid> wrote in comp.lang.perl.misc:
>
>
>
>
>
> While looking at someone else's code I came across a regular
> expression that included a construct like /(foo|)/. As far as I
> can tell, it should produce the same result as /(foo)?/. But the
> author of the code knows a heck of a lot more Perl than I do, so
> I'm wondering why she would have picked the former over the latter.
> Any ideas?
The difference is in what is captured when no "foo" is found in the
string. /(foo|)/ matches an empty string, so $1 is an empty string
after the match. /(foo)?/ skips the match entirely, so $1 is undefined.
A subtle but relevant difference.
Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
| |
| Dr.Ruud 2005-11-17, 7:57 am |
| kj:
> While looking at someone else's code I came across a regular
> expression that included a construct like /(foo|)/. As far as I
> can tell, it should produce the same result as /(foo)?/.
(foo|) is short for ((?:foo)?)
--
Affijn, Ruud
"Gewoon is een tijger."
| |
| Eric J. Roode 2005-11-17, 9:56 pm |
| -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Abigail <abigail@abigail.nl> wrote in
news:slrndnpme2.qiv.abigail@alexandra.abigail.nl:
> $_ = "bar";
> print /(foo|) (?(1)|(?!))/x ? "match\n" : "no match\n";
> print /(foo)? (?(1)|(?!))/x ? "match\n" : "no match\n";
> __END__
> match
> no match
Abigail, the master weaver of perl regexes, has once again confounded
me. It ain't the first time. :-) I had never seen (?(...)). I
looked in "perldoc perlre", and it wasn't much help:
"(?(condition)yes-pattern|no-pattern)"
"(?(condition)yes-pattern)"
Conditional expression. "(condition)" should be either an
integer in parentheses (which is valid if the corresponding
pair of parentheses matched), or look-ahead/look-behind/eval-
uate zero-width assertion.
For example:
m{ ( \( )?
[^()]+
(?(1) \) )
}x
matches a chunk of non-parentheses, possibly included in
parentheses themselves.
This.... is vague at best. What is "no-pattern"? What means
"valid"? ("matches", I assume, but perhaps one should not use it if
there's a chance that the numbered parentheses don't match?) Must
the look-ahead/look-behind/evaluate match at that point? If so, how
is it any different than having the assertion at that point *not*
within (?(...))?
If anyone can explain, or point me to a better explanation than
perlre, I would be grateful.
- --
Eric
`$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
$!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
$_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32) - WinPT 0.7.96rc1
iD8DBQFDfSy1Y96i4h5M0egRAs0RAKCrawVLAl5C
bY6NN8g1e2dLb6E0TQCfYhJN
yZukxNUWWVMPxSZRy3PlmJ4=
=NKDG
-----END PGP SIGNATURE-----
| |
| Brad Baxter 2005-11-17, 9:56 pm |
| Ingo Menger wrote:
> kj schrieb:
>
>
> Hmmm....
>
> A year ago I had to throw away my laptop, since the keyboard had given
> up completely. But before it stopped working alltogether, there were
> some w s, where only certain keys did not work anymore. Perhaps the
> author had a similar problem with his keyboard. After all, on some
> keybords, the '?' key is in the upper right corner and the '|' key is
> in the lower left, so it may well be that one of them works while the
> other does not.
LOL
--
Brad
| |
| Dr.Ruud 2005-11-17, 9:56 pm |
| Eric J. Roode:
> "(?(condition)yes-pattern|no-pattern)"
> "(?(condition)yes-pattern)"
>
> Conditional expression. "(condition)" should be either an
> integer in parentheses (which is valid if the corresponding
> pair of parentheses matched), or look-ahead/look-behind/eval-
> uate zero-width assertion.
>
> For example:
>
> m{ ( \( )?
> [^()]+
> (?(1) \) )
> }x
>
> matches a chunk of non-parentheses, possibly included in
> parentheses themselves.
>
> This.... is vague at best. What is "no-pattern"?
The pattern that will be used if the test returned false.
> What means
> "valid"? ("matches", I assume, but perhaps one should not use it if
> there's a chance that the numbered parentheses don't match?)
The (1) is only valid if there exists a corresponding (capturing?)
group.
m{ ( \( )? # optional opening paren
[^()]+ # 1 or more non-parens
(?(1) \) ) # if the 1st group matched, so there was
# an opening paren, then require a
# closing one
}x
I guess that if the "corresponding pair of parens" didn't match+capture,
$1 keeps its old value, because the ? comes after the 1st group.
--
Affijn, Ruud
"Gewoon is een tijger."
| |
| Wade Whitaker 2005-11-18, 7:00 pm |
| kj wrote:
> While looking at someone else's code I came across a regular
> expression that included a construct like /(foo|)/. As far as I
> can tell, it should produce the same result as /(foo)?/. But the
> author of the code knows a heck of a lot more Perl than I do, so
> I'm wondering why she would have picked the former over the latter.
> Any ideas?
>
> Thanks!
>
> kj
>
> P.S. I'm aware of the fact that /(|foo)/ would produce very different
> results from /(foo|)/ or /(foo)?/, but that's neither here nor
> there.
I ran into a place where this caused me a problem with a regular expression so
I came to understand.
(foo)? says give me 1 or 0 occurances of foo.
(|foo) says give me 0 or 1 occurences of foo.
(foo|) says give me 1 or 0 occurences of foo which means that (foo)? is
redundant syntax in perl that should always be able to be replaced. i.e.
search for )? and replace it with a | before at the beginning or the end.
*? and +? are ok because they are saying don't be greedy.
Anyone want to argue against that? Show a case where it is not true?
I found this out while trying to write a regular expression to find the
matching quote while parsing files. If I found a '"' I wanted to find the
matching '"' without matching on '\"'. It took me a year to finally learn
enough to do this.
The answer is: m/(["'])(|.*?[^\\])(\1|^Z)/gs works where
m/(["'])(.*?[^\\])?(\1|^Z)/gs did not.
Previous attempts that did not work are:
# $$fptr =~ m/\G((?:.*?[^\\])?)($q|^Z)/gs; # find ",'
# $$fptr =~ m/\G(.*?(?!\\))($q|^Z)/gs; # find ",'
# $$fptr =~ m/\G((?:.*?(?!\\))?)($q|^Z)/gs; # find ",'
# $$fptr =~ m/\G((?:.*?(?!\\$q))?($q|^Z))/gs; # find ",'
Wade
| |
| Ingo Menger 2005-11-18, 7:00 pm |
|
Wade Whitaker schrieb:
> (foo)? says give me 1 or 0 occurances of foo.
> (|foo) says give me 0 or 1 occurences of foo.
> (foo|) says give me 1 or 0 occurences of foo which means that (foo)? is
> redundant syntax in perl that should always be able to be replaced. i.e.
> search for )? and replace it with a | before at the beginning or the end.
> *? and +? are ok because they are saying don't be greedy.
> Anyone want to argue against that? Show a case where it is not true?
Did you read the thread?
"bar" =~ m/(foo|)/; print $1;
"bar" =~ m/(foo)?/; print $1;
It's the difference between undefined and ""
| |
| Paul Lalli 2005-11-18, 7:00 pm |
| Wade Whitaker wrote:
> I ran into a place where this caused me a problem with a regular expression so
> I came to understand.
>
> (foo)? says give me 1 or 0 occurances of foo.
> (|foo) says give me 0 or 1 occurences of foo.
> (foo|) says give me 1 or 0 occurences of foo which means that (foo)? is
> redundant syntax in perl that should always be able to be replaced. i.e.
> search for )? and replace it with a | before at the beginning or the end.
> *? and +? are ok because they are saying don't be greedy.
> Anyone want to argue against that? Show a case where it is not true?
Er. See previous responses in this thread...
> I found this out while trying to write a regular expression to find the
> matching quote while parsing files. If I found a '"' I wanted to find the
> matching '"' without matching on '\"'. It took me a year to finally learn
> enough to do this.
That's unfortunate, because you seem to have spent a year reinventing a
wheel.
http://search.cpan.org/dist/Regexp-...on/delimited.pm
> The answer is: m/(["'])(|.*?[^\\])(\1|^Z)/gs works where
> m/(["'])(.*?[^\\])?(\1|^Z)/gs did not.
The answer is m/$RE{delimited}{-delim=>q{'"}}/
Paul Lalli
| |
| Wade Whitaker 2005-11-18, 7:00 pm |
| Ingo Menger wrote:
> Wade Whitaker schrieb:
>
>
>
>
> Did you read the thread?
>
> "bar" =~ m/(foo|)/; print $1;
> "bar" =~ m/(foo)?/; print $1;
>
> It's the difference between undefined and ""
>
Agreed. It does do that, But is the difference you state essential to your
programming needs or a side effect?
Unlike your example, most of these conditional matchs are used in the context
of other "things" that need to be matched as well so the whole regular
expression is true or not; And, the conditional match is there to expand the
ability of the whole to match.
There is a difference between 0 or 1 occurances of foo and 1 or 0 occurances
of foo. 0 or 1 occurances means try 0 first and then try ever other
combination afterward before coming back and trying 1 occurance of foo.
(foo)? always matches foo first.
Could you write a regular expression where you want 0 or 1 occurances of foo
and returns undef if there are 0 occurances? In terms of the whole? This is
the side effect you are defending.
My position is is the syntax needed, preferable, better than the other? Not
does it have different side effects.
This is a style question and I think (|foo) and (foo|) are greatly better than
(foo)?.
Regards,
Wade
| |
| Ingo Menger 2005-11-25, 6:59 pm |
|
Wade Whitaker schrieb:
> Ingo Menger wrote:
> Agreed. It does do that, But is the difference you state essential to your
> programming needs or a side effect?
Both.
The difference between undefined and "" is fundamental.
And, since perl programs usually are imperative and rely heavily on
side effects (this is as it should be in imperative languages), yes,
side effects such as assigning a string or undef to $1 may be very
essential.
> My position is is the syntax needed, preferable, better than the other? Not
> does it have different side effects.
Very silly question, IMHO.
Consider the following question: is the syntax
if (expr) { expr }
better than
while (expr) { expr }
or not?
> This is a style question and I think (|foo) and (foo|) are greatly better than
> (foo)?.
No, as you pointed out before, it is a question of achieving a desired
side effect, namely a side effect on variable $1.
| |
| robic0 2005-11-26, 7:56 am |
| On 25 Nov 2005 06:22:56 -0800, "Ingo Menger"
<quetzalcotl@consultant.com> wrote:
[snip]
>
>Very silly question, IMHO.
>Consider the following question: is the syntax
> if (expr) { expr }
>better than
> while (expr) { expr }
>or not?
$expr = 7;
if ($expr) { $expr }
while ($expr) { $expr }
I would say not! Maybe you need an if-while here...
>
>
>No, as you pointed out before, it is a question of achieving a desired
>side effect, namely a side effect on variable $1.
| |
| robic0 2005-11-26, 7:56 am |
| On Fri, 18 Nov 2005 03:11:38 +0100, "Dr.Ruud"
<rvtol+news@isolution.nl> wrote:
>Eric J. Roode:
>
>
>The pattern that will be used if the test returned false.
>
>
>
>The (1) is only valid if there exists a corresponding (capturing?)
>group.
>
>
> m{ ( \( )? # optional opening paren
> [^()]+ # 1 or more non-parens
> (?(1) \) ) # if the 1st group matched, so there was
> # an opening paren, then require a
> # closing one
> }x
This is just amazing, singular regex concepts agreggated in
a nested theorehtical proof. Random number generator plus
character code/stream will break this type of nesting proofs
every time. You can't depend on the regex module to unwind
its stack like this. The reason you think this is valid
is because your sole input matches an expected output.
Try the random generator for a few months, capture the
(failures if possible) sucesses. You may be in for a
shock!
>
>
>I guess that if the "corresponding pair of parens" didn't match+capture,
>$1 keeps its old value, because the ? comes after the 1st group.
|
|
|
|
|