Code Comments
Programming Forum and web based access to our favorite programming groups.Am I wrong here or is this a bug in TCL ?
The non-greedy operator ".*?" does not work as expected when used with
\d+ or \S+ option preceding it.
expect1.8> info tclversion
8.4
expect1.9> set str {1 2 3}
1 2 3
expect1.10> regexp {\d+.*?\d+} $str match
1
expect1.11> puts $match
1 2 3
----->>> regexp should match {1 2} in the above statement since the
non-greedy operator ".*?" is used.
expect1.12> regexp {\d.*?\d} $str match
1
expect1.13> puts $match
1 2
----->>>> Works fine when using "\d" instead of "\d+" in the above
statement
Expert advice needed ...
Regards
Sharad
Post Follow-up to this messageSharad wrote: > Am I wrong here or is this a bug in TCL ? > > The non-greedy operator ".*?" does not work as expected when used with > \d+ or \S+ option preceding it. Documented behavior. Mixing greedy and non-greedy quantifiers is tricky to say at best, reread: http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm#M95 (Matching, first three paragraphs dealing with preference). Using \d+ before .*? in your first example switches the preference to 'greedy', while using \d (no preference) then .*? (non-greedy) sets preference to non-greedy for the whole expression. Michael
Post Follow-up to this messageThanks Michael !!! On Apr 1, 5:45 pm, schlenk <schl...@uni-oldenburg.de> wrote: > Sharad wrote: > > > Documented behavior. Mixing greedy and non-greedy quantifiers is > tricky to say at best, reread:http://www.tcl.tk/man/tcl8.5/TclCmd...htm# M95(Matching, first > three paragraphs dealing with preference). > > Using \d+ before .*? in your first example switches the preference to > 'greedy', while using \d (no preference) then .*? (non-greedy) sets > preference to non-greedy for the whole expression. > > Michael
Post Follow-up to this messageSharad wrote:
> The non-greedy operator ".*?" does not work as expected when used with
> \d+ or \S+ option preceding it.
> expect1.10> regexp -inline {\d+.*?\d+} "1 2 3"
> {1 2 3}
> Expert advice needed ...
That's the way our RE engine is documented to work, and is a significant
difference from the Perl-derived RE engines. The complication has to do
with the difference between recursive engines and automata-based engines
(the former handle this case "better", but the latter are better at
other types of match). Alas, the conflict can't be resolved easily;
there is a deep theoretic trade-off between the two (CS is good for some
things at least!) so the best advice is "don't mix greediness in a
single RE". It's not that you can't, but it's a recipe for confusion.
Better to use a different RE:
% regexp -inline {\d+\D*\d+} "1 2 3"
{1 2}
Donal.
Post Follow-up to this messageDonal ... I agree. Its intentionally calling for trouble. To make sure that the code doesn't break (which is important), its better to avoid such tricky things. I just wanted to clarify my doubts ... thanks to all those who shared their views. Appreciate your help.
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.