Code Comments
Programming Forum and web based access to our favorite programming groups.I am trying to create a regular expression that will "optionally" match each
expression. In the example below, I would like to regexp to attempt to matc
h
all three expressions, but I am fine if it can only match one or two of the
expressions. So in the case below, I expected it to match the "quick " and
"fox" expressions, but it failed to match any expression. What's wrong here
?
set regString "(quick )+.*?(brown )+.*?(fox)"
set story "The quick fox jumped"
set results [regexp -nocase -indices -inline $regString $story]
set numMatches 0
set matchNum 0
foreach pair $results {
set indexA [lindex $pair 0]
set indexB [lindex $pair 1]
puts "$matchNum: [string range $story $indexA $indexB]"
if { (0 < $matchNum) && ($indexA < $indexB) } {
incr numMatches
}
incr matchNum
}
# This should be 2.
puts "Number of expressions matched = $numMatches"
Post Follow-up to this messageO.B. <funkjunk@bellsouth.net> wrote: > I am trying to create a regular expression that will "optionally" match ea ch > expression. In the example below, I would like to regexp to attempt to ma tch > all three expressions, but I am fine if it can only match one or two of th e > expressions. So in the case below, I expected it to match the "quick " an d > "fox" expressions, but it failed to match any expression. What's wrong he re? > > set regString "(quick )+.*?(brown )+.*?(fox)" The RE you supplied surely does not do what you wrote it should do. The RE says: match ONE OR MORE instances of "quick ", then match as few as necessary arbitrary characters, then match ONE OR MORE instances of "brown ", then match as few as necessary arbitrary characters, then finally match ONE instance of "fox", so, clearly it cannot possibly match if the word "brown" was missing. you forgot some enclosing parentheses to collect the optional parts: set regString "(?:(quick )+.*)?(?:(brown )+.*)?(fox)" (the "?:" inside each of the new parentheses will make them "non-capturing". see man-page of re_syntaxc for details)
Post Follow-up to this messageAndreas Leitgeb wrote:
> O.B. <funkjunk@bellsouth.net> wrote:
>
>
>
> The RE you supplied surely does not do what you wrote it should do.
>
> The RE says:
> match ONE OR MORE instances of "quick ",
> then match as few as necessary arbitrary characters,
> then match ONE OR MORE instances of "brown ",
> then match as few as necessary arbitrary characters,
> then finally match ONE instance of "fox",
> so, clearly it cannot possibly match if the word "brown" was missing.
>
> you forgot some enclosing parentheses to collect the optional parts:
> set regString "(?:(quick )+.*)?(?:(brown )+.*)?(fox)"
>
> (the "?:" inside each of the new parentheses will make them
> "non-capturing". see man-page of re_syntaxc for details)
Good catch. I've tried to expand the example to further explain what I'm tr
ying
to do. Of all the expressions, I'd prefer for the program to attempt to mat
ch
as many as possible. Am I asking too much of regular expressions?
# Test 1
set regString "(?:(quick )+.*)?(?:(brown )+.*)?(?:(fox ).*)?"
set story "The quick fox jumped over another brown fox "
set results [regexp -all -nocase -indices -inline $regString $story]
For this test, I get 5 sets of data. Looping through the data, it appears t
hat
there were no complete matches. I was expecting the regular expression to m
atch
the 2nd, 7th, and 8th words of "story".
Using the "same" regString: In the event that the story contains only "The q
uick
fox jumped ", I was expecting the regular expression to match the 2nd and 3r
d
words into the 1st and 3rd expression.
# Test 2
set story "The quick fox jumped "
set results [regexp -all -nocase -indices -inline $regString $story]
FYI, the following code is used for debugging the returned results:
set numMatches [expr [llength $results] / 4]
puts "Number of matched sets = $numMatches"
puts "Results = $results"
set counter 0
set setNum 1
for {set i 0} {$i < [llength $results]} {incr i} {
if { $counter == 0 } {
puts "Set $setNum:"
}
set pair [lindex $results $i]
set indexA [lindex $pair 0]
set indexB [lindex $pair 1]
puts " $counter: [string range $story $indexA $indexB]"
if { $counter == 3 } {
set counter 0
incr setNum
} else {
incr counter
}
}
Post Follow-up to this messageO.B. <funkjunk@bellsouth.net> wrote: > > Good catch. I've tried to expand the example to further explain what > I'm trying to do. Of all the expressions, I'd prefer for the program > to attempt to match as many as possible. oh, here starts the lousy part :-/ at first glance, set regString "(?:(quick )?.*?)?(?:(brown )+.*?)?(fox)?" should do it by making the .* non-greedy, but it seems RE's that utilize non-greedy matching are somewhat more strange than one might think. The problem with the naive approach is, that a non-match of any word can happen at any place, and if a non-match is ok (through use of ?- or *-quantifier) the re-engine will not necessarily search for longer matches, if those would start at a later position. I can't say its impossible, but I can't think of a solution either. PS: if it's just about finding most of the words not caring for them to occur in any particular order, then regexp -all -inline -indices "quick|brown|fox" $story may do it for you.
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.