Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Regular expression with 0 or 1 matches option
I am trying to create a regular expression that will "optionally" match each
expression.  In the example below, I would like to regexp to attempt to matc
h
all three expressions, but I am fine if it can only match one or two of the
expressions.  So in the case below, I expected it to match the "quick " and
"fox" expressions, but it failed to match any expression.  What's wrong here
?

set regString "(quick )+.*?(brown )+.*?(fox)"

set story "The quick fox jumped"

set results [regexp -nocase -indices -inline $regString $story]

set numMatches 0
set matchNum   0
foreach pair $results {
set indexA [lindex $pair 0]
set indexB [lindex $pair 1]
puts "$matchNum:  [string range $story $indexA $indexB]"
if { (0 < $matchNum) && ($indexA < $indexB) } {
incr numMatches
}
incr matchNum
}

# This should be 2.
puts "Number of expressions matched = $numMatches"


Report this thread to moderator Post Follow-up to this message
Old Post
O.B.
08-23-04 08:58 AM


Re: Regular expression with 0 or 1 matches option
O.B. <funkjunk@bellsouth.net> wrote:
> I am trying to create a regular expression that will "optionally" match ea
ch
> expression.  In the example below, I would like to regexp to attempt to ma
tch
> all three expressions, but I am fine if it can only match one or two of th
e
> expressions.  So in the case below, I expected it to match the "quick " an
d
> "fox" expressions, but it failed to match any expression.  What's wrong he
re?
>
> set regString "(quick )+.*?(brown )+.*?(fox)"

The RE you supplied surely does not do what you wrote it should do.

The RE says:
match  ONE OR MORE instances of "quick ",
then match as few as necessary arbitrary characters,
then match ONE OR MORE instances of "brown ",
then match as few as necessary arbitrary characters,
then finally match ONE instance of "fox",
so, clearly it cannot possibly match if the word "brown" was missing.

you forgot some enclosing parentheses to collect the optional parts:
set regString "(?:(quick )+.*)?(?:(brown )+.*)?(fox)"

(the "?:" inside each of the new parentheses will make them
"non-capturing". see man-page of re_syntaxc for details)

Report this thread to moderator Post Follow-up to this message
Old Post
Andreas Leitgeb
08-23-04 01:58 PM


Re: Regular expression with 0 or 1 matches option
Andreas Leitgeb wrote:
> O.B. <funkjunk@bellsouth.net> wrote:
> 
>
>
> The RE you supplied surely does not do what you wrote it should do.
>
> The RE says:
>   match  ONE OR MORE instances of "quick ",
>   then match as few as necessary arbitrary characters,
>   then match ONE OR MORE instances of "brown ",
>   then match as few as necessary arbitrary characters,
>   then finally match ONE instance of "fox",
> so, clearly it cannot possibly match if the word "brown" was missing.
>
> you forgot some enclosing parentheses to collect the optional parts:
> set regString "(?:(quick )+.*)?(?:(brown )+.*)?(fox)"
>
> (the "?:" inside each of the new parentheses will make them
>   "non-capturing". see man-page of re_syntaxc for details)

Good catch.  I've tried to expand the example to further explain what I'm tr
ying
to do.  Of all the expressions, I'd prefer for the program to attempt to mat
ch
as many as possible.  Am I asking too much of regular expressions?

# Test 1
set regString "(?:(quick )+.*)?(?:(brown )+.*)?(?:(fox ).*)?"
set story     "The quick fox jumped over another brown fox "
set results   [regexp -all -nocase -indices -inline $regString $story]

For this test, I get 5 sets of data.  Looping through the data, it appears t
hat
there were no complete matches.  I was expecting the regular expression to m
atch
the 2nd, 7th, and 8th words of "story".

Using the "same" regString: In the event that the story contains only "The q
uick
fox jumped ", I was expecting the regular expression to match the 2nd and 3r
d
words into the 1st and 3rd expression.

# Test 2
set story     "The quick fox jumped "
set results   [regexp -all -nocase -indices -inline $regString $story]


FYI, the following code is used for debugging the returned results:

set numMatches [expr [llength $results] / 4]

puts "Number of matched sets = $numMatches"
puts "Results = $results"

set counter 0
set setNum 1
for {set i 0} {$i < [llength $results]} {incr i} {
if { $counter == 0 } {
puts "Set $setNum:"
}

set pair   [lindex $results $i]
set indexA [lindex $pair 0]
set indexB [lindex $pair 1]
puts "  $counter:  [string range $story $indexA $indexB]"

if { $counter == 3 } {
set counter 0
incr setNum
} else {
incr counter
}
}






Report this thread to moderator Post Follow-up to this message
Old Post
O.B.
08-23-04 09:01 PM


Re: Regular expression with 0 or 1 matches option
O.B. <funkjunk@bellsouth.net> wrote: 
>
> Good catch.  I've tried to expand the example to further explain what
> I'm trying to do.  Of all the expressions, I'd prefer for the program
> to attempt to match as many as possible.

oh, here starts the lousy part :-/

at first glance,
set regString "(?:(quick )?.*?)?(?:(brown )+.*?)?(fox)?"
should do it by making the .* non-greedy, but it seems
RE's that utilize non-greedy matching are somewhat more
strange than one might think.

The problem with the naive approach is, that a non-match
of any word can happen at any place, and if a non-match is
ok (through use of ?- or *-quantifier) the re-engine will not
necessarily search for longer matches, if those would start
at a later position.

I can't say its impossible, but I can't think of a solution
either.

PS: if it's just about finding most of the words not caring
for them to occur in any particular order, then
regexp -all -inline -indices "quick|brown|fox" $story
may do it for you.


Report this thread to moderator Post Follow-up to this message
Old Post
Andreas Leitgeb
08-23-04 09:01 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Tcl archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 04:41 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.