Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

non-terminating regex match
Has to be something really stupid, but the following never finish
(running Python 2.5.1 (r251:54863, Jan 10 2008, 18:00:49)
[GCC 4.2.1 (SUSE Linux)] on linux2).

The intention is to match C++ identifiers, with or without namespace
qualification, with or without arguments (e.g. variables, functions and
macros).
The following should be accepted:
main
main(int,char**)
::main
std::cout
::std::cout
NDEBUG

Thanks for any help.
And yes, I'm a total beginner when it comes to Python, but it seems
very strange to me that a regex match on a finite length string
doesn't terminate
Regards,

Maurizio

#!/usr/bin/env python
# -*- Python -*-

import re

if __name__ == '__main__':
r = re.compile (
r'(?:(?P<scope>(?:(?:::)?\w+)*)::)?'
r'(?P<name>\w+)'
r'(?:\((?P<arguments>[^\)]*)\))?'
)
match = r.search ('WITH_ALOHA_EXCEPTION_HANDLERS')



Report this thread to moderator Post Follow-up to this message
Old Post
Maurizio Vitale
04-03-08 03:59 AM


Re: non-terminating regex match
On Wed, 02 Apr 2008 16:01:59 +0000, Maurizio Vitale wrote:

> And yes, I'm a total beginner when it comes to Python, but it seems
> very strange to me that a regex match on a finite length string
> doesn't terminate

It does terminate, you just don't wait long enough.  Try it with fewer
characters and then increase the identifier character by character and
watch the time of the runs grow exponentially.

Ciao,
Marc 'BlackJack' Rintsch

Report this thread to moderator Post Follow-up to this message
Old Post
Marc 'BlackJack' Rintsch
04-03-08 03:59 AM


Re: non-terminating regex match
Marc 'BlackJack' Rintsch <bj_666@gmx.net> writes:

> On Wed, 02 Apr 2008 16:01:59 +0000, Maurizio Vitale wrote:
> 
>
> It does terminate, you just don't wait long enough.  Try it with fewer
> characters and then increase the identifier character by character and
> watch the time of the runs grow exponentially.
>

Ok. Now, my assumption was that re.compile would produce a DFA (or NFA)
and the match would then being linear in the size of the string.

Apparently this is not the case, so is there anything that can be done
to the regex itself to make sure that whatever re.compile produces is
more efficient?

Thanks,

Maurizio

Report this thread to moderator Post Follow-up to this message
Old Post
Maurizio Vitale
04-03-08 03:59 AM


Re: non-terminating regex match
On Apr 2, 5:01 pm, Maurizio Vitale <m...@cuma.polymath-solutions.lan>
wrote:
> Has to be something really stupid, but the following never finish
> (running Python 2.5.1 (r251:54863, Jan 10 2008, 18:00:49)
> [GCC 4.2.1 (SUSE Linux)] on linux2).
>
> The intention is to match C++ identifiers, with or without namespace
> qualification, with or without arguments (e.g. variables, functions and
> macros).
> The following should be accepted:
>     main
>     main(int,char**)
>     ::main
>     std::cout
>     ::std::cout
>     NDEBUG
>
> Thanks for any help.
> And yes, I'm a total beginner when it comes to Python, but it seems
> very strange to me that a regex match on a finite length string
> doesn't terminate
> Regards,
>
>         Maurizio
>
> #!/usr/bin/env python
> # -*- Python -*-
>
> import re
>
> if __name__ == '__main__':
>         r = re.compile (
>             r'(?:(?P<scope>(?:(?:::)?\w+)*)::)?'
>             r'(?P<name>\w+)'
>             r'(?:\((?P<arguments>[^\)]*)\))?'
>             )
>         match = r.search ('WITH_ALOHA_EXCEPTION_HANDLERS')

I think the problem is with this bit: '(?:(?:::)?\w+)*'. The '::' is
optional and also in a repeated group, so if it tries to match, say,
'abc' it can try and then backtrack all of these possibilities: abc,
ab c, a bc, a b c. The longer the string, the more possibilities to
try. Try this instead:

r = re.compile (
r'(?P<scope>(?:::)?(?:\w+::)*)?'
r'(?P<name>\w+)'
r'(?:\((?P<arguments>[^\)]*)\))?'
)

Report this thread to moderator Post Follow-up to this message
Old Post
MRAB
04-03-08 03:59 AM


Re: non-terminating regex match
MRAB <google@mrabarnett.plus.com> writes:

>
> I think the problem is with this bit: '(?:(?:::)?\w+)*'. The '::' is
> optional and also in a repeated group, so if it tries to match, say,
> 'abc' it can try and then backtrack all of these possibilities: abc,
> ab c, a bc, a b c. The longer the string, the more possibilities to
> try. Try this instead:
>
>     r = re.compile (
>         r'(?P<scope>(?:::)?(?:\w+::)*)?'
>         r'(?P<name>\w+)'
>         r'(?:\((?P<arguments>[^\)]*)\))?'
>         )

That was indeed the problem. Your version is ok w/ the minor difference
that the named group <scope> includes the '::' at the end. This can be
easily stripped off. Or maybe the regexp can be massaged further.
Thanks a lot,

Maurizio

Report this thread to moderator Post Follow-up to this message
Old Post
Maurizio Vitale
04-03-08 03:59 AM


Re: non-terminating regex match
En Wed, 02 Apr 2008 13:01:59 -0300, Maurizio Vitale
<mav@cuma.polymath-solutions.lan> escribió:

> The intention is to match C++ identifiers, with or without namespace
> qualification, with or without arguments (e.g. variables, functions and
> macros).
> The following should be accepted:
>     main
>     main(int,char**)
>     ::main
>     std::cout
>     ::std::cout
>     NDEBUG

What about foo(int(*f)(int)) ?
You can't match a function definition with a regular expression alone, due
to the nested (). The r.e. can recognize an identifier; you can later see
whether there is a ( and scan up to the matching ).

--
Gabriel Genellina


Report this thread to moderator Post Follow-up to this message
Old Post
Gabriel Genellina
04-03-08 01:44 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Python archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 08:45 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.