Code Comments
Programming Forum and web based access to our favorite programming groups.
Has to be something really stupid, but the following never finish
(running Python 2.5.1 (r251:54863, Jan 10 2008, 18:00:49)
[GCC 4.2.1 (SUSE Linux)] on linux2).
The intention is to match C++ identifiers, with or without namespace
qualification, with or without arguments (e.g. variables, functions and
macros).
The following should be accepted:
main
main(int,char**)
::main
std::cout
::std::cout
NDEBUG
Thanks for any help.
And yes, I'm a total beginner when it comes to Python, but it seems
very strange to me that a regex match on a finite length string
doesn't terminate
Regards,
Maurizio
#!/usr/bin/env python
# -*- Python -*-
import re
if __name__ == '__main__':
r = re.compile (
r'(?:(?P<scope>(?:(?:::)?\w+)*)::)?'
r'(?P<name>\w+)'
r'(?:\((?P<arguments>[^\)]*)\))?'
)
match = r.search ('WITH_ALOHA_EXCEPTION_HANDLERS')
Post Follow-up to this messageOn Wed, 02 Apr 2008 16:01:59 +0000, Maurizio Vitale wrote: > And yes, I'm a total beginner when it comes to Python, but it seems > very strange to me that a regex match on a finite length string > doesn't terminate It does terminate, you just don't wait long enough. Try it with fewer characters and then increase the identifier character by character and watch the time of the runs grow exponentially. Ciao, Marc 'BlackJack' Rintsch
Post Follow-up to this messageMarc 'BlackJack' Rintsch <bj_666@gmx.net> writes: > On Wed, 02 Apr 2008 16:01:59 +0000, Maurizio Vitale wrote: > > > It does terminate, you just don't wait long enough. Try it with fewer > characters and then increase the identifier character by character and > watch the time of the runs grow exponentially. > Ok. Now, my assumption was that re.compile would produce a DFA (or NFA) and the match would then being linear in the size of the string. Apparently this is not the case, so is there anything that can be done to the regex itself to make sure that whatever re.compile produces is more efficient? Thanks, Maurizio
Post Follow-up to this messageOn Apr 2, 5:01 pm, Maurizio Vitale <m...@cuma.polymath-solutions.lan>
wrote:
> Has to be something really stupid, but the following never finish
> (running Python 2.5.1 (r251:54863, Jan 10 2008, 18:00:49)
> [GCC 4.2.1 (SUSE Linux)] on linux2).
>
> The intention is to match C++ identifiers, with or without namespace
> qualification, with or without arguments (e.g. variables, functions and
> macros).
> The following should be accepted:
> main
> main(int,char**)
> ::main
> std::cout
> ::std::cout
> NDEBUG
>
> Thanks for any help.
> And yes, I'm a total beginner when it comes to Python, but it seems
> very strange to me that a regex match on a finite length string
> doesn't terminate
> Regards,
>
> Maurizio
>
> #!/usr/bin/env python
> # -*- Python -*-
>
> import re
>
> if __name__ == '__main__':
> r = re.compile (
> r'(?:(?P<scope>(?:(?:::)?\w+)*)::)?'
> r'(?P<name>\w+)'
> r'(?:\((?P<arguments>[^\)]*)\))?'
> )
> match = r.search ('WITH_ALOHA_EXCEPTION_HANDLERS')
I think the problem is with this bit: '(?:(?:::)?\w+)*'. The '::' is
optional and also in a repeated group, so if it tries to match, say,
'abc' it can try and then backtrack all of these possibilities: abc,
ab c, a bc, a b c. The longer the string, the more possibilities to
try. Try this instead:
r = re.compile (
r'(?P<scope>(?:::)?(?:\w+::)*)?'
r'(?P<name>\w+)'
r'(?:\((?P<arguments>[^\)]*)\))?'
)
Post Follow-up to this messageMRAB <google@mrabarnett.plus.com> writes: > > I think the problem is with this bit: '(?:(?:::)?\w+)*'. The '::' is > optional and also in a repeated group, so if it tries to match, say, > 'abc' it can try and then backtrack all of these possibilities: abc, > ab c, a bc, a b c. The longer the string, the more possibilities to > try. Try this instead: > > r = re.compile ( > r'(?P<scope>(?:::)?(?:\w+::)*)?' > r'(?P<name>\w+)' > r'(?:\((?P<arguments>[^\)]*)\))?' > ) That was indeed the problem. Your version is ok w/ the minor difference that the named group <scope> includes the '::' at the end. This can be easily stripped off. Or maybe the regexp can be massaged further. Thanks a lot, Maurizio
Post Follow-up to this messageEn Wed, 02 Apr 2008 13:01:59 -0300, Maurizio Vitale <mav@cuma.polymath-solutions.lan> escribió: > The intention is to match C++ identifiers, with or without namespace > qualification, with or without arguments (e.g. variables, functions and > macros). > The following should be accepted: > main > main(int,char**) > ::main > std::cout > ::std::cout > NDEBUG What about foo(int(*f)(int)) ? You can't match a function definition with a regular expression alone, due to the nested (). The r.e. can recognize an identifier; you can later see whether there is a ( and scan up to the matching ). -- Gabriel Genellina
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.