Code Comments
Programming Forum and web based access to our favorite programming groups.Can I get sequence of all strings that can match a given regular expression? For example, for expression '(a|b)|(x|y)' it would be ['ax', 'ay', 'bx', 'by'] It would be useful for example to pass these strings to a search engine not supporting RegExp (therefore adding such support to it). A program can also let user specify sequence of strings using RegExp (filenames to process, etc.). If there are other types of expressions for these purposes, please let me know. I know that for some expressions there would be infinite amount of matching strings, but these aren't the cases I'm considering. It'd still be possible if string length is limited (there might be large but finite number of matching strings). Thanks
Post Follow-up to this messageI don't think there is any built in way. Regular expressions are compiled into an expanded pattern internally, but I don't think that it is anything that would be useful for you to directly access. If you are interested in a lot of work, you could do something with PLY and write an re parser that would expand it into a series of possible textual matches :)
Post Follow-up to this messageAlex9968 <noname9968@gmail.com> wrote:
> Can I get sequence of all strings that can match a given regular
> expression? For example, for expression '(a|b)|(x|y)' it would be ['ax',
> 'ay', 'bx', 'by']
>
> It would be useful for example to pass these strings to a search engine
> not supporting RegExp (therefore adding such support to it). A program
> can also let user specify sequence of strings using RegExp (filenames to
> process, etc.). If there are other types of expressions for these
> purposes, please let me know.
>
> I know that for some expressions there would be infinite amount of
> matching strings, but these aren't the cases I'm considering. It'd still
> be possible if string length is limited (there might be large but finite
> number of matching strings).
This will give you all (byte-)strings upto a given length which match a
given regular expression. But beware, it can be slow ;)
import re
all_chars = [chr(i) for i in xrange(256)]
def gen_strings(length, alphabet=all_chars):
if length == 1:
for c in alphabet:
yield c
else:
for i in alphabet:
yield c
for s in gen_strings(length - 1, alphabet):
yield c + s
def regex_matches(regex, max_length, alphabet=all_chars):
r = re.compile('(' + regex + r')\Z')
return (s for s in gen_strings(max_length, alphabet) if r.match(s))
Marc
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.