For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > July 2005 > Re: regex puzzle









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: regex puzzle
Roberto Ruiz

2005-07-29, 5:01 pm


On Fri, Jul 29, 2005 at 01:48:15PM +0800, bingfeng zhao wrote:
> See following sample code:
>
> my @address = ("http://test", "http://", "www", "", "ftp:/foo" );
>
> for (@address)
> {
> print "\"$_\" passed! \n" if /^((http|ftp):\/\/)?.+$/;

# ^ ^
# these parentesis are making an atom of the
# enclosed part of the regex
> }
> why "http://" and "ftp:/foo" can pass the check?


Because the () atomize the first part of your regex and then the ? is
asking for 0 or 1 of that atom.

Droping the ()?: /^(http|ftp):\/\/.+$/

Or, more redable: m!^(http|ftp)://.+$!

And a little shorter: m!^(ht|f)tp://.+$!

HTH,
Roberto Ruiz

Pedro Henrique Calais

2005-07-29, 5:01 pm


/^((http|ftp):\/\/)?.+$/
In my opinion the problem is that ' .+' means 'anything',
and ''" is matched by this part of the regex
("" is anything too).

regards
Pedro

----- Original Message -----
From: "Roberto Ruiz" <rruiz@expoempresas.com>
To: <beginners@perl.org>
Sent: Friday, July 29, 2005 6:52 AM
Subject: Re: regex puzzle


>
> On Fri, Jul 29, 2005 at 01:48:15PM +0800, bingfeng zhao wrote:
> # ^ ^
> # these parentesis are making an atom of the
> # enclosed part of the regex
>
> Because the () atomize the first part of your regex and then the ? is
> asking for 0 or 1 of that atom.
>
> Droping the ()?: /^(http|ftp):\/\/.+$/
>
> Or, more redable: m!^(http|ftp)://.+$!
>
> And a little shorter: m!^(ht|f)tp://.+$!
>
> HTH,
> Roberto Ruiz
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>
>
>

Tom Allison

2005-07-29, 10:00 pm

Roberto Ruiz wrote:
> On Fri, Jul 29, 2005 at 01:48:15PM +0800, bingfeng zhao wrote:
>
>
> # ^ ^
> # these parentesis are making an atom of the
> # enclosed part of the regex
>
>
>
> Because the () atomize the first part of your regex and then the ? is
> asking for 0 or 1 of that atom.
>
> Droping the ()?: /^(http|ftp):\/\/.+$/
>
> Or, more redable: m!^(http|ftp)://.+$!
>
> And a little shorter: m!^(ht|f)tp://.+$!
>
> HTH,
> Roberto Ruiz
>
>


Nice shot!!

And yes, the ? is the culprit.
Wouldn't is be simpler if you just replaced .+$ with \w+ to make sure
that there was something that matched. Who cares if you match something
to the end of the line? You also are accepting strings like:

http://`rm -rf /.`

m|(ht|f)tp://\w+|

Because you really would be expecting something similar to 'www' to follow?
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com