Home > Archive > PERL Beginners > July 2005 > Re: regex puzzle
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| Roberto Ruiz 2005-07-29, 5:01 pm |
|
On Fri, Jul 29, 2005 at 01:48:15PM +0800, bingfeng zhao wrote:
> See following sample code:
>
> my @address = ("http://test", "http://", "www", "", "ftp:/foo" );
>
> for (@address)
> {
> print "\"$_\" passed! \n" if /^((http|ftp):\/\/)?.+$/;
# ^ ^
# these parentesis are making an atom of the
# enclosed part of the regex
> }
> why "http://" and "ftp:/foo" can pass the check?
Because the () atomize the first part of your regex and then the ? is
asking for 0 or 1 of that atom.
Droping the ()?: /^(http|ftp):\/\/.+$/
Or, more redable: m!^(http|ftp)://.+$!
And a little shorter: m!^(ht|f)tp://.+$!
HTH,
Roberto Ruiz
| |
| Pedro Henrique Calais 2005-07-29, 5:01 pm |
|
/^((http|ftp):\/\/)?.+$/
In my opinion the problem is that ' .+' means 'anything',
and ''" is matched by this part of the regex
("" is anything too).
regards
Pedro
----- Original Message -----
From: "Roberto Ruiz" <rruiz@expoempresas.com>
To: <beginners@perl.org>
Sent: Friday, July 29, 2005 6:52 AM
Subject: Re: regex puzzle
>
> On Fri, Jul 29, 2005 at 01:48:15PM +0800, bingfeng zhao wrote:
> # ^ ^
> # these parentesis are making an atom of the
> # enclosed part of the regex
>
> Because the () atomize the first part of your regex and then the ? is
> asking for 0 or 1 of that atom.
>
> Droping the ()?: /^(http|ftp):\/\/.+$/
>
> Or, more redable: m!^(http|ftp)://.+$!
>
> And a little shorter: m!^(ht|f)tp://.+$!
>
> HTH,
> Roberto Ruiz
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>
>
>
| |
| Tom Allison 2005-07-29, 10:00 pm |
| Roberto Ruiz wrote:
> On Fri, Jul 29, 2005 at 01:48:15PM +0800, bingfeng zhao wrote:
>
>
> # ^ ^
> # these parentesis are making an atom of the
> # enclosed part of the regex
>
>
>
> Because the () atomize the first part of your regex and then the ? is
> asking for 0 or 1 of that atom.
>
> Droping the ()?: /^(http|ftp):\/\/.+$/
>
> Or, more redable: m!^(http|ftp)://.+$!
>
> And a little shorter: m!^(ht|f)tp://.+$!
>
> HTH,
> Roberto Ruiz
>
>
Nice shot!!
And yes, the ? is the culprit.
Wouldn't is be simpler if you just replaced .+$ with \w+ to make sure
that there was something that matched. Who cares if you match something
to the end of the line? You also are accepting strings like:
http://`rm -rf /.`
m|(ht|f)tp://\w+|
Because you really would be expecting something similar to 'www' to follow?
|
|
|
|
|