Home > Archive > PHP Language > October 2006 > preg_match works, preg_replace does not - why?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
preg_match works, preg_replace does not - why?
|
|
| bwooster47@gmail.com 2006-10-30, 7:03 pm |
| PHP 5.1.4 (cli) (built: May 8 2006 08:41:41)
Ran from command line, here's the php file - need to replace zero or
more / characters at end of a string, this works fine in perl, but not
in php - any suggestions?
<?php
$s = "aa//";
preg_match('/\/*$/', $s, $a);
echo "matches :" . $a[0] . ":\n"; // output: matches ://:
$t = preg_replace('/\/*$/', 'A', $s);
echo "after replace :$t:, expecting aaA\n"; // output: after replace
:aaAA:
// should be aaA and not aaAA ??
$t = preg_replace("/\/+$/", 'A', $s); // works
echo "after replace :$t:, expecting aaA\n"; // output: after replace
:aaA:
?>
| |
| Koncept 2006-10-30, 7:03 pm |
| In article <1162142987.595717.254360@h48g2000cwc.googlegroups.com>,
<bwooster47@gmail.com> wrote:
> <?php
> $s = "aa//";
>
> preg_match('/\/*$/', $s, $a);
>
> echo "matches :" . $a[0] . ":\n"; // output: matches ://:
>
> $t = preg_replace('/\/*$/', 'A', $s);
>
> echo "after replace :$t:, expecting aaA\n"; // output: after replace
> :aaAA:
> // should be aaA and not aaAA ??
>
> $t = preg_replace("/\/+$/", 'A', $s); // works
>
> echo "after replace :$t:, expecting aaA\n"; // output: after replace
> :aaA:
> ?>
* is greedy and will match as much as possible (including the first
slash). If you change you instances of * to +, you will get the desired
result.
--
Koncept <<
"The snake that cannot shed its skin perishes. So do the spirits who are
prevented from changing their opinions; they cease to be a spirit." -Nietzsche
| |
| bwooster47@gmail.com 2006-10-30, 7:03 pm |
| Exactly - * is supposed to be greedy, but in preg_replace, it does not
act greedy at all.
That is what I am trying to understand.
For example, the following perl code:
# change zero or more final / characters to A
$s = 'aa//';
$s =~ s/\/*$/A/;
print $s; # this prints aaA, as expected
Works as expected - get aaA printed.
But the equivalent php code, using the supposedly perl-like
preg_replace function, results in aaAA -- instead of a single captial
A at end, it prints two, so this means it did not do a greedy match.
And, I don't want to use + instead of star - because + would not add a
A if there were 0 slashes - I definitely need *, to use for "replace
zero or more / at end, with a single A character".
Koncept wrote:
> In article <1162142987.595717.254360@h48g2000cwc.googlegroups.com>,
> * is greedy and will match as much as possible (including the first
> slash). If you change you instances of * to +, you will get the desired
> result.
>
> --
> Koncept <<
> "The snake that cannot shed its skin perishes. So do the spirits who are
> prevented from changing their opinions; they cease to be a spirit." -Nietzsche
| |
| Andy Hassall 2006-10-30, 7:03 pm |
| On 29 Oct 2006 09:29:47 -0800, bwooster47@gmail.com wrote:
>PHP 5.1.4 (cli) (built: May 8 2006 08:41:41)
>
>Ran from command line, here's the php file - need to replace zero or
>more / characters at end of a string, this works fine in perl, but not
>in php - any suggestions?
>
><?php
>$s = "aa//";
>
>preg_match('/\/*$/', $s, $a);
>
>echo "matches :" . $a[0] . ":\n"; // output: matches ://:
>
>$t = preg_replace('/\/*$/', 'A', $s);
>
>echo "after replace :$t:, expecting aaA\n"; // output: after replace
>:aaAA:
>// should be aaA and not aaAA ??
>
>$t = preg_replace("/\/+$/", 'A', $s); // works
>
>echo "after replace :$t:, expecting aaA\n"; // output: after replace
>:aaA:
>?>
Consider:
$t = preg_replace('/(\/*)$/', '($1->A)', $s);
From this, I get the output:
after replace :aa(//->A)(->A):, expecting aaA
This is not very clear, but bear in mind that preg_replace implicitly uses the
/g flag from Perl.
\/* will happily match a null string, so there is a greedy match of two
slashes, and also the empty space between the two slashes and the end of
string.
If you change it to \/+, then you force a single match only, which has a
non-zero number of slashes.
To complete the picture, consider the following in Perl:
andyh@excession ~
$ cat test.pl
#!/usr/bin/env perl
my $s = 'aa//';
$s =~ s/\/*$/A/g;
print $s . "\n";
andyh@excession ~
$ ./test.pl
aaAA
Once you realise preg_match implies /g, it is actually consistent.
--
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
| |
| Koncept 2006-10-30, 7:03 pm |
| In article <1162238914.967613.57450@m73g2000cwd.googlegroups.com>,
<bwooster47@gmail.com> wrote:
> Exactly - * is supposed to be greedy, but in preg_replace, it does not
> act greedy at all.
> That is what I am trying to understand.
Gotcha. Sorry. I totally missed the fact that you were testing this
with preg_match as well. Interesting. Going to take another look.
--
Koncept <<
"The snake that cannot shed its skin perishes. So do the spirits who are
prevented from changing their opinions; they cease to be a spirit." -Nietzsche
| |
| Koncept 2006-10-30, 7:03 pm |
| In article <1162238914.967613.57450@m73g2000cwd.googlegroups.com>,
<bwooster47@gmail.com> wrote:
> Exactly - * is supposed to be greedy, but in preg_replace, it does not
> act greedy at all.
> That is what I am trying to understand.
>
> For example, the following perl code:
> # change zero or more final / characters to A
> $s = 'aa//';
> $s =~ s/\/*$/A/;
> print $s; # this prints aaA, as expected
>
> Works as expected - get aaA printed.
>
> But the equivalent php code, using the supposedly perl-like
> preg_replace function, results in aaAA -- instead of a single captial
> A at end, it prints two, so this means it did not do a greedy match.
Very interesting.. If I set the limit modifier to 1, it seems that a
greedy match is made with the regexp whereas the initial limit modifier
of -1 (default) seems to match the initial "/" and then continue to
match the remaining characters - hence providing a result like "aaAA".
Here are my results:
Test with preg_match():
Array
(
[0] => ////////
)
Test with preg_replace() with limit:
Made 1 replacement(s) resulting in "aaA"
Test with preg_replace() with no limit:
Made 2 replacement(s) resulting in => "aaAA"
------------------------------------------------
<?php
header("Content-type:text/plain");
$string = "aa////////";
$pattern = "@/*$@";
$replace = "A";
$limit = 1;
$nolimit = -1;
preg_match( $pattern, $string, $matches );
echo "Test with preg_match():\n\n";
print_r( $matches );
echo "\nTest with preg_replace() with limit:\n";
$r = preg_replace( $pattern, $replace, $string, 1, $count );
echo " Made $count replacement(s) resulting in \"$r\"\n\n";
echo "Test with preg_replace() with no limit:\n";
$r = preg_replace( $pattern, $replace, $string, -1, $count );
echo " Made $count replacement(s) resulting in => \"$r\"\n\n";
?>
--
Koncept <<
"The snake that cannot shed its skin perishes. So do the spirits who are
prevented from changing their opinions; they cease to be a spirit." -Nietzsche
| |
| Andy Hassall 2006-10-30, 7:03 pm |
| On Mon, 30 Oct 2006 20:23:13 +0000, Andy Hassall <andy@andyh.co.uk> wrote:
> Once you realise preg_match implies /g, it is actually consistent.
Sorry, preg_replace implies /g.
preg_match_all would be the equivalent match function, consider this:
$ cat test.php
<?php
$s = "aa//";
preg_match_all('/\/*$/', $s, $a);
var_dump($a);
?>
$ php test.php
array(1) {
[0]=>
array(2) {
[0]=>
string(2) "//"
[1]=>
string(0) ""
}
}
Note the second empty match.
--
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
| |
| Koncept 2006-10-30, 7:03 pm |
| In article <ueqck21qc17f9bo7dig36lo93nnup8pin3@4ax.com>, Andy Hassall
<andy@andyh.co.uk> wrote:
> On Mon, 30 Oct 2006 20:23:13 +0000, Andy Hassall <andy@andyh.co.uk> wrote:
>
>
> Sorry, preg_replace implies /g.
>
> preg_match_all would be the equivalent match function, consider this:
>
> $ cat test.php
> <?php
> $s = "aa//";
> preg_match_all('/\/*$/', $s, $a);
> var_dump($a);
> ?>
>
> $ php test.php
> array(1) {
> [0]=>
> array(2) {
> [0]=>
> string(2) "//"
> [1]=>
> string(0) ""
> }
> }
>
> Note the second empty match.
Not sure if you noticed my other reply, but I observed that if you set
the limiter to 1 in preg_replace() your code will work as expected.
Otherwise, with the default of -1, the first / will be matched and then
the remainder ( resulting in 2 replacements - hence "aaAA" ). I agree
with you that the -1 *should* imply a global match and be greedy -
returning a default of "aaA" using a pattern like "@/*$@".
Noted Again:
--------
Test with preg_match():
Array
(
[0] => ////////
)
Test with preg_replace() with limit = 1:
Made 1 replacement(s) resulting in "aaA"
Test with preg_replace() with no limit = -1:
Made 2 replacement(s) resulting in => "aaAA"
------------------------------------------------
<?php
header("Content-type:text/plain");
$string = "aa////////";
$pattern = "@/*$@";
$replace = "A";
$limit = 1;
$nolimit = -1;
preg_match( $pattern, $string, $matches );
echo "Test with preg_match():\n\n";
print_r( $matches );
echo "\nTest with preg_replace() with limit:\n";
$r = preg_replace( $pattern, $replace, $string, 1, $count );
echo " Made $count replacement(s) resulting in \"$r\"\n\n";
echo "Test with preg_replace() with no limit:\n";
$r = preg_replace( $pattern, $replace, $string, -1, $count );
echo " Made $count replacement(s) resulting in => \"$r\"\n\n";
?>
--
Koncept <<
"The snake that cannot shed its skin perishes. So do the spirits who are
prevented from changing their opinions; they cease to be a spirit." -Nietzsche
| |
| bwooster47@gmail.com 2006-10-31, 7:56 am |
| Koncept wrote:
> Otherwise, with the default of -1, the first / will be matched and then
> the remainder ( resulting in 2 replacements - hence "aaAA" ). I agree
> with you that the -1 *should* imply a global match and be greedy -
> returning a default of "aaA" using a pattern like "@/*$@".
Yes, this the most surprising part - always ends up with two AA
characters, as you said, there seems to be a split in the way it
matches - it always matches twice - for however many / characters exist
in the input:
aa//
aa///
aa////
aa/////
etc
all get replaced by
aaAA
Very strange, indeed.
Everyone - thanks for your input, atleast I've a workaround, will use
limit of 1 to fix this problem.
> --
> Koncept <<
> "The snake that cannot shed its skin perishes. So do the spirits who are
> prevented from changing their opinions; they cease to be a spirit." -Nietzsche
|
|
|
|
|