For Programmers: Free Programming Magazines  


Home > Archive > PHP Language > October 2006 > preg_match works, preg_replace does not - why?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author preg_match works, preg_replace does not - why?
bwooster47@gmail.com

2006-10-30, 7:03 pm

PHP 5.1.4 (cli) (built: May 8 2006 08:41:41)

Ran from command line, here's the php file - need to replace zero or
more / characters at end of a string, this works fine in perl, but not
in php - any suggestions?

<?php
$s = "aa//";

preg_match('/\/*$/', $s, $a);

echo "matches :" . $a[0] . ":\n"; // output: matches ://:

$t = preg_replace('/\/*$/', 'A', $s);

echo "after replace :$t:, expecting aaA\n"; // output: after replace
:aaAA:
// should be aaA and not aaAA ??

$t = preg_replace("/\/+$/", 'A', $s); // works

echo "after replace :$t:, expecting aaA\n"; // output: after replace
:aaA:
?>

Koncept

2006-10-30, 7:03 pm

In article <1162142987.595717.254360@h48g2000cwc.googlegroups.com>,
<bwooster47@gmail.com> wrote:

> <?php
> $s = "aa//";
>
> preg_match('/\/*$/', $s, $a);
>
> echo "matches :" . $a[0] . ":\n"; // output: matches ://:
>
> $t = preg_replace('/\/*$/', 'A', $s);
>
> echo "after replace :$t:, expecting aaA\n"; // output: after replace
> :aaAA:
> // should be aaA and not aaAA ??
>
> $t = preg_replace("/\/+$/", 'A', $s); // works
>
> echo "after replace :$t:, expecting aaA\n"; // output: after replace
> :aaA:
> ?>



* is greedy and will match as much as possible (including the first
slash). If you change you instances of * to +, you will get the desired
result.

--
Koncept <<
"The snake that cannot shed its skin perishes. So do the spirits who are
prevented from changing their opinions; they cease to be a spirit." -Nietzsche
bwooster47@gmail.com

2006-10-30, 7:03 pm

Exactly - * is supposed to be greedy, but in preg_replace, it does not
act greedy at all.
That is what I am trying to understand.

For example, the following perl code:
# change zero or more final / characters to A
$s = 'aa//';
$s =~ s/\/*$/A/;
print $s; # this prints aaA, as expected

Works as expected - get aaA printed.

But the equivalent php code, using the supposedly perl-like
preg_replace function, results in aaAA -- instead of a single captial
A at end, it prints two, so this means it did not do a greedy match.

And, I don't want to use + instead of star - because + would not add a
A if there were 0 slashes - I definitely need *, to use for "replace
zero or more / at end, with a single A character".

Koncept wrote:
> In article <1162142987.595717.254360@h48g2000cwc.googlegroups.com>,
> * is greedy and will match as much as possible (including the first
> slash). If you change you instances of * to +, you will get the desired
> result.
>
> --
> Koncept <<
> "The snake that cannot shed its skin perishes. So do the spirits who are
> prevented from changing their opinions; they cease to be a spirit." -Nietzsche


Andy Hassall

2006-10-30, 7:03 pm

On 29 Oct 2006 09:29:47 -0800, bwooster47@gmail.com wrote:

>PHP 5.1.4 (cli) (built: May 8 2006 08:41:41)
>
>Ran from command line, here's the php file - need to replace zero or
>more / characters at end of a string, this works fine in perl, but not
>in php - any suggestions?
>
><?php
>$s = "aa//";
>
>preg_match('/\/*$/', $s, $a);
>
>echo "matches :" . $a[0] . ":\n"; // output: matches ://:
>
>$t = preg_replace('/\/*$/', 'A', $s);
>
>echo "after replace :$t:, expecting aaA\n"; // output: after replace
>:aaAA:
>// should be aaA and not aaAA ??
>
>$t = preg_replace("/\/+$/", 'A', $s); // works
>
>echo "after replace :$t:, expecting aaA\n"; // output: after replace
>:aaA:
>?>


Consider:

$t = preg_replace('/(\/*)$/', '($1->A)', $s);

From this, I get the output:

after replace :aa(//->A)(->A):, expecting aaA

This is not very clear, but bear in mind that preg_replace implicitly uses the
/g flag from Perl.

\/* will happily match a null string, so there is a greedy match of two
slashes, and also the empty space between the two slashes and the end of
string.

If you change it to \/+, then you force a single match only, which has a
non-zero number of slashes.

To complete the picture, consider the following in Perl:

andyh@excession ~
$ cat test.pl
#!/usr/bin/env perl

my $s = 'aa//';
$s =~ s/\/*$/A/g;
print $s . "\n";

andyh@excession ~
$ ./test.pl
aaAA

Once you realise preg_match implies /g, it is actually consistent.

--
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Koncept

2006-10-30, 7:03 pm

In article <1162238914.967613.57450@m73g2000cwd.googlegroups.com>,
<bwooster47@gmail.com> wrote:

> Exactly - * is supposed to be greedy, but in preg_replace, it does not
> act greedy at all.
> That is what I am trying to understand.


Gotcha. Sorry. I totally missed the fact that you were testing this
with preg_match as well. Interesting. Going to take another look.

--
Koncept <<
"The snake that cannot shed its skin perishes. So do the spirits who are
prevented from changing their opinions; they cease to be a spirit." -Nietzsche
Koncept

2006-10-30, 7:03 pm

In article <1162238914.967613.57450@m73g2000cwd.googlegroups.com>,
<bwooster47@gmail.com> wrote:

> Exactly - * is supposed to be greedy, but in preg_replace, it does not
> act greedy at all.
> That is what I am trying to understand.
>
> For example, the following perl code:
> # change zero or more final / characters to A
> $s = 'aa//';
> $s =~ s/\/*$/A/;
> print $s; # this prints aaA, as expected
>
> Works as expected - get aaA printed.
>
> But the equivalent php code, using the supposedly perl-like
> preg_replace function, results in aaAA -- instead of a single captial
> A at end, it prints two, so this means it did not do a greedy match.


Very interesting.. If I set the limit modifier to 1, it seems that a
greedy match is made with the regexp whereas the initial limit modifier
of -1 (default) seems to match the initial "/" and then continue to
match the remaining characters - hence providing a result like "aaAA".

Here are my results:

Test with preg_match():

Array
(
[0] => ////////
)

Test with preg_replace() with limit:
Made 1 replacement(s) resulting in "aaA"

Test with preg_replace() with no limit:
Made 2 replacement(s) resulting in => "aaAA"

------------------------------------------------
<?php
header("Content-type:text/plain");
$string = "aa////////";
$pattern = "@/*$@";
$replace = "A";
$limit = 1;
$nolimit = -1;
preg_match( $pattern, $string, $matches );
echo "Test with preg_match():\n\n";
print_r( $matches );
echo "\nTest with preg_replace() with limit:\n";
$r = preg_replace( $pattern, $replace, $string, 1, $count );
echo " Made $count replacement(s) resulting in \"$r\"\n\n";
echo "Test with preg_replace() with no limit:\n";
$r = preg_replace( $pattern, $replace, $string, -1, $count );
echo " Made $count replacement(s) resulting in => \"$r\"\n\n";
?>

--
Koncept <<
"The snake that cannot shed its skin perishes. So do the spirits who are
prevented from changing their opinions; they cease to be a spirit." -Nietzsche
Andy Hassall

2006-10-30, 7:03 pm

On Mon, 30 Oct 2006 20:23:13 +0000, Andy Hassall <andy@andyh.co.uk> wrote:

> Once you realise preg_match implies /g, it is actually consistent.


Sorry, preg_replace implies /g.

preg_match_all would be the equivalent match function, consider this:

$ cat test.php
<?php
$s = "aa//";
preg_match_all('/\/*$/', $s, $a);
var_dump($a);
?>

$ php test.php
array(1) {
[0]=>
array(2) {
[0]=>
string(2) "//"
[1]=>
string(0) ""
}
}

Note the second empty match.

--
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Koncept

2006-10-30, 7:03 pm

In article <ueqck21qc17f9bo7dig36lo93nnup8pin3@4ax.com>, Andy Hassall
<andy@andyh.co.uk> wrote:

> On Mon, 30 Oct 2006 20:23:13 +0000, Andy Hassall <andy@andyh.co.uk> wrote:
>
>
> Sorry, preg_replace implies /g.
>
> preg_match_all would be the equivalent match function, consider this:
>
> $ cat test.php
> <?php
> $s = "aa//";
> preg_match_all('/\/*$/', $s, $a);
> var_dump($a);
> ?>
>
> $ php test.php
> array(1) {
> [0]=>
> array(2) {
> [0]=>
> string(2) "//"
> [1]=>
> string(0) ""
> }
> }
>
> Note the second empty match.


Not sure if you noticed my other reply, but I observed that if you set
the limiter to 1 in preg_replace() your code will work as expected.
Otherwise, with the default of -1, the first / will be matched and then
the remainder ( resulting in 2 replacements - hence "aaAA" ). I agree
with you that the -1 *should* imply a global match and be greedy -
returning a default of "aaA" using a pattern like "@/*$@".

Noted Again:
--------

Test with preg_match():

Array
(
[0] => ////////
)

Test with preg_replace() with limit = 1:
Made 1 replacement(s) resulting in "aaA"

Test with preg_replace() with no limit = -1:
Made 2 replacement(s) resulting in => "aaAA"

------------------------------------------------
<?php
header("Content-type:text/plain");
$string = "aa////////";
$pattern = "@/*$@";
$replace = "A";
$limit = 1;
$nolimit = -1;
preg_match( $pattern, $string, $matches );
echo "Test with preg_match():\n\n";
print_r( $matches );
echo "\nTest with preg_replace() with limit:\n";
$r = preg_replace( $pattern, $replace, $string, 1, $count );
echo " Made $count replacement(s) resulting in \"$r\"\n\n";
echo "Test with preg_replace() with no limit:\n";
$r = preg_replace( $pattern, $replace, $string, -1, $count );
echo " Made $count replacement(s) resulting in => \"$r\"\n\n";
?>

--
Koncept <<
"The snake that cannot shed its skin perishes. So do the spirits who are
prevented from changing their opinions; they cease to be a spirit." -Nietzsche
bwooster47@gmail.com

2006-10-31, 7:56 am

Koncept wrote:

> Otherwise, with the default of -1, the first / will be matched and then
> the remainder ( resulting in 2 replacements - hence "aaAA" ). I agree
> with you that the -1 *should* imply a global match and be greedy -
> returning a default of "aaA" using a pattern like "@/*$@".


Yes, this the most surprising part - always ends up with two AA
characters, as you said, there seems to be a split in the way it
matches - it always matches twice - for however many / characters exist
in the input:
aa//
aa///
aa////
aa/////
etc
all get replaced by
aaAA

Very strange, indeed.

Everyone - thanks for your input, atleast I've a workaround, will use
limit of 1 to fix this problem.

> --
> Koncept <<
> "The snake that cannot shed its skin perishes. So do the spirits who are
> prevented from changing their opinions; they cease to be a spirit." -Nietzsche


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com