For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > March 2006 > Why is my substitution doubling up on brackets?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Why is my substitution doubling up on brackets?
Marc Bissonnette

2006-03-27, 9:59 pm

Hi all;

I'm at a bit of a loss to figure out why the following code, when
substituting the matches, doubles up the brackets:

(I've commented out the loop that fixes it, but I am curious as to what
I've missed) - Any pointers would be most appreciated.
########################################
#############################
#!/usr/bin/perl
$text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct), Ralphs
(48ct), Food 4 Less, Walmart';
$text2=$text;
local @matches;
@matches = $text=~ m/(\(.*?\))/g;
$count=0;
foreach $match ( @matches ) {
if ($match =~ /\,/) {
$newmatch=$match;
$newmatch=~s/\,/\|/g;
$matches{$count}=$newmatch;
$originals{$count}=$match;
++$count;
} else {
$originals{$count}=$match;
$matches{$count}=$match;
++$count;
}
}
for (0..$count) {
next if ($originals{$_} eq undef);
$text=~ s/$originals{$_}/$matches{$_}/g;
}
#while ($text =~ /\(\(/ or $text =~ /\)\)/) {
# $text =~ s/\(\(/\(/g;
# $text =~ s/\)\)/\)/g;
#}
print "The original string: $text2\nThe modified string: $text\n";
print "\n\nDONE";
########################################
#############################

--
Marc Bissonnette
Looking for a new ISP? http://www.canadianisp.com
Largest ISP comparison site across Canada.
it_says_BALLS_on_your_forehead

2006-03-27, 9:59 pm


Marc Bissonnette wrote:
> Hi all;
>
> I'm at a bit of a loss to figure out why the following code, when
> substituting the matches, doubles up the brackets:
>
> (I've commented out the loop that fixes it, but I am curious as to what
> I've missed) - Any pointers would be most appreciated.
> ########################################
#############################
> #!/usr/bin/perl
> $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct), Ralphs
> (48ct), Food 4 Less, Walmart';
> $text2=$text;
> local @matches;
> @matches = $text=~ m/(\(.*?\))/g;
> $count=0;
> foreach $match ( @matches ) {
> if ($match =~ /\,/) {
> $newmatch=$match;
> $newmatch=~s/\,/\|/g;
> $matches{$count}=$newmatch;
> $originals{$count}=$match;
> ++$count;
> } else {
> $originals{$count}=$match;
> $matches{$count}=$match;
> ++$count;
> }
> }
> for (0..$count) {
> next if ($originals{$_} eq undef);
> $text=~ s/$originals{$_}/$matches{$_}/g;


the above line is the problem. when the pattern is interpolated, the
parentheses become capturing and grouping parentheses, so you are
actually telling the regex engine this:

$text =~ s/(40,100ct)/(40|100ct)/;

so the original text (the beginning) has:
Albertsons (40,100ct),...

and you are replacing the '40,100ct' part with '(40|100ct)', that's why
you are getting double parens.

> }


it_says_BALLS_on_your_forehead

2006-03-27, 9:59 pm


Marc Bissonnette wrote:
> Hi all;
>
> I'm at a bit of a loss to figure out why the following code, when
> substituting the matches, doubles up the brackets:
>
> (I've commented out the loop that fixes it, but I am curious as to what
> I've missed) - Any pointers would be most appreciated.
> ########################################
#############################
> #!/usr/bin/perl
> $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct), Ralphs
> (48ct), Food 4 Less, Walmart';
> $text2=$text;
> local @matches;
> @matches = $text=~ m/(\(.*?\))/g;
> $count=0;
> foreach $match ( @matches ) {
> if ($match =~ /\,/) {
> $newmatch=$match;
> $newmatch=~s/\,/\|/g;
> $matches{$count}=$newmatch;
> $originals{$count}=$match;
> ++$count;
> } else {
> $originals{$count}=$match;
> $matches{$count}=$match;
> ++$count;
> }
> }
> for (0..$count) {
> next if ($originals{$_} eq undef);
> $text=~ s/$originals{$_}/$matches{$_}/g;
> }
> #while ($text =~ /\(\(/ or $text =~ /\)\)/) {
> # $text =~ s/\(\(/\(/g;
> # $text =~ s/\)\)/\)/g;
> #}
> print "The original string: $text2\nThe modified string: $text\n";
> print "\n\nDONE";
> ########################################
#############################


i'm not sure what the possible values of your string are (i.e. are
nested parens possible? or unbalanced parens?).

if you only have 1 level deep parens, and they are always balanced,
this should work:

use strict;

my $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct),
Ralphs (48ct), Food 4 Less, Walmart';

$text =~ s/(\(.*?),(.*\))/$1\|$2/g;

print $text, "\n";

John W. Krahn

2006-03-27, 9:59 pm

Marc Bissonnette wrote:
>
> I'm at a bit of a loss to figure out why the following code, when
> substituting the matches, doubles up the brackets:
>
> (I've commented out the loop that fixes it, but I am curious as to what
> I've missed) - Any pointers would be most appreciated.
> ########################################
#############################
> #!/usr/bin/perl


use warnings;
use strict;

> $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct), Ralphs
> (48ct), Food 4 Less, Walmart';
> $text2=$text;
> local @matches;
> @matches = $text=~ m/(\(.*?\))/g;
> $count=0;
> foreach $match ( @matches ) {
> if ($match =~ /\,/) {
> $newmatch=$match;
> $newmatch=~s/\,/\|/g;
> $matches{$count}=$newmatch;
> $originals{$count}=$match;
> ++$count;
> } else {
> $originals{$count}=$match;
> $matches{$count}=$match;
> ++$count;
> }
> }
> for (0..$count) {


You have an off-by-one error because $count will always be one greater then
the last element. Why not just use arrays instead of hashes if all the
indexes are contiguous integers?

> next if ($originals{$_} eq undef);


You cannot use undef in that way, you have to use the defined() function.

> $text=~ s/$originals{$_}/$matches{$_}/g;


The parentheses in $originals{$_} are interpreted by the regular expression
engine as capturing parentheses thus do not match the literal parentheses in
the string.

> }
> #while ($text =~ /\(\(/ or $text =~ /\)\)/) {
> # $text =~ s/\(\(/\(/g;
> # $text =~ s/\)\)/\)/g;
> #}
> print "The original string: $text2\nThe modified string: $text\n";
> print "\n\nDONE";
> ########################################
#############################


Your program could more simply be written as:

#!/usr/bin/perl
use warnings;
use strict;

my $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct), Ralphs
(48ct), Food 4 Less, Walmart';
my $text2 = $text;

$text =~ s{ ( \( [^()]+ \) ) }{ ( my $x = $1 ) =~ tr!,!|!; $x }exg;

print "The original string: $text2\nThe modified string: $text\n";
print "\n\nDONE";




John
--
use Perl;
program
fulfillment
Dr.Ruud

2006-03-27, 9:59 pm

it_says_BALLS_on_your_forehead schreef:
> Marc Bissonnette wrote:


>
> the above line is the problem. when the pattern is interpolated, the
> parentheses become capturing and grouping parentheses


See also \Q in perlre.

--
Affijn, Ruud

"Gewoon is een tijger."
echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'
Dr.Ruud

2006-03-27, 9:59 pm

Marc Bissonnette schreef:

> I'm at a bit of a loss to figure out why the following code, when
> substituting the matches, doubles up the brackets:


> @matches = $text=~ m/(\(.*?\))/g;


@matches = $text=~ m/\((.*?)\)/g;

But you also need to put
use strict;
use warnings;
at te start, and properly declare your variables.

I assume that you want to change each embedded comma to a bar.

#!/usr/bin/perl
use strict;
use warnings;

my $old = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct),
Ralphs (48ct), Food 4 Less, Walmart';

my $new = $old;
1 while $new =~ s/([(][^(,)]*),([^)]*[)])/$1|$2/);

print "old: $old\n";
print "new: $new\n";

--
Affijn, Ruud

"Gewoon is een tijger."
echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'

Marc Bissonnette

2006-03-27, 9:59 pm

"John W. Krahn" <someone@example.com> wrote in
news:Q30Wf.11158$B_1.8488@edtnps89:

> Marc Bissonnette wrote:
>
> use warnings;
> use strict;
>
>
> You have an off-by-one error because $count will always be one greater
> then the last element. Why not just use arrays instead of hashes if
> all the indexes are contiguous integers?
>
>
> You cannot use undef in that way, you have to use the defined()
> function.
>
>
> The parentheses in $originals{$_} are interpreted by the regular
> expression engine as capturing parentheses thus do not match the
> literal parentheses in the string.
>
>
> Your program could more simply be written as:
>
> #!/usr/bin/perl
> use warnings;
> use strict;
>
> my $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct),
> Ralphs (48ct), Food 4 Less, Walmart';
> my $text2 = $text;
>
> $text =~ s{ ( \( [^()]+ \) ) }{ ( my $x = $1 ) =~ tr!,!|!; $x }exg;
>
> print "The original string: $text2\nThe modified string: $text\n";
> print "\n\nDONE";


Many thanks to everyone (Simon, Ruud, John) who replied; I didn't (should
have, obviously) know that the parentheses in the hash would be
interpreted as capturing. The regexes you've graciously shown as examples
are much more elegant than my own :)

Thanks again!


--
Marc Bissonnette
Looking for a new ISP? http://www.canadianisp.com
Largest ISP comparison site across Canada.
Dr.Ruud

2006-03-28, 4:00 am

it_says_BALLS_on_your_forehead schreef:

> if you only have 1 level deep parens, and they are always balanced,
> this should work:
>
> use strict;
>
> my $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct),
> Ralphs (48ct), Food 4 Less, Walmart';
>
> $text =~ s/(\(.*?),(.*\))/$1\|$2/g;
>
> print $text, "\n";


That fails for multiple embedded commas.
There is no need to escape the vbar in the replacement part.

--
Affijn, Ruud

"Gewoon is een tijger."
echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'
it_says_BALLS_on_your_forehead

2006-03-28, 7:00 pm


Dr.Ruud wrote:
> it_says_BALLS_on_your_forehead schreef:
>
>
> That fails for multiple embedded commas.


you are correct, i should have listed that in the constraints. that's
why i wanted to know the exact specs of what the string format should
be.

> There is no need to escape the vbar in the replacement part.


yeah, force of habit :-).

>
> --
> Affijn, Ruud
>
> "Gewoon is een tijger."
> echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'


Dr.Ruud

2006-03-28, 7:00 pm

it_says_BALLS_on_your_forehead schreef:
> Dr.Ruud:


>
> you are correct, i should have listed that in the constraints. that's
> why i wanted to know the exact specs of what the string format should
> be.


So I shouldn't have called it failing: maybe m.e.c. can never occur, or
your result was as wanted.

--
Affijn, Ruud

"Gewoon is een tijger."
echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'

Marc Bissonnette

2006-03-28, 7:00 pm

"it_says_BALLS_on_your_forehead" <simon.chao@gmail.com> wrote in
news:1143548768.151738.80200@g10g2000cwb.googlegroups.com:

>
> Dr.Ruud wrote:
>
> you are correct, i should have listed that in the constraints. that's
> why i wanted to know the exact specs of what the string format should
> be.
>
>
> yeah, force of habit :-).


LOL, so I'm not the only one :)


--
Marc Bissonnette
Looking for a new ISP? http://www.canadianisp.com
Largest ISP comparison site across Canada.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com