Home > Archive > PERL Miscellaneous > March 2006 > Why is my substitution doubling up on brackets?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Why is my substitution doubling up on brackets?
|
|
| Marc Bissonnette 2006-03-27, 9:59 pm |
| Hi all;
I'm at a bit of a loss to figure out why the following code, when
substituting the matches, doubles up the brackets:
(I've commented out the loop that fixes it, but I am curious as to what
I've missed) - Any pointers would be most appreciated.
########################################
#############################
#!/usr/bin/perl
$text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct), Ralphs
(48ct), Food 4 Less, Walmart';
$text2=$text;
local @matches;
@matches = $text=~ m/(\(.*?\))/g;
$count=0;
foreach $match ( @matches ) {
if ($match =~ /\,/) {
$newmatch=$match;
$newmatch=~s/\,/\|/g;
$matches{$count}=$newmatch;
$originals{$count}=$match;
++$count;
} else {
$originals{$count}=$match;
$matches{$count}=$match;
++$count;
}
}
for (0..$count) {
next if ($originals{$_} eq undef);
$text=~ s/$originals{$_}/$matches{$_}/g;
}
#while ($text =~ /\(\(/ or $text =~ /\)\)/) {
# $text =~ s/\(\(/\(/g;
# $text =~ s/\)\)/\)/g;
#}
print "The original string: $text2\nThe modified string: $text\n";
print "\n\nDONE";
########################################
#############################
--
Marc Bissonnette
Looking for a new ISP? http://www.canadianisp.com
Largest ISP comparison site across Canada.
| |
| it_says_BALLS_on_your_forehead 2006-03-27, 9:59 pm |
|
Marc Bissonnette wrote:
> Hi all;
>
> I'm at a bit of a loss to figure out why the following code, when
> substituting the matches, doubles up the brackets:
>
> (I've commented out the loop that fixes it, but I am curious as to what
> I've missed) - Any pointers would be most appreciated.
> ########################################
#############################
> #!/usr/bin/perl
> $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct), Ralphs
> (48ct), Food 4 Less, Walmart';
> $text2=$text;
> local @matches;
> @matches = $text=~ m/(\(.*?\))/g;
> $count=0;
> foreach $match ( @matches ) {
> if ($match =~ /\,/) {
> $newmatch=$match;
> $newmatch=~s/\,/\|/g;
> $matches{$count}=$newmatch;
> $originals{$count}=$match;
> ++$count;
> } else {
> $originals{$count}=$match;
> $matches{$count}=$match;
> ++$count;
> }
> }
> for (0..$count) {
> next if ($originals{$_} eq undef);
> $text=~ s/$originals{$_}/$matches{$_}/g;
the above line is the problem. when the pattern is interpolated, the
parentheses become capturing and grouping parentheses, so you are
actually telling the regex engine this:
$text =~ s/(40,100ct)/(40|100ct)/;
so the original text (the beginning) has:
Albertsons (40,100ct),...
and you are replacing the '40,100ct' part with '(40|100ct)', that's why
you are getting double parens.
> }
| |
| it_says_BALLS_on_your_forehead 2006-03-27, 9:59 pm |
|
Marc Bissonnette wrote:
> Hi all;
>
> I'm at a bit of a loss to figure out why the following code, when
> substituting the matches, doubles up the brackets:
>
> (I've commented out the loop that fixes it, but I am curious as to what
> I've missed) - Any pointers would be most appreciated.
> ########################################
#############################
> #!/usr/bin/perl
> $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct), Ralphs
> (48ct), Food 4 Less, Walmart';
> $text2=$text;
> local @matches;
> @matches = $text=~ m/(\(.*?\))/g;
> $count=0;
> foreach $match ( @matches ) {
> if ($match =~ /\,/) {
> $newmatch=$match;
> $newmatch=~s/\,/\|/g;
> $matches{$count}=$newmatch;
> $originals{$count}=$match;
> ++$count;
> } else {
> $originals{$count}=$match;
> $matches{$count}=$match;
> ++$count;
> }
> }
> for (0..$count) {
> next if ($originals{$_} eq undef);
> $text=~ s/$originals{$_}/$matches{$_}/g;
> }
> #while ($text =~ /\(\(/ or $text =~ /\)\)/) {
> # $text =~ s/\(\(/\(/g;
> # $text =~ s/\)\)/\)/g;
> #}
> print "The original string: $text2\nThe modified string: $text\n";
> print "\n\nDONE";
> ########################################
#############################
i'm not sure what the possible values of your string are (i.e. are
nested parens possible? or unbalanced parens?).
if you only have 1 level deep parens, and they are always balanced,
this should work:
use strict;
my $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct),
Ralphs (48ct), Food 4 Less, Walmart';
$text =~ s/(\(.*?),(.*\))/$1\|$2/g;
print $text, "\n";
| |
| John W. Krahn 2006-03-27, 9:59 pm |
| Marc Bissonnette wrote:
>
> I'm at a bit of a loss to figure out why the following code, when
> substituting the matches, doubles up the brackets:
>
> (I've commented out the loop that fixes it, but I am curious as to what
> I've missed) - Any pointers would be most appreciated.
> ########################################
#############################
> #!/usr/bin/perl
use warnings;
use strict;
> $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct), Ralphs
> (48ct), Food 4 Less, Walmart';
> $text2=$text;
> local @matches;
> @matches = $text=~ m/(\(.*?\))/g;
> $count=0;
> foreach $match ( @matches ) {
> if ($match =~ /\,/) {
> $newmatch=$match;
> $newmatch=~s/\,/\|/g;
> $matches{$count}=$newmatch;
> $originals{$count}=$match;
> ++$count;
> } else {
> $originals{$count}=$match;
> $matches{$count}=$match;
> ++$count;
> }
> }
> for (0..$count) {
You have an off-by-one error because $count will always be one greater then
the last element. Why not just use arrays instead of hashes if all the
indexes are contiguous integers?
> next if ($originals{$_} eq undef);
You cannot use undef in that way, you have to use the defined() function.
> $text=~ s/$originals{$_}/$matches{$_}/g;
The parentheses in $originals{$_} are interpreted by the regular expression
engine as capturing parentheses thus do not match the literal parentheses in
the string.
> }
> #while ($text =~ /\(\(/ or $text =~ /\)\)/) {
> # $text =~ s/\(\(/\(/g;
> # $text =~ s/\)\)/\)/g;
> #}
> print "The original string: $text2\nThe modified string: $text\n";
> print "\n\nDONE";
> ########################################
#############################
Your program could more simply be written as:
#!/usr/bin/perl
use warnings;
use strict;
my $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct), Ralphs
(48ct), Food 4 Less, Walmart';
my $text2 = $text;
$text =~ s{ ( \( [^()]+ \) ) }{ ( my $x = $1 ) =~ tr!,!|!; $x }exg;
print "The original string: $text2\nThe modified string: $text\n";
print "\n\nDONE";
John
--
use Perl;
program
fulfillment
| |
| Dr.Ruud 2006-03-27, 9:59 pm |
| it_says_BALLS_on_your_forehead schreef:
> Marc Bissonnette wrote:
>
> the above line is the problem. when the pattern is interpolated, the
> parentheses become capturing and grouping parentheses
See also \Q in perlre.
--
Affijn, Ruud
"Gewoon is een tijger."
echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'
| |
| Dr.Ruud 2006-03-27, 9:59 pm |
| Marc Bissonnette schreef:
> I'm at a bit of a loss to figure out why the following code, when
> substituting the matches, doubles up the brackets:
> @matches = $text=~ m/(\(.*?\))/g;
@matches = $text=~ m/\((.*?)\)/g;
But you also need to put
use strict;
use warnings;
at te start, and properly declare your variables.
I assume that you want to change each embedded comma to a bar.
#!/usr/bin/perl
use strict;
use warnings;
my $old = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct),
Ralphs (48ct), Food 4 Less, Walmart';
my $new = $old;
1 while $new =~ s/([(][^(,)]*),([^)]*[)])/$1|$2/);
print "old: $old\n";
print "new: $new\n";
--
Affijn, Ruud
"Gewoon is een tijger."
echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'
| |
| Marc Bissonnette 2006-03-27, 9:59 pm |
| "John W. Krahn" <someone@example.com> wrote in
news:Q30Wf.11158$B_1.8488@edtnps89:
> Marc Bissonnette wrote:
>
> use warnings;
> use strict;
>
>
> You have an off-by-one error because $count will always be one greater
> then the last element. Why not just use arrays instead of hashes if
> all the indexes are contiguous integers?
>
>
> You cannot use undef in that way, you have to use the defined()
> function.
>
>
> The parentheses in $originals{$_} are interpreted by the regular
> expression engine as capturing parentheses thus do not match the
> literal parentheses in the string.
>
>
> Your program could more simply be written as:
>
> #!/usr/bin/perl
> use warnings;
> use strict;
>
> my $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct),
> Ralphs (48ct), Food 4 Less, Walmart';
> my $text2 = $text;
>
> $text =~ s{ ( \( [^()]+ \) ) }{ ( my $x = $1 ) =~ tr!,!|!; $x }exg;
>
> print "The original string: $text2\nThe modified string: $text\n";
> print "\n\nDONE";
Many thanks to everyone (Simon, Ruud, John) who replied; I didn't (should
have, obviously) know that the parentheses in the hash would be
interpreted as capturing. The regexes you've graciously shown as examples
are much more elegant than my own :)
Thanks again!
--
Marc Bissonnette
Looking for a new ISP? http://www.canadianisp.com
Largest ISP comparison site across Canada.
| |
| Dr.Ruud 2006-03-28, 4:00 am |
| it_says_BALLS_on_your_forehead schreef:
> if you only have 1 level deep parens, and they are always balanced,
> this should work:
>
> use strict;
>
> my $text = 'Albertsons (40,100ct), Vons(40ct), Stater Brothers(40ct),
> Ralphs (48ct), Food 4 Less, Walmart';
>
> $text =~ s/(\(.*?),(.*\))/$1\|$2/g;
>
> print $text, "\n";
That fails for multiple embedded commas.
There is no need to escape the vbar in the replacement part.
--
Affijn, Ruud
"Gewoon is een tijger."
echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'
| |
| it_says_BALLS_on_your_forehead 2006-03-28, 7:00 pm |
|
Dr.Ruud wrote:
> it_says_BALLS_on_your_forehead schreef:
>
>
> That fails for multiple embedded commas.
you are correct, i should have listed that in the constraints. that's
why i wanted to know the exact specs of what the string format should
be.
> There is no need to escape the vbar in the replacement part.
yeah, force of habit :-).
>
> --
> Affijn, Ruud
>
> "Gewoon is een tijger."
> echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'
| |
| Dr.Ruud 2006-03-28, 7:00 pm |
| it_says_BALLS_on_your_forehead schreef:
> Dr.Ruud:
>
> you are correct, i should have listed that in the constraints. that's
> why i wanted to know the exact specs of what the string format should
> be.
So I shouldn't have called it failing: maybe m.e.c. can never occur, or
your result was as wanted.
--
Affijn, Ruud
"Gewoon is een tijger."
echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'
| |
| Marc Bissonnette 2006-03-28, 7:00 pm |
| "it_says_BALLS_on_your_forehead" <simon.chao@gmail.com> wrote in
news:1143548768.151738.80200@g10g2000cwb.googlegroups.com:
>
> Dr.Ruud wrote:
>
> you are correct, i should have listed that in the constraints. that's
> why i wanted to know the exact specs of what the string format should
> be.
>
>
> yeah, force of habit :-).
LOL, so I'm not the only one :)
--
Marc Bissonnette
Looking for a new ISP? http://www.canadianisp.com
Largest ISP comparison site across Canada.
|
|
|
|
|