Home > Archive > PERL Miscellaneous > November 2005 > Bug in &= (bitwise or)
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Bug in &= (bitwise or)
|
|
| Anno Siegel 2005-10-31, 7:00 pm |
| I am observing this strange behavior:
# prepare a string
my $str = 'aa';
$str &= 'a'; # shorten it
print "str: $str\n"; # a single "a" as expected
# $str = "$str"; # this heals the defect (if any)
# something is wrong, though
die "Ha!\n" unless $str =~ /a+$/; # this dies!
The pattern should, of course, match. Similar patterns, like /a$/ and
/a+/ do match, but /a+$/ isn't recognized. Copying the string into itself
normalizes the behavior. "use bytes" makes no difference.
Whether the bug (or am I missing something?) is in &= or the regex
engine (gasp) is anyone's guess. My money is on string-truncation
by &=. It would be rarely-exercised code, other bitwise operations
don't shorten.
Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
| |
| Tassilo v. Parseval 2005-10-31, 7:00 pm |
| Also sprach Anno Siegel:
> I am observing this strange behavior:
>
> # prepare a string
> my $str = 'aa';
> $str &= 'a'; # shorten it
> print "str: $str\n"; # a single "a" as expected
>
> # $str = "$str"; # this heals the defect (if any)
>
> # something is wrong, though
> die "Ha!\n" unless $str =~ /a+$/; # this dies!
>
> The pattern should, of course, match. Similar patterns, like /a$/ and
> /a+/ do match, but /a+$/ isn't recognized. Copying the string into itself
> normalizes the behavior. "use bytes" makes no difference.
>
> Whether the bug (or am I missing something?) is in &= or the regex
> engine (gasp) is anyone's guess. My money is on string-truncation
> by &=. It would be rarely-exercised code, other bitwise operations
> don't shorten.
After the bitwise-and, the string appears not to be NULL-terminated any
longer, at least not at the offset where perl usually finds the NULL
termination. That might be confusing the regex engine.
For testing what the raw string looks like after the bitwise-and, you
can use:
use Inline C => Config => BUILD_NOISY => 1;
use Inline C => <<'EOC';
void test (SV *sv) {
int i = 0;
char *c = SvPVX(sv);
while (i++ < SvLEN(sv))
printf("%i,", *c++);
sv_dump(sv);
}
EOC
my $a = 'aa';
$a &= 'a';
test($a);
Then I am not sure myself what the result of
$s = 'aa' & 'a'
should be.
Tassilo
--
use bigint;
$n=7142335034377028016139702633033737113
9054411854220053437565440;
$m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
| |
| attn.steven.kuo@gmail.com 2005-10-31, 7:00 pm |
| Anno Siegel wrote:
> I am observing this strange behavior:
>
> # prepare a string
> my $str = 'aa';
> $str &= 'a'; # shorten it
> print "str: $str\n"; # a single "a" as expected
>
> # $str = "$str"; # this heals the defect (if any)
>
> # something is wrong, though
> die "Ha!\n" unless $str =~ /a+$/; # this dies!
>
> The pattern should, of course, match. Similar patterns, like /a$/ and
> /a+/ do match, but /a+$/ isn't recognized. Copying the string into itself
> normalizes the behavior. "use bytes" makes no difference.
>
> Whether the bug (or am I missing something?) is in &= or the regex
> engine (gasp) is anyone's guess. My money is on string-truncation
> by &=. It would be rarely-exercised code, other bitwise operations
> don't shorten.
According to Devel::P , string truncation results
in a non-NUL terminated Perl string? Not sure
if this narrows down the problem...
use Devel::P ;
my $string = 'aa';
Dump($string);
$string &= 'a';
Dump($string);
$string = "$string";
Dump($string);
__END__
SV = PV(0x11ac80) at 0x129f90
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x11a4f8 "aa"\0
CUR = 2
LEN = 3
SV = PV(0x11ac80) at 0x129f90
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x11a4f8 "a"
CUR = 1
LEN = 3
SV = PV(0x11ac80) at 0x129f90
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x11a4f8 "a"\0
CUR = 1
LEN = 3
--
Regards,
Steven
| |
| Abigail 2005-10-31, 7:00 pm |
| Anno Siegel (anno4000@lublin.zrz.tu-berlin.de) wrote on MMMMCDXLIV
September MCMXCIII in <URL:news:dk5rri$anu$1@mamenchi.zrz.TU-Berlin.DE>:
$$ I am observing this strange behavior:
$$
$$ # prepare a string
$$ my $str = 'aa';
$$ $str &= 'a'; # shorten it
$$ print "str: $str\n"; # a single "a" as expected
$$
$$ # $str = "$str"; # this heals the defect (if any)
$$
$$ # something is wrong, though
$$ die "Ha!\n" unless $str =~ /a+$/; # this dies!
$$
$$ The pattern should, of course, match. Similar patterns, like /a$/ and
$$ /a+/ do match, but /a+$/ isn't recognized. Copying the string into itself
$$ normalizes the behavior. "use bytes" makes no difference.
$$
$$ Whether the bug (or am I missing something?) is in &= or the regex
$$ engine (gasp) is anyone's guess. My money is on string-truncation
$$ by &=. It would be rarely-exercised code, other bitwise operations
$$ don't shorten.
I think &= is broken:
use Devel::P ;
my $str1 = "aa"; $str1 &= "a";
my $str2 = "a";
my $str3 = "aa" & "a";
Dump $str1;
Dump $str2;
Dump $str3;
__END__
SV = PV(0x8183010) at 0x8182ca8
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x818a690 "a"
CUR = 1
LEN = 3
SV = PV(0x8183154) at 0x8182cf0
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x818c0f8 "a"\0
CUR = 1
LEN = 2
SV = PV(0x8182ff8) at 0x81821e0
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x818c0f8 "a"\0
CUR = 1
LEN = 2
--
my $qr = qr/^.+?(;).+?\1|;Just another Perl Hacker;|;.+$/;
$qr =~ s/$qr//g;
print $qr, "\n";
| |
| Big and Blue 2005-10-31, 7:00 pm |
| Anno Siegel wrote:
> I am observing this strange behavior:
I'm observing one in the subject - given that &= is a bitwise AND.
>
> # prepare a string
> my $str = 'aa';
> $str &= 'a'; # shorten it
Hmmmm...surely you've changed the last character to a NUL byte?
> The pattern should, of course, match. Similar patterns, like /a$/ and
> /a+/ do match, but /a+$/ isn't recognized.
Which woudl fit with the string actually being "a\000".
> Whether the bug (or am I missing something?) is in &= or the regex
> engine (gasp) is anyone's guess. My money is on string-truncation
> by &=. It would be rarely-exercised code, other bitwise operations
> don't shorten.
If you "use re qw( debug );" and change the &= line to:
$str &= "\000a";
you'll find that this leaves you with "\000a", so I'n guessing that the
string you have created does end with a NUL, but Perl is as to
whether it is there?
--
Just because I've written it doesn't mean that
either you or I have to believe it.
| |
| Ilya Zakharevich 2005-10-31, 7:00 pm |
| [A complimentary Cc of this posting was sent to
Tassilo v. Parseval
<tassilo.von.parseval@rwth-aachen.de>], who wrote in article <3snf0uFotcprU1@news.dfncis.de>:
> After the bitwise-and, the string appears not to be NULL-terminated any
> longer, at least not at the offset where perl usually finds the NULL
> termination. That might be confusing the regex engine.
>
> For testing what the raw string looks like after the bitwise-and, you
> can use:
perl -MDevel::P -wle "my $a = q(aa); $a &= q(a); print Dump $a"
SV = PV(0x40c64) at 0x40a24
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x42020 "a"
CUR = 1
LEN = 3
As you can see, PV is not null-terminated. Here is how
null-terminated stuff is output:
perl -MDevel::P -wle "my $a = q(a); print Dump $a"
SV = PV(0x40c64) at 0x40a24
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x42020 "a"\0
CUR = 1
LEN = 2
Hope this helps,
Ilya
| |
| Ilya Zakharevich 2005-10-31, 7:00 pm |
| [A complimentary Cc of this posting was sent to
Anno Siegel
<anno4000@lublin.zrz.tu-berlin.de>], who wrote in article <dk5rri$anu$1@mamenchi.zrz.TU-Berlin.DE>:
> my $str = 'aa';
> $str &= 'a'; # shorten it
> die "Ha!\n" unless $str =~ /a+$/; # this dies!
> Whether the bug (or am I missing something?) is in &= or the regex
> engine (gasp) is anyone's guess.
Both.
&= should (as any Perl operation) produce \0-terminated string.
REx engine (as any Perl operation) should work on non-\0-terminated
strings too.
The only reason to have \0-termination is to allow the string to be
passed to system calls (like open()) AS IS.
Hope this helps,
Ilya
| |
| Tassilo v. Parseval 2005-11-01, 3:57 am |
| Also sprach Ilya Zakharevich:
> [A complimentary Cc of this posting was sent to
> Tassilo v. Parseval
><tassilo.von.parseval@rwth-aachen.de>], who wrote in article <3snf0uFotcprU1@news.dfncis.de>:
>
> perl -MDevel::P -wle "my $a = q(aa); $a &= q(a); print Dump $a"
>
> SV = PV(0x40c64) at 0x40a24
> REFCNT = 1
> FLAGS = (PADBUSY,PADMY,POK,pPOK)
> PV = 0x42020 "a"
> CUR = 1
> LEN = 3
>
> As you can see, PV is not null-terminated. Here is how
> null-terminated stuff is output:
Yes, well, I am aware of Dump() and how a NULL-termination is rendered.
It was after I saw the above output that I became curious and wanted to
see what characters actually were in the PV slot.
To me it seems that the perl core isn't quite sure whether it should
adhere to SvCUR or instead rather believe what is in PV. In any case,
perl obviously gets when
SvPVX(sv)[SvCUR(sv)] != '\0'
Did someone already file a bugreport?
Tassilo
--
use bigint;
$n=7142335034377028016139702633033737113
9054411854220053437565440;
$m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
| |
| Anno Siegel 2005-11-01, 8:01 am |
| Tassilo v. Parseval <tassilo.von.parseval@rwth-aachen.de> wrote in comp.lang.perl.misc:
> Did someone already file a bugreport?
I will. Want to check against bleadperl first. I'll also at least go
through the motions of seeing if it has been reported before.
Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
| |
| Anno Siegel 2005-11-01, 6:58 pm |
| Anno Siegel <anno4000@lublin.zrz.tu-berlin.de> wrote in comp.lang.perl.misc:
> Tassilo v. Parseval <tassilo.von.parseval@rwth-aachen.de> wrote in
> comp.lang.perl.misc:
>
>
> I will. Want to check against bleadperl first. I'll also at least go
> through the motions of seeing if it has been reported before.
[Anno again]
The bug is still in perl-5.9.2, I've sent a report. Fun with perlbug, as
usual.
BTW, the combination of bitwise operations and regex matching that tickles
the bug isn't as exotic as it may seem. When you work with vec(), trailing
zero bytes in a string are essentially invisible -- strings behave as if
padded with infinitely many zeroes. Therefore trailing zeroes can make
strings look different (to eq) that are really the same as far as vec()
is concerned. To get rid of trailing zeroes, s/\0+$// offers itself,
particularly after &= which may have created them even if the operands
didn't have any.
Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
| |
| Ilya Zakharevich 2005-11-01, 6:58 pm |
| [A complimentary Cc of this posting was sent to
Tassilo v. Parseval
<tassilo.von.parseval@rwth-aachen.de>], who wrote in article <3snf0uFotcprU1@news.dfncis.de>:
> For testing what the raw string looks like after the bitwise-and, you
> can use:
Is not it much easier to parse the output of Devel::P , and read the
PV by unpack()?
> my $a = 'aa';
> $a &= 'a';
> test($a);
For those who are too lazy to run this, the result it
97,97,0
> Then I am not sure myself what the result of
>
> $s = 'aa' & 'a'
>
> should be.
I think the current result is both correct and intuitive enough
(modulo two bugs which comprise this problem). It is compatible with
both
a) junk-in-junk-out ("what is after end of 'a' is junk")
b) strings behave as if followed by infinitely many \0s.
By (b), the output string should also be considered as having
infinitely many \0s; the question is where to stop this flow. And (a)
looks as a reasonable argument to choose this cut-off point.
[My opinion may be a little bit skewed, since I do not remember
whether it was me who decided on this behaviour. ;-]
Hope this helps,
Ilya
| |
| Tassilo v. Parseval 2005-11-02, 3:57 am |
| Also sprach Ilya Zakharevich:
><tassilo.von.parseval@rwth-aachen.de>], who wrote in article <3snf0uFotcprU1@news.dfncis.de>:
>
> Is not it much easier to parse the output of Devel::P , and read the
> PV by unpack()?
No, it wasn't for me. :-)
Can you give an example how to do it with unpack? I feel the 'P'
template is needed but I never know how to use that one.
>
> For those who are too lazy to run this, the result it
>
> 97,97,0
>
>
> I think the current result is both correct and intuitive enough
> (modulo two bugs which comprise this problem). It is compatible with
> both
>
> a) junk-in-junk-out ("what is after end of 'a' is junk")
> b) strings behave as if followed by infinitely many \0s.
>
> By (b), the output string should also be considered as having
> infinitely many \0s; the question is where to stop this flow. And (a)
> looks as a reasonable argument to choose this cut-off point.
What are those two bugs you mentioned? For me the real bug is that an
'impossible' string value can be constructed thus. I would expect:
('aa' & 'a') eq "a\0"
Taking (b) into account, the smaller string should be padded with '\0'
which, on bit-wise ANDing, should yield '\0'.
There's another oddity:
$ perl -MDevel::P -e 'my $a = 'aa'; $a &= 'a'; Dump($a)'
SV = PV(0x814ce90) at 0x814cc6c
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x815d628 "a"
CUR = 1
LEN = 3
$ perl -MDevel::P -e 'my $a = 'aa' & 'a'; Dump($a)'
SV = PV(0x814cf20) at 0x814cc6c
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x815c0e8 "a"\0
CUR = 1
LEN = 2
Why are those two not equivalent?
> [My opinion may be a little bit skewed, since I do not remember
> whether it was me who decided on this behaviour. ;-]
I am sure it is. ;-)
Tassilo
--
use bigint;
$n=7142335034377028016139702633033737113
9054411854220053437565440;
$m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
| |
| Ilya Zakharevich 2005-11-02, 3:57 am |
| [A complimentary Cc of this posting was sent to
Tassilo v. Parseval
<tassilo.von.parseval@rwth-aachen.de>], who wrote in article <3sr8rnFppq7qU1@news.dfncis.de>:
> Also sprach Ilya Zakharevich:
>
>
> No, it wasn't for me. :-)
>
> Can you give an example how to do it with unpack? I feel the 'P'
> template is needed but I never know how to use that one.
You are right: I thought that one can easily get the result of Dump
into a variable. Probably not easy... So to do it without fork()
would not be easy:
#!/usr/bin/perl -wl
use strict;
use Devel::P ;
# Prepare what to inspect
my $a = 'aa';
$a &= 'a';
defined (my $pid = open my $p, '-|') or die "Can't fork() to self-pipe: $!";
if ($pid) { # parent
my $out;
{
local $/;
$out = <$p>;
close $p or die;
}
# Parse output of Dump using the expected format below:
my ($addr, $len) = ($out =~ m/
^ \s+ PV \s* = \s* (0x[[:xdigit:]]+) \b
.*?
^ \s+ LEN \s* = \s* (\d+) \b
/xsm);
die "unexpected format of output of Dump" unless $addr and $len;
my $buff = unpack "P$len", pack 'J', hex $addr;
print ord for split //, $buff;
} else { # kid
open STDERR, '>&', \*STDOUT or die;
Dump $a;
###SV = PV(0x40c64) at 0x40a24
### REFCNT = 1
### FLAGS = (PADBUSY,PADMY,POK,pPOK)
### PV = 0x42020 "a"
### CUR = 1
### LEN = 3
}
__END__
> What are those two bugs you mentioned? For me the real bug is that an
> 'impossible' string value can be constructed thus.
Well, the REx engine operates in terms of start-of-string and
end-of-string. It should not read behind.
Moreover, IMO, it is important to support variables which are not
\0-terminated as wide as possible. E.g., this way one could do
substr() with copy-on-modify semantic.
> I would expect:
>
> ('aa' & 'a') eq "a\0"
>
> Taking (b) into account, the smaller string should be padded with '\0'
> which, on bit-wise ANDing, should yield '\0'.
.... And, since this \0 comes from "extrapolated" values, it should be
"deextrapotated"; in other words, stripped.
> There's another oddity:
> $ perl -MDevel::P -e 'my $a = 'aa'; $a &= 'a'; Dump($a)'
> SV = PV(0x814ce90) at 0x814cc6c
> REFCNT = 1
> FLAGS = (PADBUSY,PADMY,POK,pPOK)
> PV = 0x815d628 "a"
> CUR = 1
> LEN = 3
We know this already...
> $ perl -MDevel::P -e 'my $a = 'aa' & 'a'; Dump($a)'
> SV = PV(0x814cf20) at 0x814cc6c
> REFCNT = 1
> FLAGS = (PADBUSY,PADMY,POK,pPOK)
> PV = 0x815c0e8 "a"\0
> CUR = 1
> LEN = 2
Here 'aa' & 'a' is a temporary; most probably not \0-terminated. Now
the assignment operator fills $a from the values in the temporary; as
any well-behaved Perl operator, it does not care whether there is a
trailing \0. So it does not know that the temporary is "buggy".
Hope this helps,
Ilya
| |
| Tassilo v. Parseval 2005-11-02, 7:57 am |
| Also sprach Ilya Zakharevich:
> [A complimentary Cc of this posting was sent to
> Tassilo v. Parseval
><tassilo.von.parseval@rwth-aachen.de>], who wrote in article <3sr8rnFppq7qU1@news.dfncis.de>:
>
> You are right: I thought that one can easily get the result of Dump
> into a variable. Probably not easy... So to do it without fork()
> would not be easy:
[...]
Ah, thank you. I have to make a mental note that the p/P templates work
on memory addresses (I don't like the term 'pointer' which is used in
`perldoc -f pack`).
>
> Well, the REx engine operates in terms of start-of-string and
> end-of-string. It should not read behind.
Agreed.
> Moreover, IMO, it is important to support variables which are not
> \0-terminated as wide as possible. E.g., this way one could do
> substr() with copy-on-modify semantic.
Is that the current state of the affairs or rather an item on the
wishlist.
>
> ... And, since this \0 comes from "extrapolated" values, it should be
> "deextrapotated"; in other words, stripped.
I have to admit that I never really read what perlop has to say on the
bit-wise AND for strings of differing length. Now that Abigail spelled
it out for me in that parallel posting I see it a little more clearly.
>
>
> We know this already...
>
>
> Here 'aa' & 'a' is a temporary; most probably not \0-terminated. Now
> the assignment operator fills $a from the values in the temporary; as
> any well-behaved Perl operator, it does not care whether there is a
> trailing \0. So it does not know that the temporary is "buggy".
That can't be the explanation, because:
$ perl -MDevel::P -e 'my ($b, $c) = qw/aa a/; my $a = $b & $c; Dump($a)'
SV = PV(0x814ce78) at 0x8160d28
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x8166520 "a"\0
CUR = 1
LEN = 2
and:
$ perl -MDevel::P -e 'my $b = q/aa/; my $a = $b & 'a'; Dump($a)'
SV = PV(0x814cf38) at 0x8160cd8
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x8163d48 "a"\0
CUR = 1
LEN = 2
Tassilo
--
use bigint;
$n=7142335034377028016139702633033737113
9054411854220053437565440;
$m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
| |
| Ilya Zakharevich 2005-11-02, 6:57 pm |
| [A complimentary Cc of this posting was sent to
Tassilo v. Parseval
<tassilo.von.parseval@rwth-aachen.de>], who wrote in article <3srj9jFpot6oU1@news.dfncis.de>:
[color=darkred]
> Is that the current state of the affairs or rather an item on the
> wishlist.
It is one of those things perl *must* have to be considered a serious
string-manipulation language. Without efficient and flexible "string
type" many operations which would be easy to do in many other
languages would take centuries in Perl (linear algorithms become
quadratic in Perl).
I do not expect that 5.9 has it (although this particular part would
be easy to implement). Please surprise me. ;-)
[color=darkred]
> That can't be the explanation
However, it is. ;-)
> because:
I do not see why you think your examples contradict my argument. All
of them inspect results of assignment operator. In all of them the
result is fine (as my explanation implies).
Hope this helps,
Ilya
| |
| Anno Siegel 2005-11-15, 7:00 pm |
| Anno Siegel <anno4000@lublin.zrz.tu-berlin.de> wrote in comp.lang.perl.misc:
> Anno Siegel <anno4000@lublin.zrz.tu-berlin.de> wrote in comp.lang.perl.misc:
>
> [Anno again]
>
> The bug is still in perl-5.9.2, I've sent a report. Fun with perlbug, as
> usual.
....and fixed, at least the bug in &= is. The one in m// (relying on a
trailing zero) seems to be still there, but now it will be harder to
produce such strings in Perl.
The bug tracking ticket is #37616, if anyone cares.
Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
|
|
|
|
|