For Programmers: Free Programming Magazines  


Home > Archive > ithreads > May 2008 > p5p summary: Improving threads::shared ?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author p5p summary: Improving threads::shared ?
Dean Arnold

2008-05-03, 8:24 pm

(Apologies for not tracking p5p closely, I just
don't have time these days)

I've been pouring over the p5p archives trying to find
what the subject p5p summary item is about, but wo/ any luck.
Can someone point out the relevent thread title, or maybe
summarize what "threads::shared could share aggregates
properly with only Perl level changes to shared.pm"
means ?

TIA,
Dean Arnold
Jerry D. Hedden

2008-05-04, 9:01 am

Dean Arnold wrote:
> I've been pouring over the p5p archives trying to find
> what the subject p5p summary item is about, but wo/ any luck.
> Can someone point out the relevent thread title, or maybe
> summarize what "threads::shared could share aggregates
> properly with only Perl level changes to shared.pm"
> means ?


It means that something like this would DWIM:

my $x : shared;
$x = [ { 'complex' => 'aggregate' }, [ qw/ currently not sharable / ] ];

To do that now you'd have to do:

my $x : shared;
$x = &share([]);
push(@$x, &share({}), &share([]));
$$x[0]{'complex'} = 'aggregate';
push(@{$$x[1]}, qw/ currently not sharable /);

The problem comes with copying complex structures:

my $x = [ { 'complex' => 'aggregate' }, [ qw/ currently not sharable / ] ];

my $y : shared;
$y = $x;

How do you handle the assignment? With regular scalars, $x
and $y share copies of the same structure such that changes
via $x are visible via $y. However, with the above, you'd
have to clone the structure pointed to by $x and assign that
clone to $y such that they would be independent.

And it gets worse: Suppose the hash inside $x is already
shared. Technically, you don't need to clone that portion.

And then you'd have to distinguish (in code) how to
differentiant between the above, and say $z = $x when both
are already shared (in which case no cloning is needed).

The rules would need to be defined in detail and then
documented clearly so that programming don't get befuddled
trying to figure them out.

Thread::Queue has the Perl code needed for making 'complete'
shared clones of data structures (i.e., all parts whether
shared or not are cloned). Tweaking it to not clone already
shared portions is trivial.

Then we just need a tie-in with the assignment operator. Or
perhaps we could just provide a function:

$y = shared_clone($x);
Dean Arnold

2008-05-04, 7:55 pm

Jerry D. Hedden wrote:
> Dean Arnold wrote:
>
> It means that something like this would DWIM:
>
> my $x : shared;
> $x = [ { 'complex' => 'aggregate' }, [ qw/ currently not sharable / ] ];
>


<snip/>

>
> Thread::Queue has the Perl code needed for making 'complete'
> shared clones of data structures (i.e., all parts whether
> shared or not are cloned). Tweaking it to not clone already
> shared portions is trivial.
>


I see someone's been busy ;^)

Alas, there's one catch that the current T::Q implementation
doesn't cover: recursive structures:
(Using AS Perl 5.8.8, WinXP, T::Q 2.06)

use strict;
use warnings;
use threads;
use threads::shared;
use Thread::Queue;
use Data::Dumper;

my $q = Thread::Queue->new();

my $x = [ { 'complex' => 'aggregate' },
[ qw/ currently not sharable / ] ];

my $thrd = threads->create(\&receiver, $q);

$q->enqueue($x);

$thrd->join();

print "*** non-recursive OK\n";

#
# Make it a recursive structure
# (!!! this will choke)
#
push @$x, $x;

$thrd = threads->create(\&receiver, $q);

$q->enqueue($x);

$thrd->join();

print "*** recursive OK\n";


sub receiver {
my $q = shift;

my $data = $q->dequeue();
print "** in child:\n", Dumper($data), "\n";
}


The 2nd part emits a bunch of

"Deep recursion on anonymous subroutine at C:/Perl/lib/Thread/Queue.pm line 181"
warnings and then hangs.

While I agree that recursive structures should be discouraged, and don't
expect T::Q (or threads::shared, ftm) to try to trap them, the T::Q POD
probably needs to include a warning about it. And in the case of queueing
blessed objects, I'd expect instances of circular refs between objects
not to be uncommon.

FWIW: I had looked into this issue for Thread::Queue::Duplex and
Thread::Apartment, and punted the deepcopy clone to Storable instead
because of the circular ref issue.

OTOH, if a bitflag could be grabbed
somewhere to do a mark/sweep, this might be solvable
Or maybe using a fieldhash to key the original ref thats
being cloned, do a quick lookup, and if it exists, then skip it
ala testing for shared-ness ? kinda slow, and maybe a memory pig
for deep structures, but should solve the issue.

> Then we just need a tie-in with the assignment operator. Or
> perhaps we could just provide a function:
>
> $y = shared_clone($x);
>


Would an assignment op overload work ?
I.e., if the LHS was already shared(), then the = overload
would do the deepcopy ? Or would that break the
XS tie/magic side of the code ?

I guess it wouldn't break *if* the clone operation gets pushed
down to XS; then the STORE operation just detects the
value as an unshared ref (as it already does), and performs
the deepcopy. At which point some add'l optimization (e.g.,
grabbing the global lock once for the entire copy operation)
could be applied.

At which point, turning an existing populated hash/array
into a shared variable could also be updated to preserve the contents.

- Dean
Dean Arnold

2008-05-04, 7:55 pm

Dean Arnold wrote:
> Jerry D. Hedden wrote:
>
> <snip/>
>
>
> I see someone's been busy ;^)
>
> Alas, there's one catch that the current T::Q implementation
> doesn't cover: recursive structures:
> (Using AS Perl 5.8.8, WinXP, T::Q 2.06)
>


<snip/>

> OTOH, if a bitflag could be grabbed
> somewhere to do a mark/sweep, this might be solvable
> Or maybe using a fieldhash to key the original ref thats
> being cloned, do a quick lookup, and if it exists, then skip it


<snip/>

Momentary insanity:
Why not reuse the existing perl_clone code ?
Assuming an appropriate entry point could be
found *and* perl_clone can be fooled into
using the shared global interpretter instead
of creating a new interpretter, it should be
possible to do a complete clone wo/ circular issues.
And it would presumably support creation of
shared filehandles.

OTOH, the last time I tried to decipher perl_clone(),
I ended up w/ a 3 day headache, so it may be
intractable.

- Dean
Jerry D. Hedden

2008-05-05, 8:17 pm

Dean Arnold wrote:
> Alas, there's one catch that the current T::Q
> implementation doesn't cover: recursive structures:


Oh, crud. Thanks for catching that.

> Or maybe using a fieldhash to key the original ref thats
> being cloned, do a quick lookup, and if it exists, then
> skip it


That was my thought, too. I'll work on it.

> Momentary insanity: Why not reuse the existing perl_clone
> code ?


Egads! I've no clue about this.

> Would an assignment op overload work? I.e., if the LHS
> was already shared(), then the = overload would do the
> deepcopy? Or would that break the XS tie/magic side of the
> code?


I had the same idea. I can't think of why it might conflict
with anything the XS code does (but that thought doesn't
have much to back it up).

What are your thoughts regarding structures that mix shared
and non-shared elements? Should a whole separate clone be
made, or should that shared portions (i.e., refs) be used in
the copy? (I'm thinking the latter, while messier, makes
more sense.)
Dean Arnold

2008-05-05, 8:17 pm

Jerry D. Hedden wrote:
>
> What are your thoughts regarding structures that mix shared
> and non-shared elements? Should a whole separate clone be
> made, or should that shared portions (i.e., refs) be used in
> the copy? (I'm thinking the latter, while messier, makes
> more sense.)
>


I agree. That most closely matches perl_clone behavior,
ie., preserving shared variables between threads.
I don't know that its messier, but it will certainly need to
be documented.

BTW: have you looked into reusing or integrating with any of the
Clone modules ? At present, I don't think they handle
shared variables, but they're the usual mechanism
for cloning structures, and a couple of them use XS for
performance.

- Dean
Jerry D. Hedden

2008-05-05, 8:17 pm

Dean Arnold wrote:
> BTW: have you looked into reusing or integrating with any
> of the Clone modules ? At present, I don't think they
> handle shared variables, but they're the usual mechanism
> for cloning structures, and a couple of them use XS for
> performance.


I'm not familiar with any of them. However, if they don't
handled shared variables, I'm not sure they'd be useful
unless they were modified to do make shared copies at each
step.

As a first step, I'll work on fixing the circular references
issue in Thread::Queue. (I think I have it done, but need
to test it.) Then work on moving that code to
threads::shared in conjunction with overloading the '='
operator. After that, I could look into other modules
and/or further optimizations.

I've attached my reworked version of T::Q that I think takes
care of circular references. Would you mind giving it a
going over? (It passes all the currents tests in its test
suite, so I know I didn't break anything.) Thanks.

Dean Arnold

2008-05-05, 8:17 pm

Jerry D. Hedden wrote:
>
> I've attached my reworked version of T::Q that I think takes
> care of circular references. Would you mind giving it a
> going over? (It passes all the currents tests in its test
> suite, so I know I didn't break anything.) Thanks.


My sample worked OK *after* I removed the Data::Dumper
print; apparently, it has some issues w/ circular
references as well (I tried setting Deepcopy and
Purity, to no avail); I suspect its due to the fact
that references to shared elements don't have the same
stringified value, so it doesn't actually look like
a circular ref.

Anyway, my updated version of the test follows.

- Dean

use strict;
use warnings;
use threads;
use threads::shared;
use Thread::Queue;
use Data::Dumper;

my $q = Thread::Queue->new();

my $x = [ { 'complex' => 'aggregate' },
[ qw/ currently not sharable / ] ];

my $thrd = threads->create(\&receiver, $q);

$q->enqueue($x);

$thrd->join();

print "*** non-recursive OK\n";

#
# Make it a recursive structure
# (!!! this will choke)
#
push @$x, $x;

$thrd = threads->create(\&receiver, $q);

$q->enqueue($x);

print "*** enq'd\n";

$thrd->join();

print "*** recursive OK\n";


sub receiver {
my $q = shift;
print "*** starting child\n";
my $data = $q->dequeue();
print "*** deq'd\n";

if ($#$data == 1) {
print "** in child:\n", Dumper($data), "\n";
}
elsif (($#$data != 2) || (ref $data->[2] ne 'ARRAY')) {
print "NOT OK: $#$data $data $data->[2]\n";
}
else {
print "** in child:\n", Dumper($data->[0], $data->[1]), "\n";
}
}


Jerry D. Hedden

2008-05-05, 8:17 pm

> My sample worked OK *after* I removed the Data::Dumper
> print; apparently, it has some issues w/ circular
> references as well (I tried setting Deepcopy and
> Purity, to no avail); I suspect its due to the fact
> that references to shared elements don't have the same
> stringified value, so it doesn't actually look like
> a circular ref.


Oh, yes. That's part of what I reported here:

http://rt.perl.org/rt3/Public/Bug/Display.html?id=37946

The following highlights the problem:

use strict;
use warnings;

use threads;
use threads::shared;

my $x;
$x = \$x;
share($x);

print("Look at \$x:\n");
print($x, "\n");
print($$x, "\n");
print($$$x, "\n");
print($$$$x, "\n");
print($$$$$x, "\n");
print($$$$$$x, "\n");

my @q :shared = ($x);

my $y = $q[0];

print("\nFirst look at \$y:\n");
print($y, "\n");
print($$y, "\n");
print($$$y, "\n");
print($$$$y, "\n");
print($$$$$y, "\n");
print($$$$$$y, "\n");

print("\nSecond look at \$y:\n");
print($y, "\n");
print($$y, "\n");
print($$$y, "\n");
print($$$$y, "\n");
print($$$$$y, "\n");
print($$$$$$y, "\n");

This outputs:

Look at $x:
REF(0x144f8f0)
REF(0x144f8f0)
REF(0x144f8f0)
REF(0x144f8f0)
REF(0x144f8f0)
REF(0x144f8f0)

First look at $y:
SCALAR(0x1423c70)
SCALAR(0x1423ad8)
SCALAR(0x14bd968)
SCALAR(0x14bd980)
SCALAR(0x14bd998)
SCALAR(0x14bd9b0)

Second look at $y:
REF(0x1423c70)
REF(0x1423ad8)
REF(0x14bd968)
REF(0x14bd980)
REF(0x14bd998)
SCALAR(0x14bd9b0)

Seems to me that this is a bug. It shows that
threads::shared isn't detecting that it's dealing with a
circular reference. With each level that the app traverses,
threads::shared "unrolls" another version of the shared
variable.
Jerry D. Hedden

2008-05-05, 8:17 pm

Jerry D. Hedden wrote:
> Look at $x:
> REF(0x144f8f0)
> REF(0x144f8f0)
> REF(0x144f8f0)
> REF(0x144f8f0)
> REF(0x144f8f0)
> REF(0x144f8f0)
>
> First look at $y:
> SCALAR(0x1423c70)
> SCALAR(0x1423ad8)
> SCALAR(0x14bd968)
> SCALAR(0x14bd980)
> SCALAR(0x14bd998)
> SCALAR(0x14bd9b0)
>
> Second look at $y:
> REF(0x1423c70)
> REF(0x1423ad8)
> REF(0x14bd968)
> REF(0x14bd980)
> REF(0x14bd998)
> SCALAR(0x14bd9b0)
>
> Seems to me that this is a bug. It shows that
> threads::shared isn't detecting that it's dealing with a
> circular reference. With each level that the app traverses,
> threads::shared "unrolls" another version of the shared
> variable.


The more I think about it, the more I'm convinced this needs
to be fixed. The original variable $x is a circular
reference, and it's finiteness is detectable. The copy made
to $y has become an infinitely deep reference with lazy
evaluation.
Jerry D. Hedden

2008-05-05, 8:17 pm

erry D. Hedden wrote:
> Look at $x:
> REF(0x144f8f0)
> REF(0x144f8f0)
> REF(0x144f8f0)
> REF(0x144f8f0)
> REF(0x144f8f0)
> REF(0x144f8f0)
>
> First look at $y:
> SCALAR(0x1423c70)
> SCALAR(0x1423ad8)
> SCALAR(0x14bd968)
> SCALAR(0x14bd980)
> SCALAR(0x14bd998)
> SCALAR(0x14bd9b0)
>
> Second look at $y:
> REF(0x1423c70)
> REF(0x1423ad8)
> REF(0x14bd968)
> REF(0x14bd980)
> REF(0x14bd998)
> SCALAR(0x14bd9b0)
>
> Seems to me that this is a bug. It shows that
> threads::shared isn't detecting that it's dealing with a
> circular reference. With each level that the app traverses,
> threads::shared "unrolls" another version of the shared
> variable.


> The more I think about it, the more I'm convinced this needs
> to be fixed. The original variable $x is a circular
> reference, and it's finiteness is detectable. The copy made
> to $y has become an infinitely deep reference with lazy
> evaluation.


I've been looking at how to fix this, but know understanding
of the internals of threads::shared is weak.

Conceptually, I think it would require keeping a weak
reference of each private SV associated with a shared SV.
This would need to be done on a per-thread basis. Then when
a thread tries to reference a shared SV, a lookup would be
made to see if a private SV already exists (and still
exists) for that thread. If so, that is returned (and the
ref count of the weak ref is incremented?). If not, a new
private SV is created, and a weak ref to it is appropriately
stored.

Is this a logical approach? If so, is it doable?
Jerry D. Hedden

2008-05-05, 8:17 pm

> Is this a logical approach? If so, is it doable?

If circular references can't be fully supported in
threads::shared, then I need to document this in its POD and
in Thread::Queue's POD, too. Do you agree?
Dean Arnold

2008-05-05, 8:17 pm

Jerry D. Hedden wrote:
> erry D. Hedden wrote:
>
>
> I've been looking at how to fix this, but know understanding
> of the internals of threads::shared is weak.
>
> Conceptually, I think it would require keeping a weak
> reference of each private SV associated with a shared SV.
> This would need to be done on a per-thread basis. Then when
> a thread tries to reference a shared SV, a lookup would be
> made to see if a private SV already exists (and still
> exists) for that thread. If so, that is returned (and the
> ref count of the weak ref is incremented?). If not, a new
> private SV is created, and a weak ref to it is appropriately
> stored.
>
> Is this a logical approach? If so, is it doable?
>


By coincidence, I've been doing related
work on Thread::Sociable's STM implementation.
It has to keep a "shadow" proxy of each variable
in order to avoid the same referencing issue
(otherwise, it could create multiple different
transactional versions of the same referenced variable)

For threads::shared, the only solution I can think of is
adding a fieldhash to the thread-private my_ctx structure
keyed by the address of the referent variable's
shared version. Then each attempt to create a new proxy
would lookup any existing persistent proxy for the shared
SV, and return it if found (currently, each new reference
to a shared variable creates a new proxy, which is what causes
this mess).

It may create refcounting issues; the private proxy
would need to be refcounted every time the shared
variable was refcounted in the same thread. (Alternately,
the thread could refcount only the proxy, and then decrement
the shared versions refcount only when the private proxy's
refcount dropped to zero)

I also don't know how it would effect taking ref's of
shared array/hash elements.

And this will probably slow things down even further.

- Dean
Dean Arnold

2008-05-05, 8:17 pm

Jerry D. Hedden wrote:
>
> If circular references can't be fully supported in
> threads::shared, then I need to document this in its POD and
> in Thread::Queue's POD, too. Do you agree?
>


I don't know that they can't be "supported"; but they do
need to be explained. I also don't see how it effects
T::Q: since you always skip over anything thats already
shared, existing circular refs aren't an issue. Its only
detecting and creating new shared circular refs that
causes a problem, and thats fixed w/ the fieldhash lookup
(which is keyed on the private address, not the shared
version).

For any existing apps (eg, Data::Dumper) that want to
deal with it, they could always fallback to detecting something
as shared and saving its id (ie, the shared interpretter version's
address) to detect cycles. Not pretty, but effective.

- Dean
Dean Arnold

2008-05-06, 7:59 pm

Dean Arnold wrote:
> Jerry D. Hedden wrote:
>
> For threads::shared, the only solution I can think of is
> adding a fieldhash to the thread-private my_ctx structure
> keyed by the address of the referent variable's
> shared version. Then each attempt to create a new proxy
> would lookup any existing persistent proxy for the shared
> SV, and return it if found (currently, each new reference
> to a shared variable creates a new proxy, which is what causes
> this mess).
>


I forgot one not-so-minor detail: clone processing
would need to be updated to
detect and replace the fieldhash'd proxies, and
update each SV that invoked the magic dup() method

- Dean
Jerry D. Hedden

2008-05-07, 7:47 pm

On Mon, May 5, 2008 at 1:53 PM, Jerry D. Hedden <jdhedden@cpan.org> wrote:
>
> Oh, yes. That's part of what I reported here:
>
> http://rt.perl.org/rt3/Public/Bug/Display.html?id=37946
>
> The following highlights the problem:
>
>
> use strict;
> use warnings;
>
> use threads;
> use threads::shared;
>
> my $x;
> $x = \$x;
> share($x);
>
> print("Look at \$x:\n");
> print($x, "\n");
> print($$x, "\n");
> print($$$x, "\n");
> print($$$$x, "\n");
> print($$$$$x, "\n");
> print($$$$$$x, "\n");
>
> my @q :shared = ($x);
>
> my $y = $q[0];
>
> print("\nFirst look at \$y:\n");
> print($y, "\n");
> print($$y, "\n");
> print($$$y, "\n");
> print($$$$y, "\n");
> print($$$$$y, "\n");
> print($$$$$$y, "\n");
>
> print("\nSecond look at \$y:\n");
> print($y, "\n");
> print($$y, "\n");
> print($$$y, "\n");
> print($$$$y, "\n");
> print($$$$$y, "\n");
> print($$$$$$y, "\n");
>
> This outputs:
>
> Look at $x:
> REF(0x144f8f0)
> REF(0x144f8f0)
> REF(0x144f8f0)
> REF(0x144f8f0)
> REF(0x144f8f0)
> REF(0x144f8f0)
>
> First look at $y:
> SCALAR(0x1423c70)
> SCALAR(0x1423ad8)
> SCALAR(0x14bd968)
> SCALAR(0x14bd980)
> SCALAR(0x14bd998)
> SCALAR(0x14bd9b0)
>
> Second look at $y:
> REF(0x1423c70)
> REF(0x1423ad8)
> REF(0x14bd968)
> REF(0x14bd980)
> REF(0x14bd998)
> SCALAR(0x14bd9b0)
>
> Seems to me that this is a bug. It shows that
> threads::shared isn't detecting that it's dealing with a
> circular reference. With each level that the app traverses,
> threads::shared "unrolls" another version of the shared
> variable.


I just posted a patch to blead that fixes this. If the patch passes
muster, I'll release an update for threads::shared to CPAN.
Jerry D. Hedden

2008-05-08, 7:53 pm

> > It means that something like this would DWIM:
> ];
>
> Would an assignment op overload work ?
> I.e., if the LHS was already shared(), then the = overload
> would do the deepcopy ? Or would that break the
> XS tie/magic side of the code ?


Oops. Shared variables aren't objects. So I don't think we
can use 'overload' on '='. Is that correct?
Jerry D. Hedden

2008-05-08, 7:53 pm

> For any existing apps (eg, Data::Dumper) that want to
> deal with it, they could always fallback to detecting
> something as shared and saving its id (ie, the shared
> interpretter version's address) to detect cycles. Not
> pretty, but effective.


I looked into fixing Data::Dumper for this. The circular
reference checking is done both in Perl (for the pure-Perl
version) and in XS (the usual version). It'll take me
awhile to figure out how to do it in XS.
Jerry D. Hedden

2008-05-08, 7:53 pm

> perhaps we could just provide a function:
>
> $y = shared_clone($x);


What name should this function have?

shared_clone()
clone()
shared_copy()
copy()
make_shared()

Or something else?
Dean Arnold

2008-05-08, 7:53 pm

Jerry D. Hedden wrote:
>
> What name should this function have?
>
> shared_clone()
> clone()
> shared_copy()
> copy()
> make_shared()
>
> Or something else?
>


I'd vote against clone() or copy(), as they're too
general. Otherwise, I personally have no opinion,
tho clone has come to mean this sort of deepcopy
(see the various Clone modules), so I suppose
shared_clone() may make sense.

- Dean
Jerry D. Hedden

2008-05-17, 8:51 am

Jerry D. Hedden wrote:
>
> perhaps we could just provide a function:
>
> $y = shared_clone($x);


I uploaded threads::shared 1.21 which now has this functionality.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com