Home > Archive > ithreads > July 2007 > forks: shared variables between different applications or hosts
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
forks: shared variables between different applications or hosts
|
|
| Alvar Freude 2007-07-01, 7:19 pm |
| Hi,
as I remember in an older documentation of forks.pm there was mentioned
that it would be possible to use shared variables between different
machines or completely different applications on one machine.
Is this possible?
I'm in the planing state of an (forking) application, which gets some
commands from another application and it might be very easy to use such a
feature instead of manually freeze/thaw the data and send it via manually
opened sockets.
Or is this a bad idea?
Ciao
Alvar
--
** Alvar C.H. Freude, http://alvar.a-blast.org/
**_http://www.assoziations-blaster.de/
** http://www.wen-waehlen.de/
** http://odem.org/
| |
|
|
| Eric Rybski 2007-07-04, 7:21 pm |
|
--- Alvar Freude <alvar@a-blast.org> wrote:
>
> My Problem is, that I need about 100 MB data persistant in RAM for
> fast
> hash access. At the moment I do this by building the hash at Apache
> startup, and all is shared between the childs. But sometimes there
> are
> changes ... and then Copy-on-Write un-shares the data and the Apache
> eats
> more and more memory ;-)
>
> My idea is to write a daemon (using forks.pm) which can be connected
> by
> the mod_perl part; on updates, I shut down all children of the daemon
> and
> restart them (as needed).
>
>
> So the next idea is, that the mod_perl app can talk directly to the
> worker childs. But as more as I think about this, this seems to be
> nod
> good idea and more complicated then sending some freezed commands
> over a
> socket ...
>
If you are using mod_perl 1.0, or mod_perl 2.0 with the 'prefork' MPM,
I have successfully worked with a few companies to integrate forks into
Apache httpd instances. Additionally, if you use the forks::BerkeleyDB
add-on, you'll get excellent shared variable access performance (which
sounds like a requirement for your 100MB of data). This data doesn't
depend on fork COW to share between apps, so it's safe to update it
during runtime without incurring memory penalties (and updates are also
visible immediately between all apache handlers and extra threads you
may start in the mod_perl environment).
I've almost finalized an Apache::forks module to seamlessly
integrate forks into an apache httpd environment. So far, results are
very promising: solid memory and application stability, and performance
is quite satisfactory.
I haven't yet posted Apache::forks to CPAN, but if you're interested in
evaluating it, I'll be happy to e-mail you a stable pre-release that
works with forks 0.23 and forks::BerkeleyDB 0.05.
Regards,
Eric Rybski
| |
| Daniel Rychlik 2007-07-04, 7:21 pm |
| solution The best would be *real*
threads with really shared variables, but this is another topic ;-)
No kidding...
--------------------------
Respectfully,
Dan Rychlik
IT Projects Engineer
-----Original Message-----
From: Alvar Freude <alvar@a-blast.org>
To: Eric Rybski <rybskej@yahoo.com>; perl-ithreads@perl.org <perl-ithreads@perl.org>
Sent: Wed Jul 04 12:11:56 2007
Subject: Re: forks: shared variables between different applications or hosts
Hi,
-- Eric Rybski <rybskej@yahoo.com> wrote:
> If you are using mod_perl 1.0, or mod_perl 2.0 with the 'prefork' MPM,
> I have successfully worked with a few companies to integrate forks into
> apache httpd instances. Additionally, if you use the forks::BerkeleyDB
> add-on, you'll get excellent shared variable access performance (which
> sounds like a requirement for your 100MB of data).
hmmm, for a typical page I need between 50 and 1000 hash lookups. And
sometimes more and sometimes fewer. But I'll make some tests: if this is
fast enough, it seems to be a very good solution The best would be *real*
threads with really shared variables, but this is another topic ;-)
> I haven't yet posted Apache::forks to CPAN, but if you're interested in
> evaluating it, I'll be happy to e-mail you a stable pre-release that
> works with forks 0.23 and forks::BerkeleyDB 0.05.
OK -- first I'll do some benchmarks to test if it might work.
I think it would be the easyest and best way to do it, if fast enough!
Thanks && Ciao
Alvar
--
** Alvar C.H. Freude, http://alvar.a-blast.org/
**Â_http://www.assoziations-blaster.de/
** http://www.wen-waehlen.de/
** http://odem.org/
| |
| Eric Rybski 2007-07-06, 4:27 am |
| Daniel,
Unfortunately, *real* perl thread shared data isn't shared globally
(among all processes) in a forking environment like that of mod_perl in
Apache. The single-process nature of native ithreads makes perl shared
variables reasonably impractical to use in a mod_perl environment.
Given the described design issue, only an IPC model would work, such as
SysV shared memory, BerkeleyDB, or an in-memory TCP cache like
memcached. forks::BerkeleyDB is an attempt to abstract IPC into an
ithreads enviroment, allowing a preforking mod_perl environment behave
as a single thread group using BerkeleyDB as the underlying IPC model.
-Eric
--- Daniel Rychlik <drychlik@securustech.net> wrote:
> solution The best would be *real*
> threads with really shared variables, but this is another topic ;-)
>
> No kidding...
> --------------------------
> Respectfully,
> Dan Rychlik
> IT Projects Engineer
>
>
>
> -----Original Message-----
> From: Alvar Freude <alvar@a-blast.org>
> To: Eric Rybski <rybskej@yahoo.com>; perl-ithreads@perl.org
> <perl-ithreads@perl.org>
> Sent: Wed Jul 04 12:11:56 2007
> Subject: Re: forks: shared variables between different applications
> or hosts
>
>
> Hi,
>
> -- Eric Rybski <rybskej@yahoo.com> wrote:
>
> MPM,
> into
> forks::BerkeleyDB
> (which
>
> hmmm, for a typical page I need between 50 and 1000 hash lookups. And
>
> sometimes more and sometimes fewer. But I'll make some tests: if this
> is
> fast enough, it seems to be a very good solution The best would be
> *real*
> threads with really shared variables, but this is another topic ;-)
>
>
> interested in
> that
>
> OK -- first I'll do some benchmarks to test if it might work.
> I think it would be the easyest and best way to do it, if fast
> enough!
>
>
> Thanks && Ciao
> Alvar
>
>
> --
> ** Alvar C.H. Freude, http://alvar.a-blast.org/
> **�http://www.assoziations-blaster.de/
> ** http://www.wen-waehlen.de/
> ** http://odem.org/
>
| |
| Eric Rybski 2007-07-06, 4:27 am |
| The intent of forks::BerkeleyDB is to abstract away the usual grunt
effort of writing an application to use SysV, BerkeleyDB, or some
socket-communication framework, and instead adding this capability to
an interface that Perl developers are already likely familiar with: the
ithreads API. Yet unlike standard perl ithreads, this model works when
the perl library is embedded in a forking application, like Apache
httpd.
With Apache2, it is possible to have shared variables between threads
within a multi-threaded, single process, but this defeats the ability
for all handlers (processes) to share the same shared memory. This
same "data orphaned in a process" issue applies to native perl hashes.
Thus, you shouldn't expect anywhere near the performance of native
hashes to an IPC model like forks::BerkeleyDB: it's an apples to
oranges comparison.
You could try an existing TCP-oriented cache mechanism like memcached
(via Cache::Memcached, or Cache::Memcached::XS for better performance),
but in past experience I've found TCP daemon models (no matter how well
tuned) for frequent, small data access to be consistently slower than
most in-memory IPC models, such as SysV shmem or BerkeleyDB (with
shared memory caching enabled). Some TCP models do excel in handling
extremely large hashes of data (like memcached), but are inherently
limited in raw performance by their socket interface.
With regards to IPC models, my choice of using BerkeleyDB over SysV
shared (end result being forks::BerkeleyDB) was based on BerkeleyDB's
native database types that efficiently support _very_ large arrays and
hashes. It has outstanding tablespace optimizations to both prevent
long-term data memory fragmentation as elements are added & deleted and
reclaim deleted space. Additionally, BerkeleyDB has an optimized,
transparent shared memory interface that keeps most actively accessed
data in physical memory, eliminating unnecessary physical disk
overhead.
A comparison of BerkeleyDB to many other in-memory Perl cache modules
may be seen here:
http://cpan.robm.fastmail.fm/cache_perf.html
Here are results from an Ultrasparc3 1.4Ghz Solaris 9 system, with a
15K RPM drive. This example fetches every value from a hash of 100K
elements:
perl -Mforks::BerkeleyDB -Mforks::BerkeleyDB::shared -MBenchmark=:all
-e 'my %h:shared=(1..100000); my $a; timethis(100000, sub { $a=$h{$i++}
});
timethis 100000: 9 wallclock secs ( 8.79 usr + 0.01 sys = 8.80 CPU)
@ 11363.64/s (n=100000)
Given similar hardware, this does appear to meet your 1000 max hash
lookups per request @ <0.8 seconds of CPU time. To make
forks::BerkeleyDB really shine on Linux, you can further reduce CPU
overhead and eliminate any disk I/O wait time by allocating a ramdisk
and re-mapping the location of all BerkeleyDB files to that location:
http://www.vanemery.com/Linux/Ramdisk/ramdisk.html
Then do:
export TMPDIR=/path/to/ramdisk
perl -Mforks::BerkeleyDB -Mforks::BerkeleyDB::shared -e yourscript.pl
As always, the perl ithreads model isn't a solution for every
multi-process perl problem. My hope is that forks::BerkeleyDB allows
problems such as yours to be both solved elegantly and simply, without
overly sacrificing performance.
If these modules still don't live up to your performance requirements,
then it's reasonably unlikely any IPC model will be a best fit for your
problem: you may need to re-evaluate the architectural design of your
large in-memory hash.
I hope this helps.
-Eric
--- Alvar Freude <alvar@a-blast.org> wrote:
> Hi,
>
> -- Alvar Freude <alvar@a-blast.org> wrote:
>
> And
>
> so, the results for a first test with small hash lookup are:
> With forks::BerkeleyDB it is about 10 to 20 times slower then with an
>
> unshared hash and about 10 times faster as with the usual
> forks::shared.
>
> That's pretty fast, but for my problem it seems that the solution
> with an
> extra daemon is better.
>
>
> Ciao
> Alvar
>
>
> --
> ** Alvar C.H. Freude, http://alvar.a-blast.org/
> ** http://www.assoziations-blaster.de/
> ** http://www.wen-waehlen.de/
> ** http://odem.org/
>
|
|
|
|
|