For Programmers: Free Programming Magazines  


Home > Archive > IPC DirQueue > January 2006 > Re: [PATCH] speed up ordered dequeueing in IPC-DirQueue 0.05









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: [PATCH] speed up ordered dequeueing in IPC-DirQueue 0.05

2006-01-10, 12:22 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(cc'd mailing list)

Anton Berezin writes:
> Justin,
>
> On Thu, Dec 29, 2005 at 02:18:06PM +0100, Anton Berezin wrote:
>
>
> On the further thought, even the existing ordered method does not
> provide us with an absolute guarantee that the things are going to be
> processed in order, since there is a race between readdir and newer
> enqueue operations. So it is possible to somewhat relax it even further
> and achieve a really good speed up for some, but not all, cases.
>
> The idea is to keep a last_enqueued file in the queue dir. We are only
> interested in its mtime. Then, on a dequeue operation, we cache the
> result of readdir and only re-read the file list if the cached file list
> is empty or if the mtime of last_enqueued file has changed.
>
> This obviously won't help much if there are a lot of concurrent enqueue
> operations, but for the case "first enqueue a bunch of jobs, then launch
> several processors, then repeat after some time" it helps absolutely
> tremendously.


Hi Anton --

many thanks for this work! The patches are welcome.

I've applied the ordered-dequeueing speedup patch from the previous mail
- -- with a few changes to match latest SVN (see below). It's in as r2353.
I never thought the interpreter-level code path would have such a large
effect on execution speed; I was assuming it'd all be system calls, which
*should* be a lot slower. Looks like I was wrong. ;)

However, regarding the patch from *this* mail, it's important to note that
the underlying code changed in SVN at the start of December; r2255 added a
"last-enqueue" flag file, initially at least. I then realised that the
queue directory *itself* can be used for this purpose, in a
backwards-compatible way, so now it just adds similar semantics to the
directory inode itself.

I found that this change was necessitated to support Reiserfs and
XFS-hosted queue directories, since those filesystems didn't modify the
mtime of a directory when changes were made within it, as ext3 did. (I
was assuming that this was a POSIX requirement; again, looks like I was
wrong. ;)

There were then several more changes since then to SVN trunk, made to fix
important bugs (including another race condition) and add new,
experimental functionality.

This patch looks very useful -- I'd be happy to add it, as a patch against
current SVN, if that'd be possible. Also, it may be worthwhile making it
the ordered default, assuming I can't think of a good reason why not ;)
Right now though, it doesn't apply cleanly to SVN.

by the way there is a mailing list; <ipc-dirqueue-subscribe@perl.org> to
subscribe. it might be worth sending future patches there so other users can
comment on them too.

- --j.

> I did not think it was a good idea to modify the default "ordered"
> behavior, so my patch makes "ordered" options to be a tri-value, with
> "2" denoting this "approximately ordered" way of doing things.
>
> The result of the same tests as yesterday, with ordered => 2 is
>
> tobez@heechee ~/_> time perl eq 10000
> perl eq 10000 1.73s user 1.90s system 44% cpu 8.180 total
> tobez@heechee ~/_> time perl deq 10000
> perl deq 10000 1.39s user 2.12s system 94% cpu 3.726 total
>
> This represents a 30-fold improvement for the dequeueing case in
> comparison to the yesterdays "optimized" version.
>
> Below is the patch which also incorporates my previous changes.
>
> --- DirQueue.pm.orig Thu Dec 29 14:14:10 2005
> +++ DirQueue.pm Fri Dec 30 12:53:41 2005
> @@ -101,10 +101,12 @@ The C<chmod>-style file mode for queue c
> specified as a string with a leading 0. It will be affected by the
> current process C<umask>.
>
> -=item ordered => { 0 | 1 } (default: 1)
> +=item ordered => { 0 | 1 | 2 } (default: 1)
>
> Whether the jobs should be processed in order of submission, or
> -in no particular order.
> +in no particular order. The special value of 2 denotes "approximately
> +ordered", which does not make as strong guarantees about processing
> +order as simply ordered queue, but is much faster in a number of cases.
>
> =item buf_size => $number (default: 65536)
>
> @@ -344,6 +346,7 @@ sub enqueue_backend {
> goto failure;
> }
>
> + $self->touch_last_enqueued() if $self->{ordered} == 2;
> # my $pathqueue_created = 1; # not required, we're done!
> return 1;
>
> @@ -360,6 +363,17 @@ failure:
> return;
> }
>
> +sub touch_last_enqueued
> +{
> + my ($self) = @_;
> + my $last_enqueued_file = $self->q_dir().SLASH."last_enqueued";
> + if (sysopen(LAST, $last_enqueued_file, O_WRONLY|O_CREAT, 0666)) {
> + my $s = "X";
> + syswrite(LAST, $s, 1);
> + close LAST;
> + }
> +}
> +
> ########################################
###################################
>
> =item $job = $dq->pickup_queued_job();
> @@ -384,12 +398,12 @@ sub pickup_queued_job {
> $self->ensure_dir_exists ($pathactivedir);
>
> my $ordered = $self->{ordered};
> - my @files;
> + my $files;
> my $dirfh;
>
> if ($ordered) {
> - @files = $self->get_dir_filelist_sorted($pathqueuedir);
> - if (scalar @files <= 0) {
> + $files = $self->get_dir_filelist_sorted($pathqueuedir);
> + if (@$files <= 0) {
> return if $self->queuedir_is_bad($pathqueuedir);
> }
> } else {
> @@ -406,7 +420,7 @@ sub pickup_queued_job {
> my $pathtmpactive;
>
> if ($ordered) {
> - $nextfile = shift @files;
> + $nextfile = shift @$files;
> } else {
> $nextfile = readdir($dirfh);
> }
> @@ -638,12 +652,12 @@ sub visit_all_jobs {
> my $pathactivedir = $self->q_subdir('active');
>
> my $ordered = $self->{ordered};
> - my @files;
> + my $files;
> my $dirfh;
>
> if ($ordered) {
> - @files = $self->get_dir_filelist_sorted($pathqueuedir);
> - if (scalar @files <= 0) {
> + $files = $self->get_dir_filelist_sorted($pathqueuedir);
> + if (@$files <= 0) {
> return if $self->queuedir_is_bad($pathqueuedir);
> }
> } else {
> @@ -658,7 +672,7 @@ sub visit_all_jobs {
> my $nextfile;
> while (1) {
> if ($ordered) {
> - $nextfile = shift @files;
> + $nextfile = shift @$files;
> } else {
> $nextfile = readdir($dirfh);
> }
> @@ -728,14 +742,30 @@ sub finish_job {
>
> sub get_dir_filelist_sorted {
> my ($self, $dir) = @_;
> + my $last_enqueued_file = $self->q_dir().SLASH."last_enqueued";
> +
> + if ($self->{ordered} == 2 && $self->{filelist_sorted}{$dir}) {
> + my $mtime = (stat($last_enqueued_file))[9];
> + return $self->{filelist_sorted}{$dir}
> + if defined $mtime && defined $self->{filelist_sorted_mtime}{$dir} &&
> + $mtime <= $self->{filelist_sorted_mtime}{$dir} &&
> + @{$self->{filelist_sorted}{$dir}};
> + }
>
> if (!opendir (DIR, $dir)) {
> return (); # no dir? nothing queued
> }
> # have to read the lot, to sort them.
> - my @files = sort { $a cmp $b } grep { /^\d/ } readdir(DIR);
> + my @files = sort grep { /^\d/ } readdir(DIR);
> closedir DIR;
> - return @files;
> +
> + if ($self->{ordered} == 2) {
> + $self->{filelist_sorted_mtime}{$dir} = (stat($last_enqueued_file))[9];
> + $self->{filelist_sorted}{$dir} = \@files;
> + return $self->{filelist_sorted}{$dir};
> + } else {
> + return \@files;
> + }
> }
>
> ########################################
###################################
> @@ -998,7 +1028,7 @@ sub new_q_filename {
> $job->{pri},
> $gmt[5]+1900, $gmt[4]+1, $gmt[3], $gmt[2], $gmt[1], $gmt[0],
> $job->{time_submitted_msecs},
> - hash_string_to_filename ($self->gethostname().$$));
> + hash_string_to_filename ($self->gethostname().$$.$self->get_random_int()));
>
> # normally, this isn't used. but if there's a collision,
> # all retries after that will do this; in this case, the
>
> \Anton.
> --
> An undefined problem has an infinite number of solutions.
> -- Robert A. Humphrey

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFDvdyhMJF5cimLx9ARAgM4AJ0bCfFNHsx9
SjdGYXLtzOjhPW9MeQCgguxc
Ng8JAoBN7/thHp5CBgpSIIU=
=l6su
-----END PGP SIGNATURE-----

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com