For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > July 2005 > Argument list too long (keep reading, it's not the FAQ)









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Argument list too long (keep reading, it's not the FAQ)
Hugues de Mazancourt

2005-07-25, 9:30 am

Hi,

I have a Perl deamon that keeps running, forking other (perl) processes,
which in turn issue (among other things) system() calls.
After some time (...), the scripts issue "Argument list too long" to every
system() that is made, as if there was a global buffer that ran out of
space.
I don't have any wildcards in my system() calls, no <*> either.

My solution was to make the daemon loop a certain number of times, then
exiting, and have a shell-script that re-lauches the deamon every time it is
finished. This works, but is little satisfactory, and I'm constantly
decreasing that "certain number", as new installations raise the problem
again and again.

Platform is Linux, FC-[1-3] or RH-9.

Any ideas ?


Best,

Hugues

--
Hugues de Mazancourt
CTO
Lingway
33-35, rue Ledru-Rollin
94200 Ivry/Seine
FRANCE
http://www.lingway.com
Tel: +33-1 56 20 28 33 - Mob: +33-6 72 78 70 33


Anno Siegel

2005-07-25, 9:30 am

Hugues de Mazancourt <pasdepub.Hugues.de-Mazancourt@lingway.com> wrote in comp.lang.perl.misc:
> Hi,
>
> I have a Perl deamon that keeps running, forking other (perl) processes,
> which in turn issue (among other things) system() calls.
> After some time (...), the scripts issue "Argument list too long" to every
> system() that is made, as if there was a global buffer that ran out of
> space.
> I don't have any wildcards in my system() calls, no <*> either.
>
> My solution was to make the daemon loop a certain number of times, then
> exiting, and have a shell-script that re-lauches the deamon every time it is
> finished. This works, but is little satisfactory, and I'm constantly
> decreasing that "certain number", as new installations raise the problem
> again and again.


MAX_ARGS is a fixed kernel parameter, and each system call starts a new
process. There is nothing that could "wear out" this way, or you'd see
"Out of memory".

Are you sure your commands aren't collecting some garbage that adds to
their length without changing their function?

Show a minimal code example that exhibits the problem.

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
Hugues de Mazancourt

2005-07-25, 5:33 pm

> Are you sure your commands aren't collecting some garbage that adds to
> their length without changing their function?


As I log the commands (see below), I can check that there is no garbage
collected among usage
>
> Show a minimal code example that exhibits the problem.


Here is - roughly - how my code behaves. The "else" part is taken verbatim
from my source (with some repeatingly code removed). I fork, then prepare
wget's argument and start crawling. It runs every night, with the same data
in $source (which is a loaded from a record in MySQL).

$pid=fork();
if($pid)
{
push(@pids_to_wait4, $pid);
}
else
{
my $wget=$self->{wgetpath}?$self->{wgetpath}."/":"";
$wget.="wget -r --timestamping";
$wget.=" --level=".$source->intval('level') if($source->param('level'));
$wget.=" --span-hosts" if($source->true('spanhosts'));
$wget.=" --domains=".$source->param('domains')
if($source->true('spanhosts') && $source->param('domains'));
$wget.=" --quota=".$source->param('quota') if($source->param('quota'));
$wget.=" --wait=1 --random-wait" if($source->true('randomwait'));
$wget.=" --no-parent" if($source->true('noparent'));
# ... several lines removed, all looking like
$wget.="--option=".$source->param('option') if($source->param('option'));
my $wgetdir=$self->{srcdir};
$wget.=" --directory-prefix=$wgetdir";
my $wgetlog=$self->my_tmpfile($source->name(), '.wgetlog');
$wget.=" -nv --html-extension --output-file=$wgetlog";
$wget.=" --cookies=on --load-cookies=".$source->param('cookiefile')
if($source->param('cookiefile'));
$wget.=" \"".$source->param('url')."\"";
$self->logger()->message("Crawler", "Starting WGET: $wget");
system($wget); # And here, sometimes (...) I get "Argument list too
long"
}

Best,

Hugues


Hugues de Mazancourt

2005-07-26, 4:06 am


>
> You can check, but do you actually check?
>
>
> add
>
> $self->logger()->message("Crawler", "Length for WGET: ". length $wget);


As it is in a database, I can check it right now:

mysql> select max(length(message)), min(length(message)) from lkm_log where
message like "Starting WGET:%";
+----------------------+----------------------+
| max(length(message)) | min(length(message)) |
+----------------------+----------------------+
| 327 | 283 |
+----------------------+----------------------+

(variations come from the fact that there are different sources being
crawled)

>
> What do you mean you "get" that? You don't seem to be checking either
> the return value of system or $?. Where do you get it?


I did check $?, but was just issuing an error message for a configuration
error (as if wget wasn't found). I disabled this, so the script gets
silently wrong until I find some reasonable solution. Globally, either wget
starts and I get all information in its output file or it doesn't and I can
think about a configuration problem.
To answer your question, the error message is print to STDERR, which I
gather in a log file.

Hugues


Anno Siegel

2005-07-26, 9:02 am

Hugues de Mazancourt <pasdepub.Hugues.de-Mazancourt@lingway.com> wrote in comp.lang.perl.misc:
>
> As I log the commands (see below), I can check that there is no garbage
> collected among usage
>
> Here is - roughly - how my code behaves. The "else" part is taken verbatim
> from my source (with some repeatingly code removed). I fork, then prepare
> wget's argument and start crawling. It runs every night, with the same data
> in $source (which is a loaded from a record in MySQL).
>
> $pid=fork();
> if($pid)
> {
> push(@pids_to_wait4, $pid);
> }
> else
> {
> my $wget=$self->{wgetpath}?$self->{wgetpath}."/":"";
> $wget.="wget -r --timestamping";
> $wget.=" --level=".$source->intval('level') if($source->param('level'));
> $wget.=" --span-hosts" if($source->true('spanhosts'));
> $wget.=" --domains=".$source->param('domains')
> if($source->true('spanhosts') && $source->param('domains'));
> $wget.=" --quota=".$source->param('quota') if($source->param('quota'));
> $wget.=" --wait=1 --random-wait" if($source->true('randomwait'));
> $wget.=" --no-parent" if($source->true('noparent'));
> # ... several lines removed, all looking like
> $wget.="--option=".$source->param('option') if($source->param('option'));
> my $wgetdir=$self->{srcdir};
> $wget.=" --directory-prefix=$wgetdir";
> my $wgetlog=$self->my_tmpfile($source->name(), '.wgetlog');
> $wget.=" -nv --html-extension --output-file=$wgetlog";
> $wget.=" --cookies=on --load-cookies=".$source->param('cookiefile')
> if($source->param('cookiefile'));
> $wget.=" \"".$source->param('url')."\"";
> $self->logger()->message("Crawler", "Starting WGET: $wget");
> system($wget); # And here, sometimes (...) I get "Argument list too
> long"
> }


It would be interesting to see samples of the actual wget-commands.
Since "system( $wget)" calls a shell to expand the command, shell
metacharacters ( "*", "{...}", others) may render the actual command
longer than it appears to be.

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
Hugues de Mazancourt

2005-07-26, 9:02 am

> It would be interesting to see samples of the actual wget-commands.
> Since "system( $wget)" calls a shell to expand the command, shell
> metacharacters ( "*", "{...}", others) may render the actual command
> longer than it appears to be.


Here you are:
Starting WGET:
/usr/bin/wget -r --timestamping --level=20 --no-parent --reject=.gif,.jpg,.z
ip --directory-prefix=/usr/local/lingway/lkm/UI/Data/cg06/src -nv --html-ext
ension --output-file=/usr/local/lingway/lkm/UI/Data/cg06/work/Agriculture307
60k2zZHI.wgetlog "http://www.cg06.fr/agriculture/agriculture.html"

Starting WGET:
/usr/bin/wget -r --timestamping --level=50 --no-parent --reject=.gif,.jpg,.z
ip --directory-prefix=/usr/local/lingway/lkm/UI/Data/cg06/src -nv --html-ext
ension --output-file=/usr/local/lingway/lkm/UI/Data/cg06/work/Economie316806
iPtae.wgetlog "http://www.cg06.fr/economie/economie.html"

Starting WGET:
/usr/bin/wget -r --timestamping --level=50 --no-parent --reject=.gif,.jpg,.z
ip --directory-prefix=/usr/local/lingway/lkm/UI/Data/cg06/src -nv --html-ext
ension --output-file=/usr/local/lingway/lkm/UI/Data/cg06/work/Environnement3
16986iPtae.wgetlog "http://www.cg06.fr/environnement/environnement.html"

Starting WGET:
/usr/bin/wget -r --timestamping --level=20 --no-parent --reject=.gif,.jpg,.z
ip --directory-prefix=/usr/local/lingway/lkm/UI/Data/cg06/src -nv --html-ext
ension --output-file=/usr/local/lingway/lkm/UI/Data/cg06/work/Agriculture203
58jof3Ug.wgetlog "http://www.cg06.fr/agriculture/agriculture.html"

Starting WGET:
/usr/bin/wget -r --timestamping --level=20 --no-parent --reject=.gif,.jpg,.z
ip --directory-prefix=/usr/local/lingway/lkm/UI/Data/cg06/src -nv --html-ext
ension --output-file=/usr/local/lingway/lkm/UI/Data/cg06/work/Agriculture240
42JlLJ2Y.wgetlog "http://www.cg06.fr/agriculture/agriculture.html"

The different lines with "Agriculture" show (at least me!) that the system
retried several times to download. The successfull one (the last one) was
after restarting the script.

I'm quite sure that the problem has nothing to do with the command line
itself. What I can suspect is the multplication of fork/system in the script
or interference with DBI::MySQL (connexions left open? dunno)

Hugues


Anno Siegel

2005-07-26, 9:02 am

Hugues de Mazancourt <pasdepub.Hugues.de-Mazancourt@lingway.com> wrote in comp.lang.perl.misc:
>
> Here you are:
> Starting WGET:
> /usr/bin/wget -r --timestamping --level=20 --no-parent --reject=.gif,.jpg,.z
> ip --directory-prefix=/usr/local/lingway/lkm/UI/Data/cg06/src -nv --html-ext
> ension --output-file=/usr/local/lingway/lkm/UI/Data/cg06/work/Agriculture307
> 60k2zZHI.wgetlog "http://www.cg06.fr/agriculture/agriculture.html"
>
> Starting WGET:
> /usr/bin/wget -r --timestamping --level=50 --no-parent --reject=.gif,.jpg,.z
> ip --directory-prefix=/usr/local/lingway/lkm/UI/Data/cg06/src -nv --html-ext
> ension --output-file=/usr/local/lingway/lkm/UI/Data/cg06/work/Economie316806
> iPtae.wgetlog "http://www.cg06.fr/economie/economie.html"


[more like that]

No metacharacters (except ") that I could see -- so much about that
theory.

None of these commands should raise "Argument list too long". Standard
limits allow several kB at least, usually much more. It looks more
like an OS problem but a Perl problem to me.

I don't see ho we can help more without being able to reproduce the
problem. Pare it down until it's a self-contained script (that doesn't
depend on /usr/local being configured in a particular way). If the
process doesn't lead to a solution (it often does), post the script and
we shall see.

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com