Home > Archive > PERL Beginners > July 2007 > Optimize my script
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Optimize my script
|
|
| JeeBee 2007-07-18, 7:59 am |
| Dear Perl experts,
Some time ago I wrote a Perl script that combines the output values of a
lot of simulation runs. Because there are so many, I could some quite some
time if the script would run a bit faster. (I have the feeling it is
quite slow like it is).
Perhaps somebody has a tip to make it a bit faster without complicated
changes? Thanks in advance.
Here's the script: http://traplas.svn.sourceforge.net/viewvc/traplas/
trunk/analyze.pl?revision=807&view=markup
Here's an example output file of my simulator (I have thousands of
these): http://pastebin.ca/624474
And here's how I run it:
$ time ./analyze.pl traplas-example output
....<snip>
Directory /tmp: 1 good and 0 bad simulation runs. at ./analyze.pl line
546.
real 0m1.106s
user 0m1.012s
sys 0m0.011s
For such a small file (86 lines) over 1 second is a bit slow isn't it?
Thanks again,
JeeBee.
| |
| JeeBee 2007-07-18, 7:59 am |
| Because I'm trying to match this regular expression a lot of times, that
could very well be the point to improve. I can imagine this is the
slowest part of the script:
my $exp = "nan|-?inf|[0-9\.e\+\-]+";
my $stat_pat =
"(?:\\s+($exp))(?:\\s+($exp))" . # $1 number of samples, $2 minimum
"(?:\\s+($exp))(?:\\s+($exp))" . # $3 maximum, $4 sum
"(?:\\s+($exp))(?:\\s+($exp))" . # $5 average, $6 variance
"(?:\\s+($exp))(?:\\s+($exp))"; # $7 skewness, $8 kurtosis
Which I'm using like this many times:
foreach my $stat_key (@stat_var_keys) {
if($line =~ m/^\s*$stat_key$stat_pat/) {
# $6 == "variance"
$map{"var_$stat_key"} = $6;
}
}
Can this regular expression be changed to do the same a be a lot faster?
Thanks in advance,
JeeBee.
| |
| Paul Johnson 2007-07-18, 7:59 am |
| On Wed, Jul 18, 2007 at 09:49:24AM +0000, JeeBee wrote:
> Because I'm trying to match this regular expression a lot of times, that
> could very well be the point to improve. I can imagine this is the
> slowest part of the script:
Don't imagine. Profile. Then you'll know which parts of your program
are taking the time and you won't spend your time optimising something
that won't make much difference.
--
Paul Johnson - paul@pjcj.net
http://www.pjcj.net
| |
| JeeBee 2007-07-18, 7:59 am |
| Thanks Paul, I should profile of course, should have thought about that.
I tried the below, but it's not quite clear what this means to me.
When I run with "./analyze.pl foo bar" the execution time is fast, so the
problem is not starting up/exiting of Perl, and I conclude it must be
inside the script somwhere.
DirHandle::BEGIN takes only 0.01 seconds, so that's not it either.
I don't see the regular expressions anywhere, unfortunately.
Is there a way to change that?
Thanks
$ perl -d:DProf ./analyze.pl traplas-example output
....
$ dprofpp
Total Elapsed Time = 1.109888 Seconds
User+System Time = 1.039888 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
0.96 0.010 0.010 2 0.0050 0.0050 DirHandle::BEGIN
0.00 - -0.000 1 - - DirHandle::DESTROY
0.00 - -0.000 1 - - strict::bits
0.00 - -0.000 1 - - strict::import
0.00 - -0.000 1 - - Symbol::BEGIN
0.00 - -0.000 1 - - Symbol::gensym
0.00 - -0.000 1 - - DirHandle::read
0.00 - -0.000 1 - - DirHandle::open
0.00 - -0.000 2 - - Exporter::import
0.00 - -0.000 1 - - DirHandle::new
0.00 - 0.010 2 - 0.0050 main::BEGIN
| |
| Paul Johnson 2007-07-18, 6:59 pm |
| On Wed, Jul 18, 2007 at 03:29:51PM +0200, Paul Johnson wrote:
> - post your code so that other people can guess too
Sorry, I notice that you have already done this. When I first tried to
follow the link I had no success, but it works fine for me now.
I see I was right about no subroutines, so the rest of my advice is
probably just as valid as when I hadn't seen the code. (You get to
decide how valid that is.)
--
Paul Johnson - paul@pjcj.net
http://www.pjcj.net
| |
| Paul Johnson 2007-07-18, 6:59 pm |
| On Wed, Jul 18, 2007 at 10:17:00AM +0000, JeeBee wrote:
> Thanks Paul, I should profile of course, should have thought about that.
>
> I tried the below, but it's not quite clear what this means to me.
> When I run with "./analyze.pl foo bar" the execution time is fast, so the
> problem is not starting up/exiting of Perl, and I conclude it must be
> inside the script somwhere.
>
> DirHandle::BEGIN takes only 0.01 seconds, so that's not it either.
> I don't see the regular expressions anywhere, unfortunately.
> Is there a way to change that?
>
> Thanks
>
> $ perl -d:DProf ./analyze.pl traplas-example output
> ...
> $ dprofpp
> Total Elapsed Time = 1.109888 Seconds
> User+System Time = 1.039888 Seconds
> Exclusive Times
> %Time ExclSec CumulS #Calls sec/call Csec/c Name
> 0.96 0.010 0.010 2 0.0050 0.0050 DirHandle::BEGIN
> 0.00 - -0.000 1 - - DirHandle::DESTROY
> 0.00 - -0.000 1 - - strict::bits
> 0.00 - -0.000 1 - - strict::import
> 0.00 - -0.000 1 - - Symbol::BEGIN
> 0.00 - -0.000 1 - - Symbol::gensym
> 0.00 - -0.000 1 - - DirHandle::read
> 0.00 - -0.000 1 - - DirHandle::open
> 0.00 - -0.000 2 - - Exporter::import
> 0.00 - -0.000 1 - - DirHandle::new
> 0.00 - 0.010 2 - 0.0050 main::BEGIN
I'm not very familiar with the profiler, but at a guess, I would say
that the profiler only counts time in subroutines, and that the majority
(all?) of your program is not in a subroutine, and over 99% of the
execution time of your program is in that code.
I see the following options:
- split your code into subroutines to gain a better understanding of
what is taking how much time
- install Devel::Cover and run
$ perl -MDevel::Cover ./analyze.pl traplas-example output
$ cover
to gain a finer grained but less accurate profiling report
- go back to guessing - you were probably right ;-)
- post your code so that other people can guess too
- buy more hardware
--
Paul Johnson - paul@pjcj.net
http://www.pjcj.net
| |
| yaron@kahanovitch.com 2007-07-18, 6:59 pm |
| Hi,
Perhaps you can use the /\G.../gc idiom and develop lex like engine.
see: http://search.cpan.org/~nwclark/per...pod#What_good_=
is_\G_in_a_regular_expression?_=20
and:
http://search.cpan.org/~nwclark/per...exp_Quote-Like=
_Operators
From my experience breaking complicated regular expression into some small =
ones with the /\G.../gc idiom improve performance.
Yaron Kahanovitch
----- Original Message -----
From: "JeeBee" <JeeBee@zutt.org>
To: beginners@perl.org
Sent: 11:49:24 (GMT+0200) Africa/Harare =D7=99=D7=95=D7=9D =D7=A8=D7=91=D7=
=99=D7=A2=D7=99 18 =D7=99=D7=95=D7=9C=D7=99 2007
Subject: Re: Optimize my script
Because I'm trying to match this regular expression a lot of times, that=20
could very well be the point to improve. I can imagine this is the=20
slowest part of the script:
my $exp =3D "nan|-?inf|[0-9\.e\+\-]+";
my $stat_pat =3D
"(?:\\s+($exp))(?:\\s+($exp))" . # $1 number of samples, $2 minimum
"(?:\\s+($exp))(?:\\s+($exp))" . # $3 maximum, $4 sum
"(?:\\s+($exp))(?:\\s+($exp))" . # $5 average, $6 variance
"(?:\\s+($exp))(?:\\s+($exp))"; # $7 skewness, $8 kurtosis
Which I'm using like this many times:
foreach my $stat_key (@stat_var_keys) {
if($line =3D~ m/^\s*$stat_key$stat_pat/) {
=09# $6 =3D=3D "variance"
=09$map{"var_$stat_key"} =3D $6;
}
}
Can this regular expression be changed to do the same a be a lot faster?
Thanks in advance,
JeeBee.
--=20
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
| |
| Chas Owens 2007-07-18, 6:59 pm |
| On 7/18/07, Paul Johnson <paul@pjcj.net> wrote:
snip
> I see the following options:
>
> - split your code into subroutines to gain a better understanding of
> what is taking how much time
>
> - install Devel::Cover and run
> $ perl -MDevel::Cover ./analyze.pl traplas-example output
> $ cover
> to gain a finer grained but less accurate profiling report
>
> - go back to guessing - you were probably right ;-)
>
> - post your code so that other people can guess too
>
> - buy more hardware
snip
I second most of these recommendations, but looking at your code I
spot a pattern of inefficient code: you keep looping over the same
string.
385 foreach my $stat_key (@stat_avg_keys) {
386 if($line =~ m/^\s*$stat_key$stat_pat/) {
snip
399 foreach my $stat_key (@stat_var_keys) {
400 if($line =~ m/^\s*$stat_key$stat_pat/) {
snip
406 foreach my $stat_key (@stat_sum_keys) {
407 if($line =~ m/^\s*$stat_key$stat_pat/) {
snip
413 foreach my $stat_key (@stat_min_keys) {
414 if($line =~ m/^\s*$stat_key$stat_pat/) {
snip
420 foreach my $stat_key (@stat_max_keys) {
421 if($line =~ m/^\s*$stat_key$stat_pat/) {
snip
427 foreach my $stat_key (@stat_cnt_keys) {
428 if($line =~ m/^\s*$stat_key$stat_pat/) {
You can fix this by changing the arrays above to hashes whose keys are
the values in the array and the values are all 1. This allows the
hash to function as a quick lookup. Then change the code to read the
line into a hash of keys and values and loop over the keys checking to
see if the key is a certain type. Here is some example code:
#!/usr/bin/perl
use strict;
use warnings;
my %cnt_keys = (
foo => 1
);
my %min_keys = (
foo => 1,
baz => 1
);
$_ = "foo 100 1e99 nan nan bar 1e99 -inf inf baz 1 2 3 4";
my %rec;
my $identifier = qr/[A-Za-z_]\w*/;
my $expression = qr/nan|-?inf|[0-9\.e\+\-]+/;
while (/($identifier) ((?: \s+ $expression)+) \s*/gx) {
$rec{$1} = [ split ' ', $2 ];
}
for my $key (keys %rec) {
print "$key is\n";
if ($cnt_keys{$key}) {
print "\ta cnt_key with value $rec{$key}[0]\n";
}
if ($min_keys{$key}) {
print "\ta min_key with a value of $rec{$key}[1]\n";
}
}
| |
| kapil.V 2007-07-18, 6:59 pm |
| JeeBee wrote:
> Because I'm trying to match this regular expression a lot of times, that
> could very well be the point to improve. I can imagine this is the
> slowest part of the script:
>
> my $exp = "nan|-?inf|[0-9\.e\+\-]+";
> my $stat_pat =
> "(?:\\s+($exp))(?:\\s+($exp))" . # $1 number of samples, $2 minimum
> "(?:\\s+($exp))(?:\\s+($exp))" . # $3 maximum, $4 sum
> "(?:\\s+($exp))(?:\\s+($exp))" . # $5 average, $6 variance
> "(?:\\s+($exp))(?:\\s+($exp))"; # $7 skewness, $8 kurtosis
Why do you use ?: for all fields?
Are all fields optional?
If so, what would be present in the file if a field is blank?
>
> Which I'm using like this many times:
>
> foreach my $stat_key (@stat_var_keys) {
> if($line =~ m/^\s*$stat_key$stat_pat/) {
> # $6 == "variance"
> $map{"var_$stat_key"} = $6;
> }
> }
>
> Can this regular expression be changed to do the same a be a lot faster?
>
> Thanks in advance,
> JeeBee.
>
| |
| JeeBee 2007-07-19, 7:59 am |
|
> Why do you use ?: for all fields?
> Are all fields optional?
> If so, what would be present in the file if a field is blank?
>
Hi Kapil,
No, it doesn't mean optional. (?: ... ) is to group an expression, just
like ( ... ) does, but it also avoids creating a backpointer, which makes
it faster.
JeeBee.
ps > thanks for the answers you've all been providing :)
| |
| JeeBee 2007-07-19, 7:59 am |
|
Thanks very much. It runs a LOT faster now.
Like 20 times or maybe even a bit more.
I'm now trying to verify that the result is still the same,
but I believe so.
So, thanks very much :)
JeeBee.
|
|
|
|
|