Home > Archive > PERL Miscellaneous > June 2005 > increase performance
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
increase performance
|
|
| Rodrick Brown 2005-06-09, 3:57 am |
| Hello all I have a script that processes the following data I could possibly
speed it up
0107
0205
0304
0405
0105
0805
the script just converts the output to
Jan07
feb05
mar03
apr05
etc...
Here is a sample of how i'm doing this
#!/usr/bin/perl
use warnings;
my %months = ( 1=>"jan", 2=>"feb", 3=>"mar", 4=>"apr", 5=>"may", 6=>"june",
7=>"jul", 8=>"aug", 9=>"sep", 10=>"oct", 11=>"nov",
12=>"dec" );
my $date = "./date.txt";
open LOG, $date or die("unable to open file: $!\n");
while(<LOG> )
{
foreach my $m (keys(%months))
{
if( $m eq substr($_,1,1))
{
my $days = substr($_,2,2);
print "$months{$m}$days\n";
}
}
}
--
Rodrick R. Brown
rodrick.brown[@]gmail.com
| |
| Mike Heins 2005-06-09, 3:57 am |
| On 2005-06-09, Rodrick Brown <rodrick.brown@gmail.com> wrote:
> Hello all I have a script that processes the following data I could possibly
> speed it up
>
> 0107
> 0205
> 0304
> 0405
> 0105
> 0805
>
> the script just converts the output to
>
> Jan07
> feb05
> mar03
> apr05
>
> etc...
>
> Here is a sample of how i'm doing this
>
>
> #!/usr/bin/perl
>
> use warnings;
>
> my %months = ( 1=>"jan", 2=>"feb", 3=>"mar", 4=>"apr", 5=>"may", 6=>"june",
> 7=>"jul", 8=>"aug", 9=>"sep", 10=>"oct", 11=>"nov",
> 12=>"dec" );
>
> my $date = "./date.txt";
>
> open LOG, $date or die("unable to open file: $!\n");
>
> while(<LOG> )
> {
> foreach my $m (keys(%months))
> {
> if( $m eq substr($_,1,1))
> {
> my $days = substr($_,2,2);
> print "$months{$m}$days\n";
> }
> }
> }
There's no point in using a hash for this type of thing if you
don't do a hash key lookup.
my %months = qw(
01 jan
02 feb
03 mar
04 apr
05 may
06 june
07 jul
08 aug
09 sep
10 oct
11 nov
12 dec
);
while(<LOG> ) {
chomp;
m{(\d\d)(\d\d)} and $months{$1}
or do {
warn "bad log line: $_\n";
next;
};
print "$months{$1}$2\n";
}
--
Mike Heins
Perusion -- Expert Interchange Consulting http://www.perusion.com/
Be patient. God isn't finished with me yet. -- unknown
| |
| John W. Krahn 2005-06-09, 3:57 am |
| Rodrick Brown wrote:
> Hello all I have a script that processes the following data I could possibly
> speed it up
>
> 0107
> 0205
> 0304
> 0405
> 0105
> 0805
>
> the script just converts the output to
>
> Jan07
> feb05
> mar03
> apr05
>
> etc...
>
> Here is a sample of how i'm doing this
>
>
> #!/usr/bin/perl
>
> use warnings;
>
> my %months = ( 1=>"jan", 2=>"feb", 3=>"mar", 4=>"apr", 5=>"may", 6=>"june",
> 7=>"jul", 8=>"aug", 9=>"sep", 10=>"oct", 11=>"nov",
> 12=>"dec" );
>
> my $date = "./date.txt";
>
> open LOG, $date or die("unable to open file: $!\n");
>
> while(<LOG> )
> {
> foreach my $m (keys(%months))
> {
> if( $m eq substr($_,1,1))
> {
> my $days = substr($_,2,2);
> print "$months{$m}$days\n";
> }
> }
> }
>
my %months = qw( 01 jan 02 feb 03 mar 04 apr 05 may 06 june
07 jul 08 aug 09 sep 10 oct 11 nov 12 dec );
my $date = './date.txt';
open LOG, $date or die "unable to open file: $!\n";
while ( <LOG> ) {
my $mon = substr $_, 0, 2;
substr $_, 0, 2, $months{ $mon } || $mon;
}
__END__
John
--
use Perl;
program
fulfillment
| |
| John W. Krahn 2005-06-09, 3:57 am |
| Rodrick Brown wrote:
> Hello all I have a script that processes the following data I could possibly
> speed it up
>
> 0107
> 0205
> 0304
> 0405
> 0105
> 0805
>
> the script just converts the output to
>
> Jan07
> feb05
> mar03
> apr05
>
> etc...
>
> Here is a sample of how i'm doing this
>
>
> #!/usr/bin/perl
>
> use warnings;
>
> my %months = ( 1=>"jan", 2=>"feb", 3=>"mar", 4=>"apr", 5=>"may", 6=>"june",
> 7=>"jul", 8=>"aug", 9=>"sep", 10=>"oct", 11=>"nov",
> 12=>"dec" );
>
> my $date = "./date.txt";
>
> open LOG, $date or die("unable to open file: $!\n");
>
> while(<LOG> )
> {
> foreach my $m (keys(%months))
> {
> if( $m eq substr($_,1,1))
> {
> my $days = substr($_,2,2);
> print "$months{$m}$days\n";
> }
> }
> }
>
my %months = qw( 01 jan 02 feb 03 mar 04 apr 05 may 06 june
07 jul 08 aug 09 sep 10 oct 11 nov 12 dec );
my $date = './date.txt';
open LOG, $date or die "unable to open file: $!\n";
while ( <LOG> ) {
my $mon = substr $_, 0, 2;
substr $_, 0, 2, $months{ $mon } || $mon;
print;
}
__END__
John
--
use Perl;
program
fulfillment
| |
| A. Sinan Unur 2005-06-09, 3:57 am |
| Mike Heins <mikeh@perusion.net> wrote in
news:slrndafcq9.n99.mikeh@bill.heins.net:
> On 2005-06-09, Rodrick Brown <rodrick.brown@gmail.com> wrote:
....
[color=darkred]
> There's no point in using a hash for this type of thing if you
> don't do a hash key lookup.
Agreed.
However, the easiest way to speed this task up by an order of magnitude
is to avoid printing. As (I think) Anno says: Print rarely, print late.
But to decide how rarely, and how late, one would have to know more.
As a simple experiment, take the following script:
#! /usr/bin/perl
use strict;
use warnings;
my @months = qw(invalid
jan feb mar apr may jun
jul aug sep oct nov dec
);
while(<DATA> ) {
next unless /^(\d\d)(\d\d)$/;
print "$months[0 + $1]$2\n";
}
__END__
In the version I will use illustrate, I have 10,000 lines of data
following __END__.
I am on Windows XP Pro, perl v.5.8.6.811 (ActiveState), Acer AMD64
Laptop with 1 GB RAM:
TimeThis : Command Line : perl ttt.pl
TimeThis : Start Time : Wed Jun 08 23:46:14 2005
TimeThis : End Time : Wed Jun 08 23:46:16 2005
TimeThis : Elapsed Time : 00:00:01.578
Now, replace the script with the following:
#! /usr/bin/perl
use strict;
use warnings;
my @months = qw(invalid
jan feb mar apr may jun
jul aug sep oct nov dec
);
my $result;
while(<DATA> ) {
next unless /^(\d\d)(\d\d)$/;
$result .= "$months[0 + $1]$2\n";
}
__END__
On the exact same data set, we get:
TimeThis : Command Line : perl ttt.pl
TimeThis : Start Time : Thu Jun 09 00:02:31 2005
TimeThis : End Time : Thu Jun 09 00:02:31 2005
TimeThis : Elapsed Time : 00:00:00.187
Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(reverse each component and remove .invalid for email address)
comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/c...guidelines.html
| |
| David K. Wall 2005-06-09, 3:58 pm |
| A. Sinan Unur <1usa@llenroc.ude.invalid> wrote:
> However, the easiest way to speed this task up by an order of
> magnitude is to avoid printing. As (I think) Anno says: Print
> rarely, print late.
Not that it matters, but I think it's Uri Guttman who says that.
Anyone know what's up with Uri? I haven't seen any posts from him here
for at least a month or two.
| |
| A. Sinan Unur 2005-06-09, 3:58 pm |
| "David K. Wall" <darkon.tdo@gmail.com> wrote in
news:Xns967065E8CD199dkwwashere@216.168.3.30:
> A. Sinan Unur <1usa@llenroc.ude.invalid> wrote:
>
>
> Not that it matters, but I think it's Uri Guttman who says that.
You are right. I should have just searched Google for the phrase. Thank
you for the correction.
> Anyone know what's up with Uri? I haven't seen any posts from him
> here for at least a month or two.
I had hoped he was enjoying a well deserved vacation, but he seems to be
active elsewhere.
Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(reverse each component and remove .invalid for email address)
comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/c...guidelines.html
| |
| Joe Smith 2005-06-09, 8:57 pm |
| John W. Krahn wrote:
> my $date = './date.txt';
Why did you write that instead of
my $date = 'date.txt';
?
-Joe
| |
| John W. Krahn 2005-06-09, 8:57 pm |
| Joe Smith wrote:
> John W. Krahn wrote:
>
>
> Why did you write that instead of
> my $date = 'date.txt';
> ?
That is the way the OP wrote it and they both refer to the same file so I
didn't change it.
John
--
use Perl;
program
fulfillment
| |
| Abigail 2005-06-10, 8:57 pm |
| A. Sinan Unur (1usa@llenroc.ude.invalid) wrote on MMMMCCC September
MCMXCIII in < URL:news:Xns967090CEDDDasu1cornelledu@12
7.0.0.1>:
@@
@@ However, the easiest way to speed this task up by an order of magnitude
@@ is to avoid printing. As (I think) Anno says: Print rarely, print late.
@@
@@ But to decide how rarely, and how late, one would have to know more.
@@
@@ As a simple experiment, take the following script:
@@
@@ #! /usr/bin/perl
@@ use strict;
@@ use warnings;
@@
@@ my @months = qw(invalid
@@ jan feb mar apr may jun
@@ jul aug sep oct nov dec
@@ );
@@
@@ while(<DATA> ) {
@@ next unless /^(\d\d)(\d\d)$/;
@@ print "$months[0 + $1]$2\n";
@@ }
@@
@@ __END__
@@
@@ In the version I will use illustrate, I have 10,000 lines of data
@@ following __END__.
@@
@@ I am on Windows XP Pro, perl v.5.8.6.811 (ActiveState), Acer AMD64
@@ Laptop with 1 GB RAM:
@@
@@ TimeThis : Command Line : perl ttt.pl
@@ TimeThis : Start Time : Wed Jun 08 23:46:14 2005
@@ TimeThis : End Time : Wed Jun 08 23:46:16 2005
@@ TimeThis : Elapsed Time : 00:00:01.578
@@
@@ Now, replace the script with the following:
@@
@@ #! /usr/bin/perl
@@ use strict;
@@ use warnings;
@@
@@ my @months = qw(invalid
@@ jan feb mar apr may jun
@@ jul aug sep oct nov dec
@@ );
@@
@@ my $result;
@@
@@ while(<DATA> ) {
@@ next unless /^(\d\d)(\d\d)$/;
@@ $result .= "$months[0 + $1]$2\n";
@@ }
@@
@@ __END__
@@
@@ On the exact same data set, we get:
@@
@@ TimeThis : Command Line : perl ttt.pl
@@ TimeThis : Start Time : Thu Jun 09 00:02:31 2005
@@ TimeThis : End Time : Thu Jun 09 00:02:31 2005
@@ TimeThis : Elapsed Time : 00:00:00.187
I ran the same programs on my machine (Linux), and I got the
following results:
For the first program:
$ time perl prog1
...
real 0m0.589s
user 0m0.110s
sys 0m0.040s
For the second program:
$ time perl prog2
real 0m0.085s
user 0m0.060s
sys 0m0.010s
So, similar figures. But why is the second program faster? Well, the
second program _does not print anything at all_. So, they are not
equivalent. So, I made a third program, the same as the second, but with
a 'print $result' added before __END__. Running that gives as result:
$ time perl prog3
real 0m0.378s
user 0m0.080s
sys 0m0.010s
Still faster, but not by much. But why is it faster? Is it just because
of the single print? It's not that simple. A print is actually pretty
fast - except when it's actually flushing the data. And that's what is
happening if you are printing a line to the terminal. It all changes if
you print to a file, then prints are buffered, and only larger blocks
are committed:
$ time perl prog1 > out1
real 0m0.082s
user 0m0.060s
sys 0m0.010s
$ time perl prog3 > out3
real 0m0.085s
user 0m0.070s
sys 0m0.000s
Not much difference anymore! There is a difference when the amount
of data gets larger. With 10_000_000 entries instead of 10_000:
$ time perl prog1 > out1
real 1m31.096s
user 0m57.000s
sys 0m8.400s
$ time perl prog3 > out3
real 2m52.186s
user 1m2.530s
sys 0m18.100s
The large string that gets constructed is the killer.
Abigail
--
A perl rose: perl -e '@}-`-,-`-%-'
| |
| Ilmari Karonen 2005-06-11, 8:56 am |
| Rodrick Brown <rodrick.brown@gmail.com> kirjoitti 09.06.2005:
> Hello all I have a script that processes the following data I could possibly
> speed it up
>
> 0107
> 0205
> 0304
> 0405
> 0105
> 0805
>
> the script just converts the output to
>
> Jan07
> feb05
> mar03
> apr05
I'd do that with a one-liner:
perl -pe 's/\d\d/(qw(jan feb mar apr may jun jul aug sep oct nov dec))[$&-1]/e' date.txt
--
Ilmari Karonen
To reply by e-mail, please replace ".invalid" with ".net" in address.
|
|
|
|
|