For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > October 2005 > map/array performance









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author map/array performance
Frank Bax

2005-10-23, 6:56 pm

Rather than create/store/sort many billion entities, my script creates
these entities dynamically and maintains a hash of the "top 100". As each
entity is created, I search my hash for the entity with "lowest" value,
based on a number of elements in the hash; then "low" element gets replaced
with "new" element. Code looks like:
my $low = 0;
for( my $new=1; $new<=$iSuit; ++$new ) {
my $snew =
sprintf("%4d%4d" ,$aSuit{$new}{'rescap'},$aSuit{$new}{'re
sval'});
my $slow =
sprintf("%4d%4d" ,$aSuit{$low}{'rescap'},$aSuit{$low}{'re
sval'});
if( $snew lt $slow ) { $low = $new; }
}
I needed to change this code so that 'rescap' and 'resval' are runtime
options and there could be any number of them, so I created an array @f_seq
and rewrote my simple loop as:
my $low = 0;
for( my $new=1; $new<=$iSuit; ++$new ) {
$a=$new; $b=$low;
if( cmpSuits() > 0 ) { $low = $new; }
}
sub cmpSuits {
my $aval=''; map { $aval=$aval.sprintf("%4d",$aSuit{$a}{$_}); } @f_seq;
my $bval=''; map { $bval=$bval.sprintf("%4d",$aSuit{$b}{$_}); } @f_seq;
$bval cmp $aval; # a<b=1 a=b=0 a>b=-1 ... sorts descending
}

I use $a and $b because at the end of my script, aSuit hash is sorted for
output - also using function "cmpSuits". The problem is that my script is
now horribly slower than the original!! Is this because I used "map"? If
so, what should I have used instead?

Running the script on a small test sample from our database, the original
code runs in 85-90 seconds. The modified code using "map", takes 160-165
seconds. Processing of my "real" database took 69 hours on the original
code, but at 80 hours, my modified script is not even half way! I must
find something a bit faster then "map", but more flexible than my original
code.

BTW: archives for this list appear to be broken since Oct22.

John W. Krahn

2005-10-23, 6:56 pm

Frank Bax wrote:
> Rather than create/store/sort many billion entities, my script creates
> these entities dynamically and maintains a hash of the "top 100". As
> each entity is created, I search my hash for the entity with "lowest"
> value, based on a number of elements in the hash; then "low" element
> gets replaced with "new" element. Code looks like:
> my $low = 0;
> for( my $new=1; $new<=$iSuit; ++$new ) {
> my $snew =
> sprintf("%4d%4d" ,$aSuit{$new}{'rescap'},$aSuit{$new}{'re
sval'});
> my $slow =
> sprintf("%4d%4d" ,$aSuit{$low}{'rescap'},$aSuit{$low}{'re
sval'});


Using sprintf() to concatenate numbers is (AFAIK) going to be slower than
concatenation:

my $snew = $aSuit{ $new }{ rescap } . $aSuit{ $new }{ resval };
my $slow = $aSuit{ $low }{ rescap } . $aSuit{ $low }{ resval };


> if( $snew lt $slow ) { $low = $new; }


You are comparing numbers so:

$low = $new if $snew < $slow;


> }
> I needed to change this code so that 'rescap' and 'resval' are runtime
> options and there could be any number of them, so I created an array
> @f_seq and rewrote my simple loop as:
> my $low = 0;
> for( my $new=1; $new<=$iSuit; ++$new ) {
> $a=$new; $b=$low;
> if( cmpSuits() > 0 ) { $low = $new; }
> }
> sub cmpSuits {
> my $aval=''; map { $aval=$aval.sprintf("%4d",$aSuit{$a}{$_}); } @f_seq;
> my $bval=''; map { $bval=$bval.sprintf("%4d",$aSuit{$b}{$_}); } @f_seq;


You shouldn't use map in void context, you should use a foreach loop instead,
but you don't even need a loop there:

my $aval = join '', @{ $aSuit{ $a } }{ @f_seq };
my $bval = join '', @{ $aSuit{ $b } }{ @f_seq };


> $bval cmp $aval; # a<b=1 a=b=0 a>b=-1 ... sorts descending


Or just:

join( '', @{ $aSuit{ $a } }{ @f_seq } ) cmp join( '', @{ $aSuit{ $b } }{
@f_seq } );


> }
>
> I use $a and $b because at the end of my script, aSuit hash is sorted
> for output - also using function "cmpSuits". The problem is that my
> script is now horribly slower than the original!! Is this because I
> used "map"? If so, what should I have used instead?
>
> Running the script on a small test sample from our database, the
> original code runs in 85-90 seconds. The modified code using "map",
> takes 160-165 seconds. Processing of my "real" database took 69 hours
> on the original code, but at 80 hours, my modified script is not even
> half way! I must find something a bit faster then "map", but more
> flexible than my original code.


You might be able to use a Schwartzian Transform or a Guttman-Rosler Transform
to speed up the sort.



John
--
use Perl;
program
fulfillment
John W. Krahn

2005-10-23, 6:56 pm

John W. Krahn wrote:
> Frank Bax wrote:
>
> Using sprintf() to concatenate numbers is (AFAIK) going to be slower than
> concatenation:
>
> my $snew = $aSuit{ $new }{ rescap } . $aSuit{ $new }{ resval };
> my $slow = $aSuit{ $low }{ rescap } . $aSuit{ $low }{ resval };


Sorry, that won't work because of the '%4d' format but this should:

my $snew = $aSuit{ $new }{ rescap } * 10000 + $aSuit{ $new }{ resval };
my $slow = $aSuit{ $low }{ rescap } * 10000 + $aSuit{ $low }{ resval };



John
--
use Perl;
program
fulfillment
Frank Bax

2005-10-23, 6:56 pm

At 02:11 PM 10/23/05, John W. Krahn wrote:

>Frank Bax wrote:
>
>Using sprintf() to concatenate numbers is (AFAIK) going to be slower than
>concatenation:
>
> my $snew = $aSuit{ $new }{ rescap } . $aSuit{ $new }{ resval };
> my $slow = $aSuit{ $low }{ rescap } . $aSuit{ $low }{ resval };
>
>
>You shouldn't use map in void context, you should use a foreach loop instead,
>but you don't even need a loop there:
>
> my $aval = join '', @{ $aSuit{ $a } }{ @f_seq };
> my $bval = join '', @{ $aSuit{ $b } }{ @f_seq };



Your suggested code changes don't work when the list of numbers on each
side of comparison have different number of digits - that's why I initially
introduced sprintf - so all numbers would use 4
digits/characters. Concatenate 284 and 9 to get 2849, 284 and 10 to get
28410, which comes before 2849 in sting compare. Using sprintf("%4d",...)
- " 284 9" is compared to " 284 10" and works properly in this context.

What is "void context"?

Jeff 'japhy' Pinyan

2005-10-23, 6:56 pm

On Oct 23, Frank Bax said:

> At 02:11 PM 10/23/05, John W. Krahn wrote:
>
>
> What is "void context"?


"Void context" means an expression whose return value is being discarded.

$x = foo(); # scalar context
@y = foo(); # list context
foo(); # void context

map() builds a list to be returned, and by not USING its return value
(that is, calling map() in void context), you're wasting resources. If
you do

map BLOCK LIST

and don't intend on saving the return value of map(), just use a for loop.

for (LIST) BLOCK

--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://www.perlmonks.org/ % have long ago been overpaid?
http://princeton.pm.org/ % -- Meister Eckhart
Frank Bax

2005-10-23, 9:55 pm

At 04:35 PM 10/23/05, Frank Bax wrote:

>At 02:11 PM 10/23/05, John W. Krahn wrote:
>
>
>
>Your suggested code changes don't work when the list of numbers on each
>side of comparison have different number of digits - that's why I
>initially introduced sprintf - so all numbers would use 4
>digits/characters. Concatenate 284 and 9 to get 2849, 284 and 10 to get
>28410, which comes before 2849 in sting compare. Using sprintf("%4d",...)
>- " 284 9" is compared to " 284 10" and works properly in this context.



I just changed and tested:
my $aval=''; map { $aval=$aval.sprintf("%4d",$aSuit{$a}{$_}); }
@f_seq;
my $bval=''; map { $bval=$bval.sprintf("%4d",$aSuit{$b}{$_}); }
@f_seq;
to:
my $aval=''; foreach $f (@f_seq) {
$aval=$aval.sprintf("%4d",$aSuit{$a}{$f}); }
my $bval=''; foreach $f (@f_seq) {
$bval=$bval.sprintf("%4d",$aSuit{$b}{$f}); }

For my script "foreach" took 195 seconds, compared to 160 seconds using
"map". We're headed in the wrong direction here!

FYI: I added a counter to cmpSuit function - it gets called 14.2 million
times on our small test database.

John W. Krahn

2005-10-23, 9:55 pm

Frank Bax wrote:
> At 02:11 PM 10/23/05, John W. Krahn wrote:
>
>
>
> Your suggested code changes don't work when the list of numbers on each
> side of comparison have different number of digits - that's why I
> initially introduced sprintf - so all numbers would use 4
> digits/characters.


my $format = '%4d' x @f_seq;
sprintf( $format, @{ $aSuit{ $a } }{ @f_seq } ) cmp sprintf( $format, @{
$aSuit{ $b } }{ @f_seq } );



John
--
use Perl;
program
fulfillment
Jeff 'japhy' Pinyan

2005-10-23, 9:55 pm

On Oct 23, Frank Bax said:

> my $aval=''; map { $aval=$aval.sprintf("%4d",$aSuit{$a}{$_}); }
> @f_seq;


> my $aval=''; foreach $f (@f_seq) {
> $aval=$aval.sprintf("%4d",$aSuit{$a}{$f}); }


You should be using $aval .= here, instead of $aval = $aval . .... And as
John has shown, join() is even better.

--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://www.perlmonks.org/ % have long ago been overpaid?
http://princeton.pm.org/ % -- Meister Eckhart
Frank Bax

2005-10-24, 6:56 pm

At 10:07 PM 10/23/05, Jeff 'japhy' Pinyan wrote:

>On Oct 23, Frank Bax said:
>
>
>
>You should be using $aval .= here, instead of $aval = $aval . .... And as
>John has shown, join() is even better.



Right, but I wasn't able to integrate sprintf and join together (I've
figured it out since). This code (from John) runs in 115 seconds (map was
160, foreach was 195)!

my $format = '%4d' x @f_seq;
sprintf( $format, @{ $aSuit{ $a } }{ @f_seq } ) cmp sprintf( $format,
@{ $aSuit{ $b } }{ @f_seq } );

THANKS - It has syntax elements that are new to me - I would never have
come up with this on my own.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com