Home > Archive > PERL Miscellaneous > July 2005 > Split line into an array vs multiple strings
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Split line into an array vs multiple strings
|
|
| scottmf 2005-07-27, 5:05 pm |
| Can anyone explain why when I am reading in a file and saving the data
to a 2-d array it is faster if I split each line into an array rather
than a group of strings? Also why with each subsequent line I read in
does it take longer to process with the strings, whereas with the array
it takes the same amount of time for each line?
Thanks,
Scott
#!/usr/local/bin/perl
#
use Benchmark;
use strict;
# Create Sample File (sample.txt) and Array (@sample)
open(SAMPLE,'>sample.txt');
for (my $i=0;$i<20000;$i++) {
my $line = "abc"." "."def"." ".rand()." ".rand()." ".rand()."
".rand()." ".rand()."\n";
print SAMPLE $line;
}
close(SAMPLE);
# Count how long it takes to run each each version
my $count = 10;
timethese $count, {
'string_test' => \&string_test,
'array_test' => \&array_test
};
sub string_test{
my @array;
my $i;
open(SAMPLE, "sample.txt");
while(my $line = <SAMPLE> ){
chomp($line);
my($el1, $el2, $el3, $el4, $el5, $el6, $el7) = split/\s+/,$line;
$array[$i][0] = $el1;
$array[$i][1] = $el2;
$array[$i][2] = $el3;
$array[$i][3] = $el4;
$array[$i][4] = $el5;
$array[$i][5] = $el6;
$array[$i][6] = $el7;
$i++;
}
close(SAMPLE);
}
sub array_test{
my @array;
my $i;
open(SAMPLE, "sample.txt");
while(my $line = <SAMPLE> ){
chomp($line);
my @line_data = split/\s+/, $line;
$array[$i][0] = $line_data[0];
$array[$i][1] = $line_data[1];
$array[$i][2] = $line_data[2];
$array[$i][3] = $line_data[3];
$array[$i][4] = $line_data[4];
$array[$i][5] = $line_data[5];
$array[$i][6] = $line_data[6];
$i++;
}
close(SAMPLE);
}
returns:
Benchmark: timing 10 iterations of array_test, string_test...
array_test: 4 wallclock secs ( 4.30 usr + 0.00 sys = 4.30 CPU) @
2.33/s (n=10)
string_test: 18 wallclock secs (18.00 usr + 0.00 sys = 18.00 CPU) @
0.56/s (n=10)
| |
| John W. Krahn 2005-07-27, 10:01 pm |
| scottmf wrote:
> Can anyone explain why when I am reading in a file and saving the data
> to a 2-d array it is faster if I split each line into an array rather
> than a group of strings?
I can't explain it because on my computer the "string" version runs faster.
> Also why with each subsequent line I read in
> does it take longer to process with the strings, whereas with the array
> it takes the same amount of time for each line?
>
>
> #!/usr/local/bin/perl
> #
> use Benchmark;
> use strict;
>
> # Create Sample File (sample.txt) and Array (@sample)
> open(SAMPLE,'>sample.txt');
> for (my $i=0;$i<20000;$i++) {
> my $line = "abc"." "."def"." ".rand()." ".rand()." ".rand()."
> ".rand()." ".rand()."\n";
> print SAMPLE $line;
> }
> close(SAMPLE);
>
> # Count how long it takes to run each each version
> my $count = 10;
> timethese $count, {
> 'string_test' => \&string_test,
> 'array_test' => \&array_test
> };
>
> sub string_test{
> my @array;
> my $i;
> open(SAMPLE, "sample.txt");
> while(my $line = <SAMPLE> ){
> chomp($line);
> my($el1, $el2, $el3, $el4, $el5, $el6, $el7) = split/\s+/,$line;
> $array[$i][0] = $el1;
> $array[$i][1] = $el2;
> $array[$i][2] = $el3;
> $array[$i][3] = $el4;
> $array[$i][4] = $el5;
> $array[$i][5] = $el6;
> $array[$i][6] = $el7;
> $i++;
> }
> close(SAMPLE);
> }
The usual way to do something like that in perl is:
sub some_test {
my @array;
open SAMPLE, '<', 'sample.txt' or die "Cannot open 'sample.txt' $!";
while ( <SAMPLE> ) {
push @array, [ split ];
}
close SAMPLE;
}
Which is a bit faster then your two examples.
And if you need to limit it to only the first seven fields:
sub some_test {
my @array;
open SAMPLE, '<', 'sample.txt' or die "Cannot open 'sample.txt' $!";
while ( <SAMPLE> ) {
push @array, [ ( split )[ 0 .. 6 ] ];
}
close SAMPLE;
}
John
--
use Perl;
program
fulfillment
| |
| scottmf 2005-07-27, 10:01 pm |
| I ran some more tests starting with an input file of 10000 lines, and
increasing the filesize by 10000 lines for each benchmark, and I get
the following.
At this rate if my input file had 80000 lines it would take the string
method almost 30 times longer than the array method to just grab the
data. Also does anyone know why in the benchmark comparison the first
column changes from iterations per second to seconds per iteration?
Benchmark: timing 10 iterations of array_test, string_test...
array_test: 2 wallclock secs ( 2.09 usr + 0.02 sys = 2.11 CPU) @
4.74/s (n=10)
string_test: 6 wallclock secs ( 5.17 usr + 0.01 sys = 5.19 CPU) @
1.93/s (n=10)
Rate string_test array_test
string_test 1.93/s -- -59%
array_test 4.74/s 146% --
Benchmark: timing 10 iterations of array_test, string_test...
array_test: 4 wallclock secs ( 4.20 usr + 0.03 sys = 4.23 CPU) @
2.36/s (n=10)
string_test: 17 wallclock secs (16.52 usr + 0.02 sys = 16.53 CPU) @
0.60/s (n=10)
s/iter string_test array_test
string_test 1.65 -- -74%
array_test 0.423 290% --
Benchmark: timing 10 iterations of array_test, string_test...
array_test: 6 wallclock secs ( 6.31 usr + 0.02 sys = 6.33 CPU) @
1.58/s (n=10)
string_test: 39 wallclock secs (39.33 usr + 0.11 sys = 39.44 CPU) @
0.25/s (n=10)
s/iter string_test array_test
string_test 3.94 -- -84%
array_test 0.633 523% --
Benchmark: timing 10 iterations of array_test, string_test...
array_test: 8 wallclock secs ( 8.39 usr + 0.03 sys = 8.42 CPU) @
1.19/s (n=10)
string_test: 84 wallclock secs (83.25 usr + 0.05 sys = 83.30 CPU) @
0.12/s (n=10)
s/iter string_test array_test
string_test 8.33 -- -90%
array_test 0.842 889% --
|
|
|
|
|