For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > June 2006 > reading a line at a time inefficient?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author reading a line at a time inefficient?
Bryan Harris

2006-05-16, 7:00 pm



If I'm reading in many-megabyte files, is it considered to be more efficient
to read it into an array, then loop over the array? Or is reading a line at
a time okay?

e.g.

**************************************
while (<> ) {
# do some process with each line
}
**************************************

or...

**************************************
@lines = <>;
foreach (@lines) {
# do some process with each line
}
**************************************

I realize the second will use more memory, but what's a few megabytes in
today's computers? I'm more worried about the OS having to go back to the
disk a couple hundred-thousand times -- seems like it'd be hard on the disk.

TIA.

- Bryan


Wagner, David --- Senior Programmer Analyst --- WG

2006-05-16, 7:00 pm

Bryan Harris wrote:
> If I'm reading in many-megabyte files, is it considered to be more
> efficient to read it into an array, then loop over the array? Or is
> reading a line at a time okay?
>=20

Depends really on the size and what you trying to do. Almost all that I do=
, I read a line at time, but others will swallow in the file. Most of my re=
gex's deal with single lines or a single line will start another set of seq=
uences.

Wags ;)
> e.g.
>=20
> **************************************
> while (<> ) {
> # do some process with each line
> }
> **************************************
>=20
> or...
>=20
> **************************************
> @lines =3D <>;
> foreach (@lines) {
> # do some process with each line
> }
> **************************************
>=20
> I realize the second will use more memory, but what's a few megabytes
> in today's computers? I'm more worried about the OS having to go
> back to the disk a couple hundred-thousand times -- seems like it'd
> be hard on the disk.=20
>=20
> TIA.
>=20
> - Bryan



****************************************
******************************
This message contains information that is confidential and proprietary to F=
edEx Freight or its affiliates. It is intended only for the recipient name=
d and for the express purpose(s) described therein. Any other use is proh=
ibited.
****************************************
******************************

Tom Phoenix

2006-05-16, 7:00 pm

On 5/16/06, Bryan Harris <lists@harrisfam.net> wrote:

> If I'm reading in many-megabyte files, is it considered to be more effici=

ent
> to read it into an array, then loop over the array? Or is reading a line=

at
> a time okay?


Processing a file one line at a time is always okay.

> I realize the second will use more memory, but what's a few megabytes in
> today's computers?


A few megabytes here, some more megabytes there, pretty soon you're
talking about a lot of memory. Remember, the file will take up even
more space in memory than it does on disk. If your system runs low,
some memory will be swapped out to disk=97and oops now you're using the
disk anyway.

> I'm more worried about the OS having to go back to the
> disk a couple hundred-thousand times -- seems like it'd be hard on the di=

sk.

Don't worry about being hard on the disk. Put it completely out of
your mind. Perl and the OS take care of low-level crud like that, so
you don't have to.

(If I weren't telling you to forget about this, I'd mention that Perl
and the OS read the file one "block" at a time into a memory buffer,
and thus the disk doesn't do any extra work to allow your program to
see the data one line at a time.)

Hope this helps!

--Tom Phoenix
Stonehenge Perl Training
Michael Goldshteyn

2006-05-16, 7:00 pm

In a nutshell, use File::Slurp to read the entire file all at once.



Dr.Ruud

2006-05-16, 7:00 pm

Bryan Harris schreef:

> If I'm reading in many-megabyte files, is it considered to be more
> efficient to read it into an array, then loop over the array?


Line-by-line is fine.

Your Operating System will read from disk in blocks, so stop believing
that each line needs a disk access.
(unless each line is about the size of a block, but blocks can easily be
2 MB)


> I realize the [for-loop] will use more memory, but what's a few

megabytes
> in today's computers?


That can really byte, when there are only a few megabytes left to spare.
You don't want a memory fight in your system, because that slows down
all processes considerably, because it will start swapping memory blocks
to disk for a moment, then read them back in, etc.

--
Affijn, Ruud

"Gewoon is een tijger."


Peter Scott

2006-05-17, 6:58 pm

On Tue, 16 May 2006 09:18:14 -0700, Bryan Harris wrote:
> If I'm reading in many-megabyte files, is it considered to be more efficient
> to read it into an array, then loop over the array? Or is reading a line at
> a time okay?


Focus on *your* efficiency first, not the computer's. If you're doing
line-oriented processing, read it a line at a time. If you're doing
pattern matching that spans line boundaries, you'll probably want to read
the whole thing in.

> I realize the second will use more memory, but what's a few megabytes in
> today's computers? I'm more worried about the OS having to go back to the
> disk a couple hundred-thousand times -- seems like it'd be hard on the disk.


Don't worry about the implementation - smart people have done that for you
in optimizing the underlying code already. If you don't benchmark
different ways of doing something then your guesses as to which is more
efficient than another are unreliable. And don't bother benchmarking
until you find out that performance improvement is necessary.

--
Peter Scott
http://www.perlmedic.com/
http://www.perldebugged.com/

M. Kristall

2006-05-17, 6:58 pm

Bryan R Harris wrote:
> I figured the OS would load the file in blocks, but I thought the blocks
> might only be 12k or something like that.

Block sizes are often chosen based on average files sizes. Where there
will be lots of small files, smaller block sizes - perhaps 12KB - are
used, and the longer it takes to read large files. With larger blocks,
it will take less time to read larger files, but the amount of space
required to store small files is much greater (who wants a 4KB files
taking up 4MB?).
> I am surprised at the levels of concern over memory reading in <20 MB files.
> Don't most people have 1+ GB now? I've got 2... I'm just surprised that
> using at most 1% of my total ram would be a concern.

The most common amount of RAM on a prebuilt midrange computer is 512MB
today. This is sometimes used partially for video and BIOS caching
(maybe 128MB). The OS itself might easily use another 128MB, so that
leaves about 256MB for programs to share.

The reason people get more RAM is usually because they use it. In the
other room, I have a desktop system with 2GB RAM and it hardly ever has
512MB to spare.


If you don't need to read the look at the same line multiple times, it
almost always makes sense to read the file line by line. And it may even
be faster because you won't have to wait for the entire file to be
slurped before doing anything.
If you do need to read the file non-sequentially or to look at the same
lines multiple times, it might make sense to slurp the entire file. But
if you 'undef $/' first, your program will be horribly slow - copying
large amounts of memory multiple times can slow things down a lot.
Omega -1911

2006-06-24, 8:01 am

Hello list!

I am attempting to lower the memory load on the server that the following
lines of code creates. Is there any way to speed up this process and lower
memory usage? I read through FILE::SLURP documentation but not sure if that
would help as I need to keep the array @remaining_file_lines.

NOTE: Each file has a different size (ranging from 2kb up to 900mb)

open FILE, "$file.txt"; # $file is untainted by code before we open the
file
my ($data,$data1,$data2,$data3,$data4,@rema
ining_file_lines) = <FILE>;
close FILE; chomp
($data,$data1,$data2,$data3,$data4,@rema
ining_file_lines);
return ($data,$data1,$data2,$data3,$data4,@rema
ining_file_lines);

TIA !!!!
-David

Jeff Peng

2006-06-24, 8:01 am

Hello,
Reading a file which is large as 900M to the array,should consume memory too
quickly.
Could you open a file and obtain the file-handle in your subroutine,then
return the file-handle to the caller?For example:

sub your_sub{
....
open (FH,$somefile) or die $!;
return \*FH;
}


>From: "Omega -1911" <1911que@gmail.com>
>To: "Beginners Perl" <beginners@perl.org>
>Subject: Re: reading a line at a time inefficient?
>Date: Fri, 23 Jun 2006 02:14:52 -0400
>
>Hello list!
>
>I am attempting to lower the memory load on the server that the following
>lines of code creates. Is there any way to speed up this process and lower
>memory usage? I read through FILE::SLURP documentation but not sure if that
>would help as I need to keep the array @remaining_file_lines.
>
>NOTE: Each file has a different size (ranging from 2kb up to 900mb)
>
>open FILE, "$file.txt"; # $file is untainted by code before we open the
>file
>my ($data,$data1,$data2,$data3,$data4,@rema
ining_file_lines) = <FILE>;
> close FILE; chomp
> ($data,$data1,$data2,$data3,$data4,@rema
ining_file_lines);
> return ($data,$data1,$data2,$data3,$data4,@rema
ining_file_lines);
>
>TIA !!!!
>-David



Omega -1911

2006-06-24, 8:01 am

On 6/23/06, Jeff Peng <peng@dig-tech.com> wrote:
>
> Hello,
> Reading a file which is large as 900M to the array,should consume memory
> too
> quickly.
> Could you open a file and obtain the file-handle in your subroutine,then
> return the file-handle to the caller?For example:
>
> sub your_sub{
> ....
> open (FH,$somefile) or die $!;
> return \*FH;
> }



Thanks for the reply Jeff, but what I need to do is assign variables based
on the first 5 lines and then push the remaining lines of the file into an
array. Any suggestions?

Maybe I am asking this incorrectly. Here is what I am attempting to do:

-Open one file.
-Read that file.
--Line one will be assigned to $data1
--Line two will be " " to $data2
...
--Line five will be " " to $data5
-- The remaining lines pushed into an array...

Really appreciate any help you can give!

-David

Jeff Peng

2006-06-24, 8:01 am


>Thanks for the reply Jeff, but what I need to do is assign variables based
>on the first 5 lines and then push the remaining lines of the file into an
>array. Any suggestions?
>


Anyway,reading all the contents of a large file to an array should consume
too much physical memory.Could you take a look at Tile::File module?see
here:
http://search.cpan.org/~mjd/Tie-Fil...lib/Tie/File.pm


Omega -1911

2006-06-24, 8:01 am

On 6/23/06, Jeff Peng <peng@dig-tech.com> wrote:
>
>
> based
> an
>
> Anyway,reading all the contents of a large file to an array should consume
> too much physical memory.Could you take a look at Tile::File module?see
> here:
> http://search.cpan.org/~mjd/Tie-Fil...lib/Tie/File.pm
>
>
> Thanks Jeff, will give it a try!!!!


Mr. Shawn H. Corey

2006-06-24, 8:01 am

On Fri, 2006-23-06 at 02:39 -0400, Omega -1911 wrote:
> -Open one file.
> -Read that file.
> --Line one will be assigned to $data1
> --Line two will be " " to $data2
> ..
> --Line five will be " " to $data5
> -- The remaining lines pushed into an array...
>
> Really appreciate any help you can give!
>
> -David


Reading the file, non-slurp:

open IN, ...
chomp( my $data1 = <IN> );
chomp( my $data2 = <IN> );
chomp( my $data3 = <IN> );
chomp( my $data4 = <IN> );
chomp( my $data5 = <IN> );
while( <IN> ){
chomp;
...
}
close IN;

I don't know where people got the idea that reading a file one line at a
time is less efficient than slurping it. Both methods use the underlying
OS disk access system calls, where most of the efficiency lies. Unless
you can organize your data on disk to that advantage of more advance
methods, like ISAM or B-trees, just do it the way that makes the most
sense to the program.

Note that these advance methods take advantage of the data having a
certain structure. If your data does have this, these methods are slower
than the general purpose method.


--
__END__

Just my 0.00000002 million dollars worth,
--- Shawn

"For the things we have to learn before we can do them, we learn by doing them."
Aristotle

* Perl tutorials at http://perlmonks.org/?node=Tutorials
* A searchable perldoc is at http://perldoc.perl.org/


Jeff Peng

2006-06-24, 8:01 am

>I don't know where people got the idea that reading a file one line at a
>time is less efficient than slurping it.


Could you take a look at his questions carefully?He NEVER expressd the
meanings as you've said!


Mr. Shawn H. Corey

2006-06-24, 8:01 am

On Fri, 2006-23-06 at 11:02 +0000, Jeff Peng wrote:
>
> Could you take a look at his questions carefully?He NEVER expressd the
> meanings as you've said!
>
>


Let's see, the subject is: reading a line at a time inefficient?

That's where I got the idea.


--
__END__

Just my 0.00000002 million dollars worth,
--- Shawn

"For the things we have to learn before we can do them, we learn by doing them."
Aristotle

* Perl tutorials at http://perlmonks.org/?node=Tutorials
* A searchable perldoc is at http://perldoc.perl.org/


Jeff Peng

2006-06-24, 8:01 am


>
>Let's see, the subject is: reading a line at a time inefficient?
>
>That's where I got the idea.
>
>

No,the subject is "Re: reading a line at a time inefficient? ",he has
followed the other's threads.


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com