Home > Archive > PERL Beginners > November 2005 > Loading a text file into memory
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Loading a text file into memory
|
|
|
| Hi ,
This is my first time.
I want to load into memory a text table +- 250 columns by a number of
rows and is space separated.
I know I have to get the file in and then split it with split(/ /,???)
and then get it to become an aray of lists perhaps.
I have studied "Sams Teach yourself Perl in 24 hours" and perldocs but
can't quite put it all together.
Also how can I usefully read (lookup) data from this table after that.
Perhaps ther's a mixture of index and vlookup and match like excel
which is all(only program) I seem to be able to figure out anyway.
A reference to areas in perldocs would help but guideline, tips and
code fragments etc. would help me a lot.
thanks for your help.
jlb
| |
| Marcel 2005-11-23, 6:56 pm |
| Hi jlb
try this:
open (FH), "filename") or die "could not open file$!";
my @lines =<FH>;
close (FH);
Now you got all lines in array @lines.
Marcel
jlb schrieb:
> Hi ,
> This is my first time.
> I want to load into memory a text table +- 250 columns by a number of
> rows and is space separated.
> I know I have to get the file in and then split it with split(/ /,???)
> and then get it to become an aray of lists perhaps.
> I have studied "Sams Teach yourself Perl in 24 hours" and perldocs but
> can't quite put it all together.
> Also how can I usefully read (lookup) data from this table after that.
> Perhaps ther's a mixture of index and vlookup and match like excel
> which is all(only program) I seem to be able to figure out anyway.
> A reference to areas in perldocs would help but guideline, tips and
> code fragments etc. would help me a lot.
> thanks for your help.
> jlb
| |
| usenet@DavidFilmer.com 2005-11-23, 6:56 pm |
| jlb wrote:
> I want to load into memory a text table +- 250 columns by a number of
> rows and is space separated.
Oh goodie, another chance to highlight my favorite module, IO::All. I
highly recommend this module, especially to beginners, because a little
time spent understanding this module will save LOTS of frustration with
many I/O issues (especially when we advance to "find" tasks, etc). You
have one module to learn instead of trying to learn the all of the
regular Perl core I/O operations (open, close, readdir, etc) AND
various modules like File::Find and File::Slurp, etc, etc, etc.
Everything is easier with IO::All, the Swiss Army Knife of I/O.
Observe:
#!/usr/bin/perl
use strict; use warnings;
use IO::All;
my @stuff = io('/path/to/my/file.txt') -> slurp;
__END__
Easy, huh? One line of code reads the whole file into an array, and
basic error handling is provided by the IO::All module (try to open a
file that doesn't exist and see what happens!). And the module closes
the file for you.
But there's no reason to read the file into memory. IO::All is also a
(transparent) proxy to Tie::File, so you can do this instead:
my $io = io('path/to/my/file.txt') -> new;
Now $io is an "object" which is "tied" to the file. So you can treat
the whole file as an ordinary array WITHOUT reading it into memory:
for (@$io) {
print;
}
So you can treat HUGE files as arrays without reading them into memory
(which may overwhelm your system).
You can modify the array:
push @$io, "New Line of Text";
and it will modify the file directly (and immediately)!!! You can
change the seventh line:
$io[6] = "This is now the seventh line";
and the seventh line is changed on-disk (right now, with error handling
provided by the module).
============
OK, moving on... the OP was interested in how to access the columns.
The OP suggested reading the file and then splitting all of it into one
big data structure in memory. I say this is unnecessary. You already
have an "array" (as a tied IO::All object) and you can split as-needed
(no need to split everything up beforehand). For example, to print the
value of column 114 of line 95:
print (( split /\s+/, @$io[94] )[113]);
(remember arrays begin numbering at zero, not one).
To print column 8 and 95 of lines 44-55:
print (( split /\s+/, @$io[$_] )[7,94]) for (43..54);
Notice that NOTHING is in memory. This is all happening on-disk, so it
works for small files as well as really, really huge files (consistent
technique for both cases). Any changes made to the array are reflected
on-disk (since the array is really on-disk).
The price you pay for the convenience and power of IO::All is that the
performance is not as fast as using low-level core modules. For most
situations (and especially on modern hardware) this is not an issue.
| |
|
| Thanks guys, I try these.
jlb(John Le Brasseur)
| |
|
| Can I ask just one more thing : )).
I have not tested above replies yet but having the data availav\ble in
some way, and say that column 37 is a key to the data(or even say
columns 34, 35, 36 and 37 to be realistic) how can I find a key entry
("A03A") and then read off colums 39,40 53, 55 etc?
what is "OP" above?
thanks very much for everone's trouble.
| |
| usenet@DavidFilmer.com 2005-11-27, 3:55 am |
| jlb wrote:
> Can I ask just one more thing : )).
Sure.
> I have not tested above replies yet but having the data availav\ble in
> some way, and say that column 37 is a key to the data(or even say
> columns 34, 35, 36 and 37 to be realistic) how can I find a key entry
> ("A03A") and then read off colums 39,40 53, 55 etc?
That's not one more thing - that's a whole 'nuther thing. Now you
really ought to put your entire data structure into memory (if it's not
too huge). The data structure you use to store your data will be a
hash of hashes. It would look something like this (data has been
simplified for illustration):
#!/usr/bin/perl
use warnings; use strict;
my $key_col = 2; #which column is the key
my %data; #a hash for all our data
while (<DATA> ) {
my $count = 0;
@_ = split;
$data{$_[$key_col - 1]}{++$count} = $_ for @_;
}
die $data{'colc2'}{5};
__DATA__
cola1 cola2 cola3 cola4 cola5
colb1 colb2 colb3 colb4 colb5
colc1 colc2 colc3 colc4 colc5
cold1 cold2 cold3 cold4 cold5
----------------------------------------------------
Now you will see, for example:
$data{'colb2'}{3} eq 'colb3'
$data{'cold2'}{1} eq 'cold1'
FWIW, the key is repeated as a value, ie:
$data{'cola2'}{2} eq 'cola2'
> what is "OP" above?
OP eq "Original Poster". That would be you.
|
|
|
|
|