For Programmers: Free Programming Magazines  


Home > Archive > PerlTk > May 2005 > dealing with large files in perl









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author dealing with large files in perl
Tester

2005-05-15, 3:57 pm

Good day folks,

I'm looking for some help and hoping to get some tips back. here's the
situation:

i've two large files containing one unique value in each file which is
the id however, i have to go through each file and search for the
unique value in file1 and file2. if value exist in file1 and found in
file2 and take both lines containing that value and merge into one
single file. also, i'd like to try out option to read file in memory
not using an array as the holding the data into memory may crash my
system due to memory limiation. Please advice.

<snip>
sub processf() {

unless (open(FILER, "$out1")) {
die ("cannot open input file out1\n");
}

unless (open(FILER2, "$out2")) {
die ("cannot open input file out2\n");
}

$newfile="/tmp/results-".getppid.".tmp";
open (FILEW,"+>$newfile"); # --> output file !!

while ($line=<FILER> ) {
chop($line);
($myid,$myup,$mydown) = split(' ',$line);
}
while ($item=<FILER2> ) {
chop($item);
($myid, $myup1, $mydown1) = split(" ",$item);

#this doesn't work..
print "$myid $myup $mydown $myid1 $myup1 $mydown1\n";
}
# this is what i want but only return one single value
print "$myid $myup $mydown $myid1 $myup1 $mydown1\n";


close(FILER);
close(FILER2);
close(FILEW);
}

Dean Arnold

2005-05-15, 8:56 pm

Tester wrote:
> Good day folks,
>
> I'm looking for some help and hoping to get some tips back. here's the
> situation:
>
> i've two large files containing one unique value in each file which is
> the id however, i have to go through each file and search for the
> unique value in file1 and file2. if value exist in file1 and found in
> file2 and take both lines containing that value and merge into one
> single file. also, i'd like to try out option to read file in memory
> not using an array as the holding the data into memory may crash my
> system due to memory limiation. Please advice.
>
> <snip>
> sub processf() {
>

<snip>

Unless your script starts with "use Tk;",
you're probably asking the wrong group. Try
c.l.p.misc, or better still www.perlmonks.org,
I think I recently saw a thread there regarding
this same issue.

Dean Arnold
Presicient Corp.
Ala Qumsieh

2005-05-16, 3:57 am

Tester wrote:

> i've two large files containing one unique value in each file which is


how large is "large"? In general, Perl has no limitation on the sizes of
the files it can handle. It is only limited by the OS and the amount of
available memory.

> the id however, i have to go through each file and search for the
> unique value in file1 and file2. if value exist in file1 and found in
> file2 and take both lines containing that value and merge into one
> single file. also, i'd like to try out option to read file in memory
> not using an array as the holding the data into memory may crash my
> system due to memory limiation. Please advice.


Did you try it? Did it actually crash your system when you did?

> <snip>
> sub processf() {
>
> unless (open(FILER, "$out1")) {
> die ("cannot open input file out1\n");
> }


The more Perlish way to do this (which is completely identical to your
code) is:

open FILER, $out1 or die "Cannot open input file $out1\n";

> unless (open(FILER2, "$out2")) {
> die ("cannot open input file out2\n");
> }
>
> $newfile="/tmp/results-".getppid.".tmp";


Aside: You can replace getppid() with the builtin $$ variable.

> open (FILEW,"+>$newfile"); # --> output file !!
>
> while ($line=<FILER> ) {
> chop($line);


Use chomp() instead of chop(). It is safer in your case.

> ($myid,$myup,$mydown) = split(' ',$line);
> }


Here, you are done with your while() loop, and your three variables are
set to the the first three non-space character sequences of the *LAST*
line in $out1.

> while ($item=<FILER2> ) {
> chop($item);
> ($myid, $myup1, $mydown1) = split(" ",$item);
>
> #this doesn't work..


What doesn't work? The print()? I bet that works :)

> print "$myid $myup $mydown $myid1 $myup1 $mydown1\n";
> }
> # this is what i want but only return one single value
> print "$myid $myup $mydown $myid1 $myup1 $mydown1\n";
>
>
> close(FILER);
> close(FILER2);
> close(FILEW);
> }


You need to give us more info if we are to help you. What are the
contents of the two files? how do you identify the "unique" values in
them? What do those values refer to, and how do they look like?
Show us an example of the two files, and what you want the output file
to look like, and we might be able to better help you.

--Ala

John W. Krahn

2005-05-16, 8:57 am

Ala Qumsieh wrote:
> Tester wrote:
>
> Aside: You can replace getppid() with the builtin $$ variable.


getppid() returns the *parent* PID while $$ contains the PID of the current
process and besides you should use File::Temp for temporary file names.

[color=darkred]
>
> Use chomp() instead of chop(). It is safer in your case.
>

If you are using split(' ',$line) then using chomp() as well is redundant as
split removes *all* whitespace.



John
--
use Perl;
program
fulfillment
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com