Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

dealing with large files in perl
Good day folks,

I'm looking for some help and hoping to get some tips back. here's the
situation:

i've two large files containing one unique value in each file which is
the id however, i have to go through each file and search for the
unique value in file1 and file2. if value exist in file1 and found in
file2 and take both lines containing that value and merge into one
single file. also, i'd like to try out option to read file in memory
not using an array as the holding the data into memory may crash my
system due to memory limiation. Please advice.

<snip>
sub processf() {

unless (open(FILER, "$out1")) {
die ("cannot open input file out1\n");
}

unless (open(FILER2, "$out2")) {
die ("cannot open input file out2\n");
}

$newfile="/tmp/results-".getppid.".tmp";
open (FILEW,"+>$newfile");  # --> output file !!

while ($line=<FILER> ) {
chop($line);
($myid,$myup,$mydown) = split(' ',$line);
}
while ($item=<FILER2> ) {
chop($item);
($myid, $myup1, $mydown1) = split(" ",$item);

#this doesn't work..
print "$myid $myup $mydown $myid1 $myup1 $mydown1\n";
}
# this is what i want but only return one single value
print "$myid $myup $mydown $myid1 $myup1 $mydown1\n";


close(FILER);
close(FILER2);
close(FILEW);
}


Report this thread to moderator Post Follow-up to this message
Old Post
Tester
05-15-05 08:57 PM


Re: dealing with large files in perl
Tester wrote:
> Good day folks,
>
> I'm looking for some help and hoping to get some tips back. here's the
> situation:
>
> i've two large files containing one unique value in each file which is
> the id however, i have to go through each file and search for the
> unique value in file1 and file2. if value exist in file1 and found in
> file2 and take both lines containing that value and merge into one
> single file. also, i'd like to try out option to read file in memory
> not using an array as the holding the data into memory may crash my
> system due to memory limiation. Please advice.
>
> <snip>
> sub processf() {
>
<snip>

Unless your script starts with "use Tk;",
you're probably asking the wrong group. Try
c.l.p.misc, or better still www.perlmonks.org,
I think I recently saw a thread there regarding
this same issue.

Dean Arnold
Presicient Corp.

Report this thread to moderator Post Follow-up to this message
Old Post
Dean Arnold
05-16-05 01:56 AM


Re: dealing with large files in perl
Tester wrote:

> i've two large files containing one unique value in each file which is

how large is "large"? In general, Perl has no limitation on the sizes of
the files it can handle. It is only limited by the OS and the amount of
available memory.

> the id however, i have to go through each file and search for the
> unique value in file1 and file2. if value exist in file1 and found in
> file2 and take both lines containing that value and merge into one
> single file. also, i'd like to try out option to read file in memory
> not using an array as the holding the data into memory may crash my
> system due to memory limiation. Please advice.

Did you try it? Did it actually crash your system when you did?

> <snip>
> sub processf() {
>
> unless (open(FILER, "$out1")) {
> die ("cannot open input file out1\n");
> }

The more Perlish way to do this (which is completely identical to your
code) is:

open FILER, $out1 or die "Cannot open input file $out1\n";

> unless (open(FILER2, "$out2")) {
> die ("cannot open input file out2\n");
> }
>
> $newfile="/tmp/results-".getppid.".tmp";

Aside: You can replace getppid() with the builtin $$ variable.

> open (FILEW,"+>$newfile");  # --> output file !!
>
>         while ($line=<FILER> ) {
>                 chop($line);

Use chomp() instead of chop(). It is safer in your case.

>                 ($myid,$myup,$mydown) = split(' ',$line);
>         }

Here, you are done with your while() loop, and your three variables are
set to the the first three non-space character sequences of the *LAST*
line in $out1.

>                 while ($item=<FILER2> ) {
>                 chop($item);
>                 ($myid, $myup1, $mydown1) = split(" ",$item);
>
>                 #this doesn't work..

What doesn't work? The print()? I bet that works :)

>                 print "$myid $myup $mydown $myid1 $myup1 $mydown1\n";
>         }
>                 # this is what i want but only return one single value
>                 print "$myid $myup $mydown $myid1 $myup1 $mydown1\n";
>
>
> close(FILER);
> close(FILER2);
> close(FILEW);
> }

You need to give us more info if we are to help you. What are the
contents of the two files? how do you identify the "unique" values in
them? What do those values refer to, and how do they look like?
Show us an example of the two files, and what you want the output file
to look like, and we might be able to better help you.

--Ala


Report this thread to moderator Post Follow-up to this message
Old Post
Ala Qumsieh
05-16-05 08:57 AM


Re: dealing with large files in perl
Ala Qumsieh wrote:
> Tester wrote: 
>
> Aside: You can replace getppid() with the builtin $$ variable.

getppid() returns the *parent* PID while $$ contains the PID of the current
process and besides you should use File::Temp for temporary file names.

 
>
> Use chomp() instead of chop(). It is safer in your case.
> 

If you are using split(' ',$line) then using chomp() as well is redundant as
split removes *all* whitespace.



John
--
use Perl;
program
fulfillment

Report this thread to moderator Post Follow-up to this message
Old Post
John W. Krahn
05-16-05 01:57 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

PerlTk archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 10:02 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.