For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > January 2006 > Need some help and direction









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Need some help and direction
Daniel Gladstone

2006-01-10, 4:02 am

I thought this would be easy but I can not get it to work - can someone
please help me:

Problem: I have a file of 7.5 million records that are pipe delimted, the
first field is a record
number. I want to search for around 10 records with a specific record number
and if they
meet that condition, output them to a secondary output. And if it does not
meet the record
number criteria, output to the primary output.

Code so far(Please don't laugh - I am a newbie)

#!/usr/bin/perl -w
use strict;
# pullbad filename fieldtocount
die "usage: count <filename> <count fldnum> \n" unless (@ARGV == 2);
my ($filename, $cntfldnum) = @ARGV;
$cntfldnum--; # zero-base fldnum
my $records = 0; # count records processed

open(IN, $filename) or die "could not open $filename $!\n";
my %count = (); # hash the count key and value here
while (<IN> ) {
$records++;
my @rec=split(/\|/,$_);
#$count{$rec[$cntfldnum]}++;
if ($rec[$cntfldnum] =~ m/\D/g) {
print STDERR $rec[$cntfldnum] . " -> " . $rec[0] . " record # = " .
$records . "\n";
}



Daniel Gladstone
Email: dgeehot@hotmail.com


JupiterHost.Net

2006-01-10, 4:02 am



Daniel Gladstone wrote:
> I thought this would be easy but I can not get it to work - can someone
> please help me:
>
> Problem: I have a file of 7.5 million records that are pipe delimted,
> the first field is a record
> number. I want to search for around 10 records with a specific record
> number and if they
> meet that condition, output them to a secondary output. And if it does
> not meet the record
> number criteria, output to the primary output.
>
> Code so far(Please don't laugh - I am a newbie)


You used -w and strict, thats awesome!!! I might recommend Damian
Conway's "Perl Best Practices" to help you get started on the right foot :)

> #!/usr/bin/perl -w
> use strict;
> # pullbad filename fieldtocount
> die "usage: count <filename> <count fldnum> \n" unless (@ARGV == 2);
> my ($filename, $cntfldnum) = @ARGV;
> $cntfldnum--; # zero-base fldnum
> my $records = 0; # count records processed
>
> open(IN, $filename) or die "could not open $filename $!\n";
> my %count = (); # hash the count key and value here
> while (<IN> ) {
> $records++;
> my @rec=split(/\|/,$_);
> #$count{$rec[$cntfldnum]}++;
> if ($rec[$cntfldnum] =~ m/\D/g) {
> print STDERR $rec[$cntfldnum] . " -> " . $rec[0] . " record # = " .
> $records . "\n";
> }


So what seems to be the trouble?
usenet@DavidFilmer.com

2006-01-10, 4:02 am

Daniel Gladstone wrote:
> if ($rec[$cntfldnum] =~ m/\D/g) {


I think you want this to say something like:
if ($rec[0] == $cntfldnum) {

@rec is an array (obtained from your split) and $rec[0] is the first
element of that array (ie, your first field, which is your record
number). The regexp you specified (m/\D/g) will match any string which
contains at least one non-numeric character (probably NOT what you
intended) - instead, you want to know if the first record is equal (==)
to the field number you specified ($cntfldnum) - this assues that your
record numbers are actually numeric (use 'eq' instead of '==' if they
are alphanumeric).

You may have other problems as well that I didn't notice as I glanced
at your code, but that jumped out at me.

Your code seems to be looking for "around 10 records" which all have
the SAME record number, BTW. I hope that's what you really want to do.

Oh, also, another poster praised you for "-w", but current Perl Best
Practices recommend "use warnings;" instead of "-w" (they aren't quite
the same).

--
http://DavidFilmer.com

Shawn Corey

2006-01-10, 4:02 am

Daniel Gladstone wrote:
> I thought this would be easy but I can not get it to work - can someone
> please help me:
>
> Problem: I have a file of 7.5 million records that are pipe delimted,
> the first field is a record
> number. I want to search for around 10 records with a specific record
> number and if they
> meet that condition, output them to a secondary output. And if it does
> not meet the record
> number criteria, output to the primary output.
>
> Code so far(Please don't laugh - I am a newbie)
>
> #!/usr/bin/perl -w
> use strict;
> # pullbad filename fieldtocount
> die "usage: count <filename> <count fldnum> \n" unless (@ARGV == 2);
> my ($filename, $cntfldnum) = @ARGV;
> $cntfldnum--; # zero-base fldnum
> my $records = 0; # count records processed
>
> open(IN, $filename) or die "could not open $filename $!\n";
> my %count = (); # hash the count key and value here
> while (<IN> ) {
> $records++;
> my @rec=split(/\|/,$_);
> #$count{$rec[$cntfldnum]}++;
> if ($rec[$cntfldnum] =~ m/\D/g) {
> print STDERR $rec[$cntfldnum] . " -> " . $rec[0] . " record # = " .
> $records . "\n";
> }


Your code doesn't seem to match the problem you stated. What does
$cntfldnum contain? If you are searching, as you stated, for a record
number then the match pattern should be for column zero: $rec[0] =~
/$pattern/ Could you please clarify your intent?


--

Just my 0.00000002 million dollars worth,
--- Shawn

"Probability is now one. Any problems that are left are your own."
SS Heart of Gold, _The Hitchhiker's Guide to the Galaxy_

* Perl tutorials at http://perlmonks.org/?node=Tutorials
* A searchable perldoc is available at http://perldoc.perl.org/
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com