For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > December 2007 > How to search a CDR file for duplicates and delete them









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author How to search a CDR file for duplicates and delete them
Henrik Nielsen

2007-12-10, 7:59 am

Hi

I'm almost totally new to Perl! :-)
I need to write a program that search a CDR file for duplicate lines and
then delete them.

This is what I have found out by reading in the Perl documentation and
this newsgroup, but I need a little more help.


I found this program in this newsgroup:

---------------
#!/usr/bin/perl

my $in_file = '/path/my_in_file';

open my $in, '<', $in_file or die "Cannot open '$in_file' $!";
open my $out, '>', 'scripted_out_file' or die "Failed to script file
for duplicates $!";

my %hash;

while ( <$in> ) {
my $key = join ',', ( split /,/ )[ 2, 3, 6, 7 ];
print $out $_ unless $hash{ $key }++;
}

close $out;
close $in;
---------------

This program should take out the cells 2, 3, 6 and 7 in a file split by
comma. My CDR file is split by space.

How can I use 'readline EXPR'?
Any help?... it's probably pretty simpel :-)

...//Henrik

Jeff Pang

2007-12-10, 7:01 pm

On Dec 10, 2007 6:58 PM, Henrik Nielsen <quercus1974@gmail.com> wrote:
> Hi
>
> I'm almost totally new to Perl! :-)
> I need to write a program that search a CDR file for duplicate lines and
> then delete them.
>
> This is what I have found out by reading in the Perl documentation and
> this newsgroup, but I need a little more help.
>


Follow the script you provided, you just need to change two lines for
your new requirement.

change:

> my $key = join ',', ( split /,/ )[ 2, 3, 6, 7 ];
> print $out $_ unless $hash{ $key }++;


to:

print $out $_ unless $hash{$_}++;


Good luck.
Dr.Ruud

2007-12-10, 7:01 pm

Henrik Nielsen schreef:

> I need to write a program that search a CDR file for duplicate lines
> and then delete them.
> [...]
> while ( <$in> ) {
> my $key = join ',', ( split /,/ )[ 2, 3, 6, 7 ];
> print $out $_ unless $hash{ $key }++;
> }
> [...]
> ---------------
> This program should take out the cells 2, 3, 6 and 7 in a file split
> by comma. My CDR file is split by space.


For SP separators specifically, use:

my $key = join ' ', ( split / +/)[ 2, 3, 6, 7 ];


Often this is better:

my $key = join ' ', ( split ' ')[ 2, 3, 6, 7 ];


(a ' ' with split works almost like /\s+/, see `perldoc -f split` about
the details)


> How can I use 'readline EXPR'?


The "<$in>" is a readline().

--
Affijn, Ruud

"Gewoon is een tijger."

Henrik Nielsen

2007-12-10, 10:03 pm

Jeff Pang wrote:
> change:
>
>
> to:
>
> print $out $_ unless $hash{$_}++;
>
>
> Good luck.


How simple! It works :-)
Thx!

Next step to work with file names and dates.

.../Henrik
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com