For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > August 2007 > Removing duplicate records









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Removing duplicate records
Mihir Kamdar

2007-08-01, 7:59 am

Hi,

Need your help with the following:-

I have a csv file having many records.

I want to remove duplicate records. But the record might not be entirely
duplicate. I only have to check if the 2nd, 3rd, 7th and 8th field of a
record is same as the earlier records. If it is same, then remove the
previous or the last entry. I have written something like below to achieve
this.

#!/usr/bin/perl

open(FILE,"</home/user71/RangerDatasource/Customization/TelekomMalaysia/Scripts/Tests/cprogs/files/sample1");

my $line;
my %hash;
my @file;
while ($line=readline(FILE))
{
my @cdr=split (/,/, $line) ;
$hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}="@cdr"; #Add some more cdr key
fields if u want.
}
close FILE ;
open my $f, '>', 'outputsample1' or
die 'Failed to open outputsample1';
while (($key, $value) = each %hash)
{

print $f $value."\n";

}
close $f;

But I am not getting the desired result.

Please help me out.

Thanks,
Mihir

yaron@kahanovitch.com

2007-08-01, 7:59 am

Hi,


Your You did not open the output file correctly.

You try to store the File handler in a variable.
If you wish to do so open the file and store reference to the file handle.

Instead of using
open my $f, '>', 'outputsample1' or
die 'Failed to open outputsample1';
while (($key, $value) = each %hash)
{

print $f $value."\n";

}
close $f;


Try
open my F, '>outputsample1' or
die 'Failed to open outputsample1';
my $f = \*F;
while (($key, $value) = each %hash)
{

print F $value."\n";

}
close $f;



Best regards,

Yaron Kahanovitch
----- Original Message -----
From: "Mihir Kamdar" <kamdarmihir06@gmail.com>
To: "beginners" <beginners@perl.org>
Sent: Wednesday, August 1, 2007 1:32:54 PM (GMT+0200) Auto-Detected
Subject: Removing duplicate records

Hi,

Need your help with the following:-

I have a csv file having many records.

I want to remove duplicate records. But the record might not be entirely
duplicate. I only have to check if the 2nd, 3rd, 7th and 8th field of a
record is same as the earlier records. If it is same, then remove the
previous or the last entry. I have written something like below to achieve
this.

#!/usr/bin/perl

open(FILE,"</home/user71/RangerDatasource/Customization/TelekomMalaysia/Scripts/Tests/cprogs/files/sample1");

my $line;
my %hash;
my @file;
while ($line=readline(FILE))
{
my @cdr=split (/,/, $line) ;
$hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}="@cdr"; #Add some more cdr key
fields if u want.
}
close FILE ;
open my $f, '>', 'outputsample1' or
die 'Failed to open outputsample1';
while (($key, $value) = each %hash)
{

print $f $value."\n";

}
close $f;

But I am not getting the desired result.

Please help me out.

Thanks,
Mihir

yaron@kahanovitch.com

2007-08-01, 7:59 am

Small correction....

Try
open my F, '>outputsample1' or
die 'Failed to open outputsample1';
my $f = \*F;
while (($key, $value) = each %hash)
{

print $f $value."\n";

}
close $f;



Yaron Kahanovitch

----- Original Message -----
From: yaron@kahanovitch.com
To: "Mihir Kamdar" <kamdarmihir06@gmail.com>
Cc: "beginners" <beginners@perl.org>
Sent: Wednesday, August 1, 2007 2:02:29 PM (GMT+0200) Auto-Detected
Subject: Re: Removing duplicate records

Hi,


Your You did not open the output file correctly.

You try to store the File handler in a variable.
If you wish to do so open the file and store reference to the file handle.

Instead of using
open my $f, '>', 'outputsample1' or
die 'Failed to open outputsample1';
while (($key, $value) = each %hash)
{

print $f $value."\n";

}
close $f;


Try
open my F, '>outputsample1' or
die 'Failed to open outputsample1';
my $f = \*F;
while (($key, $value) = each %hash)
{

print F $value."\n";

}
close $f;



Best regards,

Yaron Kahanovitch
----- Original Message -----
From: "Mihir Kamdar" <kamdarmihir06@gmail.com>
To: "beginners" <beginners@perl.org>
Sent: Wednesday, August 1, 2007 1:32:54 PM (GMT+0200) Auto-Detected
Subject: Removing duplicate records

Hi,

Need your help with the following:-

I have a csv file having many records.

I want to remove duplicate records. But the record might not be entirely
duplicate. I only have to check if the 2nd, 3rd, 7th and 8th field of a
record is same as the earlier records. If it is same, then remove the
previous or the last entry. I have written something like below to achieve
this.

#!/usr/bin/perl

open(FILE,"</home/user71/RangerDatasource/Customization/TelekomMalaysia/Scripts/Tests/cprogs/files/sample1");

my $line;
my %hash;
my @file;
while ($line=readline(FILE))
{
my @cdr=split (/,/, $line) ;
$hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}="@cdr"; #Add some more cdr key
fields if u want.
}
close FILE ;
open my $f, '>', 'outputsample1' or
die 'Failed to open outputsample1';
while (($key, $value) = each %hash)
{

print $f $value."\n";

}
close $f;

But I am not getting the desired result.

Please help me out.

Thanks,
Mihir


--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/



Rob Dixon

2007-08-01, 7:59 am

yaron@kahanovitch.com wrote:
>
> Small correction....
>
> Try
> open my F, '>outputsample1' or
> die 'Failed to open outputsample1';
> my $f = \*F;
> while (($key, $value) = each %hash)
> {
>
> print $f $value."\n";
>
> }
> close $f;


Yaron.

It would help if you tested your code before you published it. This is
nonsense. This part of Mihir's code was fine as it was, while yours doesn't
even compile.

Rob
Mihir Kamdar

2007-08-01, 7:59 am

Hi,

The result is ok, but one big problem. My input file is a csv file. Its
records are like:
2007/02/26 09:38:03,999,+60320840888,+60123773138,,
1,5,2007/02/26
09:37:58,,,,2,1,0,7,1,3,1,,0,4,+60320840
888,,BRT70607,NOKIA_160_ISDN,,KL,KL
2007/02/26 09:38:05,999,+60320848888,+60326931722,,
1,21,2007/02/26
09:37:44,,,,2,18058788,0,7,1,3,1,,0,4,+6
0320848888,,TR370001,NOKIA_160_ISDN,,KL,
KL
2007/02/26 09:37:48,999,+60320937574,+60192805588,,
1,1,2007/02/26
09:37:47,,,,2,20272626,0,7,1,3,1,,0,4,+6
0320937574,,BRT70657,NOKIA_160_ISDN,,KL,
KL
2007/02/26 09:37:52,999,+60320923505,+60360753333,,
1,20,2007/02/26
09:37:32,,,,2,22904137,0,7,1,3,1,,0,4,+6
0320923505,,RCT70749,NOKIA_160_ISDN,,KL,
KKB

But the output I get is not comma seperated but space seperated:-
2007/02/26 10:04:36 999 +60320930016 +60122966096 1 4 2007/02/26
10:04:32 2 20275468 0 7 1 3 1 0 4 +60320930016 RCT70544 NOKIA_160_ISDN
KL KL
2007/02/26 09:48:28 999 +60320870666 +60371180497 1 250 2007/02/26
09:41:38 2 20275933 0 7 1 3 1 0 4 +60320870666 RCT70803 NOKIA_160_ISDN
KL KL
2007/02/26 10:06:49 999 +60320922115 +1800879415 1 113 2007/02/26
10:04:16 5 20275921 0 7 1 3 1 0 2 +60320922115 BRT70630 NOKIA_160_ISDN
KL

This is unacceptable. What changes do I have to make to the script to get a
comma seperated output?

Thanks,
Mihir
On 8/1/07, Rob Dixon <rob.dixon@350.com> wrote:
>
> yaron@kahanovitch.com wrote:
>
> Yaron.
>
> It would help if you tested your code before you published it. This is
> nonsense. This part of Mihir's code was fine as it was, while yours
> doesn't
> even compile.
>
> Rob
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>
>


Mr. Shawn H. Corey

2007-08-01, 7:59 am

Mihir Kamdar wrote:
> But I am not getting the desired result.


What are the desired results? How is your output different from what you expect?

--
Just my 0.00000002 million dollars worth,
Shawn

"For the things we have to learn before we can do them, we learn by doing them."
Aristotle
Mr. Shawn H. Corey

2007-08-01, 7:59 am

Mihir Kamdar wrote:
> But the output I get is not comma seperated but space seperated:-


Change this line:
$hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}="@cdr"; #Add some more cdr key fields if u want.

To:
$hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}=$
line; #Add some more cdr key fields if u want.


--
Just my 0.00000002 million dollars worth,
Shawn

"For the things we have to learn before we can do them, we learn by doing them."
Aristotle
yaron@kahanovitch.com

2007-08-01, 7:59 am

Hi,

The code was tested..... and it compiles for mee


Yaron



----- Original Message -----
From: "Rob Dixon" <rob.dixon@350.com>
To: beginners@perl.org
Sent: Wednesday, August 1, 2007 2:23:44 PM (GMT+0200) Auto-Detected
Subject: Re: Removing duplicate records

yaron@kahanovitch.com wrote:
>
> Small correction....
>
> Try
> open my F, '>outputsample1' or
> die 'Failed to open outputsample1';
> my $f = \*F;
> while (($key, $value) = each %hash)
> {
>
> print $f $value."\n";
>
> }
> close $f;


Yaron.

It would help if you tested your code before you published it. This is
nonsense. This part of Mihir's code was fine as it was, while yours doesn't
even compile.

Rob

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/



Mihir Kamdar

2007-08-01, 7:59 am

Thanks Shawn,Yaron,Rob....

Thanks a lot for your prompt response...

The solution is desirable...


On 8/1/07, yaron@kahanovitch.com <yaron@kahanovitch.com> wrote:
>
> Hi,
>
> The code was tested..... and it compiles for mee
>
>
> Yaron
>
>
>
> ----- Original Message -----
> From: "Rob Dixon" <rob.dixon@350.com>
> To: beginners@perl.org
> Sent: Wednesday, August 1, 2007 2:23:44 PM (GMT+0200) Auto-Detected
> Subject: Re: Removing duplicate records
>
> yaron@kahanovitch.com wrote:
>
> Yaron.
>
> It would help if you tested your code before you published it. This is
> nonsense. This part of Mihir's code was fine as it was, while yours
> doesn't
> even compile.
>
> Rob
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>
>


Rob Dixon

2007-08-01, 7:00 pm

yaron@kahanovitch.com wrote:
> Hi,
>
> The code was tested..... and it compiles for mee



What version of Perl are you using that accepts this?

open my F, '>outputsample1' or die 'Failed to open outputsample1';

Rob
John W. Krahn

2007-08-01, 7:00 pm

Mihir Kamdar wrote:
> Hi,


Hello,

> Need your help with the following:-
>
> I have a csv file having many records.
>
> I want to remove duplicate records. But the record might not be entirely
> duplicate. I only have to check if the 2nd, 3rd, 7th and 8th field of a
> record is same as the earlier records. If it is same, then remove the
> previous or the last entry. I have written something like below to achieve
> this.
>
> #!/usr/bin/perl
>
> open(FILE,"</home/user71/RangerDatasource/Customization/TelekomMalaysia/Scripts/Tests/cprogs/files/sample1");
>
> my $line;
> my %hash;
> my @file;
> while ($line=readline(FILE))
> {
> my @cdr=split (/,/, $line) ;
> $hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}="@cdr"; #Add some more cdr key
> fields if u want.
> }
> close FILE ;
> open my $f, '>', 'outputsample1' or
> die 'Failed to open outputsample1';
> while (($key, $value) = each %hash)
> {
>
> print $f $value."\n";
>
> }
> close $f;
>
> But I am not getting the desired result.


You don't need two loops for that, just one:

#!/usr/bin/perl

my $in_file =
'/home/user71/RangerDatasource/Customization/TelekomMalaysia/Scripts/Tests/cprogs/files/sample1';

open my $in, '<', $in_file or die "Cannot open '$in_file' $!";
open my $out, '>', 'outputsample1' or die "Failed to open outputsample1 $!";

my %hash;

while ( <$in> ) {
my $key = join ',', ( split /,/ )[ 2, 3, 6, 7 ];
print $out $_ unless $hash{ $key }++;
}

close $out;
close $in;

__END__



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Mr. Shawn H. Corey

2007-08-01, 7:00 pm

John W. Krahn wrote:
> Mihir Kamdar wrote:
>
> You don't need two loops for that, just one:


Two loops are required since the specification is to print the last entry (that's a duplicate) in the file, not the first.

--
Just my 0.00000002 million dollars worth,
Shawn

"For the things we have to learn before we can do them, we learn by doing them."
Aristotle
John W. Krahn

2007-08-01, 7:00 pm

Mr. Shawn H. Corey wrote:
> John W. Krahn wrote:
>
> Two loops are required since the specification is to print the last
> entry (that's a duplicate) in the file, not the first.


Are you sure that "remove the previous or the last entry" translates into
"print the last entry (that's a duplicate)"? Did you check with the OP on that?



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Chas Owens

2007-08-02, 7:00 pm

On 8/1/07, yaron@kahanovitch.com <yaron@kahanovitch.com> wrote:
snip
> The code was tested..... and it compiles for mee

snip

I serious doubt that the code you sent in the email was the code you
tested then. You cannot, in any version of Perl that I know, declare
an old style file handle with the my subroutine. In fact, that saying
open my F, '>outputsample1' or
is equivalent to saying
open F->my('>outputsample1') or
Which is why you get the error
No such class F at t.pl line 3, near "open my F"
syntax error at t.pl line 3, near "my F,"
Execution of t.pl aborted due to compilation errors.
when you try to run it.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com