Home > Archive > PERL Beginners > August 2007 > Removing duplicate records
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Removing duplicate records
|
|
| Mihir Kamdar 2007-08-01, 7:59 am |
| Hi,
Need your help with the following:-
I have a csv file having many records.
I want to remove duplicate records. But the record might not be entirely
duplicate. I only have to check if the 2nd, 3rd, 7th and 8th field of a
record is same as the earlier records. If it is same, then remove the
previous or the last entry. I have written something like below to achieve
this.
#!/usr/bin/perl
open(FILE,"</home/user71/RangerDatasource/Customization/TelekomMalaysia/Scripts/Tests/cprogs/files/sample1");
my $line;
my %hash;
my @file;
while ($line=readline(FILE))
{
my @cdr=split (/,/, $line) ;
$hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}="@cdr"; #Add some more cdr key
fields if u want.
}
close FILE ;
open my $f, '>', 'outputsample1' or
die 'Failed to open outputsample1';
while (($key, $value) = each %hash)
{
print $f $value."\n";
}
close $f;
But I am not getting the desired result.
Please help me out.
Thanks,
Mihir
| |
| yaron@kahanovitch.com 2007-08-01, 7:59 am |
| Hi,
Your You did not open the output file correctly.
You try to store the File handler in a variable.
If you wish to do so open the file and store reference to the file handle.
Instead of using
open my $f, '>', 'outputsample1' or
die 'Failed to open outputsample1';
while (($key, $value) = each %hash)
{
print $f $value."\n";
}
close $f;
Try
open my F, '>outputsample1' or
die 'Failed to open outputsample1';
my $f = \*F;
while (($key, $value) = each %hash)
{
print F $value."\n";
}
close $f;
Best regards,
Yaron Kahanovitch
----- Original Message -----
From: "Mihir Kamdar" <kamdarmihir06@gmail.com>
To: "beginners" <beginners@perl.org>
Sent: Wednesday, August 1, 2007 1:32:54 PM (GMT+0200) Auto-Detected
Subject: Removing duplicate records
Hi,
Need your help with the following:-
I have a csv file having many records.
I want to remove duplicate records. But the record might not be entirely
duplicate. I only have to check if the 2nd, 3rd, 7th and 8th field of a
record is same as the earlier records. If it is same, then remove the
previous or the last entry. I have written something like below to achieve
this.
#!/usr/bin/perl
open(FILE,"</home/user71/RangerDatasource/Customization/TelekomMalaysia/Scripts/Tests/cprogs/files/sample1");
my $line;
my %hash;
my @file;
while ($line=readline(FILE))
{
my @cdr=split (/,/, $line) ;
$hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}="@cdr"; #Add some more cdr key
fields if u want.
}
close FILE ;
open my $f, '>', 'outputsample1' or
die 'Failed to open outputsample1';
while (($key, $value) = each %hash)
{
print $f $value."\n";
}
close $f;
But I am not getting the desired result.
Please help me out.
Thanks,
Mihir
| |
| yaron@kahanovitch.com 2007-08-01, 7:59 am |
| Small correction....
Try
open my F, '>outputsample1' or
die 'Failed to open outputsample1';
my $f = \*F;
while (($key, $value) = each %hash)
{
print $f $value."\n";
}
close $f;
Yaron Kahanovitch
----- Original Message -----
From: yaron@kahanovitch.com
To: "Mihir Kamdar" <kamdarmihir06@gmail.com>
Cc: "beginners" <beginners@perl.org>
Sent: Wednesday, August 1, 2007 2:02:29 PM (GMT+0200) Auto-Detected
Subject: Re: Removing duplicate records
Hi,
Your You did not open the output file correctly.
You try to store the File handler in a variable.
If you wish to do so open the file and store reference to the file handle.
Instead of using
open my $f, '>', 'outputsample1' or
die 'Failed to open outputsample1';
while (($key, $value) = each %hash)
{
print $f $value."\n";
}
close $f;
Try
open my F, '>outputsample1' or
die 'Failed to open outputsample1';
my $f = \*F;
while (($key, $value) = each %hash)
{
print F $value."\n";
}
close $f;
Best regards,
Yaron Kahanovitch
----- Original Message -----
From: "Mihir Kamdar" <kamdarmihir06@gmail.com>
To: "beginners" <beginners@perl.org>
Sent: Wednesday, August 1, 2007 1:32:54 PM (GMT+0200) Auto-Detected
Subject: Removing duplicate records
Hi,
Need your help with the following:-
I have a csv file having many records.
I want to remove duplicate records. But the record might not be entirely
duplicate. I only have to check if the 2nd, 3rd, 7th and 8th field of a
record is same as the earlier records. If it is same, then remove the
previous or the last entry. I have written something like below to achieve
this.
#!/usr/bin/perl
open(FILE,"</home/user71/RangerDatasource/Customization/TelekomMalaysia/Scripts/Tests/cprogs/files/sample1");
my $line;
my %hash;
my @file;
while ($line=readline(FILE))
{
my @cdr=split (/,/, $line) ;
$hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}="@cdr"; #Add some more cdr key
fields if u want.
}
close FILE ;
open my $f, '>', 'outputsample1' or
die 'Failed to open outputsample1';
while (($key, $value) = each %hash)
{
print $f $value."\n";
}
close $f;
But I am not getting the desired result.
Please help me out.
Thanks,
Mihir
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
| |
| Rob Dixon 2007-08-01, 7:59 am |
| yaron@kahanovitch.com wrote:
>
> Small correction....
>
> Try
> open my F, '>outputsample1' or
> die 'Failed to open outputsample1';
> my $f = \*F;
> while (($key, $value) = each %hash)
> {
>
> print $f $value."\n";
>
> }
> close $f;
Yaron.
It would help if you tested your code before you published it. This is
nonsense. This part of Mihir's code was fine as it was, while yours doesn't
even compile.
Rob
| |
| Mihir Kamdar 2007-08-01, 7:59 am |
| Hi,
The result is ok, but one big problem. My input file is a csv file. Its
records are like:
2007/02/26 09:38:03,999,+60320840888,+60123773138,,
1,5,2007/02/26
09:37:58,,,,2,1,0,7,1,3,1,,0,4,+60320840
888,,BRT70607,NOKIA_160_ISDN,,KL,KL
2007/02/26 09:38:05,999,+60320848888,+60326931722,,
1,21,2007/02/26
09:37:44,,,,2,18058788,0,7,1,3,1,,0,4,+6
0320848888,,TR370001,NOKIA_160_ISDN,,KL,
KL
2007/02/26 09:37:48,999,+60320937574,+60192805588,,
1,1,2007/02/26
09:37:47,,,,2,20272626,0,7,1,3,1,,0,4,+6
0320937574,,BRT70657,NOKIA_160_ISDN,,KL,
KL
2007/02/26 09:37:52,999,+60320923505,+60360753333,,
1,20,2007/02/26
09:37:32,,,,2,22904137,0,7,1,3,1,,0,4,+6
0320923505,,RCT70749,NOKIA_160_ISDN,,KL,
KKB
But the output I get is not comma seperated but space seperated:-
2007/02/26 10:04:36 999 +60320930016 +60122966096 1 4 2007/02/26
10:04:32 2 20275468 0 7 1 3 1 0 4 +60320930016 RCT70544 NOKIA_160_ISDN
KL KL
2007/02/26 09:48:28 999 +60320870666 +60371180497 1 250 2007/02/26
09:41:38 2 20275933 0 7 1 3 1 0 4 +60320870666 RCT70803 NOKIA_160_ISDN
KL KL
2007/02/26 10:06:49 999 +60320922115 +1800879415 1 113 2007/02/26
10:04:16 5 20275921 0 7 1 3 1 0 2 +60320922115 BRT70630 NOKIA_160_ISDN
KL
This is unacceptable. What changes do I have to make to the script to get a
comma seperated output?
Thanks,
Mihir
On 8/1/07, Rob Dixon <rob.dixon@350.com> wrote:
>
> yaron@kahanovitch.com wrote:
>
> Yaron.
>
> It would help if you tested your code before you published it. This is
> nonsense. This part of Mihir's code was fine as it was, while yours
> doesn't
> even compile.
>
> Rob
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>
>
| |
| Mr. Shawn H. Corey 2007-08-01, 7:59 am |
| Mihir Kamdar wrote:
> But I am not getting the desired result.
What are the desired results? How is your output different from what you expect?
--
Just my 0.00000002 million dollars worth,
Shawn
"For the things we have to learn before we can do them, we learn by doing them."
Aristotle
| |
| Mr. Shawn H. Corey 2007-08-01, 7:59 am |
| Mihir Kamdar wrote:
> But the output I get is not comma seperated but space seperated:-
Change this line:
$hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}="@cdr"; #Add some more cdr key fields if u want.
To:
$hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}=$
line; #Add some more cdr key fields if u want.
--
Just my 0.00000002 million dollars worth,
Shawn
"For the things we have to learn before we can do them, we learn by doing them."
Aristotle
| |
| yaron@kahanovitch.com 2007-08-01, 7:59 am |
| Hi,
The code was tested..... and it compiles for mee
Yaron
----- Original Message -----
From: "Rob Dixon" <rob.dixon@350.com>
To: beginners@perl.org
Sent: Wednesday, August 1, 2007 2:23:44 PM (GMT+0200) Auto-Detected
Subject: Re: Removing duplicate records
yaron@kahanovitch.com wrote:
>
> Small correction....
>
> Try
> open my F, '>outputsample1' or
> die 'Failed to open outputsample1';
> my $f = \*F;
> while (($key, $value) = each %hash)
> {
>
> print $f $value."\n";
>
> }
> close $f;
Yaron.
It would help if you tested your code before you published it. This is
nonsense. This part of Mihir's code was fine as it was, while yours doesn't
even compile.
Rob
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
| |
| Mihir Kamdar 2007-08-01, 7:59 am |
| Thanks Shawn,Yaron,Rob....
Thanks a lot for your prompt response...
The solution is desirable...
On 8/1/07, yaron@kahanovitch.com <yaron@kahanovitch.com> wrote:
>
> Hi,
>
> The code was tested..... and it compiles for mee
>
>
> Yaron
>
>
>
> ----- Original Message -----
> From: "Rob Dixon" <rob.dixon@350.com>
> To: beginners@perl.org
> Sent: Wednesday, August 1, 2007 2:23:44 PM (GMT+0200) Auto-Detected
> Subject: Re: Removing duplicate records
>
> yaron@kahanovitch.com wrote:
>
> Yaron.
>
> It would help if you tested your code before you published it. This is
> nonsense. This part of Mihir's code was fine as it was, while yours
> doesn't
> even compile.
>
> Rob
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>
>
| |
| Rob Dixon 2007-08-01, 7:00 pm |
| yaron@kahanovitch.com wrote:
> Hi,
>
> The code was tested..... and it compiles for mee
What version of Perl are you using that accepts this?
open my F, '>outputsample1' or die 'Failed to open outputsample1';
Rob
| |
| John W. Krahn 2007-08-01, 7:00 pm |
| Mihir Kamdar wrote:
> Hi,
Hello,
> Need your help with the following:-
>
> I have a csv file having many records.
>
> I want to remove duplicate records. But the record might not be entirely
> duplicate. I only have to check if the 2nd, 3rd, 7th and 8th field of a
> record is same as the earlier records. If it is same, then remove the
> previous or the last entry. I have written something like below to achieve
> this.
>
> #!/usr/bin/perl
>
> open(FILE,"</home/user71/RangerDatasource/Customization/TelekomMalaysia/Scripts/Tests/cprogs/files/sample1");
>
> my $line;
> my %hash;
> my @file;
> while ($line=readline(FILE))
> {
> my @cdr=split (/,/, $line) ;
> $hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}="@cdr"; #Add some more cdr key
> fields if u want.
> }
> close FILE ;
> open my $f, '>', 'outputsample1' or
> die 'Failed to open outputsample1';
> while (($key, $value) = each %hash)
> {
>
> print $f $value."\n";
>
> }
> close $f;
>
> But I am not getting the desired result.
You don't need two loops for that, just one:
#!/usr/bin/perl
my $in_file =
'/home/user71/RangerDatasource/Customization/TelekomMalaysia/Scripts/Tests/cprogs/files/sample1';
open my $in, '<', $in_file or die "Cannot open '$in_file' $!";
open my $out, '>', 'outputsample1' or die "Failed to open outputsample1 $!";
my %hash;
while ( <$in> ) {
my $key = join ',', ( split /,/ )[ 2, 3, 6, 7 ];
print $out $_ unless $hash{ $key }++;
}
close $out;
close $in;
__END__
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
| |
| Mr. Shawn H. Corey 2007-08-01, 7:00 pm |
| John W. Krahn wrote:
> Mihir Kamdar wrote:
>
> You don't need two loops for that, just one:
Two loops are required since the specification is to print the last entry (that's a duplicate) in the file, not the first.
--
Just my 0.00000002 million dollars worth,
Shawn
"For the things we have to learn before we can do them, we learn by doing them."
Aristotle
| |
| John W. Krahn 2007-08-01, 7:00 pm |
| Mr. Shawn H. Corey wrote:
> John W. Krahn wrote:
>
> Two loops are required since the specification is to print the last
> entry (that's a duplicate) in the file, not the first.
Are you sure that "remove the previous or the last entry" translates into
"print the last entry (that's a duplicate)"? Did you check with the OP on that?
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
| |
| Chas Owens 2007-08-02, 7:00 pm |
| On 8/1/07, yaron@kahanovitch.com <yaron@kahanovitch.com> wrote:
snip
> The code was tested..... and it compiles for mee
snip
I serious doubt that the code you sent in the email was the code you
tested then. You cannot, in any version of Perl that I know, declare
an old style file handle with the my subroutine. In fact, that saying
open my F, '>outputsample1' or
is equivalent to saying
open F->my('>outputsample1') or
Which is why you get the error
No such class F at t.pl line 3, near "open my F"
syntax error at t.pl line 3, near "my F,"
Execution of t.pl aborted due to compilation errors.
when you try to run it.
|
|
|
|
|