For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > January 2006 > Merge multiple rows and remove duplicates --based on the first value









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Merge multiple rows and remove duplicates --based on the first value
Susan

2006-01-26, 7:00 pm

There must be a simple solution, but I am struck with this.

I have a file --

Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah
..
..
...

How do I merge these into one based on the first column?

Based the name "Thomas" I would like to merge the rest of the 3 columns
and get

Thomas Jacob Emily Madison Ethan Samantha
Corner Joshua Emma Isabella Christopher Daniel
Matthew Hannah
Williams Mathew John Lina

Can someone help?

Thanks.

Xicheng

2006-01-26, 7:00 pm

Susan wrote:
> There must be a simple solution, but I am struck with this.
> How do I merge these into one based on the first column?
> Based the name "Thomas" I would like to merge the rest of the 3 columns
> and get

use hash..
==================
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my %h=();
while(<DATA> ) {
chomp;
my @tmp=split' ',$_,2;
$h{$tmp[0]} .= "$tmp[1] ";
}
print Dumper \%h;
__DATA__
Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah
=========
Xicheng

> Thomas Jacob Emily Madison Ethan Samantha
> Corner Joshua Emma Isabella Christopher Daniel
> Matthew Hannah
> Williams Mathew John Lina
>
> Can someone help?
>
> Thanks.


axel@white-eagle.invalid.uk

2006-01-26, 7:00 pm

Susan <JSubadhra@gmail.com> wrote:
> There must be a simple solution, but I am struck with this.


> I have a file --


> Thomas Jacob Emily Madison
> Corner Joshua Emma Isabella
> Thomas Ethan Emily Samantha
> Williams Mathew John Lina
> Corner Christopher Emma Daniel
> Corner Joshua Matthew Hannah
> .
> .
> ..


> How do I merge these into one based on the first column?


> Based the name "Thomas" I would like to merge the rest of the 3 columns
> and get


> Thomas Jacob Emily Madison Ethan Samantha
> Corner Joshua Emma Isabella Christopher Daniel
> Matthew Hannah
> Williams Mathew John Lina



One way would be to create a hash whose keys are the entries in the
first column. The values of each entry in this hash would be
a reference to another hash whose keys are the entries in the
other columns (the values being immaterial as long as they are
defined, e.g. just use '1').

Axel

usenet@DavidFilmer.com

2006-01-26, 9:56 pm

Susan wrote:
> How do I merge these into one based on the first column?


This solution is simple but not terribly efficient (I wouldn't use it
on a huge input list). Exception handling is left as an exercise to the
reader:

#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;
use List::MoreUtils qw{uniq};

my %name;
while (<DATA> ) {
my ($col1_name, @other_names) = split;
@{$name{$col1_name}} = uniq( @{$name{$col1_name}},
@other_names );
}
print Dumper \%name;

__DATA__
Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah



--
http://DavidFilmer.com

Xicheng

2006-01-26, 9:56 pm

Xicheng wrote:
> Susan wrote:
> use hash..
> ==================
> #!/usr/bin/perl -w
> use strict;
> use Data::Dumper;
> my %h=();
> while(<DATA> ) {
> chomp;
> my @tmp=split' ',$_,2;
> $h{$tmp[0]} .= "$tmp[1] ";
> }

#add the following lines to regulate the output.
for my $k(keys %h){
$h{$k} =~ s/\s+/ /g;
print "$k => $h{$k}\n";
}
[color=darkred]
> print Dumper \%h;
> __DATA__
> Thomas Jacob Emily Madison
> Corner Joshua Emma Isabella
> Thomas Ethan Emily Samantha
> Williams Mathew John Lina
> Corner Christopher Emma Daniel
> Corner Joshua Matthew Hannah
> =========
> Xicheng
>

Jim Gibson

2006-01-27, 6:59 pm

In article <1138322821.141463.215580@g44g2000cwa.googlegroups.com>,
Xicheng <xicheng@gmail.com> wrote:

> Susan wrote:
> use hash..
> ==================
> #!/usr/bin/perl -w
> use strict;
> use Data::Dumper;
> my %h=();
> while(<DATA> ) {
> chomp;
> my @tmp=split' ',$_,2;
> $h{$tmp[0]} .= "$tmp[1] ";
> }
> print Dumper \%h;
> __DATA__
> Thomas Jacob Emily Madison
> Corner Joshua Emma Isabella
> Thomas Ethan Emily Samantha
> Williams Mathew John Lina
> Corner Christopher Emma Daniel
> Corner Joshua Matthew Hannah


The OP doesn't want duplicate entries in the output. Your program does
not fulfill that requirement. For example, it includes 'Emily' twice in
the entry for 'Thomas'.

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Xicheng

2006-01-27, 6:59 pm

Jim Gibson wrote:
> In article <1138322821.141463.215580@g44g2000cwa.googlegroups.com>,
> Xicheng <xicheng@gmail.com> wrote:
>
> The OP doesn't want duplicate entries in the output. Your program does
> not fulfill that requirement. For example, it includes 'Emily' twice in
> the entry for 'Thomas'.

yup, use hash again, you can fix it in a minute:
#if just print:
for my $k(keys %h){
my %t=();
print "$k => @{[grep{!$t{$_}++}split' ',$h{$k}]}\n";
}

#or put the list to a scalar:
for my $k(keys %h){
my %t=();
my $z=join' ',grep{!$t{$_}++}split' '=>$h{$k};
print "$k=>$z\n";
}

Xicheng

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com