| Author |
Merge multiple rows and remove duplicates --based on the first value
|
|
|
| There must be a simple solution, but I am struck with this.
I have a file --
Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah
..
..
...
How do I merge these into one based on the first column?
Based the name "Thomas" I would like to merge the rest of the 3 columns
and get
Thomas Jacob Emily Madison Ethan Samantha
Corner Joshua Emma Isabella Christopher Daniel
Matthew Hannah
Williams Mathew John Lina
Can someone help?
Thanks.
| |
| Xicheng 2006-01-26, 7:00 pm |
| Susan wrote:
> There must be a simple solution, but I am struck with this.
> How do I merge these into one based on the first column?
> Based the name "Thomas" I would like to merge the rest of the 3 columns
> and get
use hash..
==================
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my %h=();
while(<DATA> ) {
chomp;
my @tmp=split' ',$_,2;
$h{$tmp[0]} .= "$tmp[1] ";
}
print Dumper \%h;
__DATA__
Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah
=========
Xicheng
> Thomas Jacob Emily Madison Ethan Samantha
> Corner Joshua Emma Isabella Christopher Daniel
> Matthew Hannah
> Williams Mathew John Lina
>
> Can someone help?
>
> Thanks.
| |
| axel@white-eagle.invalid.uk 2006-01-26, 7:00 pm |
| Susan <JSubadhra@gmail.com> wrote:
> There must be a simple solution, but I am struck with this.
> I have a file --
> Thomas Jacob Emily Madison
> Corner Joshua Emma Isabella
> Thomas Ethan Emily Samantha
> Williams Mathew John Lina
> Corner Christopher Emma Daniel
> Corner Joshua Matthew Hannah
> .
> .
> ..
> How do I merge these into one based on the first column?
> Based the name "Thomas" I would like to merge the rest of the 3 columns
> and get
> Thomas Jacob Emily Madison Ethan Samantha
> Corner Joshua Emma Isabella Christopher Daniel
> Matthew Hannah
> Williams Mathew John Lina
One way would be to create a hash whose keys are the entries in the
first column. The values of each entry in this hash would be
a reference to another hash whose keys are the entries in the
other columns (the values being immaterial as long as they are
defined, e.g. just use '1').
Axel
| |
| usenet@DavidFilmer.com 2006-01-26, 9:56 pm |
| Susan wrote:
> How do I merge these into one based on the first column?
This solution is simple but not terribly efficient (I wouldn't use it
on a huge input list). Exception handling is left as an exercise to the
reader:
#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;
use List::MoreUtils qw{uniq};
my %name;
while (<DATA> ) {
my ($col1_name, @other_names) = split;
@{$name{$col1_name}} = uniq( @{$name{$col1_name}},
@other_names );
}
print Dumper \%name;
__DATA__
Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah
--
http://DavidFilmer.com
| |
| Xicheng 2006-01-26, 9:56 pm |
| Xicheng wrote:
> Susan wrote:
> use hash..
> ==================
> #!/usr/bin/perl -w
> use strict;
> use Data::Dumper;
> my %h=();
> while(<DATA> ) {
> chomp;
> my @tmp=split' ',$_,2;
> $h{$tmp[0]} .= "$tmp[1] ";
> }
#add the following lines to regulate the output.
for my $k(keys %h){
$h{$k} =~ s/\s+/ /g;
print "$k => $h{$k}\n";
}
[color=darkred]
> print Dumper \%h;
> __DATA__
> Thomas Jacob Emily Madison
> Corner Joshua Emma Isabella
> Thomas Ethan Emily Samantha
> Williams Mathew John Lina
> Corner Christopher Emma Daniel
> Corner Joshua Matthew Hannah
> =========
> Xicheng
>
| |
| Jim Gibson 2006-01-27, 6:59 pm |
| In article <1138322821.141463.215580@g44g2000cwa.googlegroups.com>,
Xicheng <xicheng@gmail.com> wrote:
> Susan wrote:
> use hash..
> ==================
> #!/usr/bin/perl -w
> use strict;
> use Data::Dumper;
> my %h=();
> while(<DATA> ) {
> chomp;
> my @tmp=split' ',$_,2;
> $h{$tmp[0]} .= "$tmp[1] ";
> }
> print Dumper \%h;
> __DATA__
> Thomas Jacob Emily Madison
> Corner Joshua Emma Isabella
> Thomas Ethan Emily Samantha
> Williams Mathew John Lina
> Corner Christopher Emma Daniel
> Corner Joshua Matthew Hannah
The OP doesn't want duplicate entries in the output. Your program does
not fulfill that requirement. For example, it includes 'Emily' twice in
the entry for 'Thomas'.
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
| |
| Xicheng 2006-01-27, 6:59 pm |
| Jim Gibson wrote:
> In article <1138322821.141463.215580@g44g2000cwa.googlegroups.com>,
> Xicheng <xicheng@gmail.com> wrote:
>
> The OP doesn't want duplicate entries in the output. Your program does
> not fulfill that requirement. For example, it includes 'Emily' twice in
> the entry for 'Thomas'.
yup, use hash again, you can fix it in a minute:
#if just print:
for my $k(keys %h){
my %t=();
print "$k => @{[grep{!$t{$_}++}split' ',$h{$k}]}\n";
}
#or put the list to a scalar:
for my $k(keys %h){
my %t=();
my $z=join' ',grep{!$t{$_}++}split' '=>$h{$k};
print "$k=>$z\n";
}
Xicheng
|
|
|
|