Home > Archive > PERL Beginners > January 2006 > Transform column into row
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Transform column into row
|
|
| Andrej Kastrin 2006-01-18, 7:56 am |
| Dear Perl users,
what's the best way to transform column table in row format. I know how
to split each line according to delimiter and than put it separately
into array, but I have more complicated problem (with multiple equal
records in the first column)
id001 text1
id001 text2
id001 text3
id002 text23
id002 text555
id003 text666
and want something like:
id001 text1 text2 text3
id002 text23 text 555
id003 text666
Thank's for any suggestions.
Cheers, Andrej
| |
| Paul Lalli 2006-01-18, 7:56 am |
| Andrej Kastrin wrote:
> Dear Perl users,
>
> what's the best way to transform column table in row format. I know how
> to split each line according to delimiter and than put it separately
> into array, but I have more complicated problem (with multiple equal
> records in the first column)
>
> id001 text1
> id001 text2
> id001 text3
> id002 text23
> id002 text555
> id003 text666
>
> and want something like:
>
> id001 text1 text2 text3
> id002 text23 text 555
> id003 text666
>
> Thank's for any suggestions.
Use a hash of arrays.
#!/usr/bin/perl
use strict;
use warnings;
my %texts_for;
while (<DATA> ) {
my ($id, $text) = split;
push @{$texts_for{$id}}, $text;
}
for my $id (sort keys %texts_for) {
print "$id @{$texts_for{$id}}\n";
}
__DATA__
id001 text1
id001 text2
id001 text3
id002 text23
id002 text555
id003 text666
Read about hashes of arrays, and other multi-dimensional structures,
in:
perldoc perlreftut
perldoc perllol
perldoc perldsc
perldoc perlref
Paul Lalli
| |
| John Doe 2006-01-18, 7:56 am |
| Andrej Kastrin am Mittwoch, 18. Januar 2006 10.49:
> Dear Perl users,
>
> what's the best way to transform column table in row format. I know how
> to split each line according to delimiter and than put it separately
> into array, but I have more complicated problem (with multiple equal
> records in the first column)
>
> id001 text1
> id001 text2
> id001 text3
> id002 text23
> id002 text555
> id003 text666
>
> and want something like:
>
> id001 text1 text2 text3
> id002 text23 text 555
> id003 text666
>
> Thank's for any suggestions.
My suggestion is that you show us what you tried so far, since this list is
not a script service.
joe
| |
| Paul Johnson 2006-01-18, 7:56 am |
| On Wed, Jan 18, 2006 at 01:34:01PM +0100, John Doe wrote:
> Andrej Kastrin am Mittwoch, 18. Januar 2006 10.49:
>
> My suggestion is that you show us what you tried so far, since this list is
> not a script service.
But he didn't ask for a script, he asked for suggestions on the best way
to do something.
[As a side note, and not directed specifically to John Doe, this list
seems to be becoming a little less friendly to beginners than it used to
be. I think that is a shame. Of course, no one wants to be taken
advantage of, but a little slap of the wrist whilst providing some clues
and pointers seems to be a reasonable compromise.]
In most programming problems, I find that if you can design the correct
data structures the code pretty much writes itself. In this case, the
correct data structure seems to be a hash of arrays. See perldoc
perldsc. With this data structure, and the knowledge you already have,
I would expect a solution to present itself. If you still have
problems, do as John Doe suggests, and come back to us with the code you
have already tried.
--
Paul Johnson - paul@pjcj.net
http://www.pjcj.net
| |
| John Doe 2006-01-18, 6:58 pm |
| Paul Johnson am Mittwoch, 18. Januar 2006 13.53:
> On Wed, Jan 18, 2006 at 01:34:01PM +0100, John Doe wrote:
>
> But he didn't ask for a script, he asked for suggestions on the best way
> to do something.
You are right. I apologize for that as well as for my tone.
Partly it is due to my bad english, I miss the subtilities.
> [As a side note, and not directed specifically to John Doe, this list
> seems to be becoming a little less friendly to beginners than it used to
> be. I think that is a shame.
There are some people on the list, especially J.W. Krahn (and others), who are
always very polite with any kind of questions :-)
[...]
> In most programming problems, I find that if you can design the correct
> data structures the code pretty much writes itself. In this case, the
> correct data structure seems to be a hash of arrays. See perldoc
> perldsc. With this data structure, and the knowledge you already have,
> I would expect a solution to present itself. If you still have
> problems, do as John Doe suggests, and come back to us with the code you
> have already tried.
Another way (I don's say a better :-) ) could be not using a data structure,
but doing the transformation on the fly:
- read a line of the input data
- record the first or a new field name while scanning (id001 etc.)
- if the field name is new / has changed, output it to a new file
- if it is not new / has not changed, append the value(s)
- repeat these steps
joe
| |
| Chas Owens 2006-01-18, 6:58 pm |
| snip
t is[color=darkred]
>
> But he didn't ask for a script, he asked for suggestions on the best way
> to do something.
>
> [As a side note, and not directed specifically to John Doe, this list
> seems to be becoming a little less friendly to beginners than it used to
> be. I think that is a shame. Of course, no one wants to be taken
> advantage of, but a little slap of the wrist whilst providing some clues
> and pointers seems to be a reasonable compromise.]
snip
I agree with Paul. I have not been on this list since 2002 and the
level of hostility that has grown in the last three years is
frightening. We are often the first exposure people have to the Perl
community and we want them to come away with a good feeling about Perl
and the community around it. We will have plenty of time later to rip
them new ones.
| |
| JupiterHost.Net 2006-01-18, 6:58 pm |
|
Andrej Kastrin wrote:
> Dear Perl users,
Howdy
> what's the best way to transform column table in row format. I know how
> to split each line according to delimiter and than put it separately
> into array, but I have more complicated problem (with multiple equal
> records in the first column)
>
> id001 text1
> id001 text2
> id001 text3
> id002 text23
> id002 text555
> id003 text666
>
> and want something like:
>
> id001 text1 text2 text3
> id002 text23 text 555
> id003 text666
Use the first column as a key to a hash and make the value an array ref:
my %end_results;
for my $line_arrayref(@records) {
push @{ $end_results{$line_arrayref->[0]} }, $line_arrayref->[1]
}
now %end_results looks like:
'id001' => ['text1', 'text2', 'text3'],
'id002' => ['text23', 'text', '555'],
'id003' => ['text666'],
You could do the same thing with map also...
HTH :)
| |
| Brian Volk 2006-01-18, 6:58 pm |
|
[color=darkred]
> -----Original Message-----
> From: John Doe [mailto:security.department@tele2.ch]
> Sent: Wednesday, January 18, 2006 7:34 AM
> To: beginners@perl.org
> Subject: Re: Transform column into row
>
> Andrej Kastrin am Mittwoch, 18. Januar 2006 10.49:
If you own a copy of Perl Cookbook, look at chapter 5.7 "Hashes with
Multiple Values Per Key"
"c:/brian/text_files/cook_5_7.txt" =
id001 text1
id001 text2
id001 text3
id002 text23
id002 text555
id003 text666
------- hash_mult_values.pl -----
#!/usr/bin/perl
use strict;
use warnings;
my %texts = ();
my $file = "c:/brian/text_files/cook_5_7.txt";
open (FILE, $file) or die "Can't open $file: $!";
while (<FILE> ) {
my ($id, $text) = split;
push( @{$texts{$id}}, $text );
}
foreach my $id (sort keys %texts) {
print "$id: @{$texts{$id}}\n";
}
----- end hash_mult_values.pl ----
Gives the result:
C:\brian\perl>perl hash_mult_values.pl
id001: text1 text2 text3
id002: text23 text555
id003: text666
----
Hope this helps,
Brian Volk
| |
| Dr.Ruud 2006-01-18, 6:58 pm |
| Andrej Kastrin schreef:
> id001 text1
> id001 text2
> id001 text3
> id002 text23
> id002 text555
> id003 text666
>
> and want something like:
>
> id001 text1 text2 text3
> id002 text23 text 555
> id003 text666
$ perl -MData::Dumper -aF -ne \
'push @{$HoA{$F[0]}}, $F[1];} {END{print Dumper %HoA}' \
infile
or:
#!/usr/bin/perl
use strict;
use warnings;
my %HoA;
while (<> ) {
my ($group, $member) = split;
push @{$HoA{$group}}, $member;
}
{ local ($,, $\) = ("\t", "\n");
for my $group (keys %HoA) {
print "$group:", @{$HoA{$group}};
}
}
See `perldoc perldsc`, look for "HASHES OF ARRAYS".
--
Grtz, Ruud
| |
| John Doe 2006-01-18, 6:58 pm |
| Andrej Kastrin am Mittwoch, 18. Januar 2006 10.49:
> Dear Perl users,
>
> what's the best way to transform column table in row format. I know how
> to split each line according to delimiter and than put it separately
> into array, but I have more complicated problem (with multiple equal
> records in the first column)
>
> id001 text1
> id001 text2
> id001 text3
> id002 text23
> id002 text555
> id003 text666
>
> and want something like:
>
> id001 text1 text2 text3
> id002 text23 text 555
> id003 text666
Ok, in addition to my apologies, I present here the script I described in my
last post.
In constrast to the other presented solutions using a a hash slurping all data
from the intput file, the advantage of my solution is that it is
*capable to handle 300GB input files* even with 128MB RAM.
===
#!/usr/bin/perl
use strict;
use warnings;
# open input file here if source is not <DATA>
open (my $outf, '>', './andrey_kastrin.out.txt')
or die "Can't open file: $!";
my $old_name='';
while (<DATA> ) {
chomp;
my ($name, $value)=$_=~/^([\w]+)\s+(.*)$/;
if ($old_name ne $name) {
print $outf "\n" if $old_name;
print $outf $_;
}
else {
print $outf ' ',$value;
}
$old_name=$name;
}
close $outf or die "Can't close file: $!";
# close input file here if source is not <DATA>
__DATA__
id001 text1
id001 text2
id001 text3
id002 text23
id002 text555
id003 text666
===
It outputs:
id001 text1 text2 text3
id002 text23 text555
id003 text666
I'm a nice guy, see?
joe
| |
| Paul Lalli 2006-01-18, 6:58 pm |
| John Doe wrote:
> Andrej Kastrin am Mittwoch, 18. Januar 2006 10.49:
>
> Ok, in addition to my apologies, I present here the script I described in my
> last post.
>
> In constrast to the other presented solutions using a a hash slurping all data
> from the intput file, the advantage of my solution is that it is
> *capable to handle 300GB input files* even with 128MB RAM.
In contrast to your solution, the other solutions posted do not make
any assumptions that the data will always be presented "in order". The
OP never specified this. Just because the one sample data point we
have happens to be listed like that, there is no cause to assume all
data will be as well.
Your solution would fail miserably for, for example:
id001 text1
id002 text555
id001 text2
id003 text666
id001 text3
id002 text23
An inefficiant solution that meets the specifications is always
preferable to an efficient solution that doesn't work.
Now, maybe the OP's data actually *will* conform to your assumptions.
My point is that we simply have no way of knowing that it will.
| |
| Andrej Kastrin 2006-01-19, 3:57 am |
| John Doe wrote:
>Andrej Kastrin am Mittwoch, 18. Januar 2006 10.49:
>
>
>
>Ok, in addition to my apologies, I present here the script I described in my
>last post.
>
>In constrast to the other presented solutions using a a hash slurping all data
>from the intput file, the advantage of my solution is that it is
>*capable to handle 300GB input files* even with 128MB RAM.
>
>
>===
>#!/usr/bin/perl
>
>use strict;
>use warnings;
>
># open input file here if source is not <DATA>
>
>open (my $outf, '>', './andrey_kastrin.out.txt')
> or die "Can't open file: $!";
>
>my $old_name='';
>while (<DATA> ) {
> chomp;
> my ($name, $value)=$_=~/^([\w]+)\s+(.*)$/;
> if ($old_name ne $name) {
> print $outf "\n" if $old_name;
> print $outf $_;
> }
> else {
> print $outf ' ',$value;
> }
> $old_name=$name;
>}
>close $outf or die "Can't close file: $!";
>
># close input file here if source is not <DATA>
>
>
>__DATA__
>id001 text1
>id001 text2
>id001 text3
>id002 text23
>id002 text555
>id003 text666
>===
>
>
>It outputs:
>
>id001 text1 text2 text3
>id002 text23 text555
>id003 text666
>
>I'm a nice guy, see?
>
>joe
>
>
>
many many thanks forall suggestions...
Cheers, Andrej
| |
| Dr.Ruud 2006-01-19, 3:57 am |
| John Doe schreef:
> In constrast to the other presented solutions using a a hash slurping
> all data from the intput file, the advantage of my solution is that
> it is *capable to handle 300GB input files* even with 128MB RAM.
Also in contrast is that your solution assumes sorted input, which
wasn't anounced.
The hash isn't slurping all data: only the unique values of the first
column are stored.
--
Grtz, Ruud
| |
| John Doe 2006-01-19, 7:56 am |
| Dr.Ruud am Donnerstag, 19. Januar 2006 00.49:
> John Doe schreef:
>
> Also in contrast is that your solution assumes sorted input, which
> wasn't anounced.
But it was present in the sample data. If it wasn't, I would have suggested a
preliminary non-inmemory sort technique.
> The hash isn't slurping all data: only the unique values of the first
> column are stored.
I should have said "the hash and its contained arrayrefs", sorry for that and
my bad english in general.
greets joe
| |
| Dr.Ruud 2006-01-19, 6:58 pm |
| John Doe:
> Dr.Ruud:
[color=darkred]
>
> But it was present in the sample data. If it wasn't, I would have
> suggested a preliminary non-inmemory sort technique.
Also wasn't mentioned that the dataset is several orders of magnitude
larger than the sample data. The average of the sample is two values per
key, the highest key would be 'id999', that would add up to 5 KB + 2 * 6
KB is a maximum of about 20 KB, not 300 GB.
;)
I once did this in SQL. It uses a subquery to number the values (sorted)
per key, and a cross-tab query that uses those numbers for column names.
Inserting a column with the maximum index per key is then also easy.
When you know how to do tricks like that, you hardly ever need a
scripting language with even the largest sets of data. But that is just
what I believe.
--
Grtz, Ruud
| |
| Andrej Kastrin 2006-01-20, 3:56 am |
| Andrej Kastrin wrote:
> John Doe wrote:
>
> many many thanks forall suggestions...
>
> Cheers, Andrej
>
Dear helpers,
me, again. But please, I'm just learning and I try not to ask too much,
but there are some problems that are really hard to understand (inspite
of two big books and a lot of tutorial prints on my table).
So, what is the intuition to combine | merge | join 2 tables. E.g., if
we have table;
ID001 W1, W2, ...
ID002 W5, W9, ...
ID003 W3, W2, W10, ...
Then the second table looks like:
W1 text1, text2, text3
W2 text2, text5, text1
W3 text3, text4
The result must be:
ID001 text1,text2,text3,text5 #combine elements from W1 and W2 from
first table
ID002 ...
ID003 ...
I go to study "Perl Programming" now and thank's for all suggestions...
Cheers
| |
| Timothy Johnson 2006-01-20, 3:56 am |
|
I don't have much time to help right now, but this might be helpful:
Sometimes the easiest way to make a list of unique values is to use hash
keys. As an example, the below code will take an array of strings and
assign each as a hash key. Since adding duplicate keys basically does
nothing in this case, you can use the keys of the resulting hash as a
list of unique items. You can also use map() if that is easier for you,
but to me this makes more sense, even though it's a few more lines of
code.
my @input =3D qw(text1 text2 text3 text5 text2 text9 text3);
my %unique;
foreach my $item(@input){
$unique{$item} =3D 1;
}
=09
foreach my $key(sort keys %unique){
print "$key\n";
}
That might help with one part of the problem. For the rest, one way to
go about it would be to use W1, W2, etc, as keys in a hash of arrays.
Example:
$hash{W1} =3D [text1,text2,text3];
That should be enough to get started.
-----Original Message-----
From: Andrej Kastrin [mailto:andrej.kastrin@siol.net]=20
Sent: Thursday, January 19, 2006 10:55 PM
To: beginners@perl.org
Subject: Re: Transform column into row
Andrej Kastrin wrote:
<snip>
So, what is the intuition to combine | merge | join 2 tables. E.g., if=20
we have table;
ID001 W1, W2, ...
ID002 W5, W9, ...
ID003 W3, W2, W10, ...
Then the second table looks like:
W1 text1, text2, text3
W2 text2, text5, text1
W3 text3, text4
The result must be:
ID001 text1,text2,text3,text5 #combine elements from W1 and W2 from=20
first table
ID002 ...
ID003 ...
I go to study "Perl Programming" now and thank's for all suggestions...
Cheers
|
|
|
|
|