Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

convert row-data to column data
I'm quite new to awk scripting, and i haven't been able to solve this
problem:

I have this data file:

http://www.cs.kuleuven.ac.be/~bartv..._1_0p041_12.txt

I would like to use the data from that datafile in gnuplot, which
expects the data as columns.  Now the data in my file is stored in the
even rows.

How do i transform the data from the even rows in to columns?  The best
i could come up with up until now is

http://www.cs.kuleuven.ac.be/~bartv...ads/convert.awk

but this does not give me what i want.  The row-data is indeed changed
to column data, but the columns should be next to each other, not below
each other and separated by an empty line...

Any help appreciated.

Regards,
Bart

--
"Share what you know.  Learn what you don't."

Report this thread to moderator Post Follow-up to this message
Old Post
Bart Vandewoestyne
04-29-05 08:55 PM


Re: convert row-data to column data
Bart Vandewoestyne <MyFirstName.MyLastName@telenet.be> wrote:
> I'm quite new to awk scripting, and i haven't been able to solve this
> problem:
>
> I have this data file:
>
> http://www.cs.kuleuven.ac.be/~bartv..._1_0p041_12.txt
>
> I would like to use the data from that datafile in gnuplot, which
> expects the data as columns.  Now the data in my file is stored in the
> even rows.
>
> How do i transform the data from the even rows in to columns?  The best
> i could come up with up until now is
>
> http://www.cs.kuleuven.ac.be/~bartv...ads/convert.awk
>
> but this does not give me what i want.  The row-data is indeed changed
> to column data, but the columns should be next to each other, not below
> each other and separated by an empty line...
>
> Any help appreciated.

Search <comp.lang.awk> and <comp.unix.shell> for 'transpose' keyword in
subject.

--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
Slackware Linux -- because it works.

Report this thread to moderator Post Follow-up to this message
Old Post
William Park
04-29-05 08:55 PM


Re: convert row-data to column data

Bart Vandewoestyne wrote:
> I'm quite new to awk scripting, and i haven't been able to solve this
> problem:
>
> I have this data file:
>
> http://www.cs.kuleuven.ac.be/~bartv..._1_0p041_12.txt
>
> I would like to use the data from that datafile in gnuplot, which
> expects the data as columns.  Now the data in my file is stored in the
> even rows.
>
> How do i transform the data from the even rows in to columns?  The best
> i could come up with up until now is
>
> http://www.cs.kuleuven.ac.be/~bartv...ads/convert.awk
>
> but this does not give me what i want.  The row-data is indeed changed
> to column data, but the columns should be next to each other, not below
> each other and separated by an empty line...
>

Take a look at this:

--------------
Transposing rows to selected columns and sorting by key.

Given the following input file:
Number of executions               = 437
Number of compilations             = 1
Worst preparation time (ms)        = 1
Best preparation time (ms)         = 1
Rows deleted                       = 0

Number of executions               = 1
Number of compilations             = 1
Worst preparation time (ms)        = 4
Best preparation time (ms)         = 4
Rows deleted                       = 0

Number of executions               = 29
Number of compilations             = 1
Worst preparation time (ms)        = 1
Best preparation time (ms)         = 1
Rows deleted                       = 0

To tranpose certain rows into columns and sort by one of the
column, like the following which is sorted by "Number of executions":

Number of executions   Number of compilations     Rows deleted
437                    1                          0
29                     1                          0
29                     1                          0

This will do it all in gawk:

gawk -vRS="" -F"\n" 'BEGIN{ fields = "1 2 5"; key = "1"
numflds = split(fields,flds," ")
}
{
for (i=1; i<=NF;i++) {
split($i,f,"=")
# Get rid of all spaces from the end of the title text
sub(/[[:blank:]]*$/,"",f[1])
title[i]=f[1]
# Get rid of all spaces from the value field
value[i]=f[2]+0
# Determine the width for this column based on the width
# of the title text plus 3 for spacing. Left-justify (%-).
fmt[i]="%-"(length(title[i])+3)"s"
}
# We will want to sort on the key column so we need to create a
# string at the start of each line to sort on later. Take the key
# columns value and pad it with zeros up to 20 chars followed by
# a space to separate it fromthe first real column. Conversion of
# "7" to "0007" and "17" to "0017" is necessary because asort()
# is alphabetical not numerical so all numeric fields must be the
# same width to compare alphabetically.
lines[NR] = sprintf("%020s ",value[key])

# Now add the real columns, formatted as determined earlier.
for (i=1; i<=numflds; i++) {
lines[NR] = lines[NR] sprintf(fmt[flds[i]], value[flds[i]])
}
}
END {
# Print the title line
for (i=1; i<=numflds; i++) {
printf fmt[flds[i]], title[flds[i]]
}
print ""
# Sort the lines alphabetically, i.e. by the value of the key column
# added above to the front of each line.
asort(lines)
# Print each line
for (i=1; i<=NR; i++) {
# strip out the first numeric value, the key value added above
sub("[[:digit:]]* ","",lines[i])
print lines[i]
}
}'

Setting fields and key at the beginning obvious dictates which fields to
be printed and which key to sort on. The only thing it assumes about field
sizes is that the key fields values won't be more than 20 characters.
--------------

and come back if you have questions.

Ed.

Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
04-29-05 08:55 PM


Re: convert row-data to column data
In article <e82fe$42725f4f$d1b71443$7763@PRIMUS.CA>, William Park wrote:
>
> Search <comp.lang.awk> and <comp.unix.shell> for 'transpose' keyword in
> subject.

Thanks.  I was always searching for 'convert row data to column data'
and search terms like that.

The 'transpose' hint was very usefull and I was able to write a working
awk script:

http://www.cs.kuleuven.ac.be/~bartv...ads/convert.awk

This does exactly what I want.  Of course, I'm always interested in
reading other shorter/cleaner/more_intelligent solutions :-)

Regards,
Bart

--
"Share what you know.  Learn what you don't."

Report this thread to moderator Post Follow-up to this message
Old Post
Bart Vandewoestyne
04-30-05 01:55 AM


Re: convert row-data to column data

Bart Vandewoestyne wrote:
> In article <e82fe$42725f4f$d1b71443$7763@PRIMUS.CA>, William Park wrote:
> 
>
>
> Thanks.  I was always searching for 'convert row data to column data'
> and search terms like that.
>
> The 'transpose' hint was very usefull and I was able to write a working
> awk script:
>
> http://www.cs.kuleuven.ac.be/~bartv...ads/convert.awk
>
> This does exactly what I want.  Of course, I'm always interested in
> reading other shorter/cleaner/more_intelligent solutions :-)
>
> Regards,
> Bart
>

I assume since you posted the link that you';d like some feedback so:

> #!/usr/bin/awk -f

The above is "old awk", generally considered broken and to be avoided.
On Solaris use either: gawk (you may need to install it yourself from
http://www.gnu.org/software/gawk), /usr/xpg4/bin/awk, or /usr/bin/nawk
with gawk being the first choice.

> #
> # Convert George's data files which are row-oriented towards column-orient
ed
> # data files.
>
> BEGIN { numcols=NF; numrows=0; }

The above line is not useful. NF is not set in the BEGIN section and
numrows will take the numeric value zero anyway.

> # Match a line with data
> /^[0-9].*/ {

It doesn't really matter, but you could use a character class of
[:digit:] here instead of explicitly testing for digits with newer awks.

More importantly, though, the ".*" means "any sequence of characters".
so the above tests for a line that starts with a digit and then has more
subsequent characters which may not be your intent.

If you wanted to test for a line that starts with a digit and don't care
whether or not there's subsequent characters, you'd just write:

/^[0-9]/

If you wanted to test for a line that's all digits, you'd write:

/^[0-9]*$/

etc....

>
>   # Extract the amount of data points
>   numcols=NF;

You don't need to set this for every line. You could just set it once in
the END section for the final line (for newer awks), but you can just
use NF instead.

>   # There is now one extra row/column of data
>   numrows=numrows+1

That could just be written as numrows++.

>   # Store the data in an array so we can extract it later on
>   for (i=1; i<=numcols; i++) {

Just use NF for numcols. If you REALLY wanted a numcols variable, you
could just increment it in place of "i" here with suitable arithmetic
adjustment.

>     data[i, numrows]=$i;

You don't need a terminating semicolon.

>   }
> }
>
> # Now show all the data that we stored in the array in a column-oriented w
ay.
> END {
>   for (col=1; col<=numcols; col++) {
>     for (row=1; row<=numrows; row++) {
>       printf("%s ", data[col,row]);


You don't need the terminating semicolon. Also, this will put an extra
space at the end of your line. You can avoid that by doing:

printf("%s%s",sep,data[col,row])
sep=" "

>     }
>     printf("\n");

No need for the semicolon, and normally people just use:

print ""

to add that final newline.

>   }
> }

You don't actually need all the "{" and "}"s, but they don't do any harm
and do future-proof so they're not necessarily a bad idea.

So, the above could be written as:

#!/usr/wherever/bin/gawk -f
#
# Convert George's data files which are row-oriented towards column-oriented
# data files.

# Match a line with data
/^[[:digit:]].*/ {
# There is now one extra row/column of data
numrows++

# Store the data in an array so we can extract it later on
for (i=1; i<=NF; i++)
data[i, numrows]=$i
}

# Now show all the data that we stored in the array in a column-oriented
way.
END {
for (col=1; col<=NF; col++) {
for (row=1; row<=numrows; row++) {
printf("%s%s", sep, data[col,row])
sep=" "
}
print ""
}
}

Just showing some possibilities - pick anything you want to keep.....

Ed.

Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
04-30-05 08:56 AM


Re: convert row-data to column data
Bart Vandewoestyne <MyFirstName.MyLastName@telenet.be> wrote:
> In article <e82fe$42725f4f$d1b71443$7763@PRIMUS.CA>, William Park wrote: 
>
> Thanks.  I was always searching for 'convert row data to column data'
> and search terms like that.
>
> The 'transpose' hint was very usefull and I was able to write a working
> awk script:
>
> http://www.cs.kuleuven.ac.be/~bartv...ads/convert.awk
>
> This does exactly what I want.  Of course, I'm always interested in
> reading other shorter/cleaner/more_intelligent solutions :-)

That smells like a Fortran.  Try something like

i=0
while read; do
printf '%s\n' $REPLY > file.$((++i))
done < file
paste file.*

If your file has funny characters, then use 'set -f' to disable
globbing.  To do this all in memory, then you need my patched Bash
shell
http://freshmeat.net/projects/bashdiff/
which does "transpose in-place".

--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
Slackware Linux -- because it works.

Report this thread to moderator Post Follow-up to this message
Old Post
William Park
04-30-05 08:56 AM


Re: convert row-data to column data
On Fri, 29 Apr 2005 19:41:17 -0500, Ed Morton <morton@lsupcaemnt.com>
wrote:
 
>
>No need for the semicolon, and normally people just use:
>
>	print ""
>
>to add that final newline.
>

Hi Ed, oh master of minimal awk. I have always just used "print" on a
line to print an EOL. is 'print ""' better, or why are you wasting 3
typed characters? I am not being snide, I want to know and I
appreciate your tips.

Regards, ~Steve



There is no "x" in my email address.

Report this thread to moderator Post Follow-up to this message
Old Post
Steve Calfee
04-30-05 08:56 AM


Re: convert row-data to column data

Steve Calfee wrote:
> On Fri, 29 Apr 2005 19:41:17 -0500, Ed Morton <morton@lsupcaemnt.com>
> wrote:
>
> 
>
>
> Hi Ed, oh master of minimal awk. I have always just used "print" on a
> line to print an EOL. is 'print ""' better, or why are you wasting 3
> typed characters? I am not being snide, I want to know and I
> appreciate your tips.


print on a line prints $0. To ONLY print the newline character, you need
print "".

Ed.

Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
04-30-05 08:56 AM


Re: convert row-data to column data
In article <cdmdnfW3PueySe_fRVn-rg@comcast.com>, Ed Morton wrote:
>
> I assume since you posted the link that you';d like some feedback so:
>
> <snip feedback>
>
> Just showing some possibilities - pick anything you want to keep.....

Thanks.  Ik really appreciate this kind of feedback to improve my skills
in 'yet another language' that I'm learning :-)

Regards,
Bart

--
"Share what you know.  Learn what you don't."

Report this thread to moderator Post Follow-up to this message
Old Post
Bart Vandewoestyne
04-30-05 08:56 AM


Re: convert row-data to column data
Le Fri, 29 Apr 2005 19:41:17 -0500, Ed Morton a écrit_:

>
>
> Bart Vandewoestyne wrote: 
...
> I assume since you posted the link that you';d like some feedback so:
...
> So, the above could be written as:
>
> #!/usr/wherever/bin/gawk -f
> #
> # Convert George's data files which are row-oriented towards column-orient
ed
> # data files.
>
> # Match a line with data
> /^[[:digit:]].*/ {
>    # There is now one extra row/column of data
>    numrows++
>
>    # Store the data in an array so we can extract it later on
>    for (i=1; i<=NF; i++)
>      data[i, numrows]=$i
> }
>
> # Now show all the data that we stored in the array in a column-oriented
> way.
> END {
>    for (col=1; col<=NF; col++) {
>      for (row=1; row<=numrows; row++) {
>        printf("%s%s", sep, data[col,row])
>        sep=" "
>      }
>      print ""
>    }
> }
>
> Just showing some possibilities - pick anything you want to keep.....

May I play too ?-)
Just for the sake of doing it *almost* differently
and add some more feedback ;-)
#!/usr/bin/gawk -f
#
/^[[:digit:]].*/ {
max=NF
while(NF){
data[NR,NF]=$NF;
NF--
}
}
END{
while(max - j++){
i=1
while(data[i,j]) printf data[i++,j]FS
print ""
}
}

Well, I know it's not foolproof in case the input file is not
symetric in it col/row matrix, but it shouldn't ...

Worse, it acts funny when values are zero, need another type
of test than (data[i,j]) in case zero values might be present.
For instance the usual for loop :
END{
for(j=1;j<=max;j++){
for(i=1; i<=NR; i++)
printf data[i,j]FS
print ""
}
}

And ...
It doesn't cope with *not* printing the first blank, which
Ed.'s answer didn't either  :D)
( If really needed we should *void* the sep between the `for's :-)
for (col=1; col<=NF; col++) {
sep=""
for (row=1; row<=numrows; row++) {
)


Report this thread to moderator Post Follow-up to this message
Old Post
Loki Harfagr
04-30-05 01:55 PM


Sponsored Links




Last Thread Next Thread Next
Pages (3): [1] 2 3 »
Search this forum -> 
Post New Thread

AWK archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 09:18 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.