Code Comments
Programming Forum and web based access to our favorite programming groups.I'm quite new to awk scripting, and i haven't been able to solve this problem: I have this data file: http://www.cs.kuleuven.ac.be/~bartv..._1_0p041_12.txt I would like to use the data from that datafile in gnuplot, which expects the data as columns. Now the data in my file is stored in the even rows. How do i transform the data from the even rows in to columns? The best i could come up with up until now is http://www.cs.kuleuven.ac.be/~bartv...ads/convert.awk but this does not give me what i want. The row-data is indeed changed to column data, but the columns should be next to each other, not below each other and separated by an empty line... Any help appreciated. Regards, Bart -- "Share what you know. Learn what you don't."
Post Follow-up to this messageBart Vandewoestyne <MyFirstName.MyLastName@telenet.be> wrote: > I'm quite new to awk scripting, and i haven't been able to solve this > problem: > > I have this data file: > > http://www.cs.kuleuven.ac.be/~bartv..._1_0p041_12.txt > > I would like to use the data from that datafile in gnuplot, which > expects the data as columns. Now the data in my file is stored in the > even rows. > > How do i transform the data from the even rows in to columns? The best > i could come up with up until now is > > http://www.cs.kuleuven.ac.be/~bartv...ads/convert.awk > > but this does not give me what i want. The row-data is indeed changed > to column data, but the columns should be next to each other, not below > each other and separated by an empty line... > > Any help appreciated. Search <comp.lang.awk> and <comp.unix.shell> for 'transpose' keyword in subject. -- William Park <opengeometry@yahoo.ca>, Toronto, Canada Slackware Linux -- because it works.
Post Follow-up to this messageBart Vandewoestyne wrote: > I'm quite new to awk scripting, and i haven't been able to solve this > problem: > > I have this data file: > > http://www.cs.kuleuven.ac.be/~bartv..._1_0p041_12.txt > > I would like to use the data from that datafile in gnuplot, which > expects the data as columns. Now the data in my file is stored in the > even rows. > > How do i transform the data from the even rows in to columns? The best > i could come up with up until now is > > http://www.cs.kuleuven.ac.be/~bartv...ads/convert.awk > > but this does not give me what i want. The row-data is indeed changed > to column data, but the columns should be next to each other, not below > each other and separated by an empty line... > Take a look at this: -------------- Transposing rows to selected columns and sorting by key. Given the following input file: Number of executions = 437 Number of compilations = 1 Worst preparation time (ms) = 1 Best preparation time (ms) = 1 Rows deleted = 0 Number of executions = 1 Number of compilations = 1 Worst preparation time (ms) = 4 Best preparation time (ms) = 4 Rows deleted = 0 Number of executions = 29 Number of compilations = 1 Worst preparation time (ms) = 1 Best preparation time (ms) = 1 Rows deleted = 0 To tranpose certain rows into columns and sort by one of the column, like the following which is sorted by "Number of executions": Number of executions Number of compilations Rows deleted 437 1 0 29 1 0 29 1 0 This will do it all in gawk: gawk -vRS="" -F"\n" 'BEGIN{ fields = "1 2 5"; key = "1" numflds = split(fields,flds," ") } { for (i=1; i<=NF;i++) { split($i,f,"=") # Get rid of all spaces from the end of the title text sub(/[[:blank:]]*$/,"",f[1]) title[i]=f[1] # Get rid of all spaces from the value field value[i]=f[2]+0 # Determine the width for this column based on the width # of the title text plus 3 for spacing. Left-justify (%-). fmt[i]="%-"(length(title[i])+3)"s" } # We will want to sort on the key column so we need to create a # string at the start of each line to sort on later. Take the key # columns value and pad it with zeros up to 20 chars followed by # a space to separate it fromthe first real column. Conversion of # "7" to "0007" and "17" to "0017" is necessary because asort() # is alphabetical not numerical so all numeric fields must be the # same width to compare alphabetically. lines[NR] = sprintf("%020s ",value[key]) # Now add the real columns, formatted as determined earlier. for (i=1; i<=numflds; i++) { lines[NR] = lines[NR] sprintf(fmt[flds[i]], value[flds[i]]) } } END { # Print the title line for (i=1; i<=numflds; i++) { printf fmt[flds[i]], title[flds[i]] } print "" # Sort the lines alphabetically, i.e. by the value of the key column # added above to the front of each line. asort(lines) # Print each line for (i=1; i<=NR; i++) { # strip out the first numeric value, the key value added above sub("[[:digit:]]* ","",lines[i]) print lines[i] } }' Setting fields and key at the beginning obvious dictates which fields to be printed and which key to sort on. The only thing it assumes about field sizes is that the key fields values won't be more than 20 characters. -------------- and come back if you have questions. Ed.
Post Follow-up to this messageIn article <e82fe$42725f4f$d1b71443$7763@PRIMUS.CA>, William Park wrote: > > Search <comp.lang.awk> and <comp.unix.shell> for 'transpose' keyword in > subject. Thanks. I was always searching for 'convert row data to column data' and search terms like that. The 'transpose' hint was very usefull and I was able to write a working awk script: http://www.cs.kuleuven.ac.be/~bartv...ads/convert.awk This does exactly what I want. Of course, I'm always interested in reading other shorter/cleaner/more_intelligent solutions :-) Regards, Bart -- "Share what you know. Learn what you don't."
Post Follow-up to this messageBart Vandewoestyne wrote: > In article <e82fe$42725f4f$d1b71443$7763@PRIMUS.CA>, William Park wrote: > > > > Thanks. I was always searching for 'convert row data to column data' > and search terms like that. > > The 'transpose' hint was very usefull and I was able to write a working > awk script: > > http://www.cs.kuleuven.ac.be/~bartv...ads/convert.awk > > This does exactly what I want. Of course, I'm always interested in > reading other shorter/cleaner/more_intelligent solutions :-) > > Regards, > Bart > I assume since you posted the link that you';d like some feedback so: > #!/usr/bin/awk -f The above is "old awk", generally considered broken and to be avoided. On Solaris use either: gawk (you may need to install it yourself from http://www.gnu.org/software/gawk), /usr/xpg4/bin/awk, or /usr/bin/nawk with gawk being the first choice. > # > # Convert George's data files which are row-oriented towards column-orient ed > # data files. > > BEGIN { numcols=NF; numrows=0; } The above line is not useful. NF is not set in the BEGIN section and numrows will take the numeric value zero anyway. > # Match a line with data > /^[0-9].*/ { It doesn't really matter, but you could use a character class of [:digit:] here instead of explicitly testing for digits with newer awks. More importantly, though, the ".*" means "any sequence of characters". so the above tests for a line that starts with a digit and then has more subsequent characters which may not be your intent. If you wanted to test for a line that starts with a digit and don't care whether or not there's subsequent characters, you'd just write: /^[0-9]/ If you wanted to test for a line that's all digits, you'd write: /^[0-9]*$/ etc.... > > # Extract the amount of data points > numcols=NF; You don't need to set this for every line. You could just set it once in the END section for the final line (for newer awks), but you can just use NF instead. > # There is now one extra row/column of data > numrows=numrows+1 That could just be written as numrows++. > # Store the data in an array so we can extract it later on > for (i=1; i<=numcols; i++) { Just use NF for numcols. If you REALLY wanted a numcols variable, you could just increment it in place of "i" here with suitable arithmetic adjustment. > data[i, numrows]=$i; You don't need a terminating semicolon. > } > } > > # Now show all the data that we stored in the array in a column-oriented w ay. > END { > for (col=1; col<=numcols; col++) { > for (row=1; row<=numrows; row++) { > printf("%s ", data[col,row]); You don't need the terminating semicolon. Also, this will put an extra space at the end of your line. You can avoid that by doing: printf("%s%s",sep,data[col,row]) sep=" " > } > printf("\n"); No need for the semicolon, and normally people just use: print "" to add that final newline. > } > } You don't actually need all the "{" and "}"s, but they don't do any harm and do future-proof so they're not necessarily a bad idea. So, the above could be written as: #!/usr/wherever/bin/gawk -f # # Convert George's data files which are row-oriented towards column-oriented # data files. # Match a line with data /^[[:digit:]].*/ { # There is now one extra row/column of data numrows++ # Store the data in an array so we can extract it later on for (i=1; i<=NF; i++) data[i, numrows]=$i } # Now show all the data that we stored in the array in a column-oriented way. END { for (col=1; col<=NF; col++) { for (row=1; row<=numrows; row++) { printf("%s%s", sep, data[col,row]) sep=" " } print "" } } Just showing some possibilities - pick anything you want to keep..... Ed.
Post Follow-up to this messageBart Vandewoestyne <MyFirstName.MyLastName@telenet.be> wrote: > In article <e82fe$42725f4f$d1b71443$7763@PRIMUS.CA>, William Park wrote: > > Thanks. I was always searching for 'convert row data to column data' > and search terms like that. > > The 'transpose' hint was very usefull and I was able to write a working > awk script: > > http://www.cs.kuleuven.ac.be/~bartv...ads/convert.awk > > This does exactly what I want. Of course, I'm always interested in > reading other shorter/cleaner/more_intelligent solutions :-) That smells like a Fortran. Try something like i=0 while read; do printf '%s\n' $REPLY > file.$((++i)) done < file paste file.* If your file has funny characters, then use 'set -f' to disable globbing. To do this all in memory, then you need my patched Bash shell http://freshmeat.net/projects/bashdiff/ which does "transpose in-place". -- William Park <opengeometry@yahoo.ca>, Toronto, Canada Slackware Linux -- because it works.
Post Follow-up to this messageOn Fri, 29 Apr 2005 19:41:17 -0500, Ed Morton <morton@lsupcaemnt.com> wrote: > >No need for the semicolon, and normally people just use: > > print "" > >to add that final newline. > Hi Ed, oh master of minimal awk. I have always just used "print" on a line to print an EOL. is 'print ""' better, or why are you wasting 3 typed characters? I am not being snide, I want to know and I appreciate your tips. Regards, ~Steve There is no "x" in my email address.
Post Follow-up to this messageSteve Calfee wrote: > On Fri, 29 Apr 2005 19:41:17 -0500, Ed Morton <morton@lsupcaemnt.com> > wrote: > > > > > Hi Ed, oh master of minimal awk. I have always just used "print" on a > line to print an EOL. is 'print ""' better, or why are you wasting 3 > typed characters? I am not being snide, I want to know and I > appreciate your tips. print on a line prints $0. To ONLY print the newline character, you need print "". Ed.
Post Follow-up to this messageIn article <cdmdnfW3PueySe_fRVn-rg@comcast.com>, Ed Morton wrote: > > I assume since you posted the link that you';d like some feedback so: > > <snip feedback> > > Just showing some possibilities - pick anything you want to keep..... Thanks. Ik really appreciate this kind of feedback to improve my skills in 'yet another language' that I'm learning :-) Regards, Bart -- "Share what you know. Learn what you don't."
Post Follow-up to this messageLe Fri, 29 Apr 2005 19:41:17 -0500, Ed Morton a écrit_:
>
>
> Bart Vandewoestyne wrote:
...
> I assume since you posted the link that you';d like some feedback so:
...
> So, the above could be written as:
>
> #!/usr/wherever/bin/gawk -f
> #
> # Convert George's data files which are row-oriented towards column-orient
ed
> # data files.
>
> # Match a line with data
> /^[[:digit:]].*/ {
> # There is now one extra row/column of data
> numrows++
>
> # Store the data in an array so we can extract it later on
> for (i=1; i<=NF; i++)
> data[i, numrows]=$i
> }
>
> # Now show all the data that we stored in the array in a column-oriented
> way.
> END {
> for (col=1; col<=NF; col++) {
> for (row=1; row<=numrows; row++) {
> printf("%s%s", sep, data[col,row])
> sep=" "
> }
> print ""
> }
> }
>
> Just showing some possibilities - pick anything you want to keep.....
May I play too ?-)
Just for the sake of doing it *almost* differently
and add some more feedback ;-)
#!/usr/bin/gawk -f
#
/^[[:digit:]].*/ {
max=NF
while(NF){
data[NR,NF]=$NF;
NF--
}
}
END{
while(max - j++){
i=1
while(data[i,j]) printf data[i++,j]FS
print ""
}
}
Well, I know it's not foolproof in case the input file is not
symetric in it col/row matrix, but it shouldn't ...
Worse, it acts funny when values are zero, need another type
of test than (data[i,j]) in case zero values might be present.
For instance the usual for loop :
END{
for(j=1;j<=max;j++){
for(i=1; i<=NR; i++)
printf data[i,j]FS
print ""
}
}
And ...
It doesn't cope with *not* printing the first blank, which
Ed.'s answer didn't either :D)
( If really needed we should *void* the sep between the `for's :-)
for (col=1; col<=NF; col++) {
sep=""
for (row=1; row<=numrows; row++) {
)
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.