Home > Archive > AWK > November 2006 > Yet another very basic sorting question
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Yet another very basic sorting question
|
|
| Harriet Bazley 2006-11-22, 9:55 pm |
| I'm sure I'm being extremely stupid, but I have searched both Google's
archives of this newsgroup and the text version of the gawk manual, and
can't find a reference to what seems a very simple problem, that I've
been trying to solve for hours. I've tried copying arrays into arrays,
I've tried copying arrays to reverse the keys and values, I've tried
creating new indices from 1 to n, but I can't seem to get the result I
want....
I can use a very simple program to count the number of occurrences of a
given input line and print the number of unique lines and how often each
occurs:
{lines[$0]++}
END{
for (i in lines)
print lines[1],i
}
But what I can't see how to do is use asort() or asorti() to print out
these lines in order of frequency, given that the array values are *not*
guaranteed to be unique. So far as I can see it needs at least three
arrays, but I've tried all sorts of permutations and can't find out how
to retain *both* key and index values and output both in the order of
index magnitude.
In the end I had to use
printf("%.3d %s\n",lines[i],i)
and then use my editor's block sort to get the results in order, and its
wildcard search to remove the excess zeroes again, which is not exactly
an elegant solution. :-(
--
Harriet Bazley == Loyaulte me lie ==
If you're feeling good, don't worry. You'll get over it.
| |
|
| On Thu, 23 Nov 2006 02:21:11 GMT, Harriet Bazley <bazley@feathermail.co.uk> wrote:
>I'm sure I'm being extremely stupid, but I have searched both Google's
>archives of this newsgroup and the text version of the gawk manual, and
>can't find a reference to what seems a very simple problem, that I've
>been trying to solve for hours. I've tried copying arrays into arrays,
>I've tried copying arrays to reverse the keys and values, I've tried
>creating new indices from 1 to n, but I can't seem to get the result I
>want....
>
>
>I can use a very simple program to count the number of occurrences of a
>given input line and print the number of unique lines and how often each
>occurs:
>
> {lines[$0]++}
>
>END{
> for (i in lines)
> print lines[1],i
> }
Something like:
for (i in lines)
sort_lines[++j] = sprintf("%06d %s", lines[i], i)
n = asort(sort_lines)
for (i = 1; i <= n; i++) {
split(sort_lines[i], k)
printf "%6d %s\n", k[1] k[2]
}
Grant.
--
http://bugsplatter.mine.nu/
| |
| Brian Inglis 2006-11-23, 3:56 am |
| On Thu, 23 Nov 2006 02:21:11 GMT in comp.lang.awk, Harriet Bazley
<bazley@feathermail.co.uk> wrote:
>I'm sure I'm being extremely stupid, but I have searched both Google's
>archives of this newsgroup and the text version of the gawk manual, and
>can't find a reference to what seems a very simple problem, that I've
>been trying to solve for hours. I've tried copying arrays into arrays,
>I've tried copying arrays to reverse the keys and values, I've tried
>creating new indices from 1 to n, but I can't seem to get the result I
>want....
>
>
>I can use a very simple program to count the number of occurrences of a
>given input line and print the number of unique lines and how often each
>occurs:
>
> {lines[$0]++}
>
>END{
> for (i in lines)
> print lines[1],i
> }
>
>
>But what I can't see how to do is use asort() or asorti() to print out
>these lines in order of frequency, given that the array values are *not*
>guaranteed to be unique. So far as I can see it needs at least three
>arrays, but I've tried all sorts of permutations and can't find out how
>to retain *both* key and index values and output both in the order of
>index magnitude.
>
>In the end I had to use
> printf("%.3d %s\n",lines[i],i)
>and then use my editor's block sort to get the results in order, and its
>wildcard search to remove the excess zeroes again, which is not exactly
>an elegant solution. :-(
Assuming Unix utilities:
sort in | uniq -c | sort +1nr
sorts file in, counts line occurrences and outputs count followed by
line, then sorts the output in reverse numeric order of the count
--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada
Brian.Inglis@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]a
b[dot]ca)
fake address use address above to reply
| |
| Ed Morton 2006-11-23, 6:56 pm |
| Harriet Bazley wrote:
> I'm sure I'm being extremely stupid, but I have searched both Google's
> archives of this newsgroup and the text version of the gawk manual, and
> can't find a reference to what seems a very simple problem, that I've
> been trying to solve for hours. I've tried copying arrays into arrays,
> I've tried copying arrays to reverse the keys and values, I've tried
> creating new indices from 1 to n, but I can't seem to get the result I
> want....
>
>
> I can use a very simple program to count the number of occurrences of a
> given input line and print the number of unique lines and how often each
> occurs:
>
> {lines[$0]++}
>
> END{
> for (i in lines)
> print lines[1],i
I assume you mean lines[i], not lines[1].
> }
>
>
> But what I can't see how to do is use asort() or asorti() to print out
> these lines in order of frequency, given that the array values are *not*
> guaranteed to be unique. So far as I can see it needs at least three
> arrays, but I've tried all sorts of permutations and can't find out how
> to retain *both* key and index values and output both in the order of
> index magnitude.
>
> In the end I had to use
> printf("%.3d %s\n",lines[i],i)
> and then use my editor's block sort to get the results in order, and its
> wildcard search to remove the excess zeroes again, which is not exactly
> an elegant solution. :-(
>
You may not need more arrays or to sort the array to get the output you
want. Try this:
{lines[$0]++}
END{
for (line in lines) {
count = lines[line]
allLines[count] = allLines[count] sep[count] line OFS count
sep[count] = ORS
max = (count > max ? count : max)
}
for (i=1; i<=max; i++)
if (i in allLines)
print allLines[i]
}
Obviusly you can aplit() allLines[] on RS before printing or use a
different separator or... but hopefully you get the idea.
Ed.
| |
| Harriet Bazley 2006-11-23, 9:55 pm |
| On 23 Nov 2006 as I do recall,
Grant wrote:
> On Thu, 23 Nov 2006 02:21:11 GMT, Harriet Bazley
> <bazley@feathermail.co.uk> wrote:
[snip output sort problems]
Typo - sorry!
[color=darkred]
>
> Something like:
>
> for (i in lines)
> sort_lines[++j] = sprintf("%06d %s", lines[i], i)
>
Ah... beautifully elegant :-D
And the daft thing is that it was more or less what I was doing
manually anyway, and I hadn't thought of it!
Thanks to all.
--
Harriet Bazley == Loyaulte me lie ==
Lies, damned lies and user documentation.
|
|
|
|
|