Home > Archive > AWK > February 2005 > Count number of values in sections of a file
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Count number of values in sections of a file
|
|
|
| Hi,
I have a file which contains:
SECTION1
value1
value2
SECTION2
value3
value4
value5
value6
SECTION3
value7
value8
value9
and so on.
The section headings cannot be matched with a regular expression, but
all of the values can be matched with a single regular expression.
I would like to produce a tab-separated file which contains the number
of values in each section, sorted in descending order, followed by the
section heading, as in:
4 SECTION2
3 SECTION3
2 SECTION1
I would be grateful if you could help me with this.
Many Thanks,
Jonny
| |
| Ed Morton 2005-02-12, 3:55 am |
|
Jonny wrote:
> Hi,
>
> I have a file which contains:
>
> SECTION1
> value1
> value2
> SECTION2
> value3
> value4
> value5
> value6
> SECTION3
> value7
> value8
> value9
>
> and so on.
>
> The section headings cannot be matched with a regular expression, but
> all of the values can be matched with a single regular expression.
Then we'll have to assume that a section heading is whatever does not
match a value RE, right?
> I would like to produce a tab-separated file which contains the number
> of values in each section, sorted in descending order, followed by the
> section heading, as in:
>
> 4 SECTION2
> 3 SECTION3
> 2 SECTION1
>
> I would be grateful if you could help me with this.
To collect and print the unsorted data:
awk '/values_regexp/{cnt[sect]++;next}{sect=$0}
END{for (sect in cnt) print cnt[sect], sect}' file
If you're on UNIX, the easiest way to sort the output is to pipe the
above to "sort -n". If not, you need to write the sort routing in awk.
Ed.
| |
|
| Ed Morton wrote:
> To collect and print the unsorted data:
>
> awk '/values_regexp/{cnt[sect]++;next}{sect=$0}
> END{for (sect in cnt) print cnt[sect], sect}' file
Thanks Ed. It works perfectly.
> If you're on UNIX, the easiest way to sort the output is to pipe the
> above to "sort -n". If not, you need to write the sort routing in awk.
I'm using Windows 2000, but I have the sort command as part of GNU's
UnxUtils package.
Regards,
Jonny
| |
| Ulrich M. Schwarz 2005-02-12, 8:55 am |
| Jonny <www.mail@ntlworld.com> writes:
> I would like to produce a tab-separated file which contains the number
> of values in each section, sorted in descending order, followed by the
> section heading, as in:
>
> 4 SECTION2
> 3 SECTION3
> 2 SECTION1
I'd propose a three-pass algorithm:
awk '
/value_re/ {++num; next;}
true {printf("%s\n%s\t", num, $0); num=3D0;}
END {printf("%s\n", num);}
' \
| awk '1 {print $2, $1;} \
| sort *mumble*
Untested; probably needs adjustment to the OFS in the second awk.
HTH
Ulrich
--=20
Tobias, wenn du nicht endlich mit deiner SCHEI=DF-positiven Einstellung
und diesem G=E4nsebl=FCmchenzeug aufh=F6rst, kaufe ich eine Rasenm=E4her und
nagel ihn dir an die Decke!!!=20
-- Ilja, auf www.aschgrau.de=20
| |
|
| Ulrich M. Schwarz wrote:
> awk '
> /value re/ {++num; next;}
> true {printf("%s\n%s\t", num, $0); num=0;}
> END {printf("%s\n", num);}
> ' \
> | awk '1 {print $2, $1;} \
> | sort *mumble*
Thanks for your reply Ulrich.
All I got the above to output was the number of lines in the input file.
Regards,
Jonny
|
|
|
|
|