For Programmers: Free Programming Magazines  


Home > Archive > AWK > February 2005 > Count number of values in sections of a file









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Count number of values in sections of a file
Jonny

2005-02-12, 3:55 am

Hi,

I have a file which contains:

SECTION1
value1
value2
SECTION2
value3
value4
value5
value6
SECTION3
value7
value8
value9

and so on.

The section headings cannot be matched with a regular expression, but
all of the values can be matched with a single regular expression.

I would like to produce a tab-separated file which contains the number
of values in each section, sorted in descending order, followed by the
section heading, as in:

4 SECTION2
3 SECTION3
2 SECTION1

I would be grateful if you could help me with this.

Many Thanks,
Jonny
Ed Morton

2005-02-12, 3:55 am



Jonny wrote:

> Hi,
>
> I have a file which contains:
>
> SECTION1
> value1
> value2
> SECTION2
> value3
> value4
> value5
> value6
> SECTION3
> value7
> value8
> value9
>
> and so on.
>
> The section headings cannot be matched with a regular expression, but
> all of the values can be matched with a single regular expression.


Then we'll have to assume that a section heading is whatever does not
match a value RE, right?

> I would like to produce a tab-separated file which contains the number
> of values in each section, sorted in descending order, followed by the
> section heading, as in:
>
> 4 SECTION2
> 3 SECTION3
> 2 SECTION1
>
> I would be grateful if you could help me with this.


To collect and print the unsorted data:

awk '/values_regexp/{cnt[sect]++;next}{sect=$0}
END{for (sect in cnt) print cnt[sect], sect}' file

If you're on UNIX, the easiest way to sort the output is to pipe the
above to "sort -n". If not, you need to write the sort routing in awk.

Ed.
Jonny

2005-02-12, 3:55 am

Ed Morton wrote:

> To collect and print the unsorted data:
>
> awk '/values_regexp/{cnt[sect]++;next}{sect=$0}
> END{for (sect in cnt) print cnt[sect], sect}' file


Thanks Ed. It works perfectly.

> If you're on UNIX, the easiest way to sort the output is to pipe the
> above to "sort -n". If not, you need to write the sort routing in awk.


I'm using Windows 2000, but I have the sort command as part of GNU's
UnxUtils package.

Regards,
Jonny
Ulrich M. Schwarz

2005-02-12, 8:55 am

Jonny <www.mail@ntlworld.com> writes:

> I would like to produce a tab-separated file which contains the number
> of values in each section, sorted in descending order, followed by the
> section heading, as in:
>
> 4 SECTION2
> 3 SECTION3
> 2 SECTION1


I'd propose a three-pass algorithm:
awk '
/value_re/ {++num; next;}
true {printf("%s\n%s\t", num, $0); num=3D0;}
END {printf("%s\n", num);}
' \
| awk '1 {print $2, $1;} \
| sort *mumble*

Untested; probably needs adjustment to the OFS in the second awk.

HTH
Ulrich
--=20
Tobias, wenn du nicht endlich mit deiner SCHEI=DF-positiven Einstellung
und diesem G=E4nsebl=FCmchenzeug aufh=F6rst, kaufe ich eine Rasenm=E4her und
nagel ihn dir an die Decke!!!=20
-- Ilja, auf www.aschgrau.de=20
Jonny

2005-02-12, 8:55 am

Ulrich M. Schwarz wrote:

> awk '
> /value re/ {++num; next;}
> true {printf("%s\n%s\t", num, $0); num=0;}
> END {printf("%s\n", num);}
> ' \
> | awk '1 {print $2, $1;} \
> | sort *mumble*



Thanks for your reply Ulrich.

All I got the above to output was the number of lines in the input file.

Regards,
Jonny
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com