Home > Archive > Unix Programming > March 2007 > Extract lines from a log-file with duplicate fields?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Extract lines from a log-file with duplicate fields?
|
|
| Duncan Smith 2007-03-19, 10:07 pm |
| How could I extract all of the lines from a log-file which have a
certain field duplicated?
For example, if File1.log contained
1,1,Fixed
2,2,Geared
3,1,Indexed
4,4,Hub
5,4,Freewheel
6,4,STI
Using any common GNU utilities like sed/awk/uniq/tac/egrep (not Perl
though), let's say I need a command that will return all lines where
the second field is duplicated. Note that the duplicated field may or
may not be in consecutive lines. So, I need to end up with;
1,1,Fixed
3,1,Indexed
4,4,Hub
5,4,Freewheel
6,4,STI
For added points I could also do with a count of the number of times
the field was repeated (next to the field itself):
Field 1 repeated 2
Field 4 repeated 3
or even just:
1,2
4,3
Any thoughts?
Many thanks,
Duncan.
| |
| Dave Gibson 2007-03-19, 10:07 pm |
| Duncan Smith <DSmith1974@googlemail.com> wrote:
> How could I extract all of the lines from a log-file which have a
> certain field duplicated?
>
> For example, if File1.log contained
>
> 1,1,Fixed
> 2,2,Geared
> 3,1,Indexed
> 4,4,Hub
> 5,4,Freewheel
> 6,4,STI
>
> Using any common GNU utilities like sed/awk/uniq/tac/egrep (not Perl
> though), let's say I need a command that will return all lines where
> the second field is duplicated. Note that the duplicated field may or
> may not be in consecutive lines. So, I need to end up with;
>
> 1,1,Fixed
> 3,1,Indexed
> 4,4,Hub
> 5,4,Freewheel
> 6,4,STI
>
> For added points I could also do with a count of the number of times
> the field was repeated (next to the field itself):
>
> Field 1 repeated 2
> Field 4 repeated 3
nawk '
BEGIN { FS = "," }
{ a[$2] = (a[$2]) ? a[$2] "\n" $0 : $0 ; c[$2]++ }
END {
for (n in c) { if (c[n] > 1) print a[n] }
for (n in c) { if (c[n] > 1) print "Field " n " repeated " c[n] }
}' file
| |
| Duncan Smith 2007-03-19, 10:07 pm |
| On Mar 19, 7:00 pm, dave+news...@gibson-hrd.abelgratis.co.uk.invalid
(Dave Gibson) wrote:
> Duncan Smith <DSmith1...@googlemail.com> wrote:
>
>
>
>
>
>
>
> nawk '
> BEGIN { FS = "," }
> { a[$2] = (a[$2]) ? a[$2] "\n" $0 : $0 ; c[$2]++ }
> END {
> for (n in c) { if (c[n] > 1) print a[n] }
> for (n in c) { if (c[n] > 1) print "Field " n " repeated " c[n] }
> }' file
Ah, I see.. That's very useful - this script and awk in general.
Thanks,
Duncan.
| |
| Chris F.A. Johnson 2007-03-19, 10:07 pm |
| On 2007-03-19, Dave Gibson wrote:
> Duncan Smith <DSmith1974@googlemail.com> wrote:
>
> nawk '
> BEGIN { FS = "," }
> { a[$2] = (a[$2]) ? a[$2] "\n" $0 : $0 ; c[$2]++ }
> END {
> for (n in c) { if (c[n] > 1) print a[n] }
> for (n in c) { if (c[n] > 1) print "Field " n " repeated " c[n] }
> }' file
That doesn't preserve the line order. To do that, here is the
solution I posted in comp.unix.shell (OP: please don't multi-post):
awk -F, '{ if ( x[$2]++ == 0 ) { y[$2] = $0 }
else if ( x[$2] == 2 ) print y[$2] "\n" $0
else print
}
END { print ""
for ( n in x )
if ( x[n] > 1 )
printf "Field %s repeated %d\n",n,x[n]
}'
--
Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
Shell Scripting Recipes: | My code in this post, if any,
A Problem-Solution Approach | is released under the
2005, Apress | GNU General Public Licence
| |
| Duncan Smith 2007-03-20, 7:09 pm |
|
>
> That doesn't preserve the line order. To do that, here is the
> solution I posted in comp.unix.shell (OP: please don't multi-post):
>
Thanks, that works very well, I wasn't expecting so much good
feedback. I'd originally intended to cross-post to three groups (to
get a wider audience, and as it's my first post on this topic, I
wasn't sure which group would be best - apart from trawling history to
gauge on past responses), but by mistake I'd only cross-posted to
two. To make matters worse I then multi-posted to the third
(apologies, still fine-tuning my netiquette).
Am I right in thinking that cross-posting to a small number of related
groups is acceptable if kept to a minimum, replies are requested to be
made to one group only, and the groups aren't chosen to be diverse so
as to deliberately start a flame war?
If not, which is the best choice for a single group to handle awk/sed/
grep type queries?
(Sorry for the whole multi-posting thing)
Thanks,
Duncan.
| |
| Dave Gibson 2007-03-20, 7:09 pm |
| Chris F.A. Johnson <cfajohnson@gmail.com> wrote:
> On 2007-03-19, Dave Gibson wrote:
[...][color=darkred]
>
> That doesn't preserve the line order.
Ah. I had assumed that the requirement was for those lines with common
second-field values to be grouped together.
> To do that, here is the
> solution I posted in comp.unix.shell (OP: please don't multi-post):
>
> awk -F, '{ if ( x[$2]++ == 0 ) { y[$2] = $0 }
> else if ( x[$2] == 2 ) print y[$2] "\n" $0
> else print
> }
> END { print ""
> for ( n in x )
> if ( x[n] > 1 )
> printf "Field %s repeated %d\n",n,x[n]
> }'
That doesn't preserve the line order. The line containing the first
occurrence of a value is moved to just before the line containing the
second occurrence of that value. To demonstrate, append the line
"7,2,Wibble" to the OP's sample data and note where the "2,2,Geared"
line is printed.
A two-pass approach as recommended by a poster in comp.unix.shell
(Message-ID: <etmmno$44q$1@online.de> ) seems most suitable.
[Note: Followup-To: header set to comp.os.linux.misc]
| |
| CBFalconer 2007-03-21, 5:41 am |
| Duncan Smith wrote:
>
.... snip ...
>
> Am I right in thinking that cross-posting to a small number of
> related groups is acceptable if kept to a minimum, replies are
> requested to be made to one group only, and the groups aren't
> chosen to be diverse so as to deliberately start a flame war?
On your initial post, set follow-up to the single appropriate
newsgroup, as I have done here. Note the follow-up line in the
headers.
--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
--
Posted via a free Usenet account from http://www.teranews.com
|
|
|
|
|