Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Re: How to combine two awk commands
I wrote:

> I am piping the output from one awk command to another awk command, and
> I was wondering if it is possible to combine them.
>
> The first command is just to print the second field of a file:
>
> awk "BEGIN {FS = \" \"} {print $2}"
>
> and the second command is to remove duplicates from the (unsorted)
> result of the first command:
>
> awk "{if (data[$0]++ == 0) lines[++count] = $0} END {for (i = 1; i
> <=count; i++) print lines[i]}"
>
> Please could you tell me if it is possible to combine them.


Thanks hq00e and Ed for your replies.

I found Ed's answer:

awk "BEGIN{ FS=\" \" } { array[$2]++ } END{ for ( i in  array ) print i
}"

to run slightly faster than hq00e's:

awk "BEGIN{ FS=\" \" } { array[$2]=$2 } END{ for ( i in  array ) print
array[i] }"

and both of them are running about 40% to 45% quicker than my
two-command approach.

Ed wrote:

> but you'd probably be changing the order of the output compared to the
> input by using the "in" operator this way (see
> http://www.gnu.org/software/gawk/ma...anning-an-Array)
> which may not be desirable.

I don't mind, since the input file was not in any particular order
anyway.

Your help is much appreciated.

Regards,
Jonny

Report this thread to moderator Post Follow-up to this message
Old Post
Jonny
04-07-05 08:56 PM


Re: How to combine two awk commands

Jonny wrote:
<snip>
> I found Ed's answer:
>
> awk "BEGIN{ FS=\" \" } { array[$2]++ } END{ for ( i in  array ) print i
> }"
>
> to run slightly faster than hq00e's:
>
> awk "BEGIN{ FS=\" \" } { array[$2]=$2 } END{ for ( i in  array ) print
> array[i] }"
>
> and both of them are running about 40% to 45% quicker than my
> two-command approach.

Did you try my original proposal (from 4/6):

awk "BEGIN{FS = \" \"}!date[$2]++"

I'd expect it to be the fastest. The one above was just pointing out
potential improvements to hq00es proposal, but I wouldn't really do it
that way.

Ed.

Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
04-08-05 01:55 AM


Re: How to combine two awk commands
Ed Morton wrote:

> Did you try my original proposal (from 4/6):
>
> awk "BEGIN{FS = \" \"}!date[$2]++"
>
> I'd expect it to be the fastest. The one above was just pointing out
> potential improvements to hq00es proposal, but I wouldn't really do it
> that way.

Hmm.  I don't have that posting in my list, and the postings in my
newsreader are sorted by date.  Strange.

Anyway, I tried the above command.  Perhaps I'm missing something, but
it prints the first field and the second field, but I just wanted the
second field to be printed.

I don't know why, but it was actually slightly slower than your:

awk "BEGIN{ FS=\" \" } { array[$2]++ } END{ for ( i in  array ) print
i}"

Regards,
Jonny

Report this thread to moderator Post Follow-up to this message
Old Post
Jonny
04-08-05 01:55 PM


Re: How to combine two awk commands

Jonny wrote:

> Ed Morton wrote:
>
> 
>
>
> Hmm.  I don't have that posting in my list, and the postings in my
> newsreader are sorted by date.  Strange.

Happens to me using Netscape too. google groups has a lot of problems
but at least it seems to catch all the postings! Having said that, I
still use Netscape and don't worry about the few I miss.

> Anyway, I tried the above command.  Perhaps I'm missing something, but
> it prints the first field and the second field, but I just wanted the
> second field to be printed.

Yeah, you're right. It should've been:

awk "BEGIN{FS = \" \"}!date[$2]++{print $2}"

I just noticed that you're setting the FS to it's default value, a
single space, so you don't actually need that BEGIN section:

awk "!date[$2]++{print $2}"

> I don't know why, but it was actually slightly slower than your:
>
> awk "BEGIN{ FS=\" \" } { array[$2]++ } END{ for ( i in  array ) print
> i}"
>

That's very surprising since it's avoiding the loop and array indexing.
When I ran both commands on a file that was 100000 lines long, I got
these results:

PS1> time gawk '!array[$2]++{print $2}'

real    0m1.67s
user    0m1.25s
sys     0m0.14s

PS1> time gawk '{ array[$2]++ } END{ for ( i in  array ) print i}'

real    0m1.28s
user    0m1.09s
sys     0m0.15s

Beats me....

Ed.

Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
04-08-05 08:56 PM


Re: How to combine two awk commands
Le Fri, 08 Apr 2005 08:50:47 -0500, Ed Morton a écrit_:

>
>
> Jonny wrote:
> 
... 
>
> Yeah, you're right. It should've been:
>
> awk "BEGIN{FS = \" \"}!date[$2]++{print $2}"
>
> I just noticed that you're setting the FS to it's default value, a
> single space, so you don't actually need that BEGIN section:
>
> awk "!date[$2]++{print $2}"
> 
>
> That's very surprising since it's avoiding the loop and array indexing.
> When I ran both commands on a file that was 100000 lines long, I got
> these results:
>
> PS1> time gawk '!array[$2]++{print $2}'
>
> real    0m1.67s
> user    0m1.25s
> sys     0m0.14s
>
> PS1> time gawk '{ array[$2]++ } END{ for ( i in  array ) print i}'
>
> real    0m1.28s
> user    0m1.09s
> sys     0m0.15s
>
> Beats me....
>
> 	Ed.

Yes, I get the same order of speed ununderstanding
on a big file.
$ time awk '!date[$2]++{print $2}' testfileHUGER
real    0m0.606s
user    0m0.433s
sys     0m0.085s
$ wc testfileHUGER
360000  1536000 15024000 testfileHUGER

Another thing I *don't* understand here is
the usability of doublequotes and the bang!
and/or the $

In bash, csh, ksh here I can(t get them to work out but
these errors (I'd have promised) :

$ awk "!date[$2]++{print $2}" testfile
bash: !date[$2]++{print: event not found

$ csh
% awk "!date[$2]++{print $2}" testfile
date[: Event not found.
%exit

$ ksh
u@h:w$  awk "!date[$2]++{print $2}" testfile
awk: !date[]++{print }
awk:       ^ syntax error
awk: Fatal: sous-expression invalide


Would you explain slowly why it works in your environments ?~D)
(whether I put LC_ALL to C or fr_FR it's still the same
on mine)

Report this thread to moderator Post Follow-up to this message
Old Post
Loki Harfagr
04-08-05 08:56 PM


Re: How to combine two awk commands
NNTP-Posting-Host: morton-2.ih.lucent.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.2) Gecko/
20040804 Netscape/7.2 (ax)
X-Accept-Language: en-us, en
In-Reply-To: <425695f5$0$28392$626a14ce@news.free.fr>
Xref: newsfeed-west.nntpserver.com comp.lang.awk:25834



Loki Harfagr wrote:
<snip>
> Another thing I *don't* understand here is
> the usability of doublequotes and the bang!
> and/or the $

"!" is the "not" operator. "$" dereferences an argument. Use of double
quotes instead of single is, I believe, a windows thing. I never use it
and wouldn't expect it to work on any UNIX environment.

Ed.

Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
04-08-05 08:56 PM


Re: How to combine two awk commands
Le Fri, 08 Apr 2005 12:59:47 -0500, Ed Morton a écrit_:

>
>
> Loki Harfagr wrote:
> <snip> 
>
> "!" is the "not" operator. "$" dereferences an argument.

Well, you know I know this :-)

> Use of double
> quotes instead of single is, I believe, a windows thing.

Allright ! I guess you found out the very trick !
I feel really sorry having posted such a dull question
without even thinking about it :D)

> I never use it

Well, I bet I never even tried to, before these posts ...

> and wouldn't expect it to work on any UNIX environment.

I confirm it *does* **not** work

> 	Ed.

Thanks for the relief Ed. I even thought I was going blind,
might be this case of conjunctivitis I got these days.
Woah, Windows, never got the idea it could've been the point;
I should've pointed out that "Noworyta" is a win32 app ...
I guess I'm gonna have a better wend now :D)
Cheers indeed.
Have a nice wend too, Ed.

PS: as a complement for previous posts about speed and
perfs, I "awka'd" to C the scripts and the same rates of differences
appear against all odds.
Next step (when/if I find some time not writing
stoopeedeeteez on Usenet) would be I'd write a sharpened C version of it ...
We'll see :-=)


Report this thread to moderator Post Follow-up to this message
Old Post
Loki Harfagr
04-09-05 01:56 AM


Re: How to combine two awk commands

Jonny wrote:

> Ed Morton wrote:
>
> 
>
>
> Hmm.  I don't have that posting in my list, and the postings in my
> newsreader are sorted by date.  Strange.

Happens to me using Netscape too. google groups has a lot of problems
but at least it seems to catch all the postings! Having said that, I
still use Netscape and don't worry about the few I miss.

> Anyway, I tried the above command.  Perhaps I'm missing something, but
> it prints the first field and the second field, but I just wanted the
> second field to be printed.

Yeah, you're right. It should've been:

awk "BEGIN{FS = \" \"}!date[$2]++{print $2}"

I just noticed that you're setting the FS to it's default value, a
single space, so you don't actually need that BEGIN section:

awk "!date[$2]++{print $2}"

> I don't know why, but it was actually slightly slower than your:
>
> awk "BEGIN{ FS=\" \" } { array[$2]++ } END{ for ( i in  array ) print
> i}"
>

That's very surprising since it's avoiding the loop and array indexing.
When I ran both commands on a file that was 100000 lines long, I got
these results:

PS1> time gawk '!array[$2]++{print $2}'

real    0m1.67s
user    0m1.25s
sys     0m0.14s

PS1> time gawk '{ array[$2]++ } END{ for ( i in  array ) print i}'

real    0m1.28s
user    0m1.09s
sys     0m0.15s

Beats me....

Ed.

Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
04-10-05 08:55 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

AWK archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 07:01 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.