For Programmers: Free Programming Magazines  


Home > Archive > AWK > May 2005 > Using an array to update duplicate records









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Using an array to update duplicate records
like.a.mango@gmail.com

2005-05-31, 3:57 am

Hello, I'm very new to awk, and am unsure if I have approched this
correctly. I have a logfile that may have duplicate records. I need to
take the latest version. I decided to load the records into an array
using the key as the index. This way when a duplicate key is found it
just sets the new record in its place.

{
key = $1
accounts[key] = $0
}
END {
for (x in accounts) {
print accounts[x]
}
}

Are there any limitations or problems with this? THanks in advance,

Michael

Axel Sander

2005-05-31, 8:55 am

On 30 May 2005 22:09:22 -0700, like.a.mango@gmail.com wrote:

Looks OK to me.

>END {
> for (x in accounts) {
> print accounts[x]
> }


Hint: the "in" operator doesn't print the array elements in the
sequence of input! If you want to keep this order you can use a second
array with the keys and a sequential count. Use this to access the
accounts array. If your key is sortable (i.e. something like a
timestamp YYYYMMDDhhmmss) you can pipe the result of awk to "sort".

>}
>


HTH
Axel


like.a.mango@gmail.com

2005-05-31, 8:55 am

Thanks Axel

Ed Morton

2005-05-31, 3:55 pm



Axel Sander wrote:

> On 30 May 2005 22:09:22 -0700, like.a.mango@gmail.com wrote:
>
> Looks OK to me.
>
>

For a large file it'll use a lot of memory. In older awks there's a
limit of (IIRC) 4096 entries in an array.
[color=darkred]
>
>
> Hint: the "in" operator doesn't print the array elements in the
> sequence of input! If you want to keep this order you can use a second
> array with the keys and a sequential count. Use this to access the
> accounts array. If your key is sortable (i.e. something like a
> timestamp YYYYMMDDhhmmss) you can pipe the result of awk to "sort".
>


If he had access to "sort" he wouldn't need to do this in awk. If you're
using "gawk", you can sort the result with "asort()" or "asorti()" but
just keeping an index seems like the best approach.

Ed.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com