| Author |
Using an array to update duplicate records
|
|
| like.a.mango@gmail.com 2005-05-31, 3:57 am |
| Hello, I'm very new to awk, and am unsure if I have approched this
correctly. I have a logfile that may have duplicate records. I need to
take the latest version. I decided to load the records into an array
using the key as the index. This way when a duplicate key is found it
just sets the new record in its place.
{
key = $1
accounts[key] = $0
}
END {
for (x in accounts) {
print accounts[x]
}
}
Are there any limitations or problems with this? THanks in advance,
Michael
| |
| Axel Sander 2005-05-31, 8:55 am |
| On 30 May 2005 22:09:22 -0700, like.a.mango@gmail.com wrote:
Looks OK to me.
>END {
> for (x in accounts) {
> print accounts[x]
> }
Hint: the "in" operator doesn't print the array elements in the
sequence of input! If you want to keep this order you can use a second
array with the keys and a sequential count. Use this to access the
accounts array. If your key is sortable (i.e. something like a
timestamp YYYYMMDDhhmmss) you can pipe the result of awk to "sort".
>}
>
HTH
Axel
| |
| like.a.mango@gmail.com 2005-05-31, 8:55 am |
| Thanks Axel
| |
| Ed Morton 2005-05-31, 3:55 pm |
|
Axel Sander wrote:
> On 30 May 2005 22:09:22 -0700, like.a.mango@gmail.com wrote:
>
> Looks OK to me.
>
>
For a large file it'll use a lot of memory. In older awks there's a
limit of (IIRC) 4096 entries in an array.
[color=darkred]
>
>
> Hint: the "in" operator doesn't print the array elements in the
> sequence of input! If you want to keep this order you can use a second
> array with the keys and a sequential count. Use this to access the
> accounts array. If your key is sortable (i.e. something like a
> timestamp YYYYMMDDhhmmss) you can pipe the result of awk to "sort".
>
If he had access to "sort" he wouldn't need to do this in awk. If you're
using "gawk", you can sort the result with "asort()" or "asorti()" but
just keeping an index seems like the best approach.
Ed.
|
|
|
|