Code Comments
Programming Forum and web based access to our favorite programming groups.Hi, Thanks for ur time..... I have a datafile which is like this 1234|xxxx|yyyy|12|1|1|NEW 1234|xxxx|yyyy|12|1|0|NEW and I have to create a file with the line that matches $1, $2, $3, $4 & $7 merged to create a line like this 1234|xxxx|yyyy|12|2|1|NEW $5 & $6 are aggregated. The datafile size is abt 500 MB, so the regular IF stmt's takes quite a long time. The file will have the records sorted by the first four fields. Is there anyway I can do this file processing? Regards, Thiru.
Post Follow-up to this message-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message
In comp.lang.awk thruv <thruv@yahoo.com>:
> Hi,
> Thanks for ur time.....
> I have a datafile which is like this
> 1234|xxxx|yyyy|12|1|1|NEW
> 1234|xxxx|yyyy|12|1|0|NEW
> and I have to create a file with the line that matches $1, $2, $3, $4 & $7
> merged to create a line like this
> 1234|xxxx|yyyy|12|2|1|NEW
> $5 & $6 are aggregated.
Perhaps something in the lines of:
awk 'BEGIN{FS=OFS="|"}{print $1,$2,$3,$4,$5+$6,$6,$7}' infile > outfile
Unsure, about your desired output?
--
Michael Heiming (GPG-Key ID: 0xEDD27B94)
mail: echo zvpunry@urvzvat.qr | perl -pe 'y/a-z/n-za-m/'
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFBY/2DAkPEju3Se5QRAuO/AJ40lJXdujRjzhNOtTcOjsZS6D9rQQCfZXPT
IbzhBZnOK9Rqv+cxo77mE24=
=DNIs
-----END PGP SIGNATURE-----
Post Follow-up to this message
thruv wrote:
> Hi,
>
> Thanks for ur time.....
>
> I have a datafile which is like this
>
> 1234|xxxx|yyyy|12|1|1|NEW
> 1234|xxxx|yyyy|12|1|0|NEW
>
> and I have to create a file with the line that matches $1, $2, $3, $4 & $7
> merged to create a line like this
>
> 1234|xxxx|yyyy|12|2|1|NEW
>
> $5 & $6 are aggregated.
>
> The datafile size is abt 500 MB, so the regular IF stmt's takes quite a
> long time.
>
> The file will have the records sorted by the first four fields.
>
> Is there anyway I can do this file processing?
Something like this (untested):
awk '{cur = $1 $2 $3 $4 $7
if (cur == prev) {
$5 += p5; $6 +=p6
} else {
print
}
prev = cur; p5 = $5; p6 = $6}' file
Regards,
Ed.
Post Follow-up to this messageThanks Michael, The output is calculated by adding the fields of $5 from the line and the next line if the first four fields match. it is something like this: INPUT 1234|xxxx|yyyy|12|1|2|NEW 1234|xxxx|yyyy|12|3|0|NEW add $5 frm line 1(1) and $5 from line2(3) The OUTPUT is 1234|xxxx|yyyy|12|4|2|NEW Appreciate ur ideas. Regards, Thiru.
Post Follow-up to this message"Michael Heiming" <michael+USENET@www.heiming.de> wrote in message news:45bd32-409.ln1@news.heiming.de... > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > NotDashEscaped: You need GnuPG to verify this message > > In comp.lang.awk thruv <thruv@yahoo.com>: > > > > > > > > Perhaps something in the lines of: > > awk 'BEGIN{FS=OFS="|"}{print $1,$2,$3,$4,$5+$6,$6,$7}' infile > outfile > > Unsure, about your desired output? > > -- > Michael Heiming (GPG-Key ID: 0xEDD27B94) > mail: echo zvpunry@urvzvat.qr | perl -pe 'y/a-z/n-za-m/' I think he wants to combine $5 and $6 in records that match on other fields. . something like: (untested) BEGIN{FS=OFS="|"} { s=$1"|"$2"|"$3"|"$4"|"$7; a[s]=a[s]+$5; b[s]=b[s]+$6 } END{ for (s in a) { split(s,c,"|"); print c[1]"|"c[2]"|"c[3]"|"c[4]"|"a[s]"|"b[s]"|"c[5] } } this is only the general idea and may be inefficient for a large file - sinc e your file is sorted on the first four fields then the logic can be broken do wn to only work on groups of records that are the same in the first four fields .. BEGIN{FS=OFS="|"} t==""{s=$1"|"$2"|"$3"|"$4} t==$1"|"$2"|"$3"|"$4 { a[$7]=a[$7]+$5; b[$7]=b[$7]+$6; next } { for (m in a) { split(t,c,"|"); print c[1]"|"c[2]"|"c[3]"|"c[4]"|"a[m]"|"b[m]"|"m } } -- pop is Mark Old age ain't no place for sissies. --
Post Follow-up to this messageIs there any way I can use AWK? Regards, Thiru.
Post Follow-up to this messageThanks Michael, The output is calculated by adding the fields of $5 from the line and the next line if the first four fields match. it is something like this: INPUT 1234|xxxx|yyyy|12|1|2|NEW 1234|xxxx|yyyy|12|3|0|NEW add $5 frm line 1(1) and $5 from line2(3) The OUTPUT is 1234|xxxx|yyyy|12|4|2|NEW Appreciate ur ideas. Regards, Thiru.
Post Follow-up to this messageHi, Thanks a lot for all of you. Finally I got the script working fine. Regards, Thiru.
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.