Home > Archive > AWK > October 2004 > Merging two lines from a file
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Merging two lines from a file
|
|
|
| Hi,
Thanks for ur time.....
I have a datafile which is like this
1234|xxxx|yyyy|12|1|1|NEW
1234|xxxx|yyyy|12|1|0|NEW
and I have to create a file with the line that matches $1, $2, $3, $4 & $7
merged to create a line like this
1234|xxxx|yyyy|12|2|1|NEW
$5 & $6 are aggregated.
The datafile size is abt 500 MB, so the regular IF stmt's takes quite a
long time.
The file will have the records sorted by the first four fields.
Is there anyway I can do this file processing?
Regards,
Thiru.
| |
| Michael Heiming 2004-10-07, 3:55 am |
| -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message
In comp.lang.awk thruv <thruv@yahoo.com>:
> Hi,
> Thanks for ur time.....
> I have a datafile which is like this
> 1234|xxxx|yyyy|12|1|1|NEW
> 1234|xxxx|yyyy|12|1|0|NEW
> and I have to create a file with the line that matches $1, $2, $3, $4 & $7
> merged to create a line like this
> 1234|xxxx|yyyy|12|2|1|NEW
> $5 & $6 are aggregated.
Perhaps something in the lines of:
awk 'BEGIN{FS=OFS="|"}{print $1,$2,$3,$4,$5+$6,$6,$7}' infile > outfile
Unsure, about your desired output?
--
Michael Heiming (GPG-Key ID: 0xEDD27B94)
mail: echo zvpunry@urvzvat.qr | perl -pe 'y/a-z/n-za-m/'
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFBY/2DAkPEju3Se5QRAuO/AJ40lJXdujRjzhNOtTcOjsZS6D9rQQCfZXPT
IbzhBZnOK9Rqv+cxo77mE24=
=DNIs
-----END PGP SIGNATURE-----
| |
| Ed Morton 2004-10-07, 3:55 am |
|
thruv wrote:
> Hi,
>
> Thanks for ur time.....
>
> I have a datafile which is like this
>
> 1234|xxxx|yyyy|12|1|1|NEW
> 1234|xxxx|yyyy|12|1|0|NEW
>
> and I have to create a file with the line that matches $1, $2, $3, $4 & $7
> merged to create a line like this
>
> 1234|xxxx|yyyy|12|2|1|NEW
>
> $5 & $6 are aggregated.
>
> The datafile size is abt 500 MB, so the regular IF stmt's takes quite a
> long time.
>
> The file will have the records sorted by the first four fields.
>
> Is there anyway I can do this file processing?
Something like this (untested):
awk '{cur = $1 $2 $3 $4 $7
if (cur == prev) {
$5 += p5; $6 +=p6
} else {
print
}
prev = cur; p5 = $5; p6 = $6}' file
Regards,
Ed.
| |
|
| Thanks Michael,
The output is calculated by adding the fields of $5 from the line and the
next line if the first four fields match.
it is something like this:
INPUT
1234|xxxx|yyyy|12|1|2|NEW
1234|xxxx|yyyy|12|3|0|NEW
add $5 frm line 1(1) and $5 from line2(3)
The OUTPUT is
1234|xxxx|yyyy|12|4|2|NEW
Appreciate ur ideas.
Regards,
Thiru.
| |
|
| "Michael Heiming" <michael+USENET@www.heiming.de> wrote in message
news:45bd32-409.ln1@news.heiming.de...
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> NotDashEscaped: You need GnuPG to verify this message
>
> In comp.lang.awk thruv <thruv@yahoo.com>:
>
>
>
>
>
>
>
> Perhaps something in the lines of:
>
> awk 'BEGIN{FS=OFS="|"}{print $1,$2,$3,$4,$5+$6,$6,$7}' infile > outfile
>
> Unsure, about your desired output?
>
> --
> Michael Heiming (GPG-Key ID: 0xEDD27B94)
> mail: echo zvpunry@urvzvat.qr | perl -pe 'y/a-z/n-za-m/'
I think he wants to combine $5 and $6 in records that match on other fields...
something like: (untested)
BEGIN{FS=OFS="|"}
{ s=$1"|"$2"|"$3"|"$4"|"$7; a[s]=a[s]+$5; b[s]=b[s]+$6 }
END{ for (s in a) { split(s,c,"|"); print
c[1]"|"c[2]"|"c[3]"|"c[4]"|"a[s]"|"b[s]"|"c[5] } }
this is only the general idea and may be inefficient for a large file - since
your file is sorted on the first four fields then the logic can be broken down
to only work on groups of records that are the same in the first four fields...
BEGIN{FS=OFS="|"}
t==""{s=$1"|"$2"|"$3"|"$4}
t==$1"|"$2"|"$3"|"$4 { a[$7]=a[$7]+$5; b[$7]=b[$7]+$6; next }
{ for (m in a) { split(t,c,"|"); print
c[1]"|"c[2]"|"c[3]"|"c[4]"|"a[m]"|"b[m]"|"m } }
--
pop is Mark
Old age ain't no place for sissies.
--
| |
|
| Is there any way I can use AWK?
Regards,
Thiru.
| |
|
| Thanks Michael,
The output is calculated by adding the fields of $5 from the line and the
next line if the first four fields match.
it is something like this:
INPUT
1234|xxxx|yyyy|12|1|2|NEW
1234|xxxx|yyyy|12|3|0|NEW
add $5 frm line 1(1) and $5 from line2(3)
The OUTPUT is
1234|xxxx|yyyy|12|4|2|NEW
Appreciate ur ideas.
Regards,
Thiru.
| |
|
| Hi,
Thanks a lot for all of you.
Finally I got the script working fine.
Regards,
Thiru.
|
|
|
|
|