For Programmers: Free Programming Magazines  


Home > Archive > AWK > October 2004 > Merging two lines from a file









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Merging two lines from a file
thruv

2004-10-05, 8:55 pm

Hi,

Thanks for ur time.....

I have a datafile which is like this

1234|xxxx|yyyy|12|1|1|NEW
1234|xxxx|yyyy|12|1|0|NEW

and I have to create a file with the line that matches $1, $2, $3, $4 & $7
merged to create a line like this

1234|xxxx|yyyy|12|2|1|NEW

$5 & $6 are aggregated.

The datafile size is abt 500 MB, so the regular IF stmt's takes quite a
long time.

The file will have the records sorted by the first four fields.

Is there anyway I can do this file processing?

Regards,
Thiru.



Michael Heiming

2004-10-07, 3:55 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

In comp.lang.awk thruv <thruv@yahoo.com>:
> Hi,


> Thanks for ur time.....


> I have a datafile which is like this


> 1234|xxxx|yyyy|12|1|1|NEW
> 1234|xxxx|yyyy|12|1|0|NEW


> and I have to create a file with the line that matches $1, $2, $3, $4 & $7
> merged to create a line like this


> 1234|xxxx|yyyy|12|2|1|NEW


> $5 & $6 are aggregated.


Perhaps something in the lines of:

awk 'BEGIN{FS=OFS="|"}{print $1,$2,$3,$4,$5+$6,$6,$7}' infile > outfile

Unsure, about your desired output?

--
Michael Heiming (GPG-Key ID: 0xEDD27B94)
mail: echo zvpunry@urvzvat.qr | perl -pe 'y/a-z/n-za-m/'
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBY/2DAkPEju3Se5QRAuO/AJ40lJXdujRjzhNOtTcOjsZS6D9rQQCfZXPT
IbzhBZnOK9Rqv+cxo77mE24=
=DNIs
-----END PGP SIGNATURE-----
Ed Morton

2004-10-07, 3:55 am



thruv wrote:

> Hi,
>
> Thanks for ur time.....
>
> I have a datafile which is like this
>
> 1234|xxxx|yyyy|12|1|1|NEW
> 1234|xxxx|yyyy|12|1|0|NEW
>
> and I have to create a file with the line that matches $1, $2, $3, $4 & $7
> merged to create a line like this
>
> 1234|xxxx|yyyy|12|2|1|NEW
>
> $5 & $6 are aggregated.
>
> The datafile size is abt 500 MB, so the regular IF stmt's takes quite a
> long time.
>
> The file will have the records sorted by the first four fields.
>
> Is there anyway I can do this file processing?


Something like this (untested):

awk '{cur = $1 $2 $3 $4 $7
if (cur == prev) {
$5 += p5; $6 +=p6
} else {
print
}
prev = cur; p5 = $5; p6 = $6}' file

Regards,

Ed.
thruv

2004-10-07, 3:55 am

Thanks Michael,

The output is calculated by adding the fields of $5 from the line and the
next line if the first four fields match.

it is something like this:
INPUT
1234|xxxx|yyyy|12|1|2|NEW
1234|xxxx|yyyy|12|3|0|NEW

add $5 frm line 1(1) and $5 from line2(3)
The OUTPUT is
1234|xxxx|yyyy|12|4|2|NEW

Appreciate ur ideas.

Regards,
Thiru.

pop

2004-10-07, 3:55 am

"Michael Heiming" <michael+USENET@www.heiming.de> wrote in message
news:45bd32-409.ln1@news.heiming.de...
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> NotDashEscaped: You need GnuPG to verify this message
>
> In comp.lang.awk thruv <thruv@yahoo.com>:
>
>
>
>
>
>
>
> Perhaps something in the lines of:
>
> awk 'BEGIN{FS=OFS="|"}{print $1,$2,$3,$4,$5+$6,$6,$7}' infile > outfile
>
> Unsure, about your desired output?
>
> --
> Michael Heiming (GPG-Key ID: 0xEDD27B94)
> mail: echo zvpunry@urvzvat.qr | perl -pe 'y/a-z/n-za-m/'

I think he wants to combine $5 and $6 in records that match on other fields...
something like: (untested)

BEGIN{FS=OFS="|"}
{ s=$1"|"$2"|"$3"|"$4"|"$7; a[s]=a[s]+$5; b[s]=b[s]+$6 }
END{ for (s in a) { split(s,c,"|"); print
c[1]"|"c[2]"|"c[3]"|"c[4]"|"a[s]"|"b[s]"|"c[5] } }

this is only the general idea and may be inefficient for a large file - since
your file is sorted on the first four fields then the logic can be broken down
to only work on groups of records that are the same in the first four fields...

BEGIN{FS=OFS="|"}
t==""{s=$1"|"$2"|"$3"|"$4}
t==$1"|"$2"|"$3"|"$4 { a[$7]=a[$7]+$5; b[$7]=b[$7]+$6; next }
{ for (m in a) { split(t,c,"|"); print
c[1]"|"c[2]"|"c[3]"|"c[4]"|"a[m]"|"b[m]"|"m } }
--
pop is Mark
Old age ain't no place for sissies.
--


thruv

2004-10-09, 3:55 am

Is there any way I can use AWK?

Regards,
Thiru.

thruv

2004-10-09, 8:55 am

Thanks Michael,

The output is calculated by adding the fields of $5 from the line and the
next line if the first four fields match.

it is something like this:
INPUT
1234|xxxx|yyyy|12|1|2|NEW
1234|xxxx|yyyy|12|3|0|NEW

add $5 frm line 1(1) and $5 from line2(3)
The OUTPUT is
1234|xxxx|yyyy|12|4|2|NEW

Appreciate ur ideas.

Regards,
Thiru.

thruv

2004-10-13, 3:55 pm

Hi,

Thanks a lot for all of you.

Finally I got the script working fine.

Regards,
Thiru.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com