For Programmers: Free Programming Magazines  


Home > Archive > AWK > November 2004 > better solutions?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author better solutions?
moggces

2004-11-16, 6:50 pm

Dear all
I intended to merge two files according to the third column in file2

file1: ( 27600 lines)
tm0 O9:AN3:3.15+O9:A134:OG1+A105:ND2:O7
tm1 O9:AN3:3. 14+O9:A134:OG1+A134:OG1:N3+O9:A132:O+A10
1:N:O7
tm2 O9:AN3:3.15+O9:A134:OG1
tm3 O9:A131:OG
tm4 O9:A131:OG+A131:N:O9+O9:A127:O

file2: (35 lines)
XBX_12291 32.10 21442
XBX_16460 56.51 22536
XBX_16460 56.0 22537
XBX_23526 53.25 23516
XBX_23526 54.49 23510

final:
XBX_23526 53.25 23516
XBX_12291 32.10 21442 O9:A131:OG
XBX_16460 56.51 22536 A131:N:O9
XBX_23526 54.49 23510 O9:A134:OG1+A105:ND2:O7

I have written one but it run very very slowly.

awk 'FILENAME=="file1" { name[++i]=substr($1,3); line[++x]=$2} {
num=$3; for ( r=1; r<=i; ++r ){ if ( num==name[r] ) print
$0,line[r]}}' file1 file2

Does anyone have better solution?
Thank you

Jui-Hua
A Ferenstein

2004-11-16, 6:50 pm

I think your example isn't correct, however, here's my take on what you're
trying to do:

FILENAME=="file2" {
a[$3]=$0
next
}
{ if (substr($1,3) in a) print a[substr($1,3)] $2 }


A Ferenstein

2004-11-16, 6:50 pm

I missed a " " (space) between a[substr($1,3)] and $2

"A Ferenstein" <epaalx@hotmail.com> wrote in message
news:cmvf2i$hc4$1@newstree.wise.edt.ericsson.se...
> I think your example isn't correct, however, here's my take on what you're
> trying to do:
>
> FILENAME=="file2" {
> a[$3]=$0
> next
> }
> { if (substr($1,3) in a) print a[substr($1,3)] $2 }
>
>



Ed Morton

2004-11-16, 6:50 pm



moggces wrote:
> Dear all
> I intended to merge two files according to the third column in file2
>
> file1: ( 27600 lines)
> tm0 O9:AN3:3.15+O9:A134:OG1+A105:ND2:O7
> tm1 O9:AN3:3. 14+O9:A134:OG1+A134:OG1:N3+O9:A132:O+A10
1:N:O7
> tm2 O9:AN3:3.15+O9:A134:OG1
> tm3 O9:A131:OG
> tm4 O9:A131:OG+A131:N:O9+O9:A127:O
>
> file2: (35 lines)
> XBX_12291 32.10 21442
> XBX_16460 56.51 22536
> XBX_16460 56.0 22537
> XBX_23526 53.25 23516
> XBX_23526 54.49 23510
>
> final:
> XBX_23526 53.25 23516
> XBX_12291 32.10 21442 O9:A131:OG
> XBX_16460 56.51 22536 A131:N:O9
> XBX_23526 54.49 23510 O9:A134:OG1+A105:ND2:O7


The above example makes no sense, but I assume the "tm1", etc. at the
start of file1 are supposed to match the numbers at the end of file2.

> I have written one but it run very very slowly.
>
> awk 'FILENAME=="file1" { name[++i]=substr($1,3); line[++x]=$2}


Quick fix: Put a "next" at the end of the above line. Right now you're
running the rest of the script on each of the 27600 lines in file1 when
you really only want to do it on the 35 lines in file2.

{
> num=$3; for ( r=1; r<=i; ++r ){ if ( num==name[r] ) print
> $0,line[r]}}' file1 file2
>
> Does anyone have better solution?
> Thank you
>
> Jui-Hua


Without a real example, it's hard to advise what else you could do to
improve your script, but it seems like, from a memory usage standpoint,
you'd be better storing file2 in the array then acting on file1 rather
than the other way around. You could also look at the UNIX "join"
command, but if you want to do it in awk, it should probably look more
like this:

awk 'NR==FNR{line[$3]=$0;next}
$1 in line {print line[$1], $2}' file2 file1

Again, the above is just a guess since the example doesn't make sense.

Regards,

Ed.
moggces

2004-11-16, 6:50 pm

juihuahsieh@nhri.org.tw (moggces) wrote in message news:<f4230c15.0411110112.159dc84a@posting.google.com>...
> Dear all
> I intended to merge two files according to the third column in file2
>
> file1: ( 27600 lines)
> tm0 O9:AN3:3.15+O9:A134:OG1+A105:ND2:O7
> tm1 O9:AN3:3. 14+O9:A134:OG1+A134:OG1:N3+O9:A132:O+A10
1:N:O7
> tm2 O9:AN3:3.15+O9:A134:OG1
> tm3 O9:A131:OG
> tm4 O9:A131:OG+A131:N:O9+O9:A127:O
>
> file2: (35 lines)
> XBX_12291 32.10 21442
> XBX_16460 56.51 22536
> XBX_16460 56.0 22537
> XBX_23526 53.25 23516
> XBX_23526 54.49 23510
>
> final:
> XBX_23526 53.25 23516
> XBX_12291 32.10 21442 O9:A131:OG
> XBX_16460 56.51 22536 A131:N:O9
> XBX_23526 54.49 23510 O9:A134:OG1+A105:ND2:O7
>
> I have written one but it run very very slowly.
>
> awk 'FILENAME=="file1" { name[++i]=substr($1,3); line[++x]=$2} {
> num=$3; for ( r=1; r<=i; ++r ){ if ( num==name[r] ) print
> $0,line[r]}}' file1 file2
>
> Does anyone have better solution?
> Thank you
>
> Jui-Hua



Sorry for the wrong final file. I didn't check well and I didn't point
out the point.

final:
XBX_12291 32.10 21442 O9:A134:OG1+A134:OG1:N3+O9:A132:O
XBX_16460 56.51 22536 O9:A132:O
XBX_16460 56.0 22537 O9:A131:OG+A131:N:O9+O9:A127:O
XBX_23526 53.25 23516 O9:A131:OG
^^^^^
XBX_23526 54.49 23510 O9:A134:OG1
^^^^^
I have tried the solution like A Ferenstein before I posted. However,
output will change to
XBX_12291 32.10 21442 O9:A134:OG1+A134:OG1:N3+O9:A132:O
XBX_16460 56.51 22536 O9:A132:O
XBX_16460 56.0 22537 O9:A131:OG+A131:N:O9+O9:A127:O
XBX_23526 54.49 23510 O9:A134:OG1
^^^^^
XBX_23526 53.25 23516 O9:A131:OG
^^^^^

And I look up "join" in UNIX command. It seems only could merge two
files with identical lines.


Thanks all.

Jui-Hua
Ed Morton

2004-11-16, 6:50 pm



moggces wrote:
> juihuahsieh@nhri.org.tw (moggces) wrote in message news:<f4230c15.0411110112.159dc84a@posting.google.com>...
>
<snip>[color=darkred]
> And I look up "join" in UNIX command. It seems only could merge two
> files with identical lines.


No, it can merge based on a specific field in each file. It must be me -
I just can't see what field is common between file1 and file2 in your
examples. You say it's the third column of file 2, but where does the
number "21442", for example, appear in file1????

Ed.
moggces

2004-11-16, 6:50 pm

juihuahsieh@nhri.org.tw (moggces) wrote in message news:<f4230c15.0411110112.159dc84a@posting.google.com>...
> Dear all
> I intended to merge two files according to the third column in file2
>
> file1: ( 27600 lines)
> tm0 O9:AN3:3.15+O9:A134:OG1+A105:ND2:O7
> tm1 O9:AN3:3. 14+O9:A134:OG1+A134:OG1:N3+O9:A132:O+A10
1:N:O7
> tm2 O9:AN3:3.15+O9:A134:OG1
> tm3 O9:A131:OG
> tm4 O9:A131:OG+A131:N:O9+O9:A127:O
>
> file2: (35 lines)
> XBX_12291 32.10 21442
> XBX_16460 56.51 22536
> XBX_16460 56.0 22537
> XBX_23526 53.25 23516
> XBX_23526 54.49 23510
>
> final:
> XBX_23526 53.25 23516
> XBX_12291 32.10 21442 O9:A131:OG
> XBX_16460 56.51 22536 A131:N:O9
> XBX_23526 54.49 23510 O9:A134:OG1+A105:ND2:O7
>
> I have written one but it run very very slowly.
>
> awk 'FILENAME=="file1" { name[++i]=substr($1,3); line[++x]=$2} {
> num=$3; for ( r=1; r<=i; ++r ){ if ( num==name[r] ) print
> $0,line[r]}}' file1 file2
>
> Does anyone have better solution?
> Thank you
>
> Jui-Hua



Sorry for the wrong final file. I didn't check well and I didn't point
out the point.

final:
XBX_12291 32.10 21442 O9:A134:OG1+A134:OG1:N3+O9:A132:O
XBX_16460 56.51 22536 O9:A132:O
XBX_16460 56.0 22537 O9:A131:OG+A131:N:O9+O9:A127:O
XBX_23526 53.25 23516 O9:A131:OG
^^^^^
XBX_23526 54.49 23510 O9:A134:OG1
^^^^^
I have tried the solution like A Ferenstein before I posted. However,
output will change to
XBX_12291 32.10 21442 O9:A134:OG1+A134:OG1:N3+O9:A132:O
XBX_16460 56.51 22536 O9:A132:O
XBX_16460 56.0 22537 O9:A131:OG+A131:N:O9+O9:A127:O
XBX_23526 54.49 23510 O9:A134:OG1
^^^^^
XBX_23526 53.25 23516 O9:A131:OG
^^^^^

And I look up "join" in UNIX command. It seems only could merge two
files with identical lines.


Thanks all.

Jui-Hua
Ed Morton

2004-11-16, 6:50 pm



moggces wrote:
> juihuahsieh@nhri.org.tw (moggces) wrote in message news:<f4230c15.0411110112.159dc84a@posting.google.com>...
>
<snip>[color=darkred]
> And I look up "join" in UNIX command. It seems only could merge two
> files with identical lines.


No, it can merge based on a specific field in each file. It must be me -
I just can't see what field is common between file1 and file2 in your
examples. You say it's the third column of file 2, but where does the
number "21442", for example, appear in file1????

Ed.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com