Home > Archive > AWK > October 2006 > merge two files in awk
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
merge two files in awk
|
|
| amrita.ray@gmail.com 2006-10-30, 7:01 pm |
| Hi,
I have two files (file 1 has one column and file 2 four columns), I
have choose the rows of file 2 where column 2 & 3 of file 2 matches
with column 1 of file 1. Anybody has any idea?
Thanks.
| |
| mainak.sen@gmail.com 2006-10-30, 7:01 pm |
| To add to this with an example,
file 1 :
1_8
1_9
1_10
1_11
file 2 :
1 1_500 1_600 0.000 1.0 0.0 0.0
1 1_500 1_500 0.000 0.0 0.0 1.0
1 1_9 1_100 0.000 0.50000 0.50000 0.00000
1 1_9 1_200 0.000 0.50000 0.50000 0.00000
1 1_9 1_400 0.000 1.0 0.0 0.0
.....
1 1_8 1_500 2.107 0.59766 0.40234 0.00000
1 1_8 1_9 2.107 0.89431 0.10569 0.00000
1 1_8 1_300 2.107 0.0 1.0 0.0
merge two files such that it will print
1 1_8 1_9 2.107 0.89431 0.10569 0.00000
i.e. the rows of file 2 where col.2 and col.3 matches with any two
entries in file 1
amrita.ray@gmail.com wrote:
> Hi,
> I have two files (file 1 has one column and file 2 four columns), I
> have choose the rows of file 2 where column 2 & 3 of file 2 matches
> with column 1 of file 1. Anybody has any idea?
> Thanks.
| |
| Vassilis 2006-10-30, 7:01 pm |
| Please don't top post. Corrected below
mainak.sen@gmail.com wrote:[color=darkred]
> Hi,
> I have two files (file 1 has one column and file 2 four columns), I
> have choose the rows of file 2 where column 2 & 3 of file 2 matches
> with column 1 of file 1. Anybody has any idea?
> Thanks.
awk 'NR == FNR { col[$0]++ }
$2 in col && $3 in col' file1 file2
| |
| William James 2006-10-30, 7:01 pm |
| You already got your answer in comp.unix.shell.
| |
| amrita.ray@gmail.com 2006-10-30, 7:01 pm |
| Yes, the answer:
awk 'NR==FNR {s[$1]} NR!=FNR && ($2 in s) && ($3 in s)' file1 file2
Thanks.
William James wrote:
> You already got your answer in comp.unix.shell.
| |
| Ed Morton 2006-10-30, 7:01 pm |
| amrita.ray@gmail.com wrote:
> Yes, the answer:
> awk 'NR==FNR {s[$1]} NR!=FNR && ($2 in s) && ($3 in s)' file1 file2
That's the wrong answer. Check the others you got.
Ed.
>
> William James wrote:
>
>
>
| |
| Janis Papanagnou 2006-10-30, 7:01 pm |
| Ed Morton wrote:
> amrita.ray@gmail.com wrote:
>
>
> That's the wrong answer. Check the others you got.
What's wrong with it? In c.u.s the OP said it works.
Janis
[color=darkred]
>
> Ed.
>
| |
| Ed Morton 2006-10-30, 7:01 pm |
| Janis Papanagnou wrote:
> Ed Morton wrote:
>
>
>
> What's wrong with it? In c.u.s the OP said it works.
It has 2 tests instead of one so it's less efficient and more
complicated than it has to be. The right answer is:
awk 'NR==FNR {s[$1]; next} ($2 in s) && ($3 in s)' file1 file2
Ed.
| |
| Janis Papanagnou 2006-10-30, 7:01 pm |
| Ed Morton wrote:
> Janis Papanagnou wrote:
>
> It has 2 tests instead of one so it's less efficient and more
> complicated than it has to be.
I wouldn't call that wrong, just different. Efficiency? - Maybe; I
think any difference is of little relevance here (may even depend
on how sophisticated the awk interpreter cares about optimization).
Nevermind.
But personally I think that breaking awk's natural parse sequence
by using 'next' is more "complicated" than guarding the conditions
a'la Dijkstra's if-guards.
But I wouldn't call any of the two proposed one liners complicated,
anyway, as I wouldn't call any of the two solutions "wrong".
Janis
> The right answer is:
>
> awk 'NR==FNR {s[$1]; next} ($2 in s) && ($3 in s)' file1 file2
>
> Ed.
| |
| Ed Morton 2006-10-30, 7:01 pm |
| Janis Papanagnou wrote:
> Ed Morton wrote:
>
>
>
> I wouldn't call that wrong, just different.
I would call it wrong because in addition to the above it's not
extensible. Let's say you want to do other things with the file2
records. Would you then do this:
awk '
NR==FNR {s[$1]}
NR!=FNR && ($2 in s) && ($3 in s) { print }
NR!=FNR && theSkyIsGrey { ... }
NR!=FNR && scotlandWinsWorldCup { ... }
NR!=FNR && endOfWorldArrives { ... }
' file1 file2
Rather than this:
awk '
NR==FNR {s[$1]; next}
($2 in s) && ($3 in s) { print }
theSkyIsGrey { ... }
scotlandWinsWorldCup { ... }
endOfWorldArrives { ... }
' file1 file2
Yes, the first version will work, but I'd be surprised if anyone
advocated doing it that way. Also, as a Scot, I suspect that second from
last condition will unfortunately never be true....
Regards,
Ed.
| |
| Vassilis 2006-10-30, 7:01 pm |
|
<OT>
Ed Morton wrote:
> awk '
> NR==FNR {s[$1]; next}
> ($2 in s) && ($3 in s) { print }
> theSkyIsGrey { ... }
> scotlandWinsWorldCup { ... }
> endOfWorldArrives { ... }
> ' file1 file2
>
> Yes, the first version will work, but I'd be surprised if anyone
> advocated doing it that way. Also, as a Scot, I suspect that second from
> last condition will unfortunately never be true....
>
> Regards,
>
> Ed.
Cheer up, mate. Greece has won Euro2004.
Impossible is nothing.
I hear Scotland has some team these days.
</OT>
| |
| Janis Papanagnou 2006-10-30, 7:01 pm |
| Ed Morton wrote:
> Janis Papanagnou wrote:
>
> I would call it wrong because in addition to the above it's not
> extensible. Let's say you want to do other things with the file2
> records. Would you then do this:
>
> awk '
> NR==FNR {s[$1]}
> NR!=FNR && ($2 in s) && ($3 in s) { print }
> NR!=FNR && theSkyIsGrey { ... }
> NR!=FNR && scotlandWinsWorldCup { ... }
> NR!=FNR && endOfWorldArrives { ... }
> ' file1 file2
>
> Rather than this:
>
> awk '
> NR==FNR {s[$1]; next}
> ($2 in s) && ($3 in s) { print }
> theSkyIsGrey { ... }
> scotlandWinsWorldCup { ... }
> endOfWorldArrives { ... }
> ' file1 file2
I would have done exactly the same as you _in this case_, using 'next'.
But entensibility is a multifold (and here an academic?) argument.
If you want to extend your program _in a different way_, say...
NR==FNR {s[$1]}
NR!=FNR && ($2 in s) && ($3 in s) { print }
otherConditionForAllFiles1 { ... }
otherConditionForAllFiles2 { ... }
otherConditionForAllFilesN { ... }
{ ...}
....where the action code in the otherConditionForAllFiles<i> depends on
status data set by any of the first two cases, say s[], 'next' would not
be helpful. (And this extension is just one other example (of many).)
A 'next' breaks native control flow. It's an optimization command, IMO,
as is a continue, break, or goto in other languages. And sometimes it
makes code even more readable/comprehensible/maintainable. Sometimes.
Sometimes not.
> Yes, the first version will work, but I'd be surprised if anyone
> advocated doing it that way.
Still advocating it, since the conditions are clearer.
(Though still saying, in a one-liner like these, the difference is of
little relevance.)
> Also, as a Scot, I suspect that second from
> last condition will unfortunately never be true....
There are many ways to reach a goal; in awk as well as in football/soccer.
:-)
Janis
> Regards,
>
> Ed.
|
|
|
|
|