Home > Archive > AWK > December 2005 > Lines common/unique to 2 files
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Lines common/unique to 2 files
|
|
|
| Hi, can somone give me a way of listing common lines from two files? In
my case, both files are single-column ones. Also, how do I extract
lines unique to each?
Thanks,
Sashi
| |
| Janis Papanagnou 2005-12-10, 6:56 pm |
| Sashi wrote:
> Hi, can somone give me a way of listing common lines from two files? In
> my case, both files are single-column ones. Also, how do I extract
> lines unique to each?
> Thanks,
> Sashi
NR==FNR {f1[$0]=$0}
NR!=FNR {f2[$0]=$0}
END {
print "Common:"
for(i in f1) if(i in f2) print f1[i]
print "Only in f1:"
for(i in f1) if(!(i in f2)) print f1[i]
print "Only in f2:"
for(i in f2) if(!(i in f1)) print f2[i]
}
Call the awk program with the two files as arguments.
Janis
| |
| Janis Papanagnou 2005-12-10, 6:56 pm |
| Janis Papanagnou wrote:
> Sashi wrote:
>
>
>
> NR==FNR {f1[$0]=$0}
> NR!=FNR {f2[$0]=$0}
> END {
> print "Common:"
> for(i in f1) if(i in f2) print f1[i]
> print "Only in f1:"
> for(i in f1) if(!(i in f2)) print f1[i]
> print "Only in f2:"
> for(i in f2) if(!(i in f1)) print f2[i]
> }
>
> Call the awk program with the two files as arguments.
Or simpler...
NR==FNR {f1[$0]}
NR!=FNR {f2[$0]}
END {
print "Common:"
for(i in f1) if(i in f2) print i
print "Only in f1:"
for(i in f1) if(!(i in f2)) print i
print "Only in f2:"
for(i in f2) if(!(i in f1)) print i
}
Janis
| |
| Chris F.A. Johnson 2005-12-10, 6:56 pm |
| On 2005-12-10, Sashi wrote:
> Hi, can somone give me a way of listing common lines from two files? In
> my case, both files are single-column ones. Also, how do I extract
> lines unique to each?
Use comm; read the man page for details:
man comm
--
Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
Shell Scripting Recipes: | My code in this post, if any,
A Problem-Solution Approach | is released under the
2005, Apress | GNU General Public Licence
| |
| Loki Harfagr 2005-12-10, 6:56 pm |
| Le Sat, 10 Dec 2005 16:59:12 +0100, Janis Papanagnou a écrit_:
> Janis Papanagnou wrote:
>
> Or simpler...
>
> NR==FNR {f1[$0]}
> NR!=FNR {f2[$0]}
> END {
> print "Common:"
> for(i in f1) if(i in f2) print i
> print "Only in f1:"
> for(i in f1) if(!(i in f2)) print i
> print "Only in f2:"
> for(i in f2) if(!(i in f1)) print i
> }
>
>
> Janis
A nice pair.
Here a small variant. I think it'd be lighter on huge file as it
doesn't re-scan the arrays ( in 'if(i in f2)' ), though I guess this
kind of script is not targetting huge files ;D)
Anyway, I post it, it may give better ideas to someone :-)
==>>
NR==FNR {f[$0]=f[$0]1}
NR!=FNR {f[$0]=f[$0]2}
END {
print "Common:"
for(i in f) if(match(f[i],/1/) && match(f[i],/2/) ) print i
print "Only in f1:"
for(i in f) if(!match(f[i],/2/) ) print i
print "Only in f2:"
for(i in f) if(!match(f[i],/1/) ) print i
}
<<==
As a mater of fact, what I really lack in [G]awk is a running counter of
which ARGV are we on, I know it can be twicked but ...
| |
|
|
Chris F.A. Johnson wrote:
> On 2005-12-10, Sashi wrote:
>
> Use comm; read the man page for details:
Et tu, Chris? I never thought I'd see the day when yo'ud post a non-awk
solutioin in the awk group. ;)
Anyway, knew I was missing something. I tried join but it works only on
equi-sized files. My files have different number of lines.
Yes, comm is the one I need to use.
Thanks,
Sashi
>
> man comm
>
> --
> Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
> Shell Scripting Recipes: | My code in this post, if any,
> A Problem-Solution Approach | is released under the
> 2005, Apress | GNU General Public Licence
| |
|
|
Sashi wrote:[color=darkred]
> Chris F.A. Johnson wrote:
> Et tu, Chris? I never thought I'd see the day when yo'ud post a non-awk
> solutioin in the awk group. ;)
> Anyway, knew I was missing something. I tried join but it works only on
> equi-sized files. My files have different number of lines.
> Yes, comm is the one I need to use.
> Thanks,
> Sashi
And I'm sorry about the top-post!
Sashi
| |
| Ed Morton 2005-12-10, 6:56 pm |
| Loki Harfagr wrote:
<snip>
> As a mater of fact, what I really lack in [G]awk is a running counter of
> which ARGV are we on
That's ARGIND in gawk..
Ed.
| |
| Ed Morton 2005-12-10, 6:56 pm |
| Sashi wrote:
[color=darkred]
> Sashi wrote:
>
<snip>
Y'know several people apparently just wasted their time posting awk
solutions for you, presumably under the assumption you needed one. After
all, if any UNIX solution will do, why would you post to comp.lang.awk
rather than comp.unix.shell (I am curious about the answer to that if
you could be bothered - might help some of us understand the mindset in
future)? If Chris hadn't replied, you probably wouldn't have been told
about "comm" even though many of the regulars here also frequent
comp.unix.shell. Just something to consider next time you have a
question that doesn't NEED an awk solution....
Ed.
| |
| Kenny McCormack 2005-12-10, 6:56 pm |
| In article <RrCdndYdnoJL-wbeRVn-pA@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
....
>Y'know several people apparently just wasted their time posting awk
>solutions for you, presumably under the assumption you needed one. After
>all, if any UNIX solution will do, why would you post to comp.lang.awk
>rather than comp.unix.shell (I am curious about the answer to that if
>you could be bothered - might help some of us understand the mindset in
>future)? If Chris hadn't replied, you probably wouldn't have been told
>about "comm" even though many of the regulars here also frequent
>comp.unix.shell. Just something to consider next time you have a
>question that doesn't NEED an awk solution....
Very good post. Agreed on all counts.
| |
| Loki Harfagr 2005-12-11, 7:55 am |
| Le Sat, 10 Dec 2005 17:23:32 -0600, Ed Morton a écrit_:
> Loki Harfagr wrote:
> <snip>
>
> That's ARGIND in gawk..
>
> Ed.
Thanks Ed. I may correct my quote then :
[color=darkred]
As a matter of fact what I really lack in gawk is to read
the fine manual again and remember what I read :D)
Silly me, and I knew it when I was young... Slags... Ashes...
| |
| Loki Harfagr 2005-12-11, 7:56 am |
| Le Sat, 10 Dec 2005 20:53:34 +0100, Loki Harfagr a écrit_:
> Le Sat, 10 Dec 2005 16:59:12 +0100, Janis Papanagnou a écrit_:
>
>
> A nice pair.
>
> Here a small variant. I think it'd be lighter on huge file as it
> doesn't re-scan the arrays ( in 'if(i in f2)' ), though I guess this
> kind of script is not targetting huge files ;D)
> Anyway, I post it, it may give better ideas to someone :-)
>
> ==>>
> NR==FNR {f[$0]=f[$0]1}
> NR!=FNR {f[$0]=f[$0]2}
> END {
> print "Common:"
> for(i in f) if(match(f[i],/1/) && match(f[i],/2/) ) print i
> print "Only in f1:"
> for(i in f) if(!match(f[i],/2/) ) print i
> print "Only in f2:"
> for(i in f) if(!match(f[i],/1/) ) print i
> }
> <<==
>
> As a mater of fact, what I really lack in [G]awk is a running counter of
> which ARGV are we on, I know it can be twicked but ...
And with the help of Ed. I ca now do it nicely :
{f[$0]=f[$0]ARGIND}
END {
print "Common:"
for(i in f) if(match(f[i],/1/) && match(f[i],/2/) ) print i
print "Only in f1:"
for(i in f) if(!match(f[i],/2/) ) print i
print "Only in f2:"
for(i in f) if(!match(f[i],/1/) ) print i
}
| |
|
| > Y'know several people apparently just wasted their time posting awk
> solutions for you, presumably under the assumption you needed one. After
> all, if any UNIX solution will do, why would you post to comp.lang.awk
> rather than comp.unix.shell (I am curious about the answer to that if
> you could be bothered - might help some of us understand the mindset in
> future)? If Chris hadn't replied, you probably wouldn't have been told
> about "comm" even though many of the regulars here also frequent
> comp.unix.shell. Just something to consider next time you have a
> question that doesn't NEED an awk solution....
>
> Ed.
When I posted this question, I thought awk would be a good tool to find
a solution, though any UNIX solution would've been good enough. I had
this feeling that there exists a tool for my purpose. My experiments
with join proved that it was insufficient for my task at hand. I had
forgotten about comm. Hence I posted here. Also I'm an awk fan, hence
this was my first choice.
Haven't you (or anyone else) ever had a situation wherein you wrote
your own utility for a certain task and later learnt that there already
exists a tool to do what your utility does? If you haven't, I laud your
omniscience. I wanted to write my utility in awk but Chris's reminder
about comm gave me an "Of course!" feeling and used it instead.
As for people wasting their time, no one forced them to. I think they
do enjoy doing what they do else they wouldn't be doing it. And I
always try to make sure that I do thank people for their help. I do try
to give help myself as well but I've been only using awk recently and
infrequently that I don't get much of a chance to help others,
especially since there are so many of you (Yourself, Chris, Loki et al)
who're much more experinced and give better solutions.
Also I like to see your solutions so that I can myself learn.
It's no bother and I hope I've satisfied your curiosity and helped you
understand my mindset.
I have high regard for you folks who hang out here and are good at what
you do. Please don't post in a which way that might appear curt or
condescending.
If you're helping someone, make sure that they don't feel it.
Regards,
Sashi
| |
| Ed Morton 2005-12-12, 6:56 pm |
|
Sashi wrote:
<snip>[color=darkred]
> I have high regard for you folks who hang out here and are good at what
> you do. Please don't post in a which way that might appear curt or
> condescending.
Sorry if it appeared that way, I didn't intend it to be. Thanks for
providing some insight into why you posted here originally.
Ed.
| |
| Janis Papanagnou 2005-12-12, 6:56 pm |
| Sashi wrote:
>
>
> [...]
> As for people wasting their time, no one forced them to. I think they
> do enjoy doing what they do else they wouldn't be doing it.
No. I think most people enjoy helping and don't mind even some effort
if there's some meaning behind the question or task to be done.
I enjoy helping, but I don't enjoy doing _unnecessary_ work; in this
case we apparently wasted our time. I am especially grateful for Ed's
comment that said everything in this respect.
> And I always try to make sure that I do thank people for their help.
I haven't seen that you've done so in this case. You thanked Chris
for the pointer to the standard Unix solution, the reference to 'comm';
but not for the (somewhat) larger effort of the requested awk solution,
and you have not addressed the people who wasted time developing that
for you unnecessarily.
Don't bother to catch up now...
| |
|
|
Janis Papanagnou wrote:
> Sashi wrote:
>
> No. I think most people enjoy helping and don't mind even some effort
> if there's some meaning behind the question or task to be done.
>
> I enjoy helping, but I don't enjoy doing _unnecessary_ work; in this
> case we apparently wasted our time. I am especially grateful for Ed's
> comment that said everything in this respect.
>
>
> I haven't seen that you've done so in this case. You thanked Chris
> for the pointer to the standard Unix solution, the reference to 'comm';
> but not for the (somewhat) larger effort of the requested awk solution,
> and you have not addressed the people who wasted time developing that
> for you unnecessarily.
>
> Don't bother to catch up now...
Yes, I guess it's a little late for that now but not too late for
apologies.
When I found what I wanted, I proceeded with my task and forgot to
thank those who did put in some effort to help me out with my problem.
I'll try not to repeat that.
And I can definitely understand anyone who says he/she doesn't enjoy
_unnecessary_ work.
I can also understand why anyone might feel irritated when they put in
some effort to come up with a solution that's ignored. You'll feel that
your effort was wasted and what makes it worse that you actually
enjoyed creating something that was later discarded and not
appreciated.
And from Ed's original quote which drew a response out of me, I can see
that it very plainly says "We're here because we like and enjoy what we
do and we'll be happy to help you out. But do make sure that you
respect our time and effort and don't post flippant questions which we
take seriously and put in effort on your behalf and which you don't
much care about".
That has never been my intention. As I said before, I do hang out here
regularly because I see this is a fine place to watch the pros and
learn.
Thanks for your help, all, and I do appreciate your effort and look at
this as a valuable learning resource.
Regards,
Sashi
| |
|
| > Sorry if it appeared that way, I didn't intend it to be. Thanks for
> providing some insight into why you posted here originally.
>
> Ed.
Ed, fuhgeddaboudit! No apologies needed.
Because I'm less experienced than you and am conscious about it, I
might have been sensitive if I perceived your message as being a little
too tart.
Regards,
Sashi
| |
| aaronb 2005-12-29, 8:54 am |
| Thanks Loki and Janis,
I found this helpful. For my use I didn't want to sort the lines, just find the lines (in actual order) that were in file1
and not file2.
- Aaron
{f[$0]=f[$0]ARGIND}
END {
for(i in f) if(!match(f[i],/2/) ) print i
} |
|
|
|
|