Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Removing duplicates and excluding lines
Hi,

I am using the following awk command to produce a list of people's
names, removing duplicates, and excluding some names from the resulting
list.

The names are in names.txt, and those names to be excluded are taken
from exclude.txt.

awk "NR==FNR{f[$0]=1;next}!f[$0]++" exclude.txt names.txt

So, for example, with the following names.txt file:

John Smith
Dave Green
Steve West
Steve Smith
James Brown
Tom Smith
Tom Smith

and with an exclude.txt such as:

Dave Green

then the above awk command gives:

John Smith
Steve West
Steve Smith
James Brown
Tom Smith

What I would like to do also is to include only those names ending in
"Smith" in the result.  So ideally, I would like the result:

John Smith
Steve Smith
Tom Smith

But when I use:

awk /Smith/ "NR==FNR{f[$0]=1;next}!f[$0]++" exclude.txt names.txt

the duplicates are no longer removed (in this case "Tom Smith").  That
is, I get the (unwanted) result:

John Smith
Steve Smith
Tom Smith
Tom Smith

Please can you help.

Regards,
Jonny

Report this thread to moderator Post Follow-up to this message
Old Post
Jonny
04-19-05 01:55 PM


Re: Removing duplicates and excluding lines
Jonny wrote:
> Hi,
>
> I am using the following awk command to produce a list of people's
> names, removing duplicates, and excluding some names from the resulting
> list.
>
> The names are in names.txt, and those names to be excluded are taken
> from exclude.txt.
>
> awk "NR==FNR{f[$0]=1;next}!f[$0]++" exclude.txt names.txt
>
> So, for example, with the following names.txt file:
>
> John Smith
> Dave Green
> Steve West
> Steve Smith
> James Brown
> Tom Smith
> Tom Smith
>
> and with an exclude.txt such as:
>
> Dave Green
>
> then the above awk command gives:
>
> John Smith
> Steve West
> Steve Smith
> James Brown
> Tom Smith
>
> What I would like to do also is to include only those names ending in
> "Smith" in the result.  So ideally, I would like the result:

Including Smith only, automatically excludes everybody else.
You don't need an exclude file.

awk '/Smith/ && !a[$0]++' names.txt

>
> John Smith
> Steve Smith
> Tom Smith
>
> But when I use:
>
> awk /Smith/ "NR==FNR{f[$0]=1;next}!f[$0]++" exclude.txt names.txt
>
> the duplicates are no longer removed (in this case "Tom Smith").  That
> is, I get the (unwanted) result:
>
> John Smith
> Steve Smith
> Tom Smith
> Tom Smith
>

--
Regards,

---Robert

Report this thread to moderator Post Follow-up to this message
Old Post
Robert Katz
04-19-05 08:55 PM


Re: Removing duplicates and excluding lines
Le Tue, 19 Apr 2005 09:25:27 +0000, Jonny a écrit_:

> Hi,
>
> I am using the following awk command to produce a list of people's
> names, removing duplicates, and excluding some names from the resulting
> list.
>
> The names are in names.txt, and those names to be excluded are taken
> from exclude.txt.
>
> awk "NR==FNR{f[$0]=1;next}!f[$0]++" exclude.txt names.txt
>
> So, for example, with the following names.txt file:
>
> John Smith
> Dave Green
> Steve West
> Steve Smith
> James Brown
> Tom Smith
> Tom Smith
>
> and with an exclude.txt such as:
>
> Dave Green
>
> then the above awk command gives:
>
> John Smith
> Steve West
> Steve Smith
> James Brown
> Tom Smith
>
> What I would like to do also is to include only those names ending in
> "Smith" in the result.  So ideally, I would like the result:
>
> John Smith
> Steve Smith
> Tom Smith
>
> But when I use:
>
> awk /Smith/ "NR==FNR{f[$0]=1;next}!f[$0]++" exclude.txt names.txt
>
> the duplicates are no longer removed (in this case "Tom Smith").  That
> is, I get the (unwanted) result:
>
> John Smith
> Steve Smith
> Tom Smith
> Tom Smith

Something's odd there, what do you do of poor "Steve West" ?
Definitely not a Smith and not in your exlude.txt ???

A typo of you or something I didn't read correctly ???


Report this thread to moderator Post Follow-up to this message
Old Post
Loki Harfagr
04-19-05 08:55 PM


Re: Removing duplicates and excluding lines
Robert Katz wrote:

> Jonny wrote: 
>
> Including Smith only, automatically excludes everybody else.
> You don't need an exclude file.
>
>     awk '/Smith/ && !a[$0]++' names.txt
> 


Thanks for your reply, Robert.

What if I want to exclude Steve Smith and many other Smith's (too many
to list on the command line)?

Regards,
Jonny


Report this thread to moderator Post Follow-up to this message
Old Post
Jonny
04-19-05 08:55 PM


Re: Removing duplicates and excluding lines
Robert Katz wrote:

> Jonny wrote: 
>
> Including Smith only, automatically excludes everybody else.
> You don't need an exclude file.
>
>     awk '/Smith/ && !a[$0]++' names.txt
> 


Thanks for your reply, Robert.

What if I want to exclude Steve Smith and many other Smith's (too many
to list on the command line)?

Regards,
Jonny


Report this thread to moderator Post Follow-up to this message
Old Post
Jonny
04-19-05 08:55 PM


Re: Removing duplicates and excluding lines
Le Tue, 19 Apr 2005 15:38:16 +0200, Loki Harfagr a écrit_:


> A typo of you or something I didn't read correctly ???

Doh, forget it :D)
I read it again ;-)

Report this thread to moderator Post Follow-up to this message
Old Post
Loki Harfagr
04-19-05 08:55 PM


Re: Removing duplicates and excluding lines

Ed Morton wrote:
<snip>
> awk "NR==FNR{f[$0]=1;next}($2=="Smith")&&!f[$0]++" exclude.txt names.txt

Of course, you'll have to do whatever magic your system requires to
handle double quotes around strings ("Smith") since you're using double
quotes around your script.

Ed.

Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
04-19-05 08:55 PM


Re: Removing duplicates and excluding lines
Ed Morton wrote:
>
>
> Jonny wrote:
> 
>
> <snip>
> 
>
>
>
> You do if there are some Smiths that you want excluded.

You got me there, Ed.

> 
>
>
> /Smith/ would find people whos first names are "Smith" and people who's
> first and last names start with "Smith", e.g. "Smithson". You need:
>
>     $2 == "Smith"
>

We've then got to make it more robust and account for middle names
or initials.  How about

$NF == "Smith"
 
>
>
> I don't know if the above syntax is valid in your environment. In mine
> (Unix), I'd have the /Smith/ inside the quotes along with the rest of
> the program.
> 
>
>
> Looks like it's taking the /Smith/ as your entire program and just
> finding all the Smiths. It's probably your quoting.
>
> Just do this:
>
> awk "NR==FNR{f[$0]=1;next}($2=="Smith")&&!f[$0]++" exclude.txt names.txt
>
>     Ed.

Report this thread to moderator Post Follow-up to this message
Old Post
Robert Katz
04-19-05 08:55 PM


Re: Removing duplicates and excluding lines
Ed Morton wrote:

> Ed Morton wrote:
> <snip> 
>
> Of course, you'll have to do whatever magic your system requires to
> handle double quotes around strings ("Smith") since you're using double
> quotes around your script.

Thanks Ed.

I'm using Windows 2000, so I just had to put a backslash before each of
the double quotes around Smith.

Regards,
Jonny


Report this thread to moderator Post Follow-up to this message
Old Post
Jonny
04-19-05 08:55 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

AWK archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 07:14 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.