For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > March 2005 > FAQ 4.41 How can I remove duplicate elements from a list or array?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author FAQ 4.41 How can I remove duplicate elements from a list or array?
PerlFAQ Server

2005-03-29, 8:58 am

This message is one of several periodic postings to comp.lang.perl.misc
intended to make it easier for perl programmers to find answers to
common questions. The core of this message represents an excerpt
from the documentation provided with Perl.

--------------------------------------------------------------------

4.41: How can I remove duplicate elements from a list or array?

There are several possible ways, depending on whether the array is
ordered and whether you wish to preserve the ordering.

a) If @in is sorted, and you want @out to be sorted: (this assumes all
true values in the array)

$prev = "not equal to $in[0]";
@out = grep($_ ne $prev && ($prev = $_, 1), @in);

This is nice in that it doesn't use much extra memory, simulating
uniq(1)'s behavior of removing only adjacent duplicates. The ", 1"
guarantees that the expression is true (so that grep picks it up)
even if the $_ is 0, "", or undef.

b) If you don't know whether @in is sorted:

undef %saw;
@out = grep(!$saw{$_}++, @in);

c) Like (b), but @in contains only small integers:

@out = grep(!$saw[$_]++, @in);

d) A way to do (b) without any loops or greps:

undef %saw;
@saw{@in} = ();
@out = sort keys %saw; # remove sort if undesired

e) Like (d), but @in contains only small positive integers:

undef @ary;
@ary[@in] = @in;
@out = grep {defined} @ary;

But perhaps you should have been using a hash all along, eh?



--------------------------------------------------------------------

Documents such as this have been called "Answers to Frequently
Asked Questions" or FAQ for short. They represent an important
part of the Usenet tradition. They serve to reduce the volume of
redundant traffic on a news group by providing quality answers to
questions that keep coming up.

If you are some how irritated by seeing these postings you are free
to ignore them or add the sender to your killfile. If you find
errors or other problems with these postings please send corrections
or comments to the posting email address or to the maintainers as
directed in the perlfaq manual page.

Note that the FAQ text posted by this server may have been modified
from that distributed in the stable Perl release. It may have been
edited to reflect the additions, changes and corrections provided
by respondents, reviewers, and critics to previous postings of
these FAQ. Complete text of these FAQ are available on request.

The perlfaq manual page contains the following copyright notice.

AUTHOR AND COPYRIGHT

Copyright (c) 1997-2002 Tom Christiansen and Nathan
Torkington, and other contributors as noted. All rights
reserved.

This posting is provided in the hope that it will be useful but
does not represent a commitment or contract of any kind on the part
of the contributers, authors or their agents.
Ohm

2005-03-31, 3:58 am

Hi I have done few Perl scripting once and now out of touch. Appreciate
if any help here to understand this:

I have a sorted text file each line has name. But someentries got
duplicated. I want to remove the dulplicates using Perl.

This is my dup.txt file
Aaron
Aaron
Andy
Beth
Lisa
Sam
Sam
Simon
Tim
Zewie

This is what I have done sofar to find a resolution based on the FAQ.
But I need more help to get it working.

open (INPUT, "dup.txt")|| die "No dup.txt file\n" ; # Open dup.txt
file
open (OUTPUT, ">clean.txt"); # Open a Clean file for output.

chomp( my @in = <INPUT> );
$prev = "not equal to $in[0]";
@OUTPUT = grep($_ ne $prev && ($prev = $_, 1), @in);


foreach my $line (@OUTPUT) {
chomp;
print OUTPUT "$line \n";
}
close (INPUT); #close INPUT file
close (OUTPUT); #close OUTPUT file



Appreciate any help

Sam.


PerlFAQ Server wrote:
> This message is one of several periodic postings to

comp.lang.perl.misc
> intended to make it easier for perl programmers to find answers to
> common questions. The core of this message represents an excerpt
> from the documentation provided with Perl.
>
> --------------------------------------------------------------------
>
> 4.41: How can I remove duplicate elements from a list or array?
>
> There are several possible ways, depending on whether the array

is
> ordered and whether you wish to preserve the ordering.
>
> a) If @in is sorted, and you want @out to be sorted: (this

assumes all
> true values in the array)
>
> $prev = "not equal to $in[0]";
> @out = grep($_ ne $prev && ($prev = $_, 1), @in);
>
> This is nice in that it doesn't use much extra memory,

simulating
> uniq(1)'s behavior of removing only adjacent duplicates. The

", 1"
> guarantees that the expression is true (so that grep picks it

up)
> even if the $_ is 0, "", or undef.
>
> b) If you don't know whether @in is sorted:
>
> undef %saw;
> @out = grep(!$saw{$_}++, @in);
>
> c) Like (b), but @in contains only small integers:
>
> @out = grep(!$saw[$_]++, @in);
>
> d) A way to do (b) without any loops or greps:
>
> undef %saw;
> @saw{@in} = ();
> @out = sort keys %saw; # remove sort if undesired
>
> e) Like (d), but @in contains only small positive integers:
>
> undef @ary;
> @ary[@in] = @in;
> @out = grep {defined} @ary;
>
> But perhaps you should have been using a hash all along, eh?
>
>
>
> --------------------------------------------------------------------
>
> Documents such as this have been called "Answers to Frequently
> Asked Questions" or FAQ for short. They represent an important
> part of the Usenet tradition. They serve to reduce the volume of
> redundant traffic on a news group by providing quality answers to
> questions that keep coming up.
>
> If you are some how irritated by seeing these postings you are free
> to ignore them or add the sender to your killfile. If you find
> errors or other problems with these postings please send corrections
> or comments to the posting email address or to the maintainers as
> directed in the perlfaq manual page.
>
> Note that the FAQ text posted by this server may have been modified
> from that distributed in the stable Perl release. It may have been
> edited to reflect the additions, changes and corrections provided
> by respondents, reviewers, and critics to previous postings of
> these FAQ. Complete text of these FAQ are available on request.
>
> The perlfaq manual page contains the following copyright notice.
>
> AUTHOR AND COPYRIGHT
>
> Copyright (c) 1997-2002 Tom Christiansen and Nathan
> Torkington, and other contributors as noted. All rights
> reserved.
>
> This posting is provided in the hope that it will be useful but
> does not represent a commitment or contract of any kind on the part
> of the contributers, authors or their agents.


John Bokma

2005-03-31, 3:58 am

Ohm wrote:

> Hi I have done few Perl scripting once and now out of touch. Appreciate
> if any help here to understand this:
>
> I have a sorted text file each line has name. But someentries got
> duplicated. I want to remove the dulplicates using Perl.


read a line
check if it's in the hash, if it is, next
store it in the hash
write it out

Don't top post, the FAQ doesn't like that.

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com