For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > July 2005 > Grep uniqueness issue









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Grep uniqueness issue
Jason Normandin

2005-07-29, 5:01 pm

Hey Guys

I am having an odd problem using grep to ensure an array only contains distinct entries.

I have a list similar to the following in a file (short example of a much longer list)

support01-FastEthernet1/0
support01-RH
jnormandin-p370-1691-SH-Cpu-2
jnormandin-p370-1691-SH

These entries may or may not appear multiple times within that list.

I am trying to create an ARRAY containing each of these entries only once (a distinct list). I am using a grep statement:

push @pingErrorsName, $element if (! grep (/\b$element\b/, @pingErrorsName));

* Where $element contains one of the above names in the list and @pingErrorsName is the distinct list of elements.

What I am finding is that the array will contain all of the correct entries except: jnormandin-p370-1691-SH. It appears as though the grep is matching jnormandin-p370-1691-SH to the
jnormandin-p370-1691-SH-Cpu-2 string (as it is a substring of the second one).

Now I am using word boudnary anchors (\b) in the grep so I am as to why this is not working.

Does anyone have any ideas as to why this is occuring and how I can prevent it?



Eric Walker

2005-07-29, 5:01 pm

can you use the uniq command?

On Friday 29 July 2005 09:03 am, jason_normandin@charter.net wrote:
> Hey Guys
>
> I am having an odd problem using grep to ensure an array only contains
> distinct entries.
>
> I have a list similar to the following in a file (short example of a much
> longer list)
>
> support01-FastEthernet1/0
> support01-RH
> jnormandin-p370-1691-SH-Cpu-2
> jnormandin-p370-1691-SH
>
> These entries may or may not appear multiple times within that list.
>
> I am trying to create an ARRAY containing each of these entries only once
> (a distinct list). I am using a grep statement:
>
> push @pingErrorsName, $element if (! grep (/\b$element\b/,
> @pingErrorsName));
>
> * Where $element contains one of the above names in the list and
> @pingErrorsName is the distinct list of elements.
>
> What I am finding is that the array will contain all of the correct entries
> except: jnormandin-p370-1691-SH. It appears as though the grep is matching
> jnormandin-p370-1691-SH to the jnormandin-p370-1691-SH-Cpu-2 string (as it
> is a substring of the second one).
>
> Now I am using word boudnary anchors (\b) in the grep so I am as
> to why this is not working.
>
> Does anyone have any ideas as to why this is occuring and how I can prevent
> it?


--
Eric Walker
EDA/CAD Engineer
Work: 208-368-2573
Jason Normandin

2005-07-29, 5:01 pm

Unfortunately I cannot as this is a cross-platform implementation. I need to avoid using any OS dependant commands.

>
> From: Eric Walker <ewalker@micron.com>
> Date: 2005/07/29 Fri AM 11:19:16 EDT
> To: beginners@perl.org
> CC: jason_normandin@charter.net
> Subject: Re: Grep uniqueness issue
>
> can you use the uniq command?
>
> On Friday 29 July 2005 09:03 am, jason_normandin@charter.net wrote:
>
> --
> Eric Walker
> EDA/CAD Engineer
> Work: 208-368-2573
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>
>

Jeff 'japhy' Pinyan

2005-07-29, 5:01 pm

On Jul 29, jason_normandin@charter.net said:

> I am having an odd problem using grep to ensure an array only contains
> distinct entries.


You should probably use a hash (along with your array, if you need to keep
the order they're in) to ensure unique-ness.

> support01-FastEthernet1/0
> support01-RH
> jnormandin-p370-1691-SH-Cpu-2
> jnormandin-p370-1691-SH
>
> I am trying to create an ARRAY containing each of these entries only
> once (a distinct list). I am using a grep statement:
>
> push @pingErrorsName, $element if (! grep (/\b$element\b/, @pingErrorsName));


There's NO reason to use a regex here at all. You don't want $element to
be treated like a regex (since it's just a string). If you were gonna use
grep(), you should just do

push @pingErrorsName, $element if !grep $_ eq $element, @pingErrorsName;

But I suggest you use a hash:

push @pingErrorsName, $element if !$uniq{$element}++;

Read 'perldoc -q uniq' for more information.

> * Where $element contains one of the above names in the list and
> @pingErrorsName is the distinct list of elements.
>
> What I am finding is that the array will contain all of the correct
> entries except: jnormandin-p370-1691-SH. It appears as though the grep
> is matching jnormandin-p370-1691-SH to the jnormandin-p370-1691-SH-Cpu-2
> string (as it is a substring of the second one).


Well, YES. That's because \b matches word boundaries. Between the 'H'
and the '-' is a word boundary.

--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://japhy.perlmonk.org/ % have long ago been overpaid?
http://www.perlmonks.org/ % -- Meister Eckhart
Jason Normandin

2005-07-29, 5:01 pm

Thanks.

Just so that I understand:

push @pingErrorsName, $element if !grep $_ eq $element, @pingErrorsName;

Is actually taking each value in @pingErrorsName and validating if it is equal to $element.
>
> From: "Jeff 'japhy' Pinyan" <japhy@perlmonk.org>
> Date: 2005/07/29 Fri AM 11:32:15 EDT
> To: jason_normandin@charter.net
> CC: beginners@perl.org
> Subject: Re: Grep uniqueness issue
>
> On Jul 29, jason_normandin@charter.net said:
>
>
> You should probably use a hash (along with your array, if you need to keep
> the order they're in) to ensure unique-ness.
>
>
> There's NO reason to use a regex here at all. You don't want $element to
> be treated like a regex (since it's just a string). If you were gonna use
> grep(), you should just do
>
> push @pingErrorsName, $element if !grep $_ eq $element, @pingErrorsName;
>
> But I suggest you use a hash:
>
> push @pingErrorsName, $element if !$uniq{$element}++;
>
> Read 'perldoc -q uniq' for more information.
>
>
> Well, YES. That's because \b matches word boundaries. Between the 'H'
> and the '-' is a word boundary.
>
> --
> Jeff "japhy" Pinyan % How can we ever be the sold short or
> RPI Acacia Brother #734 % the cheated, we who for every service
> http://japhy.perlmonk.org/ % have long ago been overpaid?
> http://www.perlmonks.org/ % -- Meister Eckhart
>

Ankur Gupta

2005-07-29, 5:01 pm

jason_normandin@charter.net <mailto:jason_normandin@charter.net> wrote:
> Hey Guys
>=20
> I am having an odd problem using grep to ensure an array only
> contains distinct entries.=20
>=20
> I have a list similar to the following in a file (short example of a
> much longer list)=20
>=20
> support01-FastEthernet1/0
> support01-RH
> jnormandin-p370-1691-SH-Cpu-2
> jnormandin-p370-1691-SH
>=20
> These entries may or may not appear multiple times within that list.
>=20
> I am trying to create an ARRAY containing each of these entries only
> once (a distinct list). I am using a grep statement:=20
>=20
> push @pingErrorsName, $element if (! grep (/\b$element\b/,
> @pingErrorsName));=20


push @pingErrorsName, $element if ( ! grep { $_ eq $element }
@pingErrorsName );

or=20

push @pingErrorsName, $element if ( ! grep { /^$element$/ }
@pingErrorsName );

should work.. (untested though).

But this would be inefficient for large arrays.

Use hash instead.

$pingErrorsName{$element} =3D undef;

Retrieve the elements by=20

@elements =3D (keys %pingErrorsName);

> * Where $element contains one of the above names in the list and
> @pingErrorsName is the distinct list of elements.=20
>=20
> What I am finding is that the array will contain all of the correct
> entries except: jnormandin-p370-1691-SH. It appears as though the
> grep is matching jnormandin-p370-1691-SH to the
> jnormandin-p370-1691-SH-Cpu-2 string (as it is a substring of the
> second one). =20
>
> Now I am using word boudnary anchors (\b) in the grep so I am
> as to why this is not working.=20
>=20
> Does anyone have any ideas as to why this is occuring and how I can
> prevent it?=20


HTH...

--Ankur

Why did kamikaze pilots wear helmets anyways?
Tom Allison

2005-07-29, 10:00 pm

Ankur Gupta wrote:
> jason_normandin@charter.net <mailto:jason_normandin@charter.net> wrote:
>
>
>
> push @pingErrorsName, $element if ( ! grep { $_ eq $element }
> @pingErrorsName );
>
> or
>
> push @pingErrorsName, $element if ( ! grep { /^$element$/ }
> @pingErrorsName );
>
> should work.. (untested though).
>
> But this would be inefficient for large arrays.
>
> Use hash instead.
>
> $pingErrorsName{$element} = undef;
>
> Retrieve the elements by
>
> @elements = (keys %pingErrorsName);
>



This loses the sort order, if you care.


foreach $element....
{
next if exists $hash{$element};
--OR--
next if grep( /$element/, @array);
push @array, $element;
$hash{element} = 1;
}
undef %hash; # house cleaning

I suspect that the grep approach slows down with larger arrays while the
%hash approach is a memory hog...
This is also one of those times where the use of 'o' in your regex
/$element/o would be VERY VERY bad... :)
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com