Home > Archive > PERL Beginners > July 2005 > Grep uniqueness issue
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Grep uniqueness issue
|
|
| Jason Normandin 2005-07-29, 5:01 pm |
| Hey Guys
I am having an odd problem using grep to ensure an array only contains distinct entries.
I have a list similar to the following in a file (short example of a much longer list)
support01-FastEthernet1/0
support01-RH
jnormandin-p370-1691-SH-Cpu-2
jnormandin-p370-1691-SH
These entries may or may not appear multiple times within that list.
I am trying to create an ARRAY containing each of these entries only once (a distinct list). I am using a grep statement:
push @pingErrorsName, $element if (! grep (/\b$element\b/, @pingErrorsName));
* Where $element contains one of the above names in the list and @pingErrorsName is the distinct list of elements.
What I am finding is that the array will contain all of the correct entries except: jnormandin-p370-1691-SH. It appears as though the grep is matching jnormandin-p370-1691-SH to the
jnormandin-p370-1691-SH-Cpu-2 string (as it is a substring of the second one).
Now I am using word boudnary anchors (\b) in the grep so I am as to why this is not working.
Does anyone have any ideas as to why this is occuring and how I can prevent it?
| |
| Eric Walker 2005-07-29, 5:01 pm |
| can you use the uniq command?
On Friday 29 July 2005 09:03 am, jason_normandin@charter.net wrote:
> Hey Guys
>
> I am having an odd problem using grep to ensure an array only contains
> distinct entries.
>
> I have a list similar to the following in a file (short example of a much
> longer list)
>
> support01-FastEthernet1/0
> support01-RH
> jnormandin-p370-1691-SH-Cpu-2
> jnormandin-p370-1691-SH
>
> These entries may or may not appear multiple times within that list.
>
> I am trying to create an ARRAY containing each of these entries only once
> (a distinct list). I am using a grep statement:
>
> push @pingErrorsName, $element if (! grep (/\b$element\b/,
> @pingErrorsName));
>
> * Where $element contains one of the above names in the list and
> @pingErrorsName is the distinct list of elements.
>
> What I am finding is that the array will contain all of the correct entries
> except: jnormandin-p370-1691-SH. It appears as though the grep is matching
> jnormandin-p370-1691-SH to the jnormandin-p370-1691-SH-Cpu-2 string (as it
> is a substring of the second one).
>
> Now I am using word boudnary anchors (\b) in the grep so I am as
> to why this is not working.
>
> Does anyone have any ideas as to why this is occuring and how I can prevent
> it?
--
Eric Walker
EDA/CAD Engineer
Work: 208-368-2573
| |
| Jason Normandin 2005-07-29, 5:01 pm |
| Unfortunately I cannot as this is a cross-platform implementation. I need to avoid using any OS dependant commands.
>
> From: Eric Walker <ewalker@micron.com>
> Date: 2005/07/29 Fri AM 11:19:16 EDT
> To: beginners@perl.org
> CC: jason_normandin@charter.net
> Subject: Re: Grep uniqueness issue
>
> can you use the uniq command?
>
> On Friday 29 July 2005 09:03 am, jason_normandin@charter.net wrote:
>
> --
> Eric Walker
> EDA/CAD Engineer
> Work: 208-368-2573
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>
>
| |
| Jeff 'japhy' Pinyan 2005-07-29, 5:01 pm |
| On Jul 29, jason_normandin@charter.net said:
> I am having an odd problem using grep to ensure an array only contains
> distinct entries.
You should probably use a hash (along with your array, if you need to keep
the order they're in) to ensure unique-ness.
> support01-FastEthernet1/0
> support01-RH
> jnormandin-p370-1691-SH-Cpu-2
> jnormandin-p370-1691-SH
>
> I am trying to create an ARRAY containing each of these entries only
> once (a distinct list). I am using a grep statement:
>
> push @pingErrorsName, $element if (! grep (/\b$element\b/, @pingErrorsName));
There's NO reason to use a regex here at all. You don't want $element to
be treated like a regex (since it's just a string). If you were gonna use
grep(), you should just do
push @pingErrorsName, $element if !grep $_ eq $element, @pingErrorsName;
But I suggest you use a hash:
push @pingErrorsName, $element if !$uniq{$element}++;
Read 'perldoc -q uniq' for more information.
> * Where $element contains one of the above names in the list and
> @pingErrorsName is the distinct list of elements.
>
> What I am finding is that the array will contain all of the correct
> entries except: jnormandin-p370-1691-SH. It appears as though the grep
> is matching jnormandin-p370-1691-SH to the jnormandin-p370-1691-SH-Cpu-2
> string (as it is a substring of the second one).
Well, YES. That's because \b matches word boundaries. Between the 'H'
and the '-' is a word boundary.
--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://japhy.perlmonk.org/ % have long ago been overpaid?
http://www.perlmonks.org/ % -- Meister Eckhart
| |
| Jason Normandin 2005-07-29, 5:01 pm |
| Thanks.
Just so that I understand:
push @pingErrorsName, $element if !grep $_ eq $element, @pingErrorsName;
Is actually taking each value in @pingErrorsName and validating if it is equal to $element.
>
> From: "Jeff 'japhy' Pinyan" <japhy@perlmonk.org>
> Date: 2005/07/29 Fri AM 11:32:15 EDT
> To: jason_normandin@charter.net
> CC: beginners@perl.org
> Subject: Re: Grep uniqueness issue
>
> On Jul 29, jason_normandin@charter.net said:
>
>
> You should probably use a hash (along with your array, if you need to keep
> the order they're in) to ensure unique-ness.
>
>
> There's NO reason to use a regex here at all. You don't want $element to
> be treated like a regex (since it's just a string). If you were gonna use
> grep(), you should just do
>
> push @pingErrorsName, $element if !grep $_ eq $element, @pingErrorsName;
>
> But I suggest you use a hash:
>
> push @pingErrorsName, $element if !$uniq{$element}++;
>
> Read 'perldoc -q uniq' for more information.
>
>
> Well, YES. That's because \b matches word boundaries. Between the 'H'
> and the '-' is a word boundary.
>
> --
> Jeff "japhy" Pinyan % How can we ever be the sold short or
> RPI Acacia Brother #734 % the cheated, we who for every service
> http://japhy.perlmonk.org/ % have long ago been overpaid?
> http://www.perlmonks.org/ % -- Meister Eckhart
>
| |
| Ankur Gupta 2005-07-29, 5:01 pm |
| jason_normandin@charter.net <mailto:jason_normandin@charter.net> wrote:
> Hey Guys
>=20
> I am having an odd problem using grep to ensure an array only
> contains distinct entries.=20
>=20
> I have a list similar to the following in a file (short example of a
> much longer list)=20
>=20
> support01-FastEthernet1/0
> support01-RH
> jnormandin-p370-1691-SH-Cpu-2
> jnormandin-p370-1691-SH
>=20
> These entries may or may not appear multiple times within that list.
>=20
> I am trying to create an ARRAY containing each of these entries only
> once (a distinct list). I am using a grep statement:=20
>=20
> push @pingErrorsName, $element if (! grep (/\b$element\b/,
> @pingErrorsName));=20
push @pingErrorsName, $element if ( ! grep { $_ eq $element }
@pingErrorsName );
or=20
push @pingErrorsName, $element if ( ! grep { /^$element$/ }
@pingErrorsName );
should work.. (untested though).
But this would be inefficient for large arrays.
Use hash instead.
$pingErrorsName{$element} =3D undef;
Retrieve the elements by=20
@elements =3D (keys %pingErrorsName);
> * Where $element contains one of the above names in the list and
> @pingErrorsName is the distinct list of elements.=20
>=20
> What I am finding is that the array will contain all of the correct
> entries except: jnormandin-p370-1691-SH. It appears as though the
> grep is matching jnormandin-p370-1691-SH to the
> jnormandin-p370-1691-SH-Cpu-2 string (as it is a substring of the
> second one). =20
>
> Now I am using word boudnary anchors (\b) in the grep so I am
> as to why this is not working.=20
>=20
> Does anyone have any ideas as to why this is occuring and how I can
> prevent it?=20
HTH...
--Ankur
Why did kamikaze pilots wear helmets anyways?
| |
| Tom Allison 2005-07-29, 10:00 pm |
| Ankur Gupta wrote:
> jason_normandin@charter.net <mailto:jason_normandin@charter.net> wrote:
>
>
>
> push @pingErrorsName, $element if ( ! grep { $_ eq $element }
> @pingErrorsName );
>
> or
>
> push @pingErrorsName, $element if ( ! grep { /^$element$/ }
> @pingErrorsName );
>
> should work.. (untested though).
>
> But this would be inefficient for large arrays.
>
> Use hash instead.
>
> $pingErrorsName{$element} = undef;
>
> Retrieve the elements by
>
> @elements = (keys %pingErrorsName);
>
This loses the sort order, if you care.
foreach $element....
{
next if exists $hash{$element};
--OR--
next if grep( /$element/, @array);
push @array, $element;
$hash{element} = 1;
}
undef %hash; # house cleaning
I suspect that the grep approach slows down with larger arrays while the
%hash approach is a memory hog...
This is also one of those times where the use of 'o' in your regex
/$element/o would be VERY VERY bad... :)
|
|
|
|
|