Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

bulky regex
Hello everyone,
This is the first time I was able to get a complex regex actually working as
I expect, but I wanted someone to comment on it, in case I am overlooking
something very major.
I am trying to throw away a certain string out of random text. The
occurences might be as follows:

Free UPS Ground Shipping <rest of string>
<some of string> free ground shipping !! <rest of string>
<some of string> free UPS ground shipping!!!

and all variations of the above. Here is what I did:

description =~ s/	#take away free ground shipping text

(?:		#non-capturing block for | inclusion
(^)		#start of string
|		#or
(?<=\S)	#lookbehind non-space character
)

\s*		#maybe some spaces
free		#word 'free'
\s+		#at least one space
(?:ups\s+)?	#non-capturing 'ups' with at least one trailing space
ground		#'ground'
\s+		#spaces
shipping	#'shipping'
\s*		#maybe some spaces
!*		#maybe some exclamation marks
\s*		#maybe some more spaces

(?:		#non-capturing for | inclusion
($)		#end of string
|		#or
(?=\S\s?)	#lookahead non-space character maybe followed by a space (I want t
o keep the space if I am cutting from inside a string)
)

//ixg;             #replace with nothing

Seems to be working, but I am afraid it will bite me later. Appreciate any
comments. The reason I placed all the (?: ) is to speed it up at least a
bit, I remember reading somewhere that it matters.

Thanks

Peter

Report this thread to moderator Post Follow-up to this message
Old Post
Peter Rabbitson
04-27-05 08:56 PM


Re: bulky regex
On Wed, Apr 27, 2005 at 12:16:05PM -0500, Peter Rabbitson wrote:
> description =~ s/	#take away free ground shipping text
>
> (?:		#non-capturing block for | inclusion
>    (^)		#start of string
>       |		#or
>    (?<=\S)	#lookbehind non-space character
> )
>
> \s*		#maybe some spaces
> free		#word 'free'
> \s+		#at least one space
> (?:ups\s+)?	#non-capturing 'ups' with at least one trailing space
> ground		#'ground'
> \s+		#spaces
> shipping	#'shipping'
> \s*		#maybe some spaces
> !*		#maybe some exclamation marks
> \s*		#maybe some more spaces
>
> (?:		#non-capturing for | inclusion
>    ($)		#end of string
>       |		#or
>    (?=\S\s?)	#lookahead non-space character maybe followed by a space (I w
ant to keep the space if I am cutting from inside a string)
> )
>
> //ixg;             #replace with nothing

Ops. (?=\S\s?) above should be (?=\s\S), if it's not at the end of a
string I am guaranteed at least a single space, sorry about that.

Report this thread to moderator Post Follow-up to this message
Old Post
Peter Rabbitson
04-27-05 08:56 PM


Re: bulky regex
On 4/27/05, Peter Rabbitson <rabbit@rabbit.us> wrote:
> Hello everyone,
> This is the first time I was able to get a complex regex actually working=
as
> I expect, but I wanted someone to comment on it, in case I am overlooking
> something very major.
> I am trying to throw away a certain string out of random text. The
> occurences might be as follows:
>=20
> Free UPS Ground Shipping <rest of string>
> <some of string> free ground shipping !! <rest of string>
> <some of string> free UPS ground shipping!!!
>=20
> and all variations of the above. Here is what I did:
>=20
> description =3D~ s/       #take away free ground shipping text
>=20
> (?:             #non-capturing block for | inclusion
>    (^)          #start of string
>       |         #or
>    (?<=3D\S)      #lookbehind non-space character
> )
>=20
> \s*             #maybe some spaces
> free            #word 'free'
> \s+             #at least one space
> (?:ups\s+)?     #non-capturing 'ups' with at least one trailing space
> ground          #'ground'
> \s+             #spaces
> shipping        #'shipping'
> \s*             #maybe some spaces
> !*              #maybe some exclamation marks
> \s*             #maybe some more spaces
>=20
> (?:             #non-capturing for | inclusion
>    ($)          #end of string
>       |         #or
>    (?=3D\S\s?)    #lookahead non-space character maybe followed by a spac=
e (I want to keep the space if I am cutting from inside a string)
> )
>=20
> //ixg;             #replace with nothing
>=20
> Seems to be working, but I am afraid it will bite me later. Appreciate an=
y
> comments. The reason I placed all the (?: ) is to speed it up at least a
> bit, I remember reading somewhere that it matters.
>=20
> Thanks
>=20
> Peter
>=20
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>=20
>=20

Peter,

Don't make things so complicated.  You want to find some words and
replace them with nothing.  You don't care where in the string the
pattern appears.  Therefore, you don't have to predict where in the
string it might possibly appear.  As long as what you're looking for
is contiguous, it doesn't matter where it occurs in relation to what
you *aren't* looking for.  You don't have to write a regex for the
enitre line; that's what makes regex so powerful: it does the looking
for you.  Also, ! is a negation.  It may work bare here, but get in
the habit of useing it escaped or in a class.

$description =3D~ s/free ups ground shipping[!]*//i;  # or /ig if neede=
d

should work just fine.  Write your patterns to find what is is you're
looking for, not to find what it is you're *not* lokking for.

HTH,

--jay

Report this thread to moderator Post Follow-up to this message
Old Post
Jay Savage
04-27-05 08:56 PM


Re: bulky regex
On Wed, Apr 27, 2005 at 01:31:08PM -0400, Jay Savage wrote:
>
> Don't make things so complicated.  You want to find some words and
> replace them with nothing.  You don't care where in the string the
> pattern appears.  Therefore, you don't have to predict where in the

The word 'ups' is not mandatory - it might be there, might not. Also the
amount of spaces inbetween is not fixed. However what you said about me not
caring where the string is - you are right. I dropped  (?:(^)|(?<=\S)) from
the front, and it produces identical results. Thanks!

Report this thread to moderator Post Follow-up to this message
Old Post
Peter Rabbitson
04-27-05 08:56 PM


Re: bulky regex
On 4/27/05, Peter Rabbitson <rabbit@rabbit.us> wrote:
> On Wed, Apr 27, 2005 at 01:31:08PM -0400, Jay Savage wrote: 
>=20
> The word 'ups' is not mandatory - it might be there, might not. Also the
> amount of spaces inbetween is not fixed. However what you said about me n=
ot
> caring where the string is - you are right. I dropped  (?:(^)|(?<=3D\S)) =
from
> the front, and it produces identical results. Thanks!
>=20

You can drop the stuff from the end, too.  If 'ups' is optional and
the spacing is variable, then of course handle that with *

$description =3D~
s/\s*free\+(?:ups)*\s*ground\s+shipping\s*[!]*\s*//i;  # or /ig if
needed

Rearrange to suit.  But the imporant thing here is to go for what you
need to replace, and not what you don't.

HTH,

--jay

Report this thread to moderator Post Follow-up to this message
Old Post
Jay Savage
04-27-05 08:57 PM


Re: bulky regex
On Wed, Apr 27, 2005 at 01:50:57PM -0400, Jay Savage wrote:
> You can drop the stuff from the end, too.  If 'ups' is optional and
> the spacing is variable, then of course handle that with *
>
>     $description =~
> s/\s*free\+(?:ups)*\s*ground\s+shipping\s*[!]*\s*//i;  # or /ig if
> needed
>
> Rearrange to suit.  But the imporant thing here is to go for what you
> need to replace, and not what you don't.


Mmm... as I wrote in the comments in the very first e-mail:
> blablabla...I want to keep the space if I am cutting from inside a string
Is there a way to do this without the lookahead? Yes I can replace with / /,
but then I am introducing more spaces than there were if I am at the
beginning or at the end...? Excuse my curiosity :)

Report this thread to moderator Post Follow-up to this message
Old Post
Peter Rabbitson
04-28-05 01:56 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

PERL Beginners archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 07:31 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.