Code Comments
Programming Forum and web based access to our favorite programming groups.Hello everyone, This is the first time I was able to get a complex regex actually working as I expect, but I wanted someone to comment on it, in case I am overlooking something very major. I am trying to throw away a certain string out of random text. The occurences might be as follows: Free UPS Ground Shipping <rest of string> <some of string> free ground shipping !! <rest of string> <some of string> free UPS ground shipping!!! and all variations of the above. Here is what I did: description =~ s/ #take away free ground shipping text (?: #non-capturing block for | inclusion (^) #start of string | #or (?<=\S) #lookbehind non-space character ) \s* #maybe some spaces free #word 'free' \s+ #at least one space (?:ups\s+)? #non-capturing 'ups' with at least one trailing space ground #'ground' \s+ #spaces shipping #'shipping' \s* #maybe some spaces !* #maybe some exclamation marks \s* #maybe some more spaces (?: #non-capturing for | inclusion ($) #end of string | #or (?=\S\s?) #lookahead non-space character maybe followed by a space (I want t o keep the space if I am cutting from inside a string) ) //ixg; #replace with nothing Seems to be working, but I am afraid it will bite me later. Appreciate any comments. The reason I placed all the (?: ) is to speed it up at least a bit, I remember reading somewhere that it matters. Thanks Peter
Post Follow-up to this messageOn Wed, Apr 27, 2005 at 12:16:05PM -0500, Peter Rabbitson wrote: > description =~ s/ #take away free ground shipping text > > (?: #non-capturing block for | inclusion > (^) #start of string > | #or > (?<=\S) #lookbehind non-space character > ) > > \s* #maybe some spaces > free #word 'free' > \s+ #at least one space > (?:ups\s+)? #non-capturing 'ups' with at least one trailing space > ground #'ground' > \s+ #spaces > shipping #'shipping' > \s* #maybe some spaces > !* #maybe some exclamation marks > \s* #maybe some more spaces > > (?: #non-capturing for | inclusion > ($) #end of string > | #or > (?=\S\s?) #lookahead non-space character maybe followed by a space (I w ant to keep the space if I am cutting from inside a string) > ) > > //ixg; #replace with nothing Ops. (?=\S\s?) above should be (?=\s\S), if it's not at the end of a string I am guaranteed at least a single space, sorry about that.
Post Follow-up to this messageOn 4/27/05, Peter Rabbitson <rabbit@rabbit.us> wrote: > Hello everyone, > This is the first time I was able to get a complex regex actually working= as > I expect, but I wanted someone to comment on it, in case I am overlooking > something very major. > I am trying to throw away a certain string out of random text. The > occurences might be as follows: >=20 > Free UPS Ground Shipping <rest of string> > <some of string> free ground shipping !! <rest of string> > <some of string> free UPS ground shipping!!! >=20 > and all variations of the above. Here is what I did: >=20 > description =3D~ s/ #take away free ground shipping text >=20 > (?: #non-capturing block for | inclusion > (^) #start of string > | #or > (?<=3D\S) #lookbehind non-space character > ) >=20 > \s* #maybe some spaces > free #word 'free' > \s+ #at least one space > (?:ups\s+)? #non-capturing 'ups' with at least one trailing space > ground #'ground' > \s+ #spaces > shipping #'shipping' > \s* #maybe some spaces > !* #maybe some exclamation marks > \s* #maybe some more spaces >=20 > (?: #non-capturing for | inclusion > ($) #end of string > | #or > (?=3D\S\s?) #lookahead non-space character maybe followed by a spac= e (I want to keep the space if I am cutting from inside a string) > ) >=20 > //ixg; #replace with nothing >=20 > Seems to be working, but I am afraid it will bite me later. Appreciate an= y > comments. The reason I placed all the (?: ) is to speed it up at least a > bit, I remember reading somewhere that it matters. >=20 > Thanks >=20 > Peter >=20 > -- > To unsubscribe, e-mail: beginners-unsubscribe@perl.org > For additional commands, e-mail: beginners-help@perl.org > <http://learn.perl.org/> <http://learn.perl.org/first-response> >=20 >=20 Peter, Don't make things so complicated. You want to find some words and replace them with nothing. You don't care where in the string the pattern appears. Therefore, you don't have to predict where in the string it might possibly appear. As long as what you're looking for is contiguous, it doesn't matter where it occurs in relation to what you *aren't* looking for. You don't have to write a regex for the enitre line; that's what makes regex so powerful: it does the looking for you. Also, ! is a negation. It may work bare here, but get in the habit of useing it escaped or in a class. $description =3D~ s/free ups ground shipping[!]*//i; # or /ig if neede= d should work just fine. Write your patterns to find what is is you're looking for, not to find what it is you're *not* lokking for. HTH, --jay
Post Follow-up to this messageOn Wed, Apr 27, 2005 at 01:31:08PM -0400, Jay Savage wrote: > > Don't make things so complicated. You want to find some words and > replace them with nothing. You don't care where in the string the > pattern appears. Therefore, you don't have to predict where in the The word 'ups' is not mandatory - it might be there, might not. Also the amount of spaces inbetween is not fixed. However what you said about me not caring where the string is - you are right. I dropped (?:(^)|(?<=\S)) from the front, and it produces identical results. Thanks!
Post Follow-up to this messageOn 4/27/05, Peter Rabbitson <rabbit@rabbit.us> wrote: > On Wed, Apr 27, 2005 at 01:31:08PM -0400, Jay Savage wrote: >=20 > The word 'ups' is not mandatory - it might be there, might not. Also the > amount of spaces inbetween is not fixed. However what you said about me n= ot > caring where the string is - you are right. I dropped (?:(^)|(?<=3D\S)) = from > the front, and it produces identical results. Thanks! >=20 You can drop the stuff from the end, too. If 'ups' is optional and the spacing is variable, then of course handle that with * $description =3D~ s/\s*free\+(?:ups)*\s*ground\s+shipping\s*[!]*\s*//i; # or /ig if needed Rearrange to suit. But the imporant thing here is to go for what you need to replace, and not what you don't. HTH, --jay
Post Follow-up to this messageOn Wed, Apr 27, 2005 at 01:50:57PM -0400, Jay Savage wrote: > You can drop the stuff from the end, too. If 'ups' is optional and > the spacing is variable, then of course handle that with * > > $description =~ > s/\s*free\+(?:ups)*\s*ground\s+shipping\s*[!]*\s*//i; # or /ig if > needed > > Rearrange to suit. But the imporant thing here is to go for what you > need to replace, and not what you don't. Mmm... as I wrote in the comments in the very first e-mail: > blablabla...I want to keep the space if I am cutting from inside a string Is there a way to do this without the lookahead? Yes I can replace with / /, but then I am introducing more spaces than there were if I am at the beginning or at the end...? Excuse my curiosity :)
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.