For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > June 2005 > find and replace large blocks









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author find and replace large blocks
Cy Kurtz

2005-06-08, 3:57 am

Is it possible to use s/foo/bar in another way to allow replacement of
large blocks of text with spaces, quotes, and double quotes?

Is there a better way?

Thank you,

Cy Kurtz


Xavier Noria

2005-06-08, 3:57 am

On Jun 7, 2005, at 14:51, Cy Kurtz wrote:

> Is it possible to use s/foo/bar in another way to allow replacement of
> large blocks of text with spaces, quotes, and double quotes?


Would you please send an example of what you need to accomplish?

-- fxn

Offer Kaye

2005-06-08, 3:57 am

On 6/7/05, Cy Kurtz wrote:
> Is it possible to use s/foo/bar in another way to allow replacement of
> large blocks of text with spaces, quotes, and double quotes?
>=20


Yes.

> Is there a better way?
>=20


That depends on what exactly you want to do.
HTH,
--=20
Offer Kaye
Cy Kurtz

2005-06-08, 3:57 am

OK ... Remember you asked for it. I have at least a dozen files that I
want to update. I want to do this:

me@mymachine somedirectory]$ perl -pi~ -e
's/./officers-gasenate.html/http://www.legis.state.ga.us/cgi-bin/peo_list.pl?List=stsenatedl/' ./contactus.html

I was hoping to change this code:

State House</a> <a class="item2" href="./officers-gasenate.html">Georgia
State Senate</a></div>

to this:

State House</a> <a class="item2" href="http://www.legis.state.ga.us/cgi-bin/peo_list.pl?List=stsenatedl">Georgia
State Senate</a></div>

Of course it isn't working. I think it's because of all of those
forward slashes. I wonder if I'm trying to drive a nail with a coffee
cup.

On Tue, 2005-06-07 at 15:09 +0200, Xavier Noria wrote:
> On Jun 7, 2005, at 14:51, Cy Kurtz wrote:
>
>
> Would you please send an example of what you need to accomplish?
>
> -- fxn
>
>


Chris Devers

2005-06-08, 3:57 am

On Tue, 7 Jun 2005, Cy Kurtz wrote:

> OK ... Remember you asked for it.


Right. Because without sufficient context, it's impossible to give an
adequate answer to a wildly open-ended question. Make sense?

> I have at least a dozen files that I want to update. I want to do
> this:
>
> me@mymachine somedirectory]$ perl -pi~ -e 's/./officers-gasenate.html/http://www.legis.state.ga.us/cgi-bin/peo_list.pl?List=stsenatedl/' ./contactus.html


That won't work. This gets reduced to

s/./officers-gasenate.html/

Which matches a dot /./ -- which is a metacharacter meaning "matches
anything at all" -- and replaces it with /officers-gasenate.html/

In other words, it will turn this string --

abc

-- into this

officers-gasenate.htmlofficers-gasenate.htmlofficers-gasenate.html

-- which isn't at all what you meant :-)

Further, everything after that third forward-slash is ignored, and will
probably (read: definitely) produce an error.

I see two things that are worth changing here.

* you shouldn't be using forward-slashes as the regex delimiter
* you should be escaping metacharacters like the dot

Thus, the regex should be something like this:

s|\./officers-gasenate\.html|http://www\.legis\.state\.ga\.us/cg...List=stsenatedl|

That's a bit unwieldy; you can break it up for clarity --

my $old = "\./officers-gasenate\.html";
my $new = "http://www\.legis\.state\.ga\.us/cgi-bin/peo_list\.pl\?List=stsenatedl";
s/$old/$new/;

-- but for a command-line one-liner, that's probably overkill.



Note though that it's standard to point out here that HTML is
notoriously difficult to get right with regular expressions. If all
you're doing is changing the href target of known anchor tags in a
limited set of files that you have control over, it's probably fine to
solve it this way, but if the HTML is at all complicated -- that is, if
it has any inconsistencies at all, broken tags, etc -- you're much
better off solving this kind of problem with a parser module from CPAN.

There's a lot of them to choose from, depending on your needs, but
almost any of them are a better choice than doing this kind of thing by
hand with regular expressions: it's easier, faster, and more robust.

Keep it in mind if this problem starts getting more complicated...



--
Chris Devers
Xavier Noria

2005-06-08, 3:57 am

On Jun 7, 2005, at 15:39, Cy Kurtz wrote:

> OK ... Remember you asked for it. I have at least a dozen files that I
> want to update. I want to do this:
>
> me@mymachine somedirectory]$ perl -pi~ -e
> 's/./officers-gasenate.html/http://www.legis.state.ga.us/cgi-bin/
> peo_list.pl?List=stsenatedl/' ./contactus.html
>
> I was hoping to change this code:
>
> State House</a> <a class="item2" href="./officers-
> gasenate.html">Georgia
> State Senate</a></div>
>
> to this:
>
> State House</a> <a class="item2" href="http://www.legis.state.ga.us/
> cgi-bin/peo_list.pl?List=stsenatedl">Georgia
> State Senate</a></div>
>
> Of course it isn't working. I think it's because of all of those
> forward slashes. I wonder if I'm trying to drive a nail with a coffee
> cup.


Excellent, thank you.

To address those kind of issues, Perl not only allows the traditional
escaping solution, but also allows changing the very delimiters of
s/// to avoid any escaping at all, and thus enhancing readibility.
The idea is that you choose a delimiter that is not found in either
part of the substitution. That is documented in perlop.

In your case, you could do for instance:

me@mymachine somedirectory]$ perl -pi~ -e
's{[.]/officers-gasenate[.]html}{http://www.legis.state.ga.us/cgi-bin/
peo_list.pl?List=stsenatedl}' ./contactus.html

where the regexp is surrounded by a pair "{", "}", and so is the
replacement string. Note that, in addition, the dot has been put in a
class because it was being used as a metacharacter whereas a literal
dot was required.

-- fxn

PS: Remember that munging HTML with regexps may be fragile unless you
control the HTML and know that a regexp approach is fine.
Cy Kurtz

2005-06-08, 3:57 am

Thanks for your help. I stubled over a solution in perlop. Any
non-alphanumeric non-whitespace character can be used in place of the
forward slash in s///. I'm using s%%%(those are percentage signs) now
and things are going well. I haven't looked into the parser modules yet.

Thanks again,

Cy Kurtz


On Tue, 2005-06-07 at 09:50 -0400, Chris Devers wrote:
> On Tue, 7 Jun 2005, Cy Kurtz wrote:
>
>
> Right. Because without sufficient context, it's impossible to give an
> adequate answer to a wildly open-ended question. Make sense?
>
>

..
<snip>
..

>
> Note though that it's standard to point out here that HTML is
> notoriously difficult to get right with regular expressions. If all
> you're doing is changing the href target of known anchor tags in a
> limited set of files that you have control over, it's probably fine to
> solve it this way, but if the HTML is at all complicated -- that is, if
> it has any inconsistencies at all, broken tags, etc -- you're much
> better off solving this kind of problem with a parser module from CPAN.
>
> There's a lot of them to choose from, depending on your needs, but
> almost any of them are a better choice than doing this kind of thing by
> hand with regular expressions: it's easier, faster, and more robust.
>
> Keep it in mind if this problem starts getting more complicated...
>
>
>
> --
> Chris Devers
>


Offer Kaye

2005-06-08, 3:57 am

On 6/7/05, Chris Devers wrote:
>=20
> Which matches a dot /./ -- which is a metacharacter meaning "matches
> anything at all"=20


Not quite correct - a dot (".") matches "any single character", not
"anything at all", and even this rule has an exception - a dot will
not match a newline ("\n") unless you use the "s" modifier.

>=20
> In other words, it will turn this string --
>=20
> abc
>=20
> -- into this
>=20
> officers-gasenate.htmlofficers-gasenate.htmlofficers-gasenate.html
>=20


No, it will turn "abc" into:
officers-gasenate.htmlbc
Unless you use the "g" modifier.

>=20
> Thus, the regex should be something like this:
> s|\./officers-gasenate\.html|http://www\.legis\.state\.ga\.us/cgi-bin=

/peo_list\.pl\?List=3Dstsenatedl|
>=20


There's no need to escape metachars in the replacement part. Without
modifiers (such as "e" or "x") the replacement part is treated as a
simple double-quoted string (delimiter dependent).
So the s/// can be written as:
s|\./officers-gasenate\.html|http://www.legis.state.ga.us/cgi-bin/peo_list.=
pl?List=3Dstsenatedl|

Cheers,
--=20
Offer Kaye
Manav Mathur

2005-06-08, 3:57 am


Yes. As your requirements stated are not very specific, I'd recommend you to
see the entries for
$/ (in perldco perlvar)
/s modifier for s/// operator.

|-----Original Message-----
|From: Cy Kurtz [mailto:ckurtz11@comcast.net]
|Sent: Tuesday, June 07, 2005 6:21 PM
|To: beginners@perl.org
|Subject: find and replace large blocks
|
|
|Is it possible to use s/foo/bar in another way to allow replacement of
|large blocks of text with spaces, quotes, and double quotes?
|
|Is there a better way?
|
|Thank you,
|
|Cy Kurtz
|
|
|
|--
|To unsubscribe, e-mail: beginners-unsubscribe@perl.org
|For additional commands, e-mail: beginners-help@perl.org
|<http://learn.perl.org/> <http://learn.perl.org/first-response>
|
|


****************************************
*****************
Disclaimer:

The contents of this E-mail (including the contents of the enclosure(s) or attachment(s) if any) are privileged and confidential material of MBT and should not be disclosed to, used by or copied in any manner by anyone other than the intended addressee(s)
. In case you are not the desired addressee, you should delete this message and/or re-direct it to the sender. The views expressed in this E-mail message (including the enclosure(s) or attachment(s) if any) are those of the individual sender, except wh
ere the sender expressly, and with authority, states them to be the views of MBT.

This e-mail message including attachment/(s), if any, is believed to be free of any virus. However, it is the responsibility of the recipient to ensure that it is virus free and MBT is not responsible for any loss or damage arising in any way from its us
e

****************************************
*****************
Ing. Branislav Gerzo

2005-06-08, 8:57 am

Offer Kaye [OK], on Tuesday, June 7, 2005 at 17:04 (+0300) contributed
this to our collective wisdom:

OK> There's no need to escape metachars in the replacement part. Without
OK> modifiers (such as "e" or "x") the replacement part is treated as a
OK> simple double-quoted string (delimiter dependent).
OK> So the s/// can be written as:
OK> s|\./officers-gasenate\.html|http://www.legis.state.ga.us/cgi-bi...List=stsenatedl|

why not use \Q in first part ?


--

How do you protect mail on web? I use http://www.2pu.net

[Us * "Not a problem."--Parker Lewis]


Xavier Noria

2005-06-08, 8:57 am

On Jun 8, 2005, at 8:44, Ing. Branislav Gerzo wrote:


> Offer Kaye [OK], on Tuesday, June 7, 2005 at 17:04 (+0300) contributed
> this to our collective wisdom:
>
> OK> There's no need to escape metachars in the replacement part.
> Without
> OK> modifiers (such as "e" or "x") the replacement part is treated
> as a
> OK> simple double-quoted string (delimiter dependent).
> OK> So the s/// can be written as:
> OK> s|\./officers-gasenate\.html|http://www.legis.state.ga.us/cgi-
> bin/peo_list.pl?List=stsenatedl|
>
> why not use \Q in first part ?
>


Well, you could.

In the shell prompt (which is where the OP was writing the
substitution) I prefer to avoid backslashes altogether, because
otherwise I need to double-check backslash rules of the very shell.
That's why I used "[.]".

-- fxn

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com