Home > Archive > PERL Beginners > June 2005 > find and replace large blocks
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
find and replace large blocks
|
|
| Cy Kurtz 2005-06-08, 3:57 am |
| Is it possible to use s/foo/bar in another way to allow replacement of
large blocks of text with spaces, quotes, and double quotes?
Is there a better way?
Thank you,
Cy Kurtz
| |
| Xavier Noria 2005-06-08, 3:57 am |
| On Jun 7, 2005, at 14:51, Cy Kurtz wrote:
> Is it possible to use s/foo/bar in another way to allow replacement of
> large blocks of text with spaces, quotes, and double quotes?
Would you please send an example of what you need to accomplish?
-- fxn
| |
| Offer Kaye 2005-06-08, 3:57 am |
| On 6/7/05, Cy Kurtz wrote:
> Is it possible to use s/foo/bar in another way to allow replacement of
> large blocks of text with spaces, quotes, and double quotes?
>=20
Yes.
> Is there a better way?
>=20
That depends on what exactly you want to do.
HTH,
--=20
Offer Kaye
| |
| Cy Kurtz 2005-06-08, 3:57 am |
| OK ... Remember you asked for it. I have at least a dozen files that I
want to update. I want to do this:
me@mymachine somedirectory]$ perl -pi~ -e
's/./officers-gasenate.html/http://www.legis.state.ga.us/cgi-bin/peo_list.pl?List=stsenatedl/' ./contactus.html
I was hoping to change this code:
State House</a> <a class="item2" href="./officers-gasenate.html">Georgia
State Senate</a></div>
to this:
State House</a> <a class="item2" href="http://www.legis.state.ga.us/cgi-bin/peo_list.pl?List=stsenatedl">Georgia
State Senate</a></div>
Of course it isn't working. I think it's because of all of those
forward slashes. I wonder if I'm trying to drive a nail with a coffee
cup.
On Tue, 2005-06-07 at 15:09 +0200, Xavier Noria wrote:
> On Jun 7, 2005, at 14:51, Cy Kurtz wrote:
>
>
> Would you please send an example of what you need to accomplish?
>
> -- fxn
>
>
| |
| Chris Devers 2005-06-08, 3:57 am |
| On Tue, 7 Jun 2005, Cy Kurtz wrote:
> OK ... Remember you asked for it.
Right. Because without sufficient context, it's impossible to give an
adequate answer to a wildly open-ended question. Make sense?
> I have at least a dozen files that I want to update. I want to do
> this:
>
> me@mymachine somedirectory]$ perl -pi~ -e 's/./officers-gasenate.html/http://www.legis.state.ga.us/cgi-bin/peo_list.pl?List=stsenatedl/' ./contactus.html
That won't work. This gets reduced to
s/./officers-gasenate.html/
Which matches a dot /./ -- which is a metacharacter meaning "matches
anything at all" -- and replaces it with /officers-gasenate.html/
In other words, it will turn this string --
abc
-- into this
officers-gasenate.htmlofficers-gasenate.htmlofficers-gasenate.html
-- which isn't at all what you meant :-)
Further, everything after that third forward-slash is ignored, and will
probably (read: definitely) produce an error.
I see two things that are worth changing here.
* you shouldn't be using forward-slashes as the regex delimiter
* you should be escaping metacharacters like the dot
Thus, the regex should be something like this:
s|\./officers-gasenate\.html|http://www\.legis\.state\.ga\.us/cg...List=stsenatedl|
That's a bit unwieldy; you can break it up for clarity --
my $old = "\./officers-gasenate\.html";
my $new = "http://www\.legis\.state\.ga\.us/cgi-bin/peo_list\.pl\?List=stsenatedl";
s/$old/$new/;
-- but for a command-line one-liner, that's probably overkill.
Note though that it's standard to point out here that HTML is
notoriously difficult to get right with regular expressions. If all
you're doing is changing the href target of known anchor tags in a
limited set of files that you have control over, it's probably fine to
solve it this way, but if the HTML is at all complicated -- that is, if
it has any inconsistencies at all, broken tags, etc -- you're much
better off solving this kind of problem with a parser module from CPAN.
There's a lot of them to choose from, depending on your needs, but
almost any of them are a better choice than doing this kind of thing by
hand with regular expressions: it's easier, faster, and more robust.
Keep it in mind if this problem starts getting more complicated...
--
Chris Devers
| |
| Xavier Noria 2005-06-08, 3:57 am |
| On Jun 7, 2005, at 15:39, Cy Kurtz wrote:
> OK ... Remember you asked for it. I have at least a dozen files that I
> want to update. I want to do this:
>
> me@mymachine somedirectory]$ perl -pi~ -e
> 's/./officers-gasenate.html/http://www.legis.state.ga.us/cgi-bin/
> peo_list.pl?List=stsenatedl/' ./contactus.html
>
> I was hoping to change this code:
>
> State House</a> <a class="item2" href="./officers-
> gasenate.html">Georgia
> State Senate</a></div>
>
> to this:
>
> State House</a> <a class="item2" href="http://www.legis.state.ga.us/
> cgi-bin/peo_list.pl?List=stsenatedl">Georgia
> State Senate</a></div>
>
> Of course it isn't working. I think it's because of all of those
> forward slashes. I wonder if I'm trying to drive a nail with a coffee
> cup.
Excellent, thank you.
To address those kind of issues, Perl not only allows the traditional
escaping solution, but also allows changing the very delimiters of
s/// to avoid any escaping at all, and thus enhancing readibility.
The idea is that you choose a delimiter that is not found in either
part of the substitution. That is documented in perlop.
In your case, you could do for instance:
me@mymachine somedirectory]$ perl -pi~ -e
's{[.]/officers-gasenate[.]html}{http://www.legis.state.ga.us/cgi-bin/
peo_list.pl?List=stsenatedl}' ./contactus.html
where the regexp is surrounded by a pair "{", "}", and so is the
replacement string. Note that, in addition, the dot has been put in a
class because it was being used as a metacharacter whereas a literal
dot was required.
-- fxn
PS: Remember that munging HTML with regexps may be fragile unless you
control the HTML and know that a regexp approach is fine.
| |
| Cy Kurtz 2005-06-08, 3:57 am |
| Thanks for your help. I stubled over a solution in perlop. Any
non-alphanumeric non-whitespace character can be used in place of the
forward slash in s///. I'm using s%%%(those are percentage signs) now
and things are going well. I haven't looked into the parser modules yet.
Thanks again,
Cy Kurtz
On Tue, 2005-06-07 at 09:50 -0400, Chris Devers wrote:
> On Tue, 7 Jun 2005, Cy Kurtz wrote:
>
>
> Right. Because without sufficient context, it's impossible to give an
> adequate answer to a wildly open-ended question. Make sense?
>
>
..
<snip>
..
>
> Note though that it's standard to point out here that HTML is
> notoriously difficult to get right with regular expressions. If all
> you're doing is changing the href target of known anchor tags in a
> limited set of files that you have control over, it's probably fine to
> solve it this way, but if the HTML is at all complicated -- that is, if
> it has any inconsistencies at all, broken tags, etc -- you're much
> better off solving this kind of problem with a parser module from CPAN.
>
> There's a lot of them to choose from, depending on your needs, but
> almost any of them are a better choice than doing this kind of thing by
> hand with regular expressions: it's easier, faster, and more robust.
>
> Keep it in mind if this problem starts getting more complicated...
>
>
>
> --
> Chris Devers
>
| |
| Offer Kaye 2005-06-08, 3:57 am |
| On 6/7/05, Chris Devers wrote:
>=20
> Which matches a dot /./ -- which is a metacharacter meaning "matches
> anything at all"=20
Not quite correct - a dot (".") matches "any single character", not
"anything at all", and even this rule has an exception - a dot will
not match a newline ("\n") unless you use the "s" modifier.
>=20
> In other words, it will turn this string --
>=20
> abc
>=20
> -- into this
>=20
> officers-gasenate.htmlofficers-gasenate.htmlofficers-gasenate.html
>=20
No, it will turn "abc" into:
officers-gasenate.htmlbc
Unless you use the "g" modifier.
>=20
> Thus, the regex should be something like this:
> s|\./officers-gasenate\.html|http://www\.legis\.state\.ga\.us/cgi-bin=
/peo_list\.pl\?List=3Dstsenatedl|
>=20
There's no need to escape metachars in the replacement part. Without
modifiers (such as "e" or "x") the replacement part is treated as a
simple double-quoted string (delimiter dependent).
So the s/// can be written as:
s|\./officers-gasenate\.html|http://www.legis.state.ga.us/cgi-bin/peo_list.=
pl?List=3Dstsenatedl|
Cheers,
--=20
Offer Kaye
| |
| Manav Mathur 2005-06-08, 3:57 am |
|
Yes. As your requirements stated are not very specific, I'd recommend you to
see the entries for
$/ (in perldco perlvar)
/s modifier for s/// operator.
|-----Original Message-----
|From: Cy Kurtz [mailto:ckurtz11@comcast.net]
|Sent: Tuesday, June 07, 2005 6:21 PM
|To: beginners@perl.org
|Subject: find and replace large blocks
|
|
|Is it possible to use s/foo/bar in another way to allow replacement of
|large blocks of text with spaces, quotes, and double quotes?
|
|Is there a better way?
|
|Thank you,
|
|Cy Kurtz
|
|
|
|--
|To unsubscribe, e-mail: beginners-unsubscribe@perl.org
|For additional commands, e-mail: beginners-help@perl.org
|<http://learn.perl.org/> <http://learn.perl.org/first-response>
|
|
****************************************
*****************
Disclaimer:
The contents of this E-mail (including the contents of the enclosure(s) or attachment(s) if any) are privileged and confidential material of MBT and should not be disclosed to, used by or copied in any manner by anyone other than the intended addressee(s)
. In case you are not the desired addressee, you should delete this message and/or re-direct it to the sender. The views expressed in this E-mail message (including the enclosure(s) or attachment(s) if any) are those of the individual sender, except wh
ere the sender expressly, and with authority, states them to be the views of MBT.
This e-mail message including attachment/(s), if any, is believed to be free of any virus. However, it is the responsibility of the recipient to ensure that it is virus free and MBT is not responsible for any loss or damage arising in any way from its us
e
****************************************
*****************
| |
| Ing. Branislav Gerzo 2005-06-08, 8:57 am |
| Offer Kaye [OK], on Tuesday, June 7, 2005 at 17:04 (+0300) contributed
this to our collective wisdom:
OK> There's no need to escape metachars in the replacement part. Without
OK> modifiers (such as "e" or "x") the replacement part is treated as a
OK> simple double-quoted string (delimiter dependent).
OK> So the s/// can be written as:
OK> s|\./officers-gasenate\.html|http://www.legis.state.ga.us/cgi-bi...List=stsenatedl|
why not use \Q in first part ?
--
How do you protect mail on web? I use http://www.2pu.net
[Us * "Not a problem."--Parker Lewis]
| |
| Xavier Noria 2005-06-08, 8:57 am |
| On Jun 8, 2005, at 8:44, Ing. Branislav Gerzo wrote:
> Offer Kaye [OK], on Tuesday, June 7, 2005 at 17:04 (+0300) contributed
> this to our collective wisdom:
>
> OK> There's no need to escape metachars in the replacement part.
> Without
> OK> modifiers (such as "e" or "x") the replacement part is treated
> as a
> OK> simple double-quoted string (delimiter dependent).
> OK> So the s/// can be written as:
> OK> s|\./officers-gasenate\.html|http://www.legis.state.ga.us/cgi-
> bin/peo_list.pl?List=stsenatedl|
>
> why not use \Q in first part ?
>
Well, you could.
In the shell prompt (which is where the OP was writing the
substitution) I prefer to avoid backslashes altogether, because
otherwise I need to double-check backslash rules of the very shell.
That's why I used "[.]".
-- fxn
|
|
|
|
|