For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > October 2005 > search and replace in html text









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author search and replace in html text
Matthias Leopold

2005-10-28, 7:56 am

hi

i want to search for a pattern in html text and replace only those
occurences that are not enclosed inside <> (html tag, not perl operator).
i was thinking of something like

$string =~ s/$pattern/test for <> or similar; else replace/ge;

another way could be splitting $string into an array, where html tags
and other text are separated elements, but i don't know how to do this
either

sounds like an easy task but i haven't managed to accomplish it yet.
thanks for your help

matthias

Jeff 'japhy' Pinyan

2005-10-28, 7:56 am

On Oct 28, Matthias Leopold said:

> i want to search for a pattern in html text and replace only those occurences
> that are not enclosed inside <> (html tag, not perl operator).


It's easiest to use a real HTML parser. Otherwise, you'll probably get
false positives and what-not.

> $string =~ s/$pattern/test for <> or similar; else replace/ge;


> sounds like an easy task but i haven't managed to accomplish it yet.


Parsing HTML (which really is what you're looking to do) always sounds
easy until you see how it's done correctly. Then you feel better off
using a module to do it for you. (Guilt-free, I might add.)

I'd suggest going to search.cpan.org and looking for HTML:: which should
yield several matches. I think HTML::TokeParser (or
HTML::TokeParser::Simple) would be your best bet here.

--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://www.perlmonks.org/ % have long ago been overpaid?
http://princeton.pm.org/ % -- Meister Eckhart
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com