Home > Archive > PERL Miscellaneous > May 2005 > How to clear all html tag in document?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
How to clear all html tag in document?
|
|
|
| How to clear all html tag in line (or document) ?
All tags have "<" on start, and ">" at the end of tag. Eg <table>, </table>,
<div align="left">, <td align="left" bgcolor="#000099"> ....
I make program that work character by character, and control if is start is
"<", and end ">".
Please help me I now those Perl programmers do that on easier way! How?
Thanks
| |
| Jürgen Exner 2005-05-28, 8:56 am |
| max wrote:
> How to clear all html tag in line (or document) ?
Is there anything wrong with the answer in the FAQ 'perldoc -q "remove
HTML"'
"How do I remove HTML from a string?"
> Please help me I now those Perl programmers do that on easier way!
> How?
Trivial. They just follow the suggestions in the FAQ.
jue
| |
| Brian McCauley 2005-05-28, 8:56 am |
|
max wrote:
> How to clear all html tag in line (or document) ?
See FAQ.
| |
| Fabian Pilkowski 2005-05-28, 3:57 pm |
| * max schrieb:
> How to clear all html tag in line (or document) ?
> All tags have "<" on start, and ">" at the end of tag. Eg <table>, </table>,
> <div align="left">, <td align="left" bgcolor="#000099"> ....
> I make program that work character by character, and control if is start is
> "<", and end ">".
> Please help me I now those Perl programmers do that on easier way! How?
I suggest to use the module HTML::Strip, it is doing exactly what you
want. Have a look at
http://search.cpan.org/~kilinrax/HT...p-1.04/Strip.pm
With that all you have to do is something like
my $html = "<div><b>foo</b> bar</div> baz";
my $text = HTML::Strip->new->parse( $html );
print $text;
__END__
foo bar baz
regards,
fabian
|
|
|
|
|