Home > Archive > PERL Beginners > November 2005 > HTML::TokeParser, get HTML
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
HTML::TokeParser, get HTML
|
|
| Ing. Branislav Gerzo 2005-11-22, 7:56 am |
| Hello all,
I'm using this great module for parsing HTML files. But I run into
trouble - I need get clear unchanged HTML code. Common example of
using this module is (snippet):
while(my $tag = $parser->get_tag('b')) {
my $text = $parser->get_text();
last if $text =~ /^(this and that|or that and this)/i;
}
my $text = $parser->get_text('b', 'b');
in $text I have text, but it is according to docs:
....Any entities will be converted to their corresponding character....
And that's the problem. I need get unchanged HTML (no entities
convertion).
Anyone could please help on this?
Thanks a lot guys.
| |
| Ing. Branislav Gerzo 2005-11-22, 7:56 am |
| Ing. Branislav Gerzo [IBG], on Tuesday, November 22, 2005 at 13:42
(+0100) thinks about :
IBG> while(my $tag = $parser->get_tag('b')) {
IBG> my $text = $parser->get_text();
IBG> last if $text =~ /^(this and that|or that and this)/i;
IBG> }
IBG> my $text = $parser->get_text('b', 'b');
IBG> in $text I have text, but it is according to docs:
IBG> ...Any entities will be converted to their corresponding character....
after some experiments I come to the solution:
while (my $token = $parser->get_token) {
last if $token->[0] eq 'S' and $token->[1] eq 'b';
print $token->[1] if $token->[0] eq 'T';
}
I hope it helps someone:)
--
How do you protect mail on web? I use http://www.2pu.net
[Klingon Talkshows include "O'Prah" and "D'nahue"]
|
|
|
|
|