For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > November 2005 > HTML::TokeParser, get HTML









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author HTML::TokeParser, get HTML
Ing. Branislav Gerzo

2005-11-22, 7:56 am

Hello all,

I'm using this great module for parsing HTML files. But I run into
trouble - I need get clear unchanged HTML code. Common example of
using this module is (snippet):

while(my $tag = $parser->get_tag('b')) {
my $text = $parser->get_text();
last if $text =~ /^(this and that|or that and this)/i;
}
my $text = $parser->get_text('b', 'b');

in $text I have text, but it is according to docs:
....Any entities will be converted to their corresponding character....

And that's the problem. I need get unchanged HTML (no entities
convertion).

Anyone could please help on this?

Thanks a lot guys.



Ing. Branislav Gerzo

2005-11-22, 7:56 am

Ing. Branislav Gerzo [IBG], on Tuesday, November 22, 2005 at 13:42
(+0100) thinks about :

IBG> while(my $tag = $parser->get_tag('b')) {
IBG> my $text = $parser->get_text();
IBG> last if $text =~ /^(this and that|or that and this)/i;
IBG> }
IBG> my $text = $parser->get_text('b', 'b');

IBG> in $text I have text, but it is according to docs:
IBG> ...Any entities will be converted to their corresponding character....

after some experiments I come to the solution:

while (my $token = $parser->get_token) {
last if $token->[0] eq 'S' and $token->[1] eq 'b';
print $token->[1] if $token->[0] eq 'T';
}

I hope it helps someone:)

--

How do you protect mail on web? I use http://www.2pu.net

[Klingon Talkshows include "O'Prah" and "D'nahue"]


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com