For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > July 2005 > Strange behaviour when parsing a XML file









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Strange behaviour when parsing a XML file
Francesco Moi

2005-07-26, 5:02 pm

Hi.

I want to parse these XML contents:
http://news.search.yahoo.com/news/rss?va=linux (This is a RSS file)

I tried with:
----------------
use LWP::Simple qw($ua get);
use LWP::Simple qw($ua head);

use HTML::TokeParser;

use LWP::UserAgent;

my $Url = "http://news.search.yahoo.com/news/rss?va=linux";
my $content = get($Url);

$parser=HTML::TokeParser->new(\$content);

while (my $token = $parser->get_token) {

my $tag_type = shift @{ $token };

if ($tag_type eq 'S') {

my($tag, $attr, $attrseq, $rawtxt) = @{ $token };

if ($tag eq 'title'){$title =
$parser->get_trimmed_text("/title");}
if ($tag eq 'link'){$link = $parser->get_trimmed_text("/link");}
if ($tag eq 'description'){
$description = $parser->get_trimmed_text("/description");
print "$title - $link - $description\n\n";}}}
------------

But I get this information:
---------
<![CDATA[Foo_Title]]> - Foo_Url -
--------

"<![CDATA" appears (no idea about its meaning) and no data about
description.

However if I substitute
"http://news.search.yahoo.com/news/rss?va=linux" with
"http://www.boingboing.net/index.rdf", it works OK.

Whay am I doing wrong? Regards.

Tad McClellan

2005-07-26, 5:02 pm

Francesco Moi <franscescomoi@usa.com> wrote:

> "<![CDATA" appears (no idea about its meaning)



http://www.google.com/search?hl=en&...2marked+section


> Whay am I doing wrong?



Nothing that I can see.

Did you want the code to do something different from what it is doing?


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
Francesco Moi

2005-07-27, 5:05 pm

Hi Tad.

Yes, I would like to get 'Foo_Title' instead of '
<![CDATA[Foo_Title]]>', and 'Foo_Description' instead of nothing.

Regards.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com