Home > Archive > PERL Miscellaneous > July 2005 > Strange behaviour when parsing a XML file
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Strange behaviour when parsing a XML file
|
|
| Francesco Moi 2005-07-26, 5:02 pm |
| Hi.
I want to parse these XML contents:
http://news.search.yahoo.com/news/rss?va=linux (This is a RSS file)
I tried with:
----------------
use LWP::Simple qw($ua get);
use LWP::Simple qw($ua head);
use HTML::TokeParser;
use LWP::UserAgent;
my $Url = "http://news.search.yahoo.com/news/rss?va=linux";
my $content = get($Url);
$parser=HTML::TokeParser->new(\$content);
while (my $token = $parser->get_token) {
my $tag_type = shift @{ $token };
if ($tag_type eq 'S') {
my($tag, $attr, $attrseq, $rawtxt) = @{ $token };
if ($tag eq 'title'){$title =
$parser->get_trimmed_text("/title");}
if ($tag eq 'link'){$link = $parser->get_trimmed_text("/link");}
if ($tag eq 'description'){
$description = $parser->get_trimmed_text("/description");
print "$title - $link - $description\n\n";}}}
------------
But I get this information:
---------
<![CDATA[Foo_Title]]> - Foo_Url -
--------
"<![CDATA" appears (no idea about its meaning) and no data about
description.
However if I substitute
"http://news.search.yahoo.com/news/rss?va=linux" with
"http://www.boingboing.net/index.rdf", it works OK.
Whay am I doing wrong? Regards.
| |
| Tad McClellan 2005-07-26, 5:02 pm |
| Francesco Moi <franscescomoi@usa.com> wrote:
> "<![CDATA" appears (no idea about its meaning)
http://www.google.com/search?hl=en&...2marked+section
> Whay am I doing wrong?
Nothing that I can see.
Did you want the code to do something different from what it is doing?
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
| |
| Francesco Moi 2005-07-27, 5:05 pm |
| Hi Tad.
Yes, I would like to get 'Foo_Title' instead of '
<![CDATA[Foo_Title]]>', and 'Foo_Description' instead of nothing.
Regards.
|
|
|
|
|