Home > Archive > PERL Beginners > July 2007 > Re: Parsing large XML file - Revisited
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Re: Parsing large XML file - Revisited
|
|
| Mike Blezien 2007-07-21, 7:58 am |
| Rob,
----- Original Message -----
From: "Rob Dixon" <rob.dixon@350.com>
To: "Perl List" <beginners@perl.org>
Cc: "Mike Blezien" <mickalo@frontiernet.net>
Sent: Sunday, July 15, 2007 7:49 PM
Subject: Re: Parsing large XML file
> Mike Blezien wrote:
>
> Hi Mike
>
> Your application of XML::Twig seems exactly right. I'm not sure what it is you
> don't understand, but if you use this as your 'get_products' subroutine I hope
> it answers some questions. All it does is print the title of the product and
> the title of all the tracks in that product. Post again if you have any
> trouble
> understanding what I've written.
>
> sub get_products {
>
> my $product = $_;
>
> my $product_title = $product->first_child('title');
> print $product_title->trimmed_text, "\n";
>
> my $tracks = $product->first_child('tracks');
> return unless $tracks;
>
> foreach my $track ($tracks->children('track')) {
> my $track_title = $track->first_child('title');
> print ' ', $track_title->trimmed_text, "\n";
> }
>
> print "\n";
> }
we've run a few test and everything seems to be working as expected, but got one
little problem I haven't been able to figure out, we keep getting this error
(code snipt below)
----
Can't call method "first_child_text" on an undefined value
at .. /sample.cgi line 56 which is this line "my $tracknums =
$tracks->first_child_text('number_of_tracks');
----
a value for the "$tracknums" is returned and all other values are presented
after it parses the XML file. Haven't been able to figure out why I keep getting
this error??
########################################
####################################
my $twig = new XML::Twig(twig_handlers => { product => \&get_products });
$twig->parsefile("$xmlfile"); $twig->purge();
########################################
####################################
sub get_products {
my($t,$elt) = @_;
my($track_title,$trackno,$setno,$soundty
pe,$codec,$file);
# process each product loop.
my $article_number = $elt->first_child_text('article_number');
my $dist_number = $elt->first_child_text('distributor_number');
my $dist_name = $elt->first_child_text('distributor_name');
my $artist = $elt->first_child_text('artist');
my $ean_upc = $elt->first_child_text('ean_upc');
my $set_total = $elt->first_child_text('set_total');
my $tracks = $elt->first_child('tracks');
# LINE 56 here
my $tracknums = $tracks->first_child_text('number_of_tracks');
return unless $tracks;
for my $track ($tracks->children('track'))
{
$track_title = $track->first_child_text('title');
$trackno = $track->first_child_text('trackno');
$setno = $track->first_child_text('setno');
for my $sound ($track->children('sound'))
{
$soundtype = $sound->first_child_text('sound_type');
$codec = $sound->first_child_text('codec');
$file = $sound->first_child_text('file');~;
}
} # close for $track loop
# free up memory
$t->purge();
}
Mike
| |
| Rob Dixon 2007-07-21, 6:59 pm |
| Mike Blezien wrote:
>
> Rob Dixon wrote:
>
[snip old code][color=darkred]
>
> we've run a few test and everything seems to be working as expected,
> but got one little problem I haven't been able to figure out, we keep
> getting this error (code snipt below)
> ----
> Can't call method "first_child_text" on an undefined value
> at .. /sample.cgi line 56 which is this line "my $tracknums =
> $tracks->first_child_text('number_of_tracks');
> ----
> a value for the "$tracknums" is returned and all other values are
> presented after it parses the XML file. Haven't been able to figure out
> why I keep getting this error??
>
> ########################################
####################################
>
> my $twig = new XML::Twig(twig_handlers => { product => \&get_products });
> $twig->parsefile("$xmlfile");
> $twig->purge();
> ########################################
####################################
>
> sub get_products {
> my($t,$elt) = @_;
> my($track_title,$trackno,$setno,$soundty
pe,$codec,$file);
>
> # process each product loop.
> my $article_number = $elt->first_child_text('article_number');
> my $dist_number = $elt->first_child_text('distributor_number');
> my $dist_name = $elt->first_child_text('distributor_name');
> my $artist = $elt->first_child_text('artist');
> my $ean_upc = $elt->first_child_text('ean_upc');
> my $set_total = $elt->first_child_text('set_total');
>
> my $tracks = $elt->first_child('tracks');
> # LINE 56 here
> my $tracknums = $tracks->first_child_text('number_of_tracks');
>
> return unless $tracks;
>
> for my $track ($tracks->children('track'))
> {
> $track_title = $track->first_child_text('title');
> $trackno = $track->first_child_text('trackno');
> $setno = $track->first_child_text('setno');
>
> for my $sound ($track->children('sound'))
> {
> $soundtype = $sound->first_child_text('sound_type');
> $codec = $sound->first_child_text('codec');
> $file = $sound->first_child_text('file');~;
> }
>
> } # close for $track loop
> # free up memory
> $t->purge();
> }
Hello Mike
First of all, your call
$twig->parsefile("$xmlfile");
should properly be
$twig->parsefile($xmlfile);
as there is no point in forcing Perl to interpolate a string when the result
is simply the string itself.
Your code works fine on the sample 000001.xml file that you posted. What must
be happening is that there is a product in your live data that has no <tracks>
element. The line
return unless $tracks;
is meant to protect against this, but you have use the value of $tracks before
the check. If you change your code to:
my $tracks = $elt->first_child('tracks');
return unless $tracks;
my $numtracks = $tracks->first_child_text('number_of_tracks');
for my $track ($tracks->children('track')) {
:
}
then your warning should go away, although you may want to do more than just
ignore any products without a <tracks> tag.
An alternative method, which checks the actual number of tracks instead of
relying on the accuracy of the <number_of_tracks> value is to put the <track>
elements into an array and measure its size before iterating over it:
my $tracks = $elt->first_child('tracks');
return unless $tracks;
my @tracks = $tracks->children('track');
my $numtracks = @tracks;
for my $track (@tracks) {
:
}
Which of these techniques you choose is up to you.
HTH,
Rob
| |
| Mike Blezien 2007-07-21, 6:59 pm |
| Rob,
----- Original Message -----
From: "Rob Dixon" <rob.dixon@350.com>
To: "Perl List" <beginners@perl.org>
Cc: "Mike Blezien" <mickalo@frontiernet.net>
Sent: Saturday, July 21, 2007 12:12 PM
Subject: Re: Parsing large XML file - Revisited
> Mike Blezien wrote:
[snip]
[color=darkred]
>
> Hello Mike
>
> First of all, your call
>
> $twig->parsefile("$xmlfile");
>
> should properly be
>
> $twig->parsefile($xmlfile);
>
> as there is no point in forcing Perl to interpolate a string when the result
> is simply the string itself.
>
> Your code works fine on the sample 000001.xml file that you posted. What must
> be happening is that there is a product in your live data that has no <tracks>
> element. The line
>
> return unless $tracks;
>
> is meant to protect against this, but you have use the value of $tracks before
> the check. If you change your code to:
>
> my $tracks = $elt->first_child('tracks');
> return unless $tracks;
>
> my $numtracks = $tracks->first_child_text('number_of_tracks');
>
> for my $track ($tracks->children('track')) {
> :
> }
>
> then your warning should go away, although you may want to do more than just
> ignore any products without a <tracks> tag.
>
> An alternative method, which checks the actual number of tracks instead of
> relying on the accuracy of the <number_of_tracks> value is to put the <track>
> elements into an array and measure its size before iterating over it:
>
> my $tracks = $elt->first_child('tracks');
> return unless $tracks;
> my @tracks = $tracks->children('track');
>
> my $numtracks = @tracks;
>
> for my $track (@tracks) {
> :
> }
>
> Which of these techniques you choose is up to you.
That's exactly what the problem was, some of the XML files do not have the
<tracks> elements, which I wasn't aware of till just now. After making some code
changes, the error has gone away. :)
Again, thank you for your very useful help and information. Save me alot of
head-scratching!
Mike
| |
| Dr.Ruud 2007-07-22, 7:59 am |
| "Mike Blezien" schreef:
> my $article_number = $elt->first_child_text('article_number');
> my $dist_number = $elt->first_child_text('distributor_number');
> my $dist_name = $elt->first_child_text('distributor_name');
> my $artist = $elt->first_child_text('artist');
> my $ean_upc = $elt->first_child_text('ean_upc');
> my $set_total = $elt->first_child_text('set_total');
That looks awful. Isn't there some way with the module to do it cleaner?
Or do it more like:
my @text_tags = qw(article_number distributor_number etc);
my %data;
for my $tag (@text_tags) {
$data{_text}{$tag} = $elt->first_child_text($tag);
}
You may also want a %tag2caption, like:
my %tag2caption = (
number_of_tracks => 'tracknums',
);
to be able to print a nicer text (or even translation) for a tag.
Print as-is all the tag-names that don't have an alias, after maybe
replacing each underscore by a space.
--
Affijn, Ruud
"Gewoon is een tijger."
|
|
|
|
|