For Programmers: Free Programming Magazines  


Home > Archive > PERL Modules > December 2007 > Questions about XML:LibXML









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Questions about XML:LibXML
Fergus McMenemie

2007-12-16, 7:59 am

I am currently migrating several large scripts from XML::DOM to
XML::LIBXML and the transistion has hightligted a number of problems.
Both modules have rather poorly documented aspects and the move to
XML::LIBXML is proving painful.

A typical use of either module to read XML goes along the lines of

my $parser =XML::LibXML->new();
my $tree = $parser->parse_file($metafile);

XML::LIBXML seems to require an extra step of

my $pubmeta = $tree->getDocumentElement();

I think my misunderstandings starts at the very first line.
- What this $parser object do? If I am parsing multiple files
do I need a seperate parser instance for each file?

- Having used parse_file() within a subroutine do I need to
keep it around or is it just $tree that needs kept?

- Or can I get away with just keeping $pubmeta in scope?

In answering or commenting on the above bear in mind that I am
opening lots of XML files, merging elements from them into a master.

Thanks in advance Fergus.
xhoster@gmail.com

2007-12-16, 6:59 pm

fergus@twig.demon.co.uk (Fergus McMenemie) wrote:
> I am currently migrating several large scripts from XML::DOM to
> XML::LIBXML and the transistion has hightligted a number of problems.


Out of my own curiosity, what is the driving force behind the change?

> Both modules have rather poorly documented aspects and the move to
> XML::LIBXML is proving painful.
>
> A typical use of either module to read XML goes along the lines of
>
> my $parser =XML::LibXML->new();
> my $tree = $parser->parse_file($metafile);
>
> XML::LIBXML seems to require an extra step of
>
> my $pubmeta = $tree->getDocumentElement();
>
> I think my misunderstandings starts at the very first line.
> - What this $parser object do?


It lets you set the default options to the parser. If you never do
that (and the docs don't even explain how you would go about doing it)
then it is pretty useless. Consider just another part of the bloat and rot
that seems to follow XML where ever it goes.

> If I am parsing multiple files
> do I need a seperate parser instance for each file?


No. But I would use one anyway. Creation of a XML::LibXML parser
is extremely light weight (unlike XML::DOM).

> - Having used parse_file() within a subroutine do I need to
> keep it around or is it just $tree that needs kept?
>
> - Or can I get away with just keeping $pubmeta in scope?


You can do even better, not even explicitly having the intermediaries:

my $pubmeta =
XML::LibXML->new()->parse_file($metafile)->getDocumentElement();


> In answering or commenting on the above bear in mind that I am
> opening lots of XML files, merging elements from them into a master.


Using the same parser over and over in XML::LibXML might save you around
one second for every few hundred thousand files or so. Not worth worrying
about in my book.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
Fergus McMenemie

2007-12-17, 7:01 pm

<xhoster@gmail.com> wrote:

> fergus@twig.demon.co.uk (Fergus McMenemie) wrote:
>
> Out of my own curiosity, what is the driving force behind the change?


I have had to deal with the addition of dublin core elements to some of
documents and XML::DOM does not support namespaces.

Thanks for the help on the other points, I will use it with more
confidence now!
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com