For Programmers: Free Programming Magazines  


Home > Archive > PERL Modules > April 2007 > Problem Parsing Huge XML file using XML::Twig









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Problem Parsing Huge XML file using XML::Twig
vikrant

2007-04-23, 9:57 pm

Hi,
I am trying to parse a Huge XMLfile using XML::Twig.The part of XML
file is as following:-
This is a sample code:-
-------------------------------------------------------------------------------------------------------------------------
<?xml version='1.0'?>
<StoreInfo>
<StoreName>AEC</StoreName>
<Products>
<Product>
<ProductID>21CR10.2</ProductID>
<ProductInfo name="abc" category="xyz">HUGE</ProductInfo>
<SupplierID>AEC</SupplierID>
<PurchasePrice>10.99</PurchasePrice>
<links>
<link>http://www.example.com</link>
<link>http://www.example2.com</link>
</links>
</Product>
<Product>
<ProductID>21CR11.2</ProductID>
<ProductInfo name="abcd" category="xyzd">ARROW</ProductInfo>
<SupplierID>AEC</SupplierID>
<PurchasePrice>10.49</PurchasePrice>
<links>
<link>http://www.example.com</link>
<link>http://www.example2.com</link>
</links>
</Product>
</Products>
</StoreInfo>
------------------------------------------------------------------------------------------------------------------------------------
Here,Product Tag repeating 2000 times in original file.

I am able to get the values of ProductID,SupplierID and
PurchasePrice using the following code.But,How do a get the value's at
"link" Node's ,attributes values and node value of ProductInfo NODE.
I know we can use XPath with XML::Twig but unfortunaly i am not able
to use it.So,please help me.Any document,link or refrences related to
it.I search a lot but failed to find.
-----------------------------------------------------------------------------------------------------------------------------
#!/bin/perl -w
use strict;
use XML::Twig;

my $t= new XML::Twig( TwigHandlers=> { Product => \&product});
$t->parsefile( 'sample.xml');
exit;
sub product
{ my ($t, $product)= @_;
my %product;
$product{id}= $product->field( 'ProductID');
$product{SupplierID}= $product->field( 'SupplierID');
$product{PurchasePrice}= $product->field( 'PurchasePrice');

print "$product{id}: $product{SupplierID} :$product{PurchasePrice}
\n";
$product->delete;
}
------------------------------------------------------------------------------------------------------------------------------
One strange thing i find accidently is that when i am removing the
"StoreInfo" tag from above XML code the following error coming on
screen.
Error:-
junk after document element at line 5, column 0, byte 53 at /usr/lib/
perl5/site_perl/5.8.8/i386-linux-thread-multi/XML/Parser.pm line 187

Any comments.

Sorry for band english :)

Regards,
Vikrant

mirod

2007-04-24, 3:58 am

vikrant wrote:
> Hi,
> I am trying to parse a Huge XMLfile using XML::Twig.The part of XML
> file is as following:-
> This is a sample code:-
> -------------------------------------------------------------------------------------------------------------------------
> <?xml version='1.0'?>
> <StoreInfo>
> <StoreName>AEC</StoreName>
> <Products>
> <Product>
> <ProductID>21CR10.2</ProductID>
> <ProductInfo name="abc" category="xyz">HUGE</ProductInfo>
> <SupplierID>AEC</SupplierID>
> <PurchasePrice>10.99</PurchasePrice>
> <links>
> <link>http://www.example.com</link>
> <link>http://www.example2.com</link>
> </links>
> </Product>
> <Product>
> <ProductID>21CR11.2</ProductID>
> <ProductInfo name="abcd" category="xyzd">ARROW</ProductInfo>
> <SupplierID>AEC</SupplierID>
> <PurchasePrice>10.49</PurchasePrice>
> <links>
> <link>http://www.example.com</link>
> <link>http://www.example2.com</link>
> </links>
> </Product>
> </Products>
> </StoreInfo>
> ------------------------------------------------------------------------------------------------------------------------------------
> Here,Product Tag repeating 2000 times in original file.
>
> I am able to get the values of ProductID,SupplierID and
> PurchasePrice using the following code.But,How do a get the value's at
> "link" Node's ,attributes values and node value of ProductInfo NODE.
> I know we can use XPath with XML::Twig but unfortunaly i am not able
> to use it.So,please help me.Any document,link or refrences related to
> it.I search a lot but failed to find.
> -----------------------------------------------------------------------------------------------------------------------------
> #!/bin/perl -w
> use strict;
> use XML::Twig;
>
> my $t= new XML::Twig( TwigHandlers=> { Product => \&product});
> $t->parsefile( 'sample.xml');
> exit;
> sub product
> { my ($t, $product)= @_;
> my %product;
> $product{id}= $product->field( 'ProductID');
> $product{SupplierID}= $product->field( 'SupplierID');
> $product{PurchasePrice}= $product->field( 'PurchasePrice');
>
> print "$product{id}: $product{SupplierID} :$product{PurchasePrice}
> \n";
> $product->delete;
> }
> ------------------------------------------------------------------------------------------------------------------------------


'field' is not the only method to get data from the data.
In your case you would use:

my $name= $product->first_child( 'ProductInfo')->att( 'name');

my $links= $product->first_child( 'links'); # the element links
my @links= map { $_->text } $links->children( 'link');

The tutorial at http://www.xmltwig.com/xmltwig/tutorial/index.html
(referenced in the README and at the top of the doc of the module)
gives more info about those methods.

> One strange thing i find accidently is that when i am removing the
> "StoreInfo" tag from above XML code the following error coming on
> screen.
> Error:-
> junk after document element at line 5, column 0, byte 53 at /usr/lib/
> perl5/site_perl/5.8.8/i386-linux-thread-multi/XML/Parser.pm line 187


If you remove the StoreInfo tag then the parser sees
<StoreName>AEC</StoreName> as the entire document, then dies, with an
appropriate error message, when it finds the rest of your original
document, and has no way of dealing with it, as it has already seen a
complete tree.

--
mirod
vikrant

2007-04-24, 9:57 pm


> 'field' is not the only method to get data from the data.
> In your case you would use:
>
> my $name= $product->first_child( 'ProductInfo')->att( 'name');
>
> my $links= $product->first_child( 'links'); # the element links
> my @links= map { $_->text } $links->children( 'link');
>
> The tutorial athttp://www.xmltwig.com/xmltwig/tutorial/index.html
> (referenced in the README and at the top of the doc of the module)
> gives more info about those methods.
>
>
> If you remove the StoreInfo tag then the parser sees
> <StoreName>AEC</StoreName> as the entire document, then dies, with an
> appropriate error message, when it finds the rest of your original
> document, and has no way of dealing with it, as it has already seen a
> complete tree.
>
> --
> mirod


Thanks for the answer.

Regards,
Vikrant

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com