Home > Archive > PERL Beginners > August 2007 > XML to inMemory Hash
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
XML to inMemory Hash
|
|
| Prabu Ayyappan 2007-08-01, 4:00 am |
| Hi,
I want to convert a huge XML file into an inMemory Hash.
I tried using XML::Simple. But its taking huge memory space and time to convert it into Hash.
While loading a XML file of 300MB its taking more memory space and time.
Is there any better method to load this XML to memory ?
Is there any benchmarks available ?
Please guide me.
Thanks in advance,
Prabu
---------------------------------
Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when.
| |
|
|
"Prabu Ayyappan" <prabu.ayyappan@yahoo.com> wrote in message news:67529.17673.qm@web57107.mail.re3.yahoo.com...
> Hi,
>
> I want to convert a huge XML file into an inMemory Hash.
>
> I tried using XML::Simple. But its taking huge memory space and time to convert it into Hash.
>
> While loading a XML file of 300MB its taking more memory space and time.
>
> Is there any better method to load this XML to memory ?
>
> Is there any benchmarks available ?
>
> Please guide me.
>
> Thanks in advance,
> Prabu
>
>
> ---------------------------------
>
If you find that XML::Simple is very slow, the most likely reason is that you have XML::SAX installed but no additional SAX parser module.
The XML::SAX distribution includes an XML parser written entirely in Perl - very portable but not very fast.
If you have XML::LibXML or XML::Parser installed you can set the $XML::Simple::PREFERRED_PARSER variable to tell it to use an other parser.
Add following modification and time will be reduced drastically.
use XML::Simple; $XML::Simple::PREFERRED_PARSER = 'XML::Parser';
Regards
John
| |
| yaron@kahanovitch.com 2007-08-01, 7:59 am |
| Hi,
Did you try to use XML::TrePP. It seems to fit your needs.
http://search.cpan.org/~kawasaki/XM...b/XML/TreePP.pm
Yaron Kahanovitch
----- Original Message -----
From: "Prabu Ayyappan" <prabu.ayyappan@yahoo.com>
To: "perl Beginner" <beginners@perl.org>
Sent: Wednesday, August 1, 2007 9:51:49 AM (GMT+0200) Auto-Detected
Subject: XML to inMemory Hash
Hi,
I want to convert a huge XML file into an inMemory Hash.
I tried using XML::Simple. But its taking huge memory space and time to convert it into Hash.
While loading a XML file of 300MB its taking more memory space and time.
Is there any better method to load this XML to memory ?
Is there any benchmarks available ?
Please guide me.
Thanks in advance,
Prabu
---------------------------------
Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when.
| |
| Rob Dixon 2007-08-01, 7:59 am |
| Prabu Ayyappan wrote:
>
> I want to convert a huge XML file into an inMemory Hash.
>
> I tried using XML::Simple. But its taking huge memory space and time to convert it into Hash.
>
> While loading a XML file of 300MB its taking more memory space and time.
>
> Is there any better method to load this XML to memory ?
>
> Is there any benchmarks available ?
If your file is huge and you want it in memory then you must expect it to take
take a while to parse and occupy a huge memory space. Almost anything is better
than XML::Simple, but no module can easily make your data any smaller. You should
reassess your solution to see if you can work with only part of the data in memory.
I suggest you look at XML::Twig for processing XML in chunks. XML::LibXML is also
very good and you may find it faster.
HTH,
Rob
| |
| Jenda Krynicky 2007-08-02, 7:59 am |
| From: Rob Dixon <rob.dixon@350.com>
> Prabu Ayyappan wrote:
>
> If your file is huge and you want it in memory then you must expect it to take
> take a while to parse and occupy a huge memory space. Almost anything is better
> than XML::Simple, but no module can easily make your data any smaller.
Actually not. Almost any other module that parses the whole file and
then gives you the data will take up more memory. XML::Simple ignores
a lot of the details that it thinks are unimportant and produces a
farily minimal structure with no links to siblings, no complex
blessed structures etc.
You'd need to use something that gives you more control over the
generated structure, so that you can skip unimportant tags and
attributes etc. (Yeah, shameless plug. XML::Rules)
> You should
> reassess your solution to see if you can work with only part of the data in memory.
> I suggest you look at XML::Twig for processing XML in chunks.
Agreed. If you can work in chunks you'd definitely need much less
memory.
Jenda
===== Jenda@Krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery
| |
| Prabu Ayyappan 2007-08-03, 10:01 pm |
| Hi All,
Thanks for the valuable inputs,
I am planning to solve my memory issue by making the XML into Chunks
using Xml::Twig and using simplify( ) in XML::Twig to convert that chunks back into the Hash data structure.
Is that advicable method to proceed to reduce my Memory size.So that after processing that chunk i will purge my memory (purge in XML::Twig)
Thanks in advance,
Prabu
Jenda Krynicky <Jenda@Krynicky.cz> wrote: From: Rob Dixon
> Prabu Ayyappan wrote:
>
> If your file is huge and you want it in memory then you must expect it to take
> take a while to parse and occupy a huge memory space. Almost anything is better
> than XML::Simple, but no module can easily make your data any smaller.
Actually not. Almost any other module that parses the whole file and
then gives you the data will take up more memory. XML::Simple ignores
a lot of the details that it thinks are unimportant and produces a
farily minimal structure with no links to siblings, no complex
blessed structures etc.
You'd need to use something that gives you more control over the
generated structure, so that you can skip unimportant tags and
attributes etc. (Yeah, shameless plug. XML::Rules)
> You should
> reassess your solution to see if you can work with only part of the data in memory.
> I suggest you look at XML::Twig for processing XML in chunks.
Agreed. If you can work in chunks you'd definitely need much less
memory.
Jenda
===== Jenda@Krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
---------------------------------
Pinpoint customers who are looking for what you sell.
| |
| Jenda Krynicky 2007-08-04, 4:01 am |
| From: Prabu Ayyappan <prabu.ayyappan@yahoo.com>
> Thanks for the valuable inputs,
>
> I am planning to solve my memory issue by making the XML into Chunks
> using Xml::Twig and using simplify( ) in XML::Twig to convert that
> chunks back into the Hash data structure.
>
> Is that advicable method to proceed to reduce my Memory size.So that
> after processing that chunk i will purge my memory (purge in
> XML::Twig)
>
> Thanks in advance,
> Prabu
Yes, this should reduce the memory footprint while keeping the
necessary changes to the script small. I think in your case it's the
best solution.
Jenda
===== Jenda@Krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery
| |
| Prabu Ayyappan 2007-08-04, 7:01 pm |
| Yes its saving the memory.
But now am into a new problem :(
Its taking lot of time to parse a file.
Is there any way to create chunks so that it wont parse the entire file.
my requirement is to read the chunks based on an attribute.
I tried
my $twig = XML::Twig->new(twig_roots => {'node[@id="'.$ID.'"]' => \&chunks});
$twig->parse($showtimeString);
sub chunks{
my( $twig, $reOrganize)= @_;
$chunkedIXMLG = $_->sprint;
$twig->purge;
}
sub myprocess
{
print $chunkedIXMLG;
}
Also
my $twig = XML::Twig->new(twig_handlers => {'node[@id="'.$ID.'"]' => \&chunks});
$twig->parse($showtimeString);
sub chunks{
my( $twig, $reOrganize)= @_;
$chunkedIXMLG = $_->sprint;
$twig->purge;
}
Since it is reading the whole file each time to parse the node based on the ID.It is taking lot of time.
Thanks in advance,
Prabu
Jenda Krynicky <Jenda@Krynicky.cz> wrote:
From: Prabu Ayyappan
> Thanks for the valuable inputs,
>
> I am planning to solve my memory issue by making the XML into Chunks
> using Xml::Twig and using simplify( ) in XML::Twig to convert that
> chunks back into the Hash data structure.
>
> Is that advicable method to proceed to reduce my Memory size.So that
> after processing that chunk i will purge my memory (purge in
> XML::Twig)
>
> Thanks in advance,
> Prabu
Yes, this should reduce the memory footprint while keeping the
necessary changes to the script small. I think in your case it's the
best solution.
Jenda
===== Jenda@Krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
---------------------------------
Got a little couch potato?
Check out fun summer activities for kids.
| |
| Prabu Ayyappan 2007-08-04, 7:01 pm |
| Hi,
Yes its saving the memory.
But now am into a new problem :(
Its taking lot of time to parse a file.
Is there any way to create chunks so that it wont parse the entire file.
my requirement is to read the chunks based on an attribute.
I tried
my $twig = XML::Twig->new(twig_roots => {'node[@id="'.$ID.'"]' => \&chunks});
$twig->parse($showtimeString);
sub chunks{
my( $twig, $reOrganize)= @_;
$chunkedIXMLG = $_->sprint;
$twig->purge;
}
sub myprocess
{
print $chunkedIXMLG;
}
Also
my $twig = XML::Twig->new(twig_handlers => {'node[@id="'.$ID.'"]' => \&chunks});
$twig->parse($showtimeString);
sub chunks{
my( $twig, $reOrganize)= @_;
$chunkedIXMLG = $_->sprint;
$twig->purge;
}
Since it is reading the whole file each time to parse the node based on the ID.It is taking lot of time.I need to read a 100MB file each time to take the node.
Thanks in advance,
Prabu
Jenda Krynicky <Jenda@Krynicky.cz> wrote:
From: Prabu Ayyappan
> Thanks for the valuable inputs,
>
> I am planning to solve my memory issue by making the XML into Chunks
> using Xml::Twig and using simplify( ) in XML::Twig to convert that
> chunks back into the Hash data structure.
>
> Is that advicable method to proceed to reduce my Memory size.So that
> after processing that chunk i will purge my memory (purge in
> XML::Twig)
>
> Thanks in advance,
> Prabu
Yes, this should reduce the memory footprint while keeping the
necessary changes to the script small. I think in your case it's the
best solution.
Jenda
===== Jenda@Krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
---------------------------------
Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV.
|
|
|
|
|