For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > July 2004 > splitting large xml file









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author splitting large xml file
Sean Davis

2004-07-22, 8:56 pm

I have a very large (200Mb) XML file that consists of multiple records. I
would like to split these records up and store the XML for each in a
database for quick retrieval. I simply need to echo all of the XML between
the enclosing record tags into the database. Ideally, I would use SAX to
parse things, but I can't figure out how to echo the data back out exactly
as I got it. Any clues?

Thanks,
Sean



Rob Hanson

2004-07-22, 8:56 pm

> Ideally, I would use SAX to parse things

Optionally you could look at XML::RAX.

Article on the RAX concept:
http://www.xml.com/pub/a/2000/04/26/rax/index.html

RAX allows you to specify a record seperator (a tag in the XML file), and
splits into into chunks of that tag. It is stream based so it only reads in
as much of the file it needs to construct the next record. It only applies
to XML files that fit that type of format though (like RSS). At the very
least you might find the code helpful.

> but I can't figure out how to echo the data
> back out exactly as I got it.


I'm not sure I completely understand. Anyway I am out of here today, hope
you find an answer.

Rob


-----Original Message-----
From: Sean Davis [mailto:sdavis2@mail.nih.gov]
Sent: Thursday, July 22, 2004 5:42 PM
To: beginners@perl.org
Subject: splitting large xml file


I have a very large (200Mb) XML file that consists of multiple records. I
would like to split these records up and store the XML for each in a
database for quick retrieval. I simply need to echo all of the XML between
the enclosing record tags into the database. Ideally, I would use SAX to
parse things, but I can't figure out how to echo the data back out exactly
as I got it. Any clues?

Thanks,
Sean




--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Sean Davis

2004-07-23, 8:56 am

Rob,

Thanks for replying. I ended up answering my own question. I used
XML::Twig to find chunks I was interested in, could grab indexing
information from the twig, then save the indices in a database for
later lookup of the entire XML record and...presto, random-access of
200 Mb of XML!

Sean

On Jul 22, 2004, at 7:06 PM, Hanson, Rob wrote:

>
> Optionally you could look at XML::RAX.
>
> Article on the RAX concept:
> http://www.xml.com/pub/a/2000/04/26/rax/index.html
>
> RAX allows you to specify a record seperator (a tag in the XML file),
> and
> splits into into chunks of that tag. It is stream based so it only
> reads in
> as much of the file it needs to construct the next record. It only
> applies
> to XML files that fit that type of format though (like RSS). At the
> very
> least you might find the code helpful.
>
>
> I'm not sure I completely understand. Anyway I am out of here today,
> hope
> you find an answer.
>
> Rob
>
>
> -----Original Message-----
> From: Sean Davis [mailto:sdavis2@mail.nih.gov]
> Sent: Thursday, July 22, 2004 5:42 PM
> To: beginners@perl.org
> Subject: splitting large xml file
>
>
> I have a very large (200Mb) XML file that consists of multiple
> records. I
> would like to split these records up and store the XML for each in a
> database for quick retrieval. I simply need to echo all of the XML
> between
> the enclosing record tags into the database. Ideally, I would use SAX
> to
> parse things, but I can't figure out how to echo the data back out
> exactly
> as I got it. Any clues?
>
> Thanks,
> Sean
>
>
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com