Home > Archive > PERL Beginners > June 2005 > xml parser
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| Dermot Paikkos 2005-06-08, 8:56 pm |
| Hi,
I need to create a cgi program that will parse an xml file for
output. I haven't began to write anything yet but had a look at the
xml file I am going to work with. The snippet below contains 3
records.
My question is what would be the best (easiest/fastest) way to parse
the file. Should I try and set the record separator to something like
$/ = "<record>\n";
Or should I use a XML::Parser (which looks a bit scary) or
XML::Simple (my first attempt at which failed).
I think i will eventally need to create a data structure like:
$hashref = {
'number' => 'S370/0128',
'Date' => '03-Jun-05',
....etc
push(@myarray,$hashref);
Does anyone have an opinion on what would be the easiest way to parse
the data below? Will the '/' in the data be an problem?
Thanx.
Dermot.
============== DATA ==========
<record>
<CT.NUM>S370/0128</CT.NUM>
<CT.DAT>03-Jun-05</CT.DAT>
<CT.UPD>03-Jun-05</CT.UPD>
<CT.ORI>IN</CT.ORI>
<CT.ALN>0</CT.ALN>
<CT.UKN>0</CT.UKN>
<CT.TI1>Apollo mission </CT.TI1>
<CT.DSC>Apollo mission </CT.DSC>
<CT.C01>CREDIT: NASA</CT.C01>
<CT.C03>In December of 1972, Apollo 17 astronauts Eugene </CT.C03>
<CT.C04>Cernan and Harrison Schmitt spent about 75 hours </CT.C04>
<CT.C05>on the moon, in the Taurus-Littrow valley, while </CT.C05>
<CT.C06>colleague Ronald Evans orbited overhead. THe </CT.C06>
<CT.C07>Apollo 17 crew returned with 110 kilograms of rock </CT.C07>
<CT.C08>and soil samples, more than from any other lunar </CT.C08>
<CT.C09>landing sites. And after thirty plus years, Cernan </CT.C09>
<CT.C10>and Schmitt are still the last to walk on the </CT.C10>
<CT.C11>Moon.</CT.C11>
<CT.MBS>41</CT.MBS>
<CT.COL></CT.COL>
<CT.WPL></CT.WPL>
<CT.SU1>SQUARE</CT.SU1>
<CT.SU2>MOON LANDING</CT.SU2>
<CT.SU3>ASTRONAUTS</CT.SU3>
<CT.SU4>NASA</CT.SU4>
<CT.SU5>LUNAR EXPLORATION</CT.SU5>
<CT.SU6>SPACE PROGRAM</CT.SU6>
<CT.SU7>SPACE</CT.SU7>
<CT.SU8>MOON</CT.SU8>
<CT.SU9>APOLLO 17</CT.SU9>
<CT.SU10>RONALD EVANS</CT.SU10>
<CT.SU11>EUGENE CERNAN</CT.SU11>
<CT.SU12>EXPLORATION</CT.SU12>
<CT.SU13>HARRISON SCHMITT</CT.SU13>
<CT.SU14>MANNED</CT.SU14>
<CT.SU15>SPACEFLIGHT</CT.SU15>
<CT.SU16>SPACE</CT.SU16>
<CT.SU17>APOLLO</CT.SU17>
<CT.SU18>PROGRAM</CT.SU18>
<CT.SU19>PROGRAMME</CT.SU19>
<CT.UPD>03-Jun-05</CT.UPD>
<CT.DUP></CT.DUP>
<CT.PHO>XNS</CT.PHO>
<PH.NAM>NASA</PH.NAM>
<RS.CY2>* UK</RS.CY2>
<RS.CY3>* EIRE</RS.CY3>
<RS.CY4>* BAHRAIN</RS.CY4>
<RS.CY5>* EGYPT</RS.CY5>
<RS.CY6>* HONG KONG</RS.CY6>
<RS.CY7>* ICELAND</RS.CY7>
<RS.CY8>* MALAYSIA</RS.CY8>
<RS.CY9>* SINGAPORE</RS.CY9>
<RS.CY10>* SAUDI ARABIA</RS.CY10>
<RS.CY11>* SOUTH AFRICA</RS.CY11>
<RS.CY12>* TAIWAN</RS.CY12>
<RS.CY13>* THAILAND</RS.CY13>
<RS.CY14>* UNITED ARAB EMIRATES</RS.CY14>
<CT.PSD></CT.PSD>
<CT.PSI></CT.PSI>
<CT.REF>AV120A</CT.REF>
<CT.FOR></CT.FOR>
<CT.DFM></CT.DFM>
</record>
<record>
<CT.NUM>S375/0045</CT.NUM>
<CT.DAT>22-Oct-92</CT.DAT>
<CT.UPD>03-Jun-05</CT.UPD>
<CT.ORI>IN</CT.ORI>
<CT.ALN>10</CT.ALN>
<CT.UKN>4</CT.UKN>
<CT.TI1> </CT.TI1>
<CT.DSC>Night launch of Apollo</CT.DSC>
<CT.C01>CREDIT: NASA</CT.C01>
<CT.C03>Launch of Apollo 17. The Saturn V rocket carrying </CT.C03>
<CT.C04>Apollo 17 blasts into the night sky at Cape </CT.C04>
<CT.C05>Canaveral on 7 December 1972. This was the only </CT.C05>
<CT.C06>night launch of the Apollo Lunar programme. Apollo </CT.C06>
<CT.C07>17 carried astronauts Eugene Cernan and Harrison </CT.C07>
<CT.C08>Schmitt to the Moon, Ronald Evans remaining in the </CT.C08>
<CT.C09>command module in lunar orbit. Cernan and Schmitt </CT.C09>
<CT.C10>landed in the Taurus-Littrow region on 11 December </CT.C10>
<CT.C11>and left on 13 December. Cernan was the last man </CT.C11>
<CT.C12>to stand on the Moon's surface. Apollo 17 returned </CT.C12>
<CT.C13>to Earth on 19 December 1972 - the end of manned </CT.C13>
<CT.C14>Lunar exploration for the time being.</CT.C14>
<CT.MBS>46</CT.MBS>
<CT.COL>C</CT.COL>
<CT.WPL></CT.WPL>
<CT.SU1>APOLLO 17, LAUNCH, NIGHT</CT.SU1>
<CT.SU2>SATURN V, LAUNCH, APOLLO 17</CT.SU2>
<CT.SU3>ROCKET</CT.SU3>
<CT.SU4>MANNED SPACEFLIGHT, SPACE</CT.SU4>
<CT.SU5>APOLLO PROGRAM, PROGRAMME</CT.SU5>
<CT.UPD>03-Jun-05</CT.UPD>
<CT.DUP>13</CT.DUP>
<CT.PHO>NAS</CT.PHO>
<PH.NAM>NASA</PH.NAM>
<CT.PSD></CT.PSD>
<CT.PSI></CT.PSI>
<CT.REF>S72-55070</CT.REF>
<CT.FOR>8x10 print</CT.FOR>
<CT.DFM></CT.DFM>
</record>
<record>
<CT.NUM>S380/0286</CT.NUM>
<CT.DAT>03-Jun-05</CT.DAT>
<CT.UPD>03-Jun-05</CT.UPD>
<CT.ORI>IN</CT.ORI>
<CT.ALN>0</CT.ALN>
<CT.UKN>0</CT.UKN>
<CT.TI1>Apollo 14 lunar central station</CT.TI1>
<CT.DSC>Apollo 14 lunar central station</CT.DSC>
<CT.C01>CREDIT: NASA</CT.C01>
<CT.C03>East and north sides of the Central Station, with </CT.C03>
<CT.C04>good definition of the astronaut switches at the </CT.C04>
<CT.C05>bottom. Apollo 14 was the third mission in which </CT.C05>
<CT.C06>humans walked on the lunar surface and returned to </CT.C06>
<CT.C07>Earth. On 5 February 1971 two astronauts (Apollo </CT.C07>
<CT.C08>14 Commander Alan B. Shepard, Jr. and LM pilot </CT.C08>
<CT.C09>Edgar D. Mitchell) landed near Fra Mauro crater on </CT.C09>
<CT.C10>the Moon in the Lunar Module (LM) while the </CT.C10>
<CT.C11>Command and Service Module (CSM) (with CM pilot </CT.C11>
<CT.C12>Stuart A. Roosa) continued in lunar orbit. During </CT.C12>
<CT.C13>their stay on the Moon, the astronauts set up </CT.C13>
<CT.C14>scientific experiments, took photographs, and </CT.C14>
<CT.C15>collected lunar samples. The LM took off from the </CT.C15>
<CT.C16>Moon on 6 February and the astronauts returned to </CT.C16>
<CT.C17>Earth on 9 February.</CT.C17>
<CT.MBS>48</CT.MBS>
<CT.COL></CT.COL>
<CT.WPL></CT.WPL>
<CT.SU1>PORTRAIT</CT.SU1>
<CT.SU2>SPACE</CT.SU2>
<CT.SU3>LUNAR EXPERIMENTS</CT.SU3>
<CT.SU4>APOLLO 14</CT.SU4>
<CT.SU5>MOON</CT.SU5>
<CT.SU6>LUNAR EXPERIMENT</CT.SU6>
<CT.SU7>NASA</CT.SU7>
<CT.SU8>SPACE PROGRAM</CT.SU8>
<CT.SU9>MANNED</CT.SU9>
<CT.SU10>SPACEFLIGHT</CT.SU10>
<CT.SU11>SPACE</CT.SU11>
<CT.SU12>APOLLO</CT.SU12>
<CT.SU13>PROGRAM</CT.SU13>
<CT.SU14>PROGRAMME</CT.SU14>
<CT.UPD>03-Jun-05</CT.UPD>
<CT.DUP></CT.DUP>
<CT.PHO>XNS</CT.PHO>
<PH.NAM>NASA</PH.NAM>
<RS.CY2>* UK</RS.CY2>
<RS.CY3>* EIRE</RS.CY3>
<RS.CY4>* BAHRAIN</RS.CY4>
<RS.CY5>* EGYPT</RS.CY5>
<RS.CY6>* HONG KONG</RS.CY6>
<RS.CY7>* ICELAND</RS.CY7>
<RS.CY8>* MALAYSIA</RS.CY8>
<RS.CY9>* SINGAPORE</RS.CY9>
<RS.CY10>* SAUDI ARABIA</RS.CY10>
<RS.CY11>* SOUTH AFRICA</RS.CY11>
<RS.CY12>* TAIWAN</RS.CY12>
<RS.CY13>* THAILAND</RS.CY13>
<RS.CY14>* UNITED ARAB EMIRATES</RS.CY14>
<CT.PSD></CT.PSD>
<CT.PSI></CT.PSI>
<CT.REF>AV064A</CT.REF>
<CT.FOR></CT.FOR>
<CT.DFM></CT.DFM>
</record>
| |
| Chris Devers 2005-06-08, 8:56 pm |
| On Wed, 8 Jun 2005, Dermot Paikkos wrote:
> My question is what would be the best (easiest/fastest) way to parse
> the file. Should I try and set the record separator to something like
>
> $/ = "<record>\n";
>
> Or should I use a XML::Parser (which looks a bit scary) or
> XML::Simple (my first attempt at which failed).
Yes. Use one of the modules, probably XML::Simple. It may take some
learning to get started with, but it will be *much* more robust than
anything you hand-roll using regular expressions.
"Easiest" can be a matter of perspective; plunging in with regexes can
seem easiest to get started with, but it'll be much harder to finish
that way, and much, much harder to maintain if the data format ever
changes -- which they all always do, sooner or later. If you learn how
to use a parsing module and get it to do the work for you, it will be
much simpler to write and maintain in the long run.
If you like, you can show [relevant sections from] your first attempt at
using XML::Simple to the list, and we can help get you through using it.
--
Chris Devers
| |
| Wiggins d'Anconia 2005-06-08, 8:56 pm |
| Dermot Paikkos wrote:
> Hi,
>
> I need to create a cgi program that will parse an xml file for
> output. I haven't began to write anything yet but had a look at the
> xml file I am going to work with. The snippet below contains 3
> records.
>
> My question is what would be the best (easiest/fastest) way to parse
> the file. Should I try and set the record separator to something like
>
> $/ = "<record>\n";
>
> Or should I use a XML::Parser (which looks a bit scary) or
> XML::Simple (my first attempt at which failed).
>
Use a module, hands down. XML::Simple should work well, can you show us
your code and any failure messages and I bet we could set it straight.
> I think i will eventally need to create a data structure like:
>
> $hashref = {
> 'number' => 'S370/0128',
> 'Date' => '03-Jun-05',
> ....etc
>
> push(@myarray,$hashref);
>
>
> Does anyone have an opinion on what would be the easiest way to parse
> the data below? Will the '/' in the data be an problem?
A module, specifically an XML parser. '/' shouldn't be a problem
assuming it is correctly built XML.
> Thanx.
> Dermot.
>
http://danconia.org
> ============== DATA ==========
[snip]
|
|
|
|
|