Home > Archive > PERL Miscellaneous > September 2004 > XML::Twig
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
|
| OK. I am now desperate. I have written a sub routine to slipt up large
(~2-3MB) XML documents into seperate documents. When I use $twig->
parsefile I get the following error:
"not well-formed (invalid token) at line 27072, column 1934, byte 878399
at C:/Perl/site/lib/XML/Parser.pm line 187"
When I change to $twig->safe_parsefile I can parse the document, but it
only gets a portion of the document (~38 of 83 elements).
I am the first to admit that I am not a Perl hack by trade, so please
don't rape me for my code sample. I should also mention that this code
worked great on smaller files ( <300k ).
Any help/suggestions would be greatly appreciated.
Brendan
sub splitFiles {
my $fPath = $_[0];
my $twig= new XML::Twig;
&logMessage("DEBUG - Build the Twig for " . $fPath);
$twig->safe_parsefile($fPath); # build the twig
&logMessage("DEBUG - I can parse the file");
my $root = $twig->root; # get the root of the twig
(vdf_metadata_list)
&logMessage("DEBUG - Videos: ". $root->children_count);
my @videos = $root->children; # put the vdf_metadata elements into
an array
if (scalar @videos > 0 ) {
&logMessage("DEBUG - Number of videos is " . scalar @videos);
my $i = 0;
foreach my $video (@videos) {
$i++;
my $timeStamp = gettimeofday;
my $tmpPath = "$tmpDir".$timeStamp.$i;
my $FH;
open($FH, ">$tmpPath") || die("cannot open file: " . $!);
$video->print($FH);
close (FH);
}
} else {
&logMessage("DEBUG - Skipping file " . $fPath);
}
}
| |
| Brian McCauley 2004-09-25, 3:56 pm |
|
c0rk wrote:
> OK. I am now desperate. I have written a sub routine to slipt up large
> (~2-3MB) XML documents into seperate documents. When I use $twig->
> parsefile I get the following error:
>
> "not well-formed (invalid token) at line 27072, column 1934, byte 878399
> at C:/Perl/site/lib/XML/Parser.pm line 187"
Well, in the absense of any evidence to the contrary I'm be inclined to
accept that at face value.
Do you have a reason to disbelive it?
| |
| Tad McClellan 2004-09-25, 3:56 pm |
| c0rk <pam4prezNOSPAM@hotmail.com> wrote:
> When I use $twig->
> parsefile I get the following error:
>
> "not well-formed (invalid token) at line 27072, column 1934, byte 878399
> at C:/Perl/site/lib/XML/Parser.pm line 187"
This message means that there is something wrong with the _data_
rather than with the code.
Open the data file to the 1934th character on the 27072nd line
and see what it is that makes it invalid XML.
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
| |
|
| Brian McCauley <nobull@mail.com> wrote in
news:cj46h1$v2m$1@slavica.ukpost.com:
>
>
> c0rk wrote:
>
> Well, in the absense of any evidence to the contrary I'm be inclined
> to accept that at face value.
>
> Do you have a reason to disbelive it?
>
Brian
You know - I have been working on this script since Thursday, trying to
determine _my_ problem. When I saw this error, I took it as there was an
error in my processing method (i.e. memory problem). For whatever reason, I
just didn't read the error message for what it was. Turns out that the XML
has bad characters in it. I replaced those characters and my script
processed a 3MB file in seconds.
Many thanks for your response!
-c
| |
|
| Tad McClellan <tadmc@augustmail.com> wrote in
news:slrnclb93j.qpk.tadmc@magna.augustmail.com:
> c0rk <pam4prezNOSPAM@hotmail.com> wrote:
>
>
>
> This message means that there is something wrong with the _data_
> rather than with the code.
>
> Open the data file to the 1934th character on the 27072nd line
> and see what it is that makes it invalid XML.
>
>
>
Tad,
thanks for the response. you are 100% correct. I replaced the bad
characters at the specified location, and life is good!!!
Thanks,
-c
|
|
|
|
|