Code Comments
Programming Forum and web based access to our favorite programming groups.OK. I am now desperate. I have written a sub routine to slipt up large
(~2-3MB) XML documents into seperate documents. When I use $twig->
parsefile I get the following error:
"not well-formed (invalid token) at line 27072, column 1934, byte 878399
at C:/Perl/site/lib/XML/Parser.pm line 187"
When I change to $twig->safe_parsefile I can parse the document, but it
only gets a portion of the document (~38 of 83 elements).
I am the first to admit that I am not a Perl hack by trade, so please
don't rape me for my code sample. I should also mention that this code
worked great on smaller files ( <300k ).
Any help/suggestions would be greatly appreciated.
Brendan
sub splitFiles {
my $fPath = $_[0];
my $twig= new XML::Twig;
&logMessage("DEBUG - Build the Twig for " . $fPath);
$twig->safe_parsefile($fPath); # build the twig
&logMessage("DEBUG - I can parse the file");
my $root = $twig->root; # get the root of the twig
(vdf_metadata_list)
&logMessage("DEBUG - Videos: ". $root->children_count);
my @videos = $root->children; # put the vdf_metadata elements into
an array
if (scalar @videos > 0 ) {
&logMessage("DEBUG - Number of videos is " . scalar @videos);
my $i = 0;
foreach my $video (@videos) {
$i++;
my $timeStamp = gettimeofday;
my $tmpPath = "$tmpDir".$timeStamp.$i;
my $FH;
open($FH, ">$tmpPath") || die("cannot open file: " . $!);
$video->print($FH);
close (FH);
}
} else {
&logMessage("DEBUG - Skipping file " . $fPath);
}
}
Post Follow-up to this messagec0rk wrote: > OK. I am now desperate. I have written a sub routine to slipt up large > (~2-3MB) XML documents into seperate documents. When I use $twig-> > parsefile I get the following error: > > "not well-formed (invalid token) at line 27072, column 1934, byte 878399 > at C:/Perl/site/lib/XML/Parser.pm line 187" Well, in the absense of any evidence to the contrary I'm be inclined to accept that at face value. Do you have a reason to disbelive it?
Post Follow-up to this messagec0rk <pam4prezNOSPAM@hotmail.com> wrote: > When I use $twig-> > parsefile I get the following error: > > "not well-formed (invalid token) at line 27072, column 1934, byte 878399 > at C:/Perl/site/lib/XML/Parser.pm line 187" This message means that there is something wrong with the _data_ rather than with the code. Open the data file to the 1934th character on the 27072nd line and see what it is that makes it invalid XML. -- Tad McClellan SGML consulting tadmc@augustmail.com Perl programming Fort Worth, Texas
Post Follow-up to this messageBrian McCauley <nobull@mail.com> wrote in news:cj46h1$v2m$1@slavica.ukpost.com: > > > c0rk wrote: > > Well, in the absense of any evidence to the contrary I'm be inclined > to accept that at face value. > > Do you have a reason to disbelive it? > Brian You know - I have been working on this script since Thursday, trying to determine _my_ problem. When I saw this error, I took it as there was an error in my processing method (i.e. memory problem). For whatever reason, I just didn't read the error message for what it was. Turns out that the XML has bad characters in it. I replaced those characters and my script processed a 3MB file in seconds. Many thanks for your response! -c
Post Follow-up to this messageTad McClellan <tadmc@augustmail.com> wrote in news:slrnclb93j.qpk.tadmc@magna.augustmail.com: > c0rk <pam4prezNOSPAM@hotmail.com> wrote: > > > > This message means that there is something wrong with the _data_ > rather than with the code. > > Open the data file to the 1934th character on the 27072nd line > and see what it is that makes it invalid XML. > > > Tad, thanks for the response. you are 100% correct. I replaced the bad characters at the specified location, and life is good!!! Thanks, -c
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.