For Programmers: Free Programming Magazines  


Home > Archive > PERL Modules > May 2006 > HTML::FormatText problem









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author HTML::FormatText problem
Emmett

2006-05-07, 7:01 pm

Hi,

I have a curious problem with HTML::FormatText and I wonder if anybody
can help me.

I have a bunch of patent documents in a local directory from which I am
extracting the title, abstract, etc for each patent to insert into a
MySQL database. The core lines of the script where I am having problems
are:

use HTML::FormatText;
......
my $plain_page =
HTML::FormatText->new-> format(parse_htmlfile($local_patent_file
))
....do regex stuff with $plain_page...

This works fine - except - it seems - when the patent document contains
the string "##STR1##" which is used in the patent documents to
represent a complex formula. This seems to kill HTML::FormatText, in
other words $plain_page is undefined.

Obviously '#' is used in Perl to represent a comment but I'm surprised
if it affects HTML::FormatText is such a simple way. Maybe ##X## does
something, I honestly don't know.

If anybody had any suggestions, opinions, work-arounds or alternative
suggestions I'd be very grateful.

Thanks

Emmett

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com