For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > July 2007 > Re: PDF to Text









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: PDF to Text
Jeff Pang

2007-07-13, 9:58 pm


--- Mike Lesser <exceptions@earthlink.net> wrote:

> Hi all. Like it says, I need to extract the content
> of a PDF file.
>
> I installed the tool pdftotext, and it works fine
> for my needs. I
> recall there was a very simple module that used this
> to extract text,
> but for the life of me, I can't find it on CPAN! Any
> leads? Using a
> command-line script in my own code makes me feel
> icky, but I guess
> I'll deal...
>


I found this source,is it suitable for you?
http://search.cpan.org/search?query...ct+pdf&mode=all



________________________________________
________________________________________
____
We won't tell. Get more on shows you hate to love
(and love to hate): Yahoo! TV's Guilty Pleasures list.
http://tv.yahoo.com/collections/265
Mr. Shawn H. Corey

2007-07-14, 7:58 am

Jeff Pang wrote:
> --- Mike Lesser <exceptions@earthlink.net> wrote:
>
>
> I found this source,is it suitable for you?
> http://search.cpan.org/search?query...ct+pdf&mode=all


Well, PDF::API2 is capable of reading and creating PDFs. The problem is
that the contents of a PDF is a description on how to write a document,
not just text. The contents are like a programming language with the
text as strings inside it. I know of no module that parses this
language so you can extract the text from it.

WARNING: PDF::API2 is huge.

CPAN: http://search.cpan.org/~areibens/PD...lib/PDF/API2.pm
SourceForge: http://sourceforge.net/projects/pdfapi2
mailing list: http://tech.groups.yahoo.com/group/...xt-pdf-modules/

--
Just my 0.00000002 million dollars worth,
Shawn

"For the things we have to learn before we can do them, we learn by
doing them."
Aristotle
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com