| DJ Stunks 2007-01-06, 7:04 pm |
| Wagner, David --- Senior Programmer Analyst --- WGO wrote:
> I have tried both PDF::API2 and CAM::PDF and I must be
> misunderstanding how to use these modules. Here is the way
> I attempted using CAM::PDF
>
> Source portion:
> =E2=80=A6
> use CAM::PDF;
> =E2=80=A6=E2=80=A6=E2=80=A6=E2=80=A6
>
> $MyPDF =3D CAM::PDF->new($MyFileIn); # a PDF file which has text
>
> $MyPDFPgCnt =3D $MyPDF->numPages();
>
> my $contentTree =3D $MyPDF->getPageContentTree(1);
> $contentTree->render("CAM::PDF::Renderer::Text");
>
> I get a lot of blank lines and the characters I do get, look like:
>
> 3 U L Q W =E2=99=A5 ' D W H =E2=86=94 =E2=99=A5 =C2=B6 =C2=A7 =E2=86=95 =
=C2=A7 =C2=A7 =E2=86=95 =C2=A7 =E2=80=BC =E2=80=BC =E2=86=93
>
>
> & K L O G =E2=99=A5 $ F F R X Q W V
> 7 L P H =E2=86=94 =E2=99=A5 =C2=B6 =C2=A7 =E2=86=
=94 =C2=B6 =E2=88=9F 3 0
I think your use of render() isn't right. This seems to work for me:
#!/usr/bin/perl
use strict;
use warnings;
use CAM::PDF;
use CAM::PDF::PageText;
my $filename =3D shift || die "Supply pdf on command line\n";
my $pdf =3D CAM::PDF->new($filename);
print text_from_page(1);
sub text_from_page {
my $pg_num =3D shift;
return
CAM::PDF::PageText->render($pdf->getPageContentTree($pg_num));
}
=20
__END__
HTH,
-jp
|