For Programmers: Free Programming Magazines  


Home > Archive > MSDN > August 2004 > pdf file to image file.









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author pdf file to image file.
Ravi

2004-08-10, 3:55 pm

Hi everyone,
I was wondering if there's any way to convert a pdf file to image
files (jpeg, bmp or tiff). I know there are a number of tools available out
there, but, was wondering what the logic is behind such a conversion.

Any help would be greatly appreciated.

Thank you,
Ravi.
Ronny Ong

2004-08-10, 3:55 pm

Rendering PDF input is much harder than generating PDF output. When you
generate output as PDF, your code only needs to implement the PDF structures
that your app supports, but if you need to render PDF files created by other
apps, then you have to be prepared to deal with many/all possible
structures. This essentially involves implementing a subset of a Postscript
RIP (Raster Image Processor). Most users are only "aware" of a tiny fraction
of the possible features in Adobe's PDF Specification (current spec is 1.5
which corresponds to Acrobat 6), but any given file may include dozens of
features that the user was unaware of. Unless you've got thousands of
person-hours to spend, don't try to implement a RIP from scratch.

One solution is to use Adobe's SDK. Obviously, this is the "official"
approach. There is an example of how to do this with VB .NET or C# at:
http://www.codeproject.com/dotnet/pdfthumbnail.asp

You could also use Ghostscript to render the PDF to a GDI+ bitmap which you
can then manipulate with any of the GDI+ functions. Microsoft covers the
C/C++ API for GDI+ in the Platform SDK, and the System.Drawing namespace
wraps GDI+ in the .NET Framework SDK. If you're using a COM language like
VB6, you can find wrappers for GDI+ such as the one at:
http://www.vbaccelerator.com/home/V...per/article.asp

To do the actual call to Ghostscript, here's a C++ sample:
http://www.codeproject.com/vcpp/gdi...hostwrapper.asp

If you're going to resort to including Ghostscript in your app, there's no
point in going thru GDI+ unless you need to do some sort of manipulation
before writing it out. That's because Ghostscript already includes the
ability to export in common image file formats. If you want to use that from
a COM language or script, you can use a simple wrapper such as:
http://community.wow.net/grt/comgs.html

If there's an academic reason you want to do all the rendering yourself from
scratch, there are a tiny number of open source projects you could look at.
Here's one in Java:
http://multivalent.sourceforge.net/format/pdf/PDF.html



"Ravi" <Ravi@discussions.microsoft.com> wrote in message
news:D0D870AA-8F7F-4984-8939-A14DA374B153@microsoft.com...
> I was wondering if there's any way to convert a pdf file to
> image
> files (jpeg, bmp or tiff). I know there are a number of tools available
> out
> there, but, was wondering what the logic is behind such a conversion.



Ravi

2004-08-11, 3:57 am

Hi Ronny,
That's one of the most comprehensive replies I've ever seen.
I guess it presents almost all possible approaches to this problem Thanks a
lot.

I've studied each of those approaches, and mostly they lead
to the GhostScript version 5.5, especially to a dll file named gsdll32.dll.
This is free to acquire, but, I guess I cannot use it freely for commercial
purposes. I'm not sure if there's anything out there that I could use in my
commercial product.

I still couldn't quite follow the logic behind such
conversions, but I guess I need to delve more deep into the pdf structure to
understand this. Do you know if there's any material that describes this. I
couldn't find the right material in the last link that you've given.


Your reply was great.

Thanks a lot,
Ravi.


"Ronny Ong" wrote:

> Rendering PDF input is much harder than generating PDF output. When you
> generate output as PDF, your code only needs to implement the PDF structures
> that your app supports, but if you need to render PDF files created by other
> apps, then you have to be prepared to deal with many/all possible
> structures. This essentially involves implementing a subset of a Postscript
> RIP (Raster Image Processor). Most users are only "aware" of a tiny fraction
> of the possible features in Adobe's PDF Specification (current spec is 1.5
> which corresponds to Acrobat 6), but any given file may include dozens of
> features that the user was unaware of. Unless you've got thousands of
> person-hours to spend, don't try to implement a RIP from scratch.
>
> One solution is to use Adobe's SDK. Obviously, this is the "official"
> approach. There is an example of how to do this with VB .NET or C# at:
> http://www.codeproject.com/dotnet/pdfthumbnail.asp
>
> You could also use Ghostscript to render the PDF to a GDI+ bitmap which you
> can then manipulate with any of the GDI+ functions. Microsoft covers the
> C/C++ API for GDI+ in the Platform SDK, and the System.Drawing namespace
> wraps GDI+ in the .NET Framework SDK. If you're using a COM language like
> VB6, you can find wrappers for GDI+ such as the one at:
> http://www.vbaccelerator.com/home/V...per/article.asp
>
> To do the actual call to Ghostscript, here's a C++ sample:
> http://www.codeproject.com/vcpp/gdi...hostwrapper.asp
>
> If you're going to resort to including Ghostscript in your app, there's no
> point in going thru GDI+ unless you need to do some sort of manipulation
> before writing it out. That's because Ghostscript already includes the
> ability to export in common image file formats. If you want to use that from
> a COM language or script, you can use a simple wrapper such as:
> http://community.wow.net/grt/comgs.html
>
> If there's an academic reason you want to do all the rendering yourself from
> scratch, there are a tiny number of open source projects you could look at.
> Here's one in Java:
> http://multivalent.sourceforge.net/format/pdf/PDF.html
>
>
>
> "Ravi" <Ravi@discussions.microsoft.com> wrote in message
> news:D0D870AA-8F7F-4984-8939-A14DA374B153@microsoft.com...
>
>
>

Ronny Ong

2004-08-11, 3:57 am

"Ravi" <Ravi@discussions.microsoft.com> wrote in message
news:2B92DB8F-295B-45BC-9CD7-F52C7580E453@microsoft.com...
> I've studied each of those approaches, and mostly they lead
> to the GhostScript version 5.5, especially to a dll file named
> gsdll32.dll.
> This is free to acquire, but, I guess I cannot use it freely for
> commercial
> purposes. I'm not sure if there's anything out there that I could use in
> my
> commercial product.


I'm not sure if you're saying that you need a free solution to use in a
commercial product, or if you think there's no way to use Ghostscript in a
commercial product at all. Ghostscript can be used in a commercial product,
as long as you obtain a commercial license from Artifex for it. It's been a
few years since I spoke to them, but they were fairly flexible in terms of
pricing. For the most part, they based the license price on the potential
revenue you would get from your product.

> I still couldn't quite follow the logic behind such
> conversions, but I guess I need to delve more deep into the pdf structure
> to
> understand this. Do you know if there's any material that describes this.
> I
> couldn't find the right material in the last link that you've given.


By "last link" you mean the one for Multivalent? That was just an example of
some open source which (mostly) implements PDF rendering. The link I gave
was for the doc page describing the PDF aspects of Multivalent (including
the PDF features that it hasn't finished implementing), but you'd need to
jump over to the SourceForge files page to download the Java source code
zip, extract it, and study the relevant portions. That download is at:
http://sourceforge.net/project/show...?group_id=44509

You can also download the PDF file format specification from:
http://partners.adobe.com/asn/tech/...cifications.jsp


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com