Code Comments
Programming Forum and web based access to our favorite programming groups.Hi everyone, I was wondering if there's any way to convert a pdf file to image files (jpeg, bmp or tiff). I know there are a number of tools available out there, but, was wondering what the logic is behind such a conversion. Any help would be greatly appreciated. Thank you, Ravi.
Post Follow-up to this messageRendering PDF input is much harder than generating PDF output. When you generate output as PDF, your code only needs to implement the PDF structures that your app supports, but if you need to render PDF files created by other apps, then you have to be prepared to deal with many/all possible structures. This essentially involves implementing a subset of a Postscript RIP (Raster Image Processor). Most users are only "aware" of a tiny fraction of the possible features in Adobe's PDF Specification (current spec is 1.5 which corresponds to Acrobat 6), but any given file may include dozens of features that the user was unaware of. Unless you've got thousands of person-hours to spend, don't try to implement a RIP from scratch. One solution is to use Adobe's SDK. Obviously, this is the "official" approach. There is an example of how to do this with VB .NET or C# at: http://www.codeproject.com/dotnet/pdfthumbnail.asp You could also use Ghostscript to render the PDF to a GDI+ bitmap which you can then manipulate with any of the GDI+ functions. Microsoft covers the C/C++ API for GDI+ in the Platform SDK, and the System.Drawing namespace wraps GDI+ in the .NET Framework SDK. If you're using a COM language like VB6, you can find wrappers for GDI+ such as the one at: http://www.vbaccelerator.com/home/V...r /> ticle.asp To do the actual call to Ghostscript, here's a C++ sample: http://www.codeproject.com/vcpp/gdi...hostwrapper.asp If you're going to resort to including Ghostscript in your app, there's no point in going thru GDI+ unless you need to do some sort of manipulation before writing it out. That's because Ghostscript already includes the ability to export in common image file formats. If you want to use that from a COM language or script, you can use a simple wrapper such as: http://community.wow.net/grt/comgs.html If there's an academic reason you want to do all the rendering yourself from scratch, there are a tiny number of open source projects you could look at. Here's one in Java: http://multivalent.sourceforge.net/format/pdf/PDF.html "Ravi" <Ravi@discussions.microsoft.com> wrote in message news:D0D870AA-8F7F-4984-8939-A14DA374B153@microsoft.com... > I was wondering if there's any way to convert a pdf file to > image > files (jpeg, bmp or tiff). I know there are a number of tools available > out > there, but, was wondering what the logic is behind such a conversion.
Post Follow-up to this messageHi Ronny, That's one of the most comprehensive replies I've ever seen. I guess it presents almost all possible approaches to this problem Thanks a lot. I've studied each of those approaches, and mostly they lead to the GhostScript version 5.5, especially to a dll file named gsdll32.dll. This is free to acquire, but, I guess I cannot use it freely for commercial purposes. I'm not sure if there's anything out there that I could use in my commercial product. I still couldn't quite follow the logic behind such conversions, but I guess I need to delve more deep into the pdf structure to understand this. Do you know if there's any material that describes this. I couldn't find the right material in the last link that you've given. Your reply was great. Thanks a lot, Ravi. "Ronny Ong" wrote: > Rendering PDF input is much harder than generating PDF output. When you > generate output as PDF, your code only needs to implement the PDF structur es > that your app supports, but if you need to render PDF files created by oth er > apps, then you have to be prepared to deal with many/all possible > structures. This essentially involves implementing a subset of a Postscrip t > RIP (Raster Image Processor). Most users are only "aware" of a tiny fracti on > of the possible features in Adobe's PDF Specification (current spec is 1.5 > which corresponds to Acrobat 6), but any given file may include dozens of > features that the user was unaware of. Unless you've got thousands of > person-hours to spend, don't try to implement a RIP from scratch. > > One solution is to use Adobe's SDK. Obviously, this is the "official" > approach. There is an example of how to do this with VB .NET or C# at: > http://www.codeproject.com/dotnet/pdfthumbnail.asp > > You could also use Ghostscript to render the PDF to a GDI+ bitmap which yo u > can then manipulate with any of the GDI+ functions. Microsoft covers the > C/C++ API for GDI+ in the Platform SDK, and the System.Drawing namespace > wraps GDI+ in the .NET Framework SDK. If you're using a COM language like > VB6, you can find wrappers for GDI+ such as the one at: > http://www.vbaccelerator.com/home/V.../> article.asp > > To do the actual call to Ghostscript, here's a C++ sample: > http://www.codeproject.com/vcpp/gdi...hostwrapper.asp > > If you're going to resort to including Ghostscript in your app, there's no > point in going thru GDI+ unless you need to do some sort of manipulation > before writing it out. That's because Ghostscript already includes the > ability to export in common image file formats. If you want to use that fr om > a COM language or script, you can use a simple wrapper such as: > http://community.wow.net/grt/comgs.html > > If there's an academic reason you want to do all the rendering yourself fr om > scratch, there are a tiny number of open source projects you could look at . > Here's one in Java: > http://multivalent.sourceforge.net/format/pdf/PDF.html > > > > "Ravi" <Ravi@discussions.microsoft.com> wrote in message > news:D0D870AA-8F7F-4984-8939-A14DA374B153@microsoft.com... > > >
Post Follow-up to this message"Ravi" <Ravi@discussions.microsoft.com> wrote in message news:2B92DB8F-295B-45BC-9CD7-F52C7580E453@microsoft.com... > I've studied each of those approaches, and mostly they lead > to the GhostScript version 5.5, especially to a dll file named > gsdll32.dll. > This is free to acquire, but, I guess I cannot use it freely for > commercial > purposes. I'm not sure if there's anything out there that I could use in > my > commercial product. I'm not sure if you're saying that you need a free solution to use in a commercial product, or if you think there's no way to use Ghostscript in a commercial product at all. Ghostscript can be used in a commercial product, as long as you obtain a commercial license from Artifex for it. It's been a few years since I spoke to them, but they were fairly flexible in terms of pricing. For the most part, they based the license price on the potential revenue you would get from your product. > I still couldn't quite follow the logic behind such > conversions, but I guess I need to delve more deep into the pdf structure > to > understand this. Do you know if there's any material that describes this. > I > couldn't find the right material in the last link that you've given. By "last link" you mean the one for Multivalent? That was just an example of some open source which (mostly) implements PDF rendering. The link I gave was for the doc page describing the PDF aspects of Multivalent (including the PDF features that it hasn't finished implementing), but you'd need to jump over to the SourceForge files page to download the Java source code zip, extract it, and study the relevant portions. That download is at: http://sourceforge.net/project/show...?group_id=44509 You can also download the PDF file format specification from: http://partners.adobe.com/asn/tech/...cifications.jsp
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.