Home > Archive > Compression > April 2007 > Adobe zlib compression aggressiveness
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Adobe zlib compression aggressiveness
|
|
| byaarov@yahoo.com 2007-04-05, 6:56 pm |
| Hi,
In a PDF file, for the objects that use the FlateDecode filter, zlib
compression is used. As far as I know, they use the actual zlib
library itself.
However, they must have some special tuning going on, since I can
never create a deflate stream that is as small or perhaps even smaller
than the Adobe generated deflate stream.
I am using maximum compression settings of zlib (well, maximum level
and memory setting of 8 and a window size of 15 bits)
I dont see them using any dictionaries either, so I dont think thats
the issue.
My question is, what other parameters could they have tweaked of zlib
in order to get such an efficient stream?
B
| |
| cr88192 2007-04-05, 6:56 pm |
|
<byaarov@yahoo.com> wrote in message
news:1175793467.037145.322610@p77g2000hsh.googlegroups.com...
> Hi,
> In a PDF file, for the objects that use the FlateDecode filter, zlib
> compression is used. As far as I know, they use the actual zlib
> library itself.
>
> However, they must have some special tuning going on, since I can
> never create a deflate stream that is as small or perhaps even smaller
> than the Adobe generated deflate stream.
>
> I am using maximum compression settings of zlib (well, maximum level
> and memory setting of 8 and a window size of 15 bits)
>
> I dont see them using any dictionaries either, so I dont think thats
> the issue.
>
> My question is, what other parameters could they have tweaked of zlib
> in order to get such an efficient stream?
>
if in fact using zlib, it is possible they may have customized the string
matcher to be better suited to the kinds of data present.
for example, I will note another example, png:
IME, my deflater, when tuned for png (speed/ratio), typically performs worse
on typical data (text and binary data);
when tuned for ordinary data, its speed and ratio, when used on png, leaves
a little to be desired.
now, it is possible that my algo just isn't tuned all that well in general,
but at times I have considered having a customized version for png encoding
(likely more emphasizing speed than ratio though, and probably based on a
modified earlier version of my encoder).
dunno, this is only a simple example, who knows what adobe might have done.
> B
>
| |
| Mark Adler 2007-04-05, 6:56 pm |
| On Apr 5, 10:17 am, byaa...@yahoo.com wrote:
> My question is, what other parameters could they have tweaked of zlib
> in order to get such an efficient stream?
They could be using the deflateTune() function, or equivalently they
may have modified the compression levels table that sets four
parameters to some value for each compression level. Alternatively,
they could have rewritten part of deflate to do string matching
better, or pick better matches out of those found. Or they may simply
be using a heuristic to decide when to flush deflate blocks, which can
also provide better compression.
Mark
| |
| byaarov@yahoo.com 2007-04-06, 6:56 pm |
| On Apr 5, 1:43 pm, "Mark Adler" <mad...@alumni.caltech.edu> wrote:
> On Apr 5, 10:17 am, byaa...@yahoo.com wrote:
>
>
> They could be using the deflateTune() function, or equivalently they
> may have modified the compression levels table that sets four
> parameters to some value for each compression level. Alternatively,
> they could have rewritten part of deflate to do string matching
> better, or pick better matches out of those found. Or they may simply
> be using a heuristic to decide when to flush deflate blocks, which can
> also provide better compression.
>
> Mark
I read the PDF specification chapter 3, and they describe somewhat
some predictors they use for images and text. For images, they say
they use PNG predictors and for text, they have some way of seeding
the huffman tables. I understand the impact of that algorithmically,
but in zlib, how does one provide predictor functions? Is this done
via deflateTune()?
Also, any custom predictor functions are only used during compression
right? Any decompression routine even with out that prediction logic
should be able to inflate the stream I suppose?
B
| |
| cr88192 2007-04-06, 9:56 pm |
|
<byaarov@yahoo.com> wrote in message
news:1175888506.962950.226600@p77g2000hsh.googlegroups.com...
> On Apr 5, 1:43 pm, "Mark Adler" <mad...@alumni.caltech.edu> wrote:
>
> I read the PDF specification chapter 3, and they describe somewhat
> some predictors they use for images and text. For images, they say
> they use PNG predictors and for text, they have some way of seeding
> the huffman tables. I understand the impact of that algorithmically,
> but in zlib, how does one provide predictor functions? Is this done
> via deflateTune()?
>
what makes you so certain that they used zlib to begin with?...
after all, deflate is simple enough, and common enough, that they may well
have just implemented a custom compressor (even if they have a 'zlib
header', that says close to nothing...).
> Also, any custom predictor functions are only used during compression
> right? Any decompression routine even with out that prediction logic
> should be able to inflate the stream I suppose?
>
potentially, but a lot depends on what kind of predictor.
in PNG, for images, the predictors applied prior to deflating the image, and
after inflating the image (and are thus not part of the mechanics of
deflate, but another stage).
as for custom huffman tables, ..., yes, these will not effect a decoder, but
can help with encoding.
> B
>
| |
| Mark Adler 2007-04-07, 7:56 am |
| On Apr 6, 12:41 pm, byaa...@yahoo.com wrote:
> I understand the impact of that algorithmically,
> but in zlib, how does one provide predictor functions?
The only ways that come to mind are to a) provide a dictionary, or b)
pre-process the the input, e.g. tokenizing words, to help zlib find
matches at the next level of structure in the data.
> Also, any custom predictor functions are only used during compression
> right? Any decompression routine even with out that prediction logic
> should be able to inflate the stream I suppose?
They would need to be used during decompression as well to either undo
the processing or to provide the same dictionary at the other end.
Mark
| |
| Mark Adler 2007-04-07, 7:56 am |
| On Apr 6, 5:13 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
> what makes you so certain that they used zlib to begin with?...
I know they were using zlib a few years ago, but I don't know for sure
whether they still are. However the same comments apply to the
possible differences between zlib's deflate and their possibly home-
grown deflate.
Mark
| |
| cr88192 2007-04-07, 7:56 am |
|
"Mark Adler" <madler@alumni.caltech.edu> wrote in message
news:1175921683.270589.123260@d57g2000hsg.googlegroups.com...
> On Apr 6, 5:13 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
>
> I know they were using zlib a few years ago, but I don't know for sure
> whether they still are. However the same comments apply to the
> possible differences between zlib's deflate and their possibly home-
> grown deflate.
>
yes, that is a good enough answer...
for all I had known though, the OP didn't know, and was just assuming. in my
case, I didn't know one way or another.
> Mark
>
|
|
|
|
|