For Programmers: Free Programming Magazines  


Home > Archive > Compression > January 2008 > deflate algorithm used in zip









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author deflate algorithm used in zip
byaarov@yahoo.com

2008-01-07, 9:57 pm

Hi,
Am I correct in understanding that gzip uses zlib for the deflate
implementation, but unzip uses a different implementation and not
zlib?

If so, can zlib be used to inflate the deflate encoded stream in a zip
file?
cr88192

2008-01-08, 3:56 am


<byaarov@yahoo.com> wrote in message
news:2b780b1e-9d5a-4dc8-a7d7-209d7b2e4a1b@s19g2000prg.googlegroups.com...
> Hi,
> Am I correct in understanding that gzip uses zlib for the deflate
> implementation, but unzip uses a different implementation and not
> zlib?
>
> If so, can zlib be used to inflate the deflate encoded stream in a zip
> file?


zlib or non-zlib makes no real difference so long as the stream is
encoded/decoded correctly.



byaarov@yahoo.com

2008-01-08, 3:56 am

On Jan 8, 12:49 am, "cr88192" <cr88...@hotmail.com> wrote:
> <byaa...@yahoo.com> wrote in message
>
> news:2b780b1e-9d5a-4dc8-a7d7-209d7b2e4a1b@s19g2000prg.googlegroups.com...
>
>
>
> zlib or non-zlib makes no real difference so long as the stream is
> encoded/decoded correctly.



But I would need to initialize zlib in a special way I assume if I
want it to inflate the stream encoded in a zip file correct? That is,
wouldnt zlib be looking for some zlib nuance in the stream unless
instructed otherwise?
Mark Adler

2008-01-08, 6:57 pm

On Jan 7, 7:37=A0pm, byaa...@yahoo.com wrote:
> If so, can zlib be used to inflate the deflate encoded stream in a zip
> file?


Yes. Look at contrib/minizip in the zlib distribution for an example.

Mark

byaarov@yahoo.com

2008-01-08, 6:57 pm

On Jan 8, 6:19 am, Mark Adler <mad...@alumni.caltech.edu> wrote:
> On Jan 7, 7:37 pm, byaa...@yahoo.com wrote:
>
>
> Yes. Look at contrib/minizip in the zlib distribution for an example.
>
> Mark



Thank you!. Also, I think I had asked this before but I dont think I
got an answer... I know that zlib params can be controlled/set to
produce different output zlib streams corresponding to the same input
sequence. Am I correct in understanding that the only tweakable
parameters for zlib are actually the compression level (0..9), memory
level (0..9) and possibly a dictionary?

Mark Adler

2008-01-08, 6:57 pm

On Jan 8, 9:14=A0am, byaa...@yahoo.com wrote:
> Am I correct in understanding that the only tweakable
> parameters for zlib are actually the compression level (0..9), memory
> level (0..9) and possibly a dictionary?


There are others such as the dictionary size and compression strategy,
as well as a finer level of control on the compression level using
deflateTune(). However they are unimportant to decompression. No
matter what parameters are used to generate the deflate stream, a
compliant deflate stream will be generated and inflate will be able to
decode it.

Mark

byaarov@yahoo.com

2008-01-08, 6:57 pm

On Jan 8, 9:21 am, Mark Adler <mad...@alumni.caltech.edu> wrote:
> On Jan 8, 9:14 am, byaa...@yahoo.com wrote:
>
>
> There are others such as the dictionary size and compression strategy,
> as well as a finer level of control on the compression level using
> deflateTune(). However they are unimportant to decompression. No
> matter what parameters are used to generate the deflate stream, a
> compliant deflate stream will be generated and inflate will be able to
> decode it.
>
> Mark


This may be me incorrectly using zlib, but in some corner cases, I
will pass in just one byte to inflate (it is the last byte in my
stream). Essentially what I am trying to do is detect deflate streams
in documents...

Inflate() on this one byte (the byte in hex is for example 0x82)
returns Z_OK but there is no inflated output (perhaps because that
wasnt a byte that can lead to any inflated content).
My question is, when I call inflateEnd(), is there anyway for me to
know that there were some bytes I had passed into inflate() that did
not result in meaningful output? This way I can shove that byte back
into the main stream and finish.

Is there a minimum length of content to pass to inflate() to be sure?
cr88192

2008-01-08, 6:57 pm


<byaarov@yahoo.com> wrote in message
news:3deee1da-826c-4ebb-938b-365c360ded73@e10g2000prf.googlegroups.com...
> On Jan 8, 12:49 am, "cr88192" <cr88...@hotmail.com> wrote:
>
>
> But I would need to initialize zlib in a special way I assume if I
> want it to inflate the stream encoded in a zip file correct? That is,
> wouldnt zlib be looking for some zlib nuance in the stream unless
> instructed otherwise?


I am not sure what you are getting at, exactly...

the main difference between zlib and raw deflate, is in the presence of a
'zlib header' and an ending checksum (CRC-32).

I think, different init functions are used.
none the less, zlib works fine in any case.

I can't really answer for too many of the details though, since I don't use
zlib personally (rather, typically my own custom inflate/deflate code).



Mark Adler

2008-01-08, 6:57 pm

On Jan 8, 12:40=A0pm, byaa...@yahoo.com wrote:
> Is there a minimum length of content to pass to inflate() to be sure?


No, since it's possible for a random set of bytes to be a valid
deflate stream.

I fed one million random streams to inflate. About one in 200 random
sequences were valid deflate streams. In all but one case the valid
streams completed within 100 bytes. The one case was a valid random
~16K byte stream. The bad streams were all detected within 120 bytes
except for one case, which took ~40K bytes to find an error.

So in general if you feed, say, 128 bytes to the inflator and it eats
it all, doesn't complete, and doesn't complain about a data error,
then you probably have a good stream. At least you know that that
starting point is worth some more time to investigate as a candidate
deflate stream.

If you have a very short valid stream, it may not be what you're
looking for, so you will need another way to check.

Mark

byaarov@yahoo.com

2008-01-08, 6:57 pm

On Jan 8, 1:56 pm, Mark Adler <mad...@alumni.caltech.edu> wrote:
> On Jan 8, 12:40 pm, byaa...@yahoo.com wrote:
>
>
> No, since it's possible for a random set of bytes to be a valid
> deflate stream.
>
> I fed one million random streams to inflate. About one in 200 random
> sequences were valid deflate streams. In all but one case the valid
> streams completed within 100 bytes. The one case was a valid random
> ~16K byte stream. The bad streams were all detected within 120 bytes
> except for one case, which took ~40K bytes to find an error.
>
> So in general if you feed, say, 128 bytes to the inflator and it eats
> it all, doesn't complete, and doesn't complain about a data error,
> then you probably have a good stream. At least you know that that
> starting point is worth some more time to investigate as a candidate
> deflate stream.
>
> If you have a very short valid stream, it may not be what you're
> looking for, so you will need another way to check.
>
> Mark


That makes sense. In the case where inflate() consumed X number of
bytes because they appeared valid, and my stream ended, what should my
course of action be? Since inflate() produced no output for those X
bytes and since the stream ended, should I just assume those X bytes
were perhaps not valid and copy the input (X bytes) to output as is?
Because otherwise as it stands, I am losing those X bytes.

Here is an example...
yyyyydddddddyyyyxxxx[EOF]

Where yyy are some invalid deflate bytes, and I copy input to output
as is. I detect that by passing the string starting at the first y to
inflate and since I get an error, I assume its an invalid string, copy
the first y to output and left shift the string.

Eventually I get at the first d, which is the start of the first valid
deflate stream. inflate() handles the bytes and produces some output
for me, which I copy to the output stream. Then my file pointer is at
y, which as before I copy y to output.

Then comes x, which is a partial deflate stream (xxxx itself is a
valid deflate set of bytes, but the stream xxxx is in entirity is
invalid.) Since I feed bytes a few at a time to inflate(), I do not
know the length of xxxx. So those bytes get consumed by inflate(),
but my stream ends.

So my question is, what is the best method to detect that?

B
Mark Adler

2008-01-08, 6:57 pm

On Jan 8, 2:13=A0pm, byaa...@yahoo.com wrote:
> That makes sense. =A0In the case where inflate() consumed X number of
> bytes because they appeared valid, and my stream ended, what should my
> course of action be?


I don't know anything about the format you are trying to decode, but I
can only guess that the format does not permit incomplete deflate
streams. In that case, the final X bytes that look like deflate data
but don't complete must not, in fact, be deflate data.

If you don't get a Z_STREAM_END out of zlib's inflate, then the stream
is not complete.

Mark

byaarov@yahoo.com

2008-01-08, 6:57 pm

On Jan 8, 4:11 pm, Mark Adler <mad...@alumni.caltech.edu> wrote:
> On Jan 8, 2:13 pm, byaa...@yahoo.com wrote:
>
>
> I don't know anything about the format you are trying to decode, but I
> can only guess that the format does not permit incomplete deflate
> streams. In that case, the final X bytes that look like deflate data
> but don't complete must not, in fact, be deflate data.
>
> If you don't get a Z_STREAM_END out of zlib's inflate, then the stream
> is not complete.
>
> Mark


That I think answers my question... so in this case I should probably
just go back and copy those xxxx bytes manually to the output...

Thanks...
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com