For Programmers: Free Programming Magazines  


Home > Archive > Compression > January 2008 > lzw in gif









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author lzw in gif
byaarov@yahoo.com

2008-01-28, 6:57 pm

With deflate, we need to use an algorithm like precomp to figure out
the corresponding deflate stream for the uncompressed bytes.

Do we need to do something similar to that with LZW compressed streams
used in GIF? In reading some of the LZW literature, I did not see
ways to tune the parameters, so my initial suspicion is that it is not
needed and there is always a one-to-one correspondence between a LZW
stream and its uncompressed stream.

Am I correct in that understanding?
byaarov@yahoo.com

2008-01-28, 9:57 pm

On Jan 28, 4:41 pm, byaa...@yahoo.com wrote:
> With deflate, we need to use an algorithm like precomp to figure out
> the corresponding deflate stream for the uncompressed bytes.
>
> Do we need to do something similar to that with LZW compressed streams
> used in GIF? In reading some of the LZW literature, I did not see
> ways to tune the parameters, so my initial suspicion is that it is not
> needed and there is always a one-to-one correspondence between a LZW
> stream and its uncompressed stream.
>
> Am I correct in that understanding?


To answer my own question, I suppose even with lzw you can have
different compressed streams represent the same input.

I tried precomp on some gif images and it did not seem to handle lzw
specifically. I thought precomp would try and crack GIF files?
Mark Adler

2008-01-29, 3:57 am

On Jan 28, 4:41=A0pm, byaa...@yahoo.com wrote:
> so my initial suspicion is that it is not
> needed and there is always a one-to-one correspondence between a LZW
> stream and its uncompressed stream.


Unfortunately for you, that suspicion is not borne out by reality.
There are compression parameters even for the usual greedy LZW
implementation, in particular the number of bits. You can also create
LZW streams that correctly decode, but are not what the usual encoder
would have made.

Mark

schnaader

2008-01-29, 3:57 am

> To answer my own question, I suppose even with lzw you can have
> different compressed streams represent the same input.


That's right.

> I tried precomp on some gif images and it did not seem to handle lzw
> specifically. I thought precomp would try and crack GIF files?


Precomp handles LZW using a standard LZW approach. If recompression
fails because of differences, Precomp will output something like
this (using debug switch -v):

Possible GIF found at position 0
Can be decompressed to 87040 bytes
Best identical bytes only 3848 instead of 36832
Couldn't recompress

Of course, the differences could be saved to succeed in
recompression (much easier for LZW than for deflate). I
have an implementation that tries to do so, but it's not
finished yet.
byaarov@yahoo.com

2008-01-29, 6:58 pm

On Jan 28, 10:19 pm, Mark Adler <mad...@alumni.caltech.edu> wrote:
> On Jan 28, 4:41 pm, byaa...@yahoo.com wrote:
>
>
> Unfortunately for you, that suspicion is not borne out by reality.
> There are compression parameters even for the usual greedy LZW
> implementation, in particular the number of bits. You can also create
> LZW streams that correctly decode, but are not what the usual encoder
> would have made.
>
> Mark


If the code size (I guess this is the number of bits) is constant,
then for a given input string, wont the LZW compress algorithm output
the same identical encoded bytes?

That is, does the code size not ultimately dictate how big the string
table can be and therefore how far back in history the compressor can
go to find matches?
Pasi Ojala

2008-01-29, 6:58 pm

On 2008-01-29, byaarov@yahoo.com <byaarov@yahoo.com> wrote:
> If the code size (I guess this is the number of bits) is constant,
> then for a given input string, wont the LZW compress algorithm output
> the same identical encoded bytes?


You can still decide between encoding a literal and encoding a string.
You can also select between the existing strings. This brings me to
another question.

Does any of the lzw algorithms use lazy matching?

-Pasi
--
/Dreams can be as real as events. Or so it seemed to me afterwards./
-- Lestat in "The Tale of the Body Thief"
byaarov@yahoo.com

2008-01-29, 6:58 pm

On Jan 29, 8:05 am, Pasi Ojala <alb...@mustatilhi.cs.tut.fi> wrote:
> On 2008-01-29, byaa...@yahoo.com <byaa...@yahoo.com> wrote:
>
>
> You can still decide between encoding a literal and encoding a string.
> You can also select between the existing strings.

Given a particular implementation of the algorithm however, this will
not happen correct? That is, given an implementation of LZW compress
routine and the code size, I suppose the encoded stream will be the
same?

> This brings me to another question.
>
> Does any of the lzw algorithms use lazy matching?
>
> -Pasi
> --
> /Dreams can be as real as events. Or so it seemed to me afterwards./
> -- Lestat in "The Tale of the Body Thief"

Mark Adler

2008-01-29, 6:58 pm

On Jan 29, 7:53=A0am, byaa...@yahoo.com wrote:
> If the code size (I guess this is the number of bits) is constant,
> then for a given input string, wont the LZW compress algorithm output
> the same identical encoded bytes?


Yes (for a given variant of LZW). However you don't have to use the
LZW compress algorithm to create a stream that can be decompressed by
the LZW decompress algorithm.

Mark
byaarov@yahoo.com

2008-01-29, 6:58 pm

On Jan 29, 8:46 am, Mark Adler <mad...@alumni.caltech.edu> wrote:
> On Jan 29, 7:53 am, byaa...@yahoo.com wrote:
>
>
> Yes (for a given variant of LZW). However you don't have to use the
> LZW compress algorithm to create a stream that can be decompressed by
> the LZW decompress algorithm.


I understand.

In my case, I am trying to ensure that for GIF files, I can re-create
the LZW stream. Since in GIF the stream is always created by the LZW
algorithm, I should be able to use the same code size and get the same
output as long as the same GIF library were used correct?

Also, do I have any hope if a different GIF library (and hence a
different LZW compress implementation) were used?

>
> Mark

John Reiser

2008-01-29, 6:58 pm

> If the code size (I guess this is the number of bits) is constant,
> then for a given input string, wont the LZW compress algorithm output
> the same identical encoded bytes?


For a substring with matches, commonly there are *several* preceding
substrings which match. Also, there may be matches of differing lengths.
"The" LZW compress algorithm may choose the longest match, then the
nearest, but all the other choices are valid, and may produce differing
encodings. Choosing the nearest preceding match, or the longest possible
match, does not necessarily give the shortest output.

--
byaarov@yahoo.com

2008-01-29, 6:58 pm

On Jan 29, 9:40 am, John Reiser <jrei...@BitWagon.com> wrote:
>
> For a substring with matches, commonly there are *several* preceding
> substrings which match. Also, there may be matches of differing lengths.
> "The" LZW compress algorithm may choose the longest match, then the
> nearest, but all the other choices are valid, and may produce differing
> encodings. Choosing the nearest preceding match, or the longest possible
> match, does not necessarily give the shortest output.


I am looking at the pseudo code from wikipedia
w = NIL;
while (read a char c) do
if (wc exists in dictionary) then
w = wc;
else
add wc to the dictionary;
output the code for w;
w = c;
endif
done
output the code for w;

In that pseudo code for example, given a current string "wc", there
can only be one match right?

I suppose in a different implementation it is possible to have
multiple matches of different lengths... Though my question now is,
given an algorithm's implementation similar to the above, there can
only be one match right?

Also, even if the algorithm were different and supported multiple
matches, given that same example implementation and code size, I am
assuming the output will always be the same.

I believe Mark already answered that question 2 posts ago in that,
given a variant of LZW, the output can be the same.
John Reiser

2008-01-29, 6:58 pm

byaarov@yahoo.com wrote:
> I am looking at the pseudo code from wikipedia ...

[snip]
> In that pseudo code for example, given a current string "wc", there
> can only be one match right?
>
> I suppose in a different implementation it is possible to have
> multiple matches of different lengths... Though my question now is,
> given an algorithm's implementation similar to the above, there can
> only be one match right?


The pseudo code writes to the output only after a longest match
(or end-of-file), so that is unique.

> Also, even if the algorithm were different and supported multiple
> matches, given that same example implementation and code size, I am
> assuming the output will always be the same.


The "same example implementation" is not a requirement for belonging to
the class that is commonly designated "LZW". There are other
implementations which may be called LZW which produce different output,
such as: limiting the length of a longest match, limiting the number
of substrings in the dictionary (by various policies), limiting the
total number of characters belonging to substrings in the dictionary
(by overlapping, or non-overlapping, or ...) and so on.

> I believe Mark already answered that question 2 posts ago in that,
> given a variant of LZW, the output can be the same.


Of course there would never be a bug in compression code :-)
A much more likely situation is that *every* implementation of
compression has had one or more bugs, and some of them are
still present. For LZW, some of these affect the output.
Compression is complex enough that it is hard to guarantee
identical output, except by insisting on identical code,
token by token.

--

Mark Adler

2008-01-29, 6:58 pm

On Jan 29, 8:53=A0am, byaa...@yahoo.com wrote:
> Since in GIF the stream is always created by the LZW
> algorithm,


It might not be. I recall a GIF encoder that would deliberately only
encode matches of runs of a single byte, i.e. only doing run-length
encoding for compression, in order to sidestep the Unisys patent.

Mark

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com