Home > Archive > Compression > October 2004 > Re: GIF87a/LZW problem
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Re: GIF87a/LZW problem
|
|
|
| Take your first byte 00 and put it into a buffer:
00
Take your next byte 01 and add it to the buffer (at the left hand end)
0100
and so on...
1C080100
now start taking codes out of this buffer from the right hand end
first one of 9 bits is (in binary) 100000000=256
next one is 000000000=0, next is 100000010 = 258 etc
"Patrick Herring" <ph@anweald.co.uk> wrote in message
news:416163e3.12817044@usenet.plus.net...
> I'm trying my hand at coding a gif interpreter, because I'm like that,
> and have succeeded except for decompressing the raster data, but I
> can't see what I'm missing, so to speak.
>
> My test img is from a win98 installation: c:\windows\web\tips.gif,
> which Gimp has no problems with.
>
> My program output is
> ...
> matched , 2C
> ImgLeft 0 ImgTop 0 ImgWidth 27 ImgHeight 36 ImgM 0 ImgI 0
> ImgBitsPerPixel 1 CodeSize 8
> found raster block size 242
> found raster block size 0
> RasterDataLZW
> 0001081C48B0A0C18308132A5CC8B02180041021
3A34083180458B12273E0C0080
>
A3C78E092626E02810C13C041D2DCE6B3892E0BC
971F03AC5C18D2E5CB8E2967266C69735ECC
043A
>
0FF21C6812A5C700401586243970E5CFA0142FE2
94191329D48223634E9D9A54A14FA945535A
6538
>
2F6BC7973E71765D58F662D18B6B152268D0562B
50BA0BE702A87B11E34ABC08F50A2C9B15E9
5AC0
>
0505977C19112D41C4831B241EFCB20001A890F7
4A266A128065C7033347268812EDE5C7579B
6E06
>
80002581CB3A45179CB71AED4D81B20DD26E7A1B
40EE83BB0DFE46183C746AB2AB87B395AC1C
F971
> 8D7BA14B9F4E9D7A40
> ClearCode 256
> EndOfInformationCode 257
> DictSize 257 CodeBits 9
> matched ; 3B
> CurPos 1046 length image file 1045
>
> which shows I've interpreted the overall structure down to the last
> byte, so I'm confident the lzw input is correct. The spec I'm
> following is at http://www.w3.org/Graphics/GIF/spec-gif87.txt which
> says there should be a starting Clear code i.e. a '256'd, and there
> isn't. There isn't an EOI code at the end either, I think. Am I right
> in starting with 9 as the number of bits per code? The decompress code
> finds a lot of codes not in the dictionary, which I thought should be
> rare. The output is clearly not enough bytes to satisfy
> ImgWidth*ImgHeight, about half in fact.
>
> Any help very much appreciated <g>.
>
> --
> Patrick Herring, Sheffield, UK
> http://www.anweald.co.uk
| |
| Patrick Herring 2004-10-05, 8:55 am |
| "Pat" <ebooks2goNOSPAM@yahoo.co.uk> wrote:
| Take your first byte 00 and put it into a buffer:
|
| 00
|
| Take your next byte 01 and add it to the buffer (at the left hand end)
|
| 0100
|
| and so on...
|
| 1C080100
|
| now start taking codes out of this buffer from the right hand end
| first one of 9 bits is (in binary) 100000000=256
|
| next one is 000000000=0, next is 100000010 = 258 etc
Yes, this is what I understood from Carsten's post though your
explicit description is usefully clear. In my prototyping language
it's simply:
RasterDataLZW = c2b( reverse( RasterDataLZW ) )
Code = right( RasterDataLZW, CodeBits )
RasterDataLZW
= left( RasterDataLZW, length( RasterDataLZW ) - CodeBits )
etc, though I expect anyone used to C will be horrified at that <g>.
Presumably this is originally from the wish to use the chip
architecture efficiently e.g. registers are LIFO. Certainly thinking
about what (probably) happens physically makes some sense of it.
Anyway, it works, I find the image. I assume it's OK to have the odd
bit left over after the EOI code (but if there's lots left over it
means the code size has gone wrong - yes?).
But there's some muck at the start and end of the uncompressed data.
It goes
444943542E3235360043484152000000...0000000000444943542E323537
where if I take off the non-00 stuff I still haven't got the right
number of bytes for the image. Anyone know what these header/trailer
sequences are? Is it possible they're LZW artefacts or should I go to
an image ng rather than ask here?
Thanks to you and Carsten for your help up this curve.
--
Patrick Herring, Sheffield, UK
http://www.anweald.co.uk
| |
| Pat Crowe 2004-10-05, 8:55 am |
| Well I suppose you've spotted that it says DICT.256 CHAR....
Very strange; I've not seen anything like it.
Are you sure? - I wouldn't have expected the gif to view properly.
"Patrick Herring" <ph@anweald.co.uk> wrote in message
news:416282a1.5747045@usenet.plus.net...
> "Pat" <ebooks2goNOSPAM@yahoo.co.uk> wrote:
>
> | Take your first byte 00 and put it into a buffer:
> |
> | 00
> |
> | Take your next byte 01 and add it to the buffer (at the left hand end)
> |
> | 0100
> |
> | and so on...
> |
> | 1C080100
> |
> | now start taking codes out of this buffer from the right hand end
> | first one of 9 bits is (in binary) 100000000=256
> |
> | next one is 000000000=0, next is 100000010 = 258 etc
>
> Yes, this is what I understood from Carsten's post though your
> explicit description is usefully clear. In my prototyping language
> it's simply:
>
> RasterDataLZW = c2b( reverse( RasterDataLZW ) )
> Code = right( RasterDataLZW, CodeBits )
> RasterDataLZW
> = left( RasterDataLZW, length( RasterDataLZW ) - CodeBits )
>
> etc, though I expect anyone used to C will be horrified at that <g>.
> Presumably this is originally from the wish to use the chip
> architecture efficiently e.g. registers are LIFO. Certainly thinking
> about what (probably) happens physically makes some sense of it.
>
> Anyway, it works, I find the image. I assume it's OK to have the odd
> bit left over after the EOI code (but if there's lots left over it
> means the code size has gone wrong - yes?).
>
> But there's some muck at the start and end of the uncompressed data.
> It goes
>
> 444943542E3235360043484152000000...0000000000444943542E323537
>
> where if I take off the non-00 stuff I still haven't got the right
> number of bytes for the image. Anyone know what these header/trailer
> sequences are? Is it possible they're LZW artefacts or should I go to
> an image ng rather than ask here?
>
> Thanks to you and Carsten for your help up this curve.
>
> --
> Patrick Herring, Sheffield, UK
> http://www.anweald.co.uk
| |
| Carsten Neubauer 2004-10-05, 3:55 pm |
| hi again,
> Anyway, it works, I find the image. I assume it's OK to have the odd
> bit left over after the EOI code (but if there's lots left over it
> means the code size has gone wrong - yes?).
>
>
> But there's some muck at the start and end of the uncompressed data.
> It goes
>
> 444943542E3235360043484152000000...0000000000444943542E323537
>
> where if I take off the non-00 stuff I still haven't got the right
> number of bytes for the image. Anyone know what these header/trailer
> sequences are? Is it possible they're LZW artefacts or should I go to
> an image ng rather than ask here?
you are pretty close, you only need to debug your code.
the garbage you mention shouldn't be in TIPS.GIF.
when decoding 0xF2 LZW-bytes (reading 215 9-bit-codes,
last is EOI) , there should be only 1 bit left.
perhaps some bits still get mixed up or lost.
i'd recommend to convert the GIF-image to an 8-bit-BMP. loaded into
an hex-editor you will see the header, colortable and image-data
immediately. so you know the uncompressed output byte by byte and
debugging gets much easyer.
greetings,
carsten
http://www.c14sw.de/
| |
| Patrick Herring 2004-10-05, 8:55 pm |
| "Pat Crowe" <pcjREMOVE@mqp.com> wrote:
| Well I suppose you've spotted that it says DICT.256 CHAR....
| Very strange; I've not seen anything like it.
| Are you sure? - I wouldn't have expected the gif to view properly.
I didn't notice at all, was thinking in hex and binary. Eckh, a silly
mistake. The language defaults variable values to the variable name,
unless you switch a flag on, which you always should so as to trap
uninitialised variables. The above is showing me getting the loop
logic wrong at the start and end and, inter alia, outputting as image
data the dictionary entry for the clear code. So, thanks!
| "Patrick Herring" <ph@anweald.co.uk> wrote in message
| news:416282a1.5747045@usenet.plus.net...
| > "Pat" <ebooks2goNOSPAM@yahoo.co.uk> wrote:
| >
| > | Take your first byte 00 and put it into a buffer:
| > |
| > | 00
| > |
| > | Take your next byte 01 and add it to the buffer (at the left hand end)
| > |
| > | 0100
| > |
| > | and so on...
| > |
| > | 1C080100
| > |
| > | now start taking codes out of this buffer from the right hand end
| > | first one of 9 bits is (in binary) 100000000=256
| > |
| > | next one is 000000000=0, next is 100000010 = 258 etc
| >
| > Yes, this is what I understood from Carsten's post though your
| > explicit description is usefully clear. In my prototyping language
| > it's simply:
| >
| > RasterDataLZW = c2b( reverse( RasterDataLZW ) )
| > Code = right( RasterDataLZW, CodeBits )
| > RasterDataLZW
| > = left( RasterDataLZW, length( RasterDataLZW ) - CodeBits )
| >
| > etc, though I expect anyone used to C will be horrified at that <g>.
| > Presumably this is originally from the wish to use the chip
| > architecture efficiently e.g. registers are LIFO. Certainly thinking
| > about what (probably) happens physically makes some sense of it.
| >
| > Anyway, it works, I find the image. I assume it's OK to have the odd
| > bit left over after the EOI code (but if there's lots left over it
| > means the code size has gone wrong - yes?).
| >
| > But there's some muck at the start and end of the uncompressed data.
| > It goes
| >
| > 444943542E3235360043484152000000...0000000000444943542E323537
| >
| > where if I take off the non-00 stuff I still haven't got the right
| > number of bytes for the image. Anyone know what these header/trailer
| > sequences are? Is it possible they're LZW artefacts or should I go to
| > an image ng rather than ask here?
| >
| > Thanks to you and Carsten for your help up this curve.
| >
| > --
| > Patrick Herring, Sheffield, UK
| > http://www.anweald.co.uk
|
|
--
Patrick Herring, Sheffield, UK
http://www.anweald.co.uk
| |
| Patrick Herring 2004-10-05, 8:55 pm |
| cneubau@aol.comNOJUNK (Carsten Neubauer) wrote:
| hi again,
|
| > Anyway, it works, I find the image. I assume it's OK to have the odd
| > bit left over after the EOI code (but if there's lots left over it
| > means the code size has gone wrong - yes?).
| >
| >
| > But there's some muck at the start and end of the uncompressed data.
| > It goes
| >
| > 444943542E3235360043484152000000...0000000000444943542E323537
| >
| > where if I take off the non-00 stuff I still haven't got the right
| > number of bytes for the image. Anyone know what these header/trailer
| > sequences are? Is it possible they're LZW artefacts or should I go to
| > an image ng rather than ask here?
|
| you are pretty close, you only need to debug your code.
Yep.
| the garbage you mention shouldn't be in TIPS.GIF.
| when decoding 0xF2 LZW-bytes (reading 215 9-bit-codes,
| last is EOI) , there should be only 1 bit left.
| perhaps some bits still get mixed up or lost.
Presumably you can have up to 7 bits left over due to the 8-bit byte
packing.
| i'd recommend to convert the GIF-image to an 8-bit-BMP. loaded into
| an hex-editor you will see the header, colortable and image-data
| immediately. so you know the uncompressed output byte by byte and
| debugging gets much easyer.
Hadn't thought of that, a very useful idea. An idea I did have, which
turned out to be of some use, is to view the interpreted gif data by
writing an HTML table with cells width=height={1,2,3} and the
appropriate bgcolor setting. Anything more than a small image takes a
while to load, but it saves you having to know how to address the
screen (probably not your problem <g> ).
Anyway, got it to work; there were several bugs needing squashing.
It's amazing how once it starts working you can't see how you could
have got it wrong before. Thanks!
I don't suppose anyone has examples of 87a gifs with very few colours
i.e. fewer than 16?
All of the example GIF87a LZW code I found on the web was either
simplified pseudo-code or in languages I don't know well enough, so
here's my code, which is in Rexx, useful here since it's general
clarity (I hope!) makes it executable pseudo-code:
CodeBits = CodeSize + 1
ClearCode = 2 ** CodeSize
say 'ClearCode' ClearCode
EndOfInformationCode = ClearCode + 1
say 'EndOfInformationCode' EndOfInformationCode
do i = 0 to ClearCode - 1
Dict.i = d2c( i )
end
RasterDataLZW = c2b( reverse( RasterDataLZW ) )
RasterData = ''
/* assume first code is the clear code */
do while RasterDataLZW \= ''
CurCode = b2d( right( RasterDataLZW, CodeBits ) )
RasterDataLZW,
= left( RasterDataLZW, length( RasterDataLZW ) - CodeBits )
select
when CurCode = EndOfInformationCode then do
say 'eoi found, ending LZW decompression'
if length( RasterDataLZW ) > 7 then
abend( 'too much raster left after eoi' RasterDataLZW )
leave /* do loop */
end
when CurCode = ClearCode then do
say 'clear code found, initialising dictionary'
DictSize = ClearCode + 1
CodeBits = CodeSize + 1
say 'DictSize' DictSize 'CodeBits' CodeBits
/* assume no more clear codes */
PrevCode = b2d( right( RasterDataLZW, CodeBits ) )
RasterDataLZW,
= left( RasterDataLZW, length( RasterDataLZW ) - CodeBits )
/* assume first code after clear is in dictionary */
RasterData = RasterData || Dict.PrevCode
end
otherwise do
if CurCode > DictSize then do /* code not in dictionary */
NewDictEntry = Dict.PrevCode || left( Dict.PrevCode, 1 )
RasterData = RasterData || NewDictEntry
end
else do
RasterData = RasterData || Dict.CurCode
NewDictEntry = Dict.PrevCode || left( Dict.CurCode, 1 )
end
DictSize = DictSize + 1
Dict.DictSize = NewDictEntry
CodeBitsOld = CodeBits
select
when DictSize = 1 then CodeBits = 2 /* ? */
when DictSize = 3 then CodeBits = 3 /* ? */
when DictSize = 7 then CodeBits = 4 /* ? */
when DictSize = 15 then CodeBits = 5
when DictSize = 31 then CodeBits = 6
when DictSize = 63 then CodeBits = 7
when DictSize = 127 then CodeBits = 8
when DictSize = 255 then CodeBits = 9
when DictSize = 511 then CodeBits = 10
when DictSize = 1023 then CodeBits = 11
when DictSize = 2047 then CodeBits = 12
when DictSize = 4096 then
abend( 'should have found a clear code, DictSize 4096' )
otherwise nop
end
if CodeBitsOld \= CodeBits then
say 'DictSize' DictSize 'CodeBits' CodeBits
PrevCode = CurCode
end
end
end
--
Patrick Herring, Sheffield, UK
http://www.anweald.co.uk
|
|
|
|
|