| ArarghMail705NOSPAM 2007-05-22, 9:56 pm |
| On Tue, 22 May 2007 20:41:47 +0200, Terje Mathisen
<spamtrap@crayne.org> wrote:
>Phil Carmody wrote:
>
>With a given data set like this, a 'static huffman', i.e. optimally
>compressed huffmann codes for those values that actually do occur, is
>probably gonna be quite good.
>
>My guess is that even arithmetic compression won't beat that by more
>than a few percent, and the decoding cost will be immensly larger.
>
>A static huffman decoder is most easily implemented as a few layers of
>lookup tables, where the primary table can decode zero, one or more
>tokens at once:
>
>The zero token case leads to a secondary table which handles the
>remaining bits. (BTW, you might get _very_ large token lengths if you
>have a maximally skewed distribution:
>
>Assume half the tokens are '1', half the remainder are '2', 1/8 is '3'
>etc up to 1/64K which have the value 'N', which leads to a one-bit token
>for '1', two bits for '2' etc up to 64K bits for the least common token.
>
>In such a case you can still use a table-driven approach, using
>byte-sized lookups:
>
>One byte will average 4 tokens decoded, and only if it is 255 will you
>need a secondary table. This second table can again lead to a third,
>forth, fifth table, with a total of 8K different tables needed, with
>8K*256 = 2 M entries in total.
>
>The first few levels will fit in L1 cache, while (nearly) all the tables
>will fit in L2, so the decoding speed will be great. :-)
>
Also, the OP could just use zlib, and not worry about the details. :-)
--
ArarghMail705 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html
To reply by email, remove the extra stuff from the reply address.
|