Home > Archive > Compression > January 2007 > Speeds of Huffman vs. Range coder?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Speeds of Huffman vs. Range coder?
|
|
| Jim Leonard 2007-01-19, 6:55 pm |
| I was thinking of adding entropy encoding to a low-resource project I'm
working on. For those who have implemented both Huffman coders and
Range coders, which is faster?
Also, are they asymmetric? Meaning, for both coders, is one operation
much faster than the other, or are both compression and decompression
operations roughly the same speed?
| |
| cr88192 2007-01-19, 9:55 pm |
|
"Jim Leonard" <MobyGamer@gmail.com> wrote in message
news:1169241036.944660.106530@a75g2000cwd.googlegroups.com...
>I was thinking of adding entropy encoding to a low-resource project I'm
> working on. For those who have implemented both Huffman coders and
> Range coders, which is faster?
>
> Also, are they asymmetric? Meaning, for both coders, is one operation
> much faster than the other, or are both compression and decompression
> operations roughly the same speed?
>
I say, if implemented well, huffman is faster.
the reason:
encode can be reduced down to:
a shift and an or operation (adding code to the output window);
an addition (adding code length);
a check and possibly emmiting output bytes (a few more masks and shifts, a
few assignments, ...).
decode can be reduced to:
a shift (getting the code from the input window);
a table index (and possibly a few other steps);
an addition (adding decoded code length);
a check and possibly reading more input bytes (as before).
for huffman, speed is fairly good.
for huffman, decode should be faster (encode usually involves extra steps,
such as gathering statistics, building the table, ...).
typically, range coding involves a few extra steps that don't help much wrt
performance:
one has to calculate the base and range of the symbol being encoded;
on encode, one has to do several multiplies and additions to adjust the
min/max values (or base/range, if represented this way), ...
one also has the additional tasks of adjusting the model and any related
tasks;
....
on decode, one can't as effectively optimize symbol fetching (a large lookup
table can't be built cost-effectively), so decoding involves a number of
multiplies, possibly a division, and a loop to figure out which particular
symbol is being decoded, followed by many of the tasks as in the encoder.
as such, IME for range coding, the encode step is fairly fast, but decoding
is somewhat slower.
so, I say, huffman is faster (if implemented efficiently, there are many
ways to implement it not-particularly efficiently, and many ways to
implement it dead slow as well...).
range coding gets better compression, and is maybe worth the costs in many
cases, but it is a lot slower.
or such.
others may disagree if they want.
| |
| John Reiser 2007-01-20, 6:56 pm |
| Jim Leonard wrote:
> I was thinking of adding entropy encoding to a low-resource project I'm
> working on. For those who have implemented both Huffman coders and
> Range coders, which is faster?
It depends on the implementation, where differences of 20% are common
even for the "same" algorithm on the "same" hardware. And "low-resource"
might constrain space during decompression, for instance. Huffman
decoding tables often are implemented in "common denominator mode",
where the table may be 512 long even though much of that is 4 copies
of a 128-element table [or 128 copies of a 4-element table] because the
"shape" of the code is mostly 7 bits, but with some 2-bit subcases.
The 512 usually saves time, but costs space. For a range decoder,
the state table of probabilities can behave similarly (approximately
double or halve in space, with generally inverse costs in time),
but the details and coefficients matter *a lot*.
> Also, are they asymmetric? Meaning, for both coders, is one operation
> much faster than the other, or are both compression and decompression
> operations roughly the same speed?
Usually in practice, decompression is several times faster, although
for a range coder in theory it could take just as long as compression.
Time for actual encoding is often very similar to time for decoding,
but parsing the input and deciding *how* to encode it takes longer.
For example, on a medium-speed machine the LZMA range coder in UPX
takes around 16 seconds to compress some common Linux kernels, but
only around 1 second or so to decompress them at boot. "gzip -9"
which uses Huffman encoding of some LZ variant with 32KB window,
takes 2 to 3 seconds to compress and less than 1 second to decompress.
Native UPX compression (pure deflate with an infinite window and modified
Elias gamma coding) takes a couple times longer than gzip for compression,
but can save around 10% in space, and often is 25% or more faster during
decompression. The native UPX decompressor also is tiny (400 bytes)
and requires no extra space at all.
--
| |
| cr88192 2007-01-20, 6:56 pm |
|
"John Reiser" <jreiser@BitWagon.com> wrote in message
news:eotmq202lss@enews4.newsguy.com...
> Jim Leonard wrote:
>
> It depends on the implementation, where differences of 20% are common
> even for the "same" algorithm on the "same" hardware. And "low-resource"
> might constrain space during decompression, for instance. Huffman
> decoding tables often are implemented in "common denominator mode",
> where the table may be 512 long even though much of that is 4 copies
> of a 128-element table [or 128 copies of a 4-element table] because the
> "shape" of the code is mostly 7 bits, but with some 2-bit subcases.
> The 512 usually saves time, but costs space. For a range decoder,
> the state table of probabilities can behave similarly (approximately
> double or halve in space, with generally inverse costs in time),
> but the details and coefficients matter *a lot*.
>
yes, probable.
a poorly written huffman coder can be a lot slower than a good range coder,
but a well written huffman coder is hard to beat
I had assumed a case where more than enough memory was available.
assuming deflate, one needs about 288 elems in the primary table, and 32 in
the secondary.
I think in my case, I had hard coded them at something like 288 and 32.
I had also at one point implemented an algo I called "bigflate", which was
essentially an "upgraded" form of deflate supporting longer matches and
windows (4GB window with a 64kB match limit, or at least in theory).
this had hardcoded the tables at 320 and 64 it seems.
>
> Usually in practice, decompression is several times faster, although
> for a range coder in theory it could take just as long as compression.
> Time for actual encoding is often very similar to time for decoding,
> but parsing the input and deciding *how* to encode it takes longer.
> For example, on a medium-speed machine the LZMA range coder in UPX
> takes around 16 seconds to compress some common Linux kernels, but
> only around 1 second or so to decompress them at boot. "gzip -9"
> which uses Huffman encoding of some LZ variant with 32KB window,
> takes 2 to 3 seconds to compress and less than 1 second to decompress.
> Native UPX compression (pure deflate with an infinite window and modified
> Elias gamma coding) takes a couple times longer than gzip for compression,
> but can save around 10% in space, and often is 25% or more faster during
> decompression. The native UPX decompressor also is tiny (400 bytes)
> and requires no extra space at all.
>
the problem is that algos like LZMA, deflate, ... have LZ77 mixed in, which
messes up the issue some: Huffman vs Range speed difference.
actually, assuming which is used, if implemented well, shouldn't make that
much difference in the case of LZ77 decoding, as the main limit is actually
more how quickly things can be moved in memory (where large moves seem
better than smaller ones).
from the description, I will make the following guesses:
they are probably using a hard-coded psuedo-huffman scheme (for literals and
possibly lengths).
elias gamma coding makes more sense for the distances, and possibly the
lengths.
for example, assuming I were designing it from the above description:
00xxxx 256..271 (LZ-escapes)
01xxxxx 0..31 (space, CR, LF, ...)
10xxxxx 96..127 (lower-case letters, a few symbols)
110xxxxxx 32..95 (numbers, uppercase letters, ...)
111xxxxxxx 128..255 extended ascii range.
or (simpler and better for binary data):
00xxxx 256..271
01xxxxxx 0..63
10xxxxxx 64..127
11xxxxxxx 128..255
albeit, if needed:
00xxxxx 256..288 also works (sane if using deflate-like escape codes).
00xxx 256..263 makes sense for gamma-codes (no length in prefix)
and for a possible escape scheme:
256: end of coded block
257: RLE-run (dist=1, or last char)
258: RLE-run (dist=2)
259: RLE-run (dist=3)
260: RLE-run (dist=4)
261: LZ-run (generic, overlapping)
262: LZ-run (non-overlapping)
note that making RLE-runs (technically overlapping matches with a very short
distance) special allows for saving a few bits (no dist gamma code).
one can distinguish overlapping and non-overlapping runs for speed reasons
(particularly, in not having to check explicitly).
however, for all I know their scheme may be somewhat different than this...
another (partly deflate compatible) approach would be to encode the data
only using fixed tables (via a custom encoder or zlib options).
this way, a specialized decoder could skip out on having to do any true
huffman decoding...
or such...
> --
|
|
|
|
|