| Malcolm Taylor 2005-11-26, 3:55 am |
| > Can the algorithm be easily generalized to models with more
> symbols. Eg, bytes instead of bits. Can it be generalized to handle
> data from a higher-order process, or do you need the statistics to
> be simplified down to order-0 with something like BWT. Is it faster
> in these circumsances than ARI? Can you compare the A&B constants
> with an ARI?
It can be used for bytes by doing a binary decomposition.
It is my understanding that the enumerative coding method can only cope
with modelling switching sources. That is, the data must be split into
classes based on the statistics, and the classes are coded entirely
independantly.
For example, you could split the data according to the context (eg. one
class is all bits following '01' another following '10') - this
ironically leads to BWT, with the output segmented on context boundaries
(can also be done on bytes).
The downside with the enumerative method, is that it will never be able
to cope with complex statistical models that use mixtures or weighting
(eg. paq or the more recent PPM variants). Still, it doesn't really
provide any advantages for them anyway, so not too much of a loss.
I do look forward to the BWT based on it though. It seems ideally
suited, and if the claims of speed are correct, then we could have a BWT
with the speed of a huffman version, but the compression of the best
arithmetic versions. It could potentially provide a new efficiency
benchmark!
Malcolm
PS. David, please cut him some slack. He has only displayed the same
zeal for his ideas, that you yourself have displayed for bijective
coders. We should be applauding that, not trying to tear him down.
|