Home > Archive > Compression > November 2005 > New combinatorial coder: tighter & faster than Arithmetic coding
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
New combinatorial coder: tighter & faster than Arithmetic coding
|
|
| nightlight 2005-11-20, 6:55 pm |
| There is a new algorithm for entropy coding, constrained
coding and fast combinatorial enumeration described in the
arXiv preprint cs.IT/0511057 (also submitted to DCC 2006):
http://arxiv.org/abs/cs.IT/0511057
It is coding based on enumerative combinatorics (walks
on Pascal triangle, counting of integer lattice paths).
The new algorithm significantly improves on the previously
known method "enumerative coding" (which is the regular
un/ranking procedure in combinatorics), increasing its speed
and reducing memory size by factor O(n), where n is number
of input symbols). It is the algorithm Rissanen was looking for
when he found a partial solution, arithmetic coding (in 1976).
The new algorithm, even the exploratory unoptimized
prototype runs much faster (see the table below, copied
from the of paper, with speedup factors over 200) and
always compresses better than the best of arithmetic
coders (such as Moffat et al 1998). Besides its usefulness
in entropy coding (for all types of file, image,...
compressors where Huffman, arithmetic or similar coding
is used in the final phase) and for encoding of complex objects
(e.g. permutations, trees, graphs, self-delimiting sequences,..
etc) the new algorithm extends significantly the domain of
enumerative combinatorics accessible to computer explorations.
Here is the abstract:
---------------------------------------------------------
Title: Quantized Indexing: Beyond Arithmetic Coding
Author: Ratko V. Tomic
Quantized Indexing is a fast and space-efficient form of
enumerative (combinatorial) coding, the strongest among
asymptotically optimal universal entropy coding algorithms.
The present advance in enumerative coding is similar to
that made by arithmetic coding with respect to its unlimited
precision predecessor, Elias coding.
The arithmetic precision, execution time, table sizes and
coding delay are all reduced by a factor O(n) at a
redundancy below log(e)/2^(g-1) bits/symbol (for n input
symbols and g-bit QI precision). Due to its tighter
enumeration, QI output redundancy is below that of
arithmetic coding (which can be derived as a lower
accuracy approximation of QI). The relative compression
gain vanishes in large n and in high entropy limits and
increases for shorter outputs and for less predictable data.
QI is significantly faster than the fastest arithmetic
coders, from factor 6 in high entropy limit to over 100
in low entropy limit (`typically' 10-20 times faster).
These speedups are result of using only 3 adds, 1 shift
and 2 array lookups (all in 32 bit precision) per less
probable symbol and no coding operations for the most
probable symbol.
Further, the exact enumeration algorithm is sharpened
and its lattice walks formulation is generalized.
A new numeric type with a broader applicability,
sliding window integer, is introduced.
---------------------------------------------------------
Below are few test results of QI vs. AC (AC: Moffat et al. 1998,
ver 3 from 2000, on its `best' settings, RAM to RAM coding,
all buffers prealloced, no slow streams; only inner coding
loops timed via sub-microsecond timers) for input sizes N
(K=1024 bits) and for given number of 1's randomly distributed
(except for the input "Vary" which represents a non-stationary
source). Both coders tested were binary order 0 coders (i.e.
they consider input data as a sequence of uncorrelated bits).
Each result is an average obtained from 500 random inputs.
Column Speed: coding times ratio TimeAC/TimeQI
(e.g. QI is up to 247.5 times faster than AC, cf. top right)
Column N: output size percent using (A/Q-1)*100
(e.g. AC output is up to 110% larger than QI's, cf. bottom left)
Input array Vary = array of int32 {...,-2,-1,0,+1,+2,...}.
---------------------------------------------------------------
#1's N:4K Speed N:8K Speed N:32K Speed N:128K Speed
---------------------------------------------------------------
8 6.846 68.3 6.421 112.8 5.447 199.6 5.966 247.5x
16 4.175 59.7 3.830 78.5 3.389 138.1 3.730 168.0
32 2.297 49.7 2.090 58.9 2.096 95.9 2.220 117.2
N/64 1.370 40.3 0.606 41.0 0.186 41.7 0.073 42.5
N/32 0.897 30.8 0.343 33.7 0.123 34.2 0.049 34.5
N/16 0.505 21.8 0.197 25.3 0.084 24.6 0.040 24.8
N/8 0.359 14.4 0.155 16.7 0.069 16.8 0.045 16.8
N/4 0.288 9.2 0.138 10.8 0.083 10.6 0.068 10.5
N/2 0.509 6.6 0.445 6.6 0.367 6.4 0.332 6.4
Vary 110.899 21.9 96.736 19.6 71.308 16.5 52.580 14.1
---------------------------------------------------------------
| |
| David A. Scott 2005-11-20, 6:55 pm |
| "nightlight" <nightlight@omegapoint.com> wrote in
news:1132497246.485901.250410@f14g2000cwb.googlegroups.com:
>
> The new algorithm, even the exploratory unoptimized
> prototype runs much faster (see the table below, copied
> from the of paper, with speedup factors over 200) and
> always compresses better than the best of arithmetic
> coders (such as Moffat et al 1998). Besides its usefulness
> in entropy coding (for all types of file, image,...
> compressors where Huffman, arithmetic or similar coding
> is used in the final phase) and for encoding of complex objects
> (e.g. permutations, trees, graphs, self-delimiting sequences,..
> etc) the new algorithm extends significantly the domain of
> enumerative combinatorics accessible to computer explorations.
>
>
Sounds like hype. Especially the statement
"always compresses better than the best of arithmetic coders"
This is obviouly a false statement. It is impossible
for any compressor to always compress better than every other
compressor. First of all the best compressors would have to
be bijective and then the best they can do is change the mappings
from one set to another. Take any compressor that could make one
stream or file smaller. Then the identity compressor could beat it
on files that the previous compressor lengthened since its a loose
loose situation if you make something else smaller or better compressed
then it has to make something else longer.
Second if you actaully took the time to test the online versions
of code that use Moffat style code you would soon relize it not the
best.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
| |
| nightlight 2005-11-20, 6:55 pm |
| > Sounds like hype. Especially the statement "always compresses better
> than the best of arithmetic coders" This is obviouly a false statement.
> It is impossible for any compressor to always compress better than
> every other compressor.
The two are not arbitrary unrelated algorithms -- arithmetic coding
(AC) and QI are both approximations of the exact enumerative coding
(the exact enumeration of messages). The difference is that AC first
approximates the enumeration problem itself (via stirling approximation
and ignoring some factors, cf. the paper p.2), then it applies
additional precision reduction (truncation to r-bit arithmetic). The QI
doesn't do the 1st AC approximation at all, i.e. it works from exact
enumeration and it performs only the second part of the AC
approximation and even here it selects precision truncation which has 4
times smaller redundancy than that of AC's truncation. So, the AC is a
lower accuracy approximation of QI, hence the AC always produces larger
or at best equal output as QI.
What you're referring to is trivial observation that for any given
algorithm A, one can create another algorithm B which will compress a
given input X better than A. It is trivial since you can select a
message X for which A produces an output of more than one bit, then you
simply create an algorithm B which outputs 0 for message X and outputs
1 followed by <output of A> otherwise.
The relation between AC and QI is not of this kind, though. The AC
message index is the same number that QI produces, but multiplied with
a number which is strictly greater than 1 (cf. footnote on p.2 on
double index majorization by AC). This may or may not (depending on
how fractions round to next whole bit) result in a greater number of
whole bits output by AC, but mathematically it cannot ever result in a
smaller number of whole bits in the AC output.
When Rissanen was looking to solve the precision problem of the exact
enumeration (in 1976), he ran into problem which seemed intractable
(due to a missing piece of formalism in the conventional EC, which the
paper above constructs as a natural setting for QI). Hence he
approximated the enumeration first (by majorizing the index), which
allowed him to construct a decodable index -- this was the arithmetic
coding. Rissanen (who is the original creator of arithmetic coding and
of MDL modeling principle) recognized instantly the solution in this
paper, his old nemesis conquered at last, commenting: "I understand
what you have done. Very clever!"
| |
| nightlight 2005-11-20, 6:55 pm |
| > Second if you actaully took the time to test the online versions of code that
> use Moffat style code you would soon relize it not the best.
One can take Moffat et al code (version 3, 2000) and select some other
'accuracy vs speed' tradeoffs. If you know of a version of that coder
which is faster and produces smaller output (the figures I listed refer
to binary 0-order coders test), you are welcome present the comparison
figures between the two and I will gladly check it out. Keep in mind,
the speed factors in the paper show QI is 200+ times faster than AC on
some inputs, and typically 10-20 times faster (this was a generic QI
coder, i.e. the same loop and the same steps coding the dense and the
sparse arrays, and not something tuned to run fast for sparse arrays,
such as run length coder, which otherwise blows on size and speed for
denser inputs). There are dime a dozen quasi-arithmetic coders, which
sacrifice some compression quality (say another 5-10% larger output) to
gain some speed (a factor 2-5).
The new algorithm does much less work (and a simpler work: just an add
of the table addend, a machine word) per coding step and always does
fewer coding steps, thus it has a fundamental reason for a much better
performance than AC.
| |
| Ben Rudiak-Gould 2005-11-20, 6:55 pm |
| nightlight wrote:
> http://arxiv.org/abs/cs.IT/0511057
I looked quickly through the paper, and here's my impression of what it's about.
The idea of "enumerative coding" is that you divide the set of possible
unencoded messages into disjoint subsets with the property that all of the
messages in a particular subset are equally likely to occur (according to
some model). Then the coded message is a code identifying the subset which
contains the message, followed by a code identifying an element of that subset.
For example, suppose that your messages are the bit strings [01]*, and you
divide them into subsets according to how many zeroes and ones they contain:
{}, {0}, {1}, {00}, {01,10}, {11}, {000}, {001,010,100}, ...
If our message has K ones and L zeroes, we can encode the index of the
appropriate subset by encoding K and L at a cost of log K + log L bits.
Encoding the index of the appropriate element of that subset takes
log C(K,K+L) = log ((K+L)! / (K! L!))
bits.
Compare this to arithmetic coding with a static order-0 model. Here we first
encode the length of the message (K+L) and p(one) = K/(K+L), which again
takes log K + log L bits. Then for each one bit in the message we encode log
((K+L)/K) bits, and for each zero bit we encode log ((K+L)/L) bits, for a
total of
K log (K+L)/K + L log (K+L)/L = log ((K+L)? / (K? L?))
bits, where I write N? to mean N^N.
Arithmetic coding is asymptotically as efficient as enumerative coding,
since O(N?) = O(N!), but enumerative coding will always use fewer bits,
since N! < N? for interesting values of N.
I think that this paper describes an efficient algorithm for enumerative
coding for the order-0 model I used above, but generalized to arbitrarily
many symbols. If so, it won't replace coders based on sophisticated models
with arithmetic-coding back ends, but it may find a niche in applications
that use static order-0 models, such as video compression.
-- Ben
| |
| Willem 2005-11-20, 6:55 pm |
| nightlight wrote:
)> Sounds like hype. Especially the statement "always compresses better
)> than the best of arithmetic coders" This is obviouly a false statement.
)> It is impossible for any compressor to always compress better than
)> every other compressor.
)
) The two are not arbitrary unrelated algorithms -- arithmetic coding
) (AC) and QI are both approximations of the exact enumerative coding
) (the exact enumeration of messages).
AC is not only enumerative coding. It is also used with adaptive models
and higher-order statistics.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| nightlight 2005-11-20, 6:55 pm |
| > If so, it won't replace coders based on sophisticated models
> with arithmetic-coding back ends,...
The AC modeling paradigm ('probability of next single symbol' modeling
interface bottleneck) was created following path of 'virtue out of
necessity'. The necessity being the particular entanglement of the
inner most coding steps with the model parameters p(a). The EC/QI can
code the same way, too, except they can also code a much better way
(Kolmogorov modeling paradigm). There is a bit on the modeling on the
page 9 in the paper and a much longer discussion in the tech report
cited:
TR05-0625 "Quantized indexing: Background information"
http://www.1stworks.com/ref/TR/tr05-0625a.pdf
As you may have noted in the row "Vary" of the test, which is a
non-probabilistic sequence for the order-0 coders, one can easily
provide inputs which will "surprise" catastrophically the arithmetic
0-order coder, but you can't provide inputs on which AC works well and
EC fails as dramatically (as does AC on the data "Vary"). The same
phenomenon exists for the higher order coders, as soon as the input
falkls outside of the "predictable" at a given order. The adaptive AC
fails badly, while EC merely comes out slighly sub-optimally (since the
effect of a "suprise" is always much greater on the predictor scheme
than on the 'descriptor' scheme; with EC only the frequency table
encoding may turn out suboptimal, while the bulk of output, the index
is still optimal).
The essential difference between the adaptive AC and EC/QI is that AC
subjects the modeling engine to an aritificial constraint (the
memoryless coder allowed to have only a single symbol at a time, a
limitation largely not relevant in practical coding since Morse
telegraph days). The EC can do that, too, but it normally doesn't.
Trying to predict furture, as AC modeling paradigm insists, is
inherently harder than describing the present or past. Even the
practical forms of AC modeling recognize the problem and allow a
backdoor cheating against the ideal pure prediction (via the fragile
and ad hoc escape mechanisms, where the coder is "allowed" to look
ahead and clue the decoder as to what is coming).
As another aspect of the distinction, note that the EC mdeling engine
normally needs to know only whether, say messages X and Y have the same
probabilities (whatever their values might be), while the AC modeling
engine needs to know what the actual values of the probabilities are,
which is a much harder task.
The upshot is that the native EC/QI modeling (the Kolmogorov/MDL
scheme) is much simpler and more robust than the predictive AC paradigm
(EC/QI can reuse the existent AC modeling engine, though, e.g. by
splitting the output into different probability classes, cf p. 9, [N7]).
| |
| Willem 2005-11-20, 6:55 pm |
| nightlight wrote:
)> If so, it won't replace coders based on sophisticated models
)> with arithmetic-coding back ends,...
)
) The AC modeling paradigm ('probability of next single symbol' modeling
) interface bottleneck) was created following path of 'virtue out of
) necessity'. The necessity being the particular entanglement of the
) ...
)
) As you may have noted in the row "Vary" of the test, which is a
) non-probabilistic sequence for the order-0 coders, one can easily
) provide inputs which will "surprise" catastrophically the arithmetic
) 0-order coder,
You don't seem to grasp the existence of higher-order models.
Would something like PPM be easy to implement using this QI coder ?
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| nightlight 2005-11-20, 6:55 pm |
| > AC is not only enumerative coding. It is also used with adaptive models
> and higher-order statistics.
The coding aspects in both cases are message enumerations -- with EC in
the exact lists of messages (performed using the virtual dictionary of
combinations), with AC in the cummulative probablity space (which only
asymptotically for large n approaches the exact EC coding, while being
uniformly suboptimal for any finite n). It is a historic fluke that EC
was presented in conjunction with the 'descriptive'/universal way of
modeling (MDL/Kolmogorov), while AC with the predictive way (such as
PPM).
Both coding methods can work with modeling engines either way, though.
The coding algorithm of AC does not separate cleanly the model
parameters from the coder arithmetic (by virtue of hardwiring
'probabilities of next single symbol' into its coding addends). Thus,
its native mode is the 'predictive' way, as a virtue out of necessity.
EC has a much cleaner separation between the modeling parameters and
the coding computations -- the modeling engine of EC defines the
enumerative classes (the subsets of "equiprobable messages"), while
coder computations deal only with the universal combinatorial
properties of sequences (belonging to a given class). The result is a
much faster operation, since the 'universal properties of sequences'
are independent of the particular source parameters, thus they can be
precomputed and optimized away, outside of the coding loop, as it were.
If AC were to code via precomputed values, it would need a separate
table for each set of source probabilities, which is impractical even
for a static binary source.
| |
| Willem 2005-11-20, 6:55 pm |
| nightlight wrote:
)> AC is not only enumerative coding. It is also used with adaptive models
)> and higher-order statistics.
)
) The coding aspects in both cases are message enumerations -- with EC in
) the exact lists of messages (performed using the virtual dictionary of
) combinations), with AC in the cummulative probablity space (which only
) asymptotically for large n approaches the exact EC coding, while being
) uniformly suboptimal for any finite n).
Not true. Adaptive coding is trivially capable of outperforming a model
that uses only the cumulative probability, in cases where there are local
variations in the probability distribution.
Also, adaptive coding works very well with higher-order models, where it
is intractable to transmit the model because of its exponential size.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| Ben Rudiak-Gould 2005-11-20, 6:55 pm |
| nightlight wrote:
>The AC modeling paradigm ('probability of next single symbol' modeling
>interface bottleneck) was created following path of 'virtue out of
>necessity'. [etc, etc, etc]
Look, I understand all of this. I understand the weaknesses of PPM and of
arithmetic coding, and I understand that an enumerative coder can do better
in principle. None of that matters. What matters is this: what is your
contribution? Can you, in practice, do a better job than state-of-the-art
compressors based on PPM+ari? Going on about theoretical benefits is a waste
of time. Theoretically, the best way to write a chess-playing program is to
use minimax on the complete game tree. Actual chess-playing algorithms are
gross hacks which attempt, in a highly inelegant way, to approximate that
ideal. Pointing out that that's true accomplishes nothing. Everyone versed
in the art knows it already.
>As you may have noted in the row "Vary" of the test, which is a
>non-probabilistic sequence for the order-0 coders, one can easily
>provide inputs which will "surprise" catastrophically the arithmetic
>0-order coder, but you can't provide inputs on which AC works well and
>EC fails as dramatically (as does AC on the data "Vary").
Two questions. First, do you understand the difference between adaptive
order-0 and static order-0 models? Second, how would the "QI" and "AC" in
your tests perform if the input consisted of N/2 zeroes followed by N/2 ones?
-- Ben
| |
| nightlight 2005-11-20, 6:55 pm |
| >> ) As you may have noted in the row "Vary" of the test, which is a
[color=darkred]
> You don't seem to grasp the existence of higher-order models.
That was a test of the same order 0 coders. At any order, you can
suprise the "predictive" scheme catastrophically (unless it cheats via
some ad hoc esc mechanism, allowing it to use temporarily universal way
of coding), while you can't do so to a descriptive/universal scheme.
> Would something like PPM be easy to implement using this QI coder ?
There are couple ways, at least, to do that. QI can code (via lattice
jumps) to the 'probability of the next single symbol' exactly as AC,
but that costs it much of speed advantage (since mantissa scaling used
by the jumps, requires multiplications & divisions). But it can also
code from the AC modeling engine instructions much faster by splitting
the outputs into the probability classes (which needs also an error
estimates on the probabilities given to the coder; the scheme grows the
classes in the manner of Context Tree Weighing modeling scheme).
OTOH, it could do much better, with any given a priori knowledge about
the input that is fed to PPM, by coding the native EC way, which is by
splitting the input into proper enumerative classes. Such segmentation
can be done based on the last s symbols (e.g. for order-s static Markov
source; a method similar to Context Tree Weighing to select the output
stream), or dynamically via BWT-like segmentation which doesn't assume
a priory idealized finite order Markov source (you can check the
discussion of this in the Ref [23] cited earlier). Even with the kludgy
ad hoc entropy coding schemes (such as MTF variants which crudely
mangle the enumerative classes present in the final BWT matrix), BWT is
already surprisingly competitive with PPM in compression quality, while
executing much quicker. With the proper handling of these presently
mangled BWT enumerative classes, which QI is naturally suited to do
quickly and optimally, BWT should consistently outperform PPM not just
in speed but in compression quality.
In short, AC is a particular historical approximation of QI, hence
anything AC can do, QI can do but generally much faster (or at least as
fast) and generally more accurately (or at least as accurately).
Similarly, the PPM or generally the predictive way of modeling via the
'probability of the next single symbol' is another historical accident
(tailored to a particular hardwired coding algorithm of AC), by no
means the only (and certainly not the best) way to model the properties
of finite sequences of symbols. AC algorithm happens to require that
all properties must be translated into exactly such form (as 'the
probabilities of the single next symbol') so it can encode it. That is
more of a handicap than a strength for the modeling engine to bend and
distort everything else around the AC computational peculiarities, to
have to translate and funnel all the conceivable properties of symbol
sequences available to modeling engine through that particular coder
interface bottleneck.
So, instead of questing anyones grasp of 'higher order models' you
might wish to reaxamine first, how do some manage to convince
themselves to perceive, with all their sincerity, a plain, gross
handicap as a great strength.
| |
| nightlight 2005-11-20, 6:55 pm |
| > Can you, in practice, do a better job than state-of-the-art compressors
> based on PPM+ari?
QI is a very new algorithm (and a first truly practical form of genuine
EC). A better BWT variant, without mangling of its contexts (as the MTF
phase presently does), is certainly high on the list of things to apply
QI to. It is still only one among many interesting possibilities
opened by the existence of practical EC. The whole field of practical
EC modeling is presently in a very primitive form and most people have
many misconceptions what it is (or generally on modeling, identifying
it entirely with the AC modeling paradigm). Hopefully a publication of
a clean, well performing reference implementation of QI (on which I am
working, and which will perform even better than the tests results of
the crude research prototype shown in the preprint) in the forthcoming
months, should bring more people to try out new ways of modeling and
answer those questions.
> First, do you understand the difference between adaptive order-0 and
> static order-0 models? Second, how would the "QI" and "AC" in
> your tests perform if the input consisted of N/2 zeroes followed by N/2
> ones?
The sequence "Vary" was a sequence similar to that, since the negative
half of that sequence is mostly 1's and the positive part mostly 0's.
The rest of the answer obviously depends on how the QI segments the
sequence, what are the rules of the game. If you insist on requiring
that QI/EC models ...000111... sequence in a single block of size N,
then obviously it will produce output which is N + log*(N) + 1/2 log(N)
bits (since it needs to encode the length of a sequence in log*(N) =
log(log(...N) bits and the count of 1's in log(N) bits, while the
index is C(N,N/2) which is Catalan number with approx N - 1/2 log(N)
bits). Of course, if you wish to compare apples to apples, then if the
predictive AC coder is allowed some adaptation, one would need to allow
QI to also have some kind of corresponding capabilities (i.e. not to
function as a static order-0 coder), such as allowing the encoder to
select the input segmentation, in which case it would produce output
smaller than adaptive AC (as array "Vary" illustrates). Comparing
static vs adaptive order-0 coders is comparing apples and oranges.
There is no instrinsic algorithmic constraint for EC/QI to code static
order-0 way (i.e. a la Lynch-Davisson 1966 EC coder). If you take AC
and EC/QI and make both code static order-0 way with the same
segmentation, the EC/QI will give you slighly smaller output and code
much faster. Then if you make AC adaptive, with adaptation rate faster
than N/2, you need to decide what is the corresponding EC/QI mode of
coding? Whatever it is, it certainly cannot be the same scheme that was
just tested as a comparison of the static order-0 coders, which is what
you seem to be hinting it ought to be. The most natural EC/QI
counterpart to AC adapting faster than N/2 is to allow EC/QI coder the
segmentation choice to at least 2 parts, in which case it will come out
on top again (as the array "Vary" showed; even though the QI coder
tested was somewhat more primitive than that: it had simply used fixed
1Kbit blocks).
In any case, this particular tangent is more about verbal pigeonholing,
rather than about the substance of the difference. The basic difference
between AC and QI coding algorithms is that AC enumerates messages via
cummulative probabilities and EC/QI via exact ranking of combinations
of symbol sequences. The latter way is more accurate and it is much
faster (since much of the coder work is with the universal sequence
properties, thus it is precomputed in the quantized binomial tables,
outside of the coding loop).
The modeling scheme is an entirely unrelated issue. You can select a
modeling scheme where both coders servicing that modeler perform
equally poorly (e.g. by forcing QI to do lattice jumps a la AC, in
which case it will have to work exactly as hard per symbol as AC; with
adaptive output splitting QI will compress well as adaptive AC, but
code much faster). Generally, though, given the same modeling scheme,
QI will always perform at least as well as AC, in speed and in accuracy
(but mostly better, especially in speed). Anything else is comparing
apples and oranges (note that the test in the row "Vary" is of that
apple-vs-orange kind, except that QI was the one slightly handicapped
since it wasn't really as adaptive as the EC counterpart of adaptive AC
ought to be i.e. free select at least one input partition for AC
allowed to adapt faster than N/2; QI/EC produces 0 length index for
sequence of all 1's or all 0's).
| |
| nightlight 2005-11-20, 6:55 pm |
| > Adaptive coding is trivially capable of outperforming a
> model that uses only the cumulative probability, in cases
> where there are local variations in the probability distribution.
You are comparing different modeling schemes (adaptive vs static),
while I was comparing the bare coding algorithms (both can be attached
to the same model). At any given point AC gives a position (an
interval) on the cummulative probability line for the sequence seen so
far, while EC/QI gives the exact ordinal of the same sequence in the
same list of sequences (the list is what model produces, we assume the
same model and look at the coder difference). The only coder level
difference is that the cummulative probabilty index is an approximation
(a uniform majorization via the Stirling approx. of binomials and
dropping of the two factors < 1, cf p. 2 and ref [23] for more details)
of the exact ordinal index and that the QI algorithm can compute it
generally much faster than AC algorithm can (and always at least as
fast). The reason for the speed difference is discussed in ref [23] --
it is basically that the QI algorithm has much cleaner separation
between the coding computations and the specific instance properties of
the source, than the AC alogotithm, hence QI can shift much of the
coding computation outside of the instance specific coding loop.
> Also, adaptive coding works very well with higher-order models, where
> it is intractable to transmit the model because of its exponential size.
This is an interesting point, but outside of the comparison of QI and
AC coding algorithms.
Taken on its own, though, your statement is a bit self-contradictory,
since the model information obviously can be transmitted in
sub-exponential size (e.g. via Adaptive AC scheme). What you really
mean is that the first simple-minded encoding of the model parameters
that you can came up with, off the top of your head, is exponential.
Well, ok, so what? For example, the BWT, which is a bit less obvious
way to extract and transmit a higher order models, is not exponential
in size (and it is also not an adaptive PPM scheme). In fact, even when
coupled with the crude MTF obliteration of the context boundaries, it
performs nearly as well as PPM in compression quality (while running
much faster).
| |
| nightlight 2005-11-20, 6:55 pm |
| The most relevant papers for this discussion are available online at:
Quantized Indexing References
http://www.1stworks.com/ref/tr/tr05-0625r.htm
Specifically of interest here are the papers:
5. J. Rissanen Generalised Kraft inequality and arithmetic coding
IBM J. Res. Dev. 20, 198-203, 19 76
http://www.research.ibm.com/journal.../ibmrd2003B.pdf
27. J. Rissanen Arithmetic codings as number representations
Acta Polyt. Scand., Math. & Comp. Sc. Vol. 31 pp 44-51, 1979
http://www.1stworks.com/ref/ariNumRepr.pdf
28. L.D. Davisson Universal noiseless coding
IEEE Trans. Inform. Theory IT-19 (6), 783-795, 1973
http://cg.ensmp.fr/%7Evert/proj/bib...73Universal.pdf
29. J. Rissanen, G.G. Langdon Universal Modeling and Coding
IEEE Trans. Inform. Theory IT-19 (1), 12-23, 1981
http://cg.ensmp.fr/%7Evert/proj/bib...81Universal.pdf
34. J.G. Cleary, I.H. Witten A Comparison of Enumerative and Adaptive
Codes, IEEE Trans. Inform. Theory IT-30 (2), 306-315, 1984
http://www.1stworks.com/ref/Cleary84Enum.pdf
| |
| Phil Carmody 2005-11-20, 6:55 pm |
| "nightlight" <nightlight@omegapoint.com> writes:
> the speed factors in the paper show QI is 200+ times faster than AC on
The question this raises for me is why AC implementations are 200x
slower than reading from RAM? That sounds a little slow. Computer
arithmetic is now incredibly fast, and RAM is slow in comparison.
/If/ AC is only 50x slower than reading from RAM, say, (a figure that
wouldn't surprise me), then there's no way on earth that QI could be
200 times faster in _any_ circumstances.
Phil
--
If a religion is defined to be a system of ideas that contains unprovable
statements, then Godel taught us that mathematics is not only a religion, it
is the only religion that can prove itself to be one. -- John Barrow
| |
| Phil Carmody 2005-11-20, 6:55 pm |
| "nightlight" <nightlight@omegapoint.com> writes:
>
> The sequence "Vary" was a sequence similar to that, since the negative
> half of that sequence is mostly 1's and the positive part mostly 0's.
What do you mean by "negative half" and "positive part"?
Applying the concept of sign to sequences of bits is, erm, unconventional.
Phil
--
If a religion is defined to be a system of ideas that contains unprovable
statements, then Godel taught us that mathematics is not only a religion, it
is the only religion that can prove itself to be one. -- John Barrow
| |
| nightlight 2005-11-20, 6:55 pm |
| > The question this raises for me is why AC implementations are 200x
> slower than reading from RAM?
The AC has to process and encode each bit, 32 per machine word, for
which QI needs only to establish if there is any of the "less frequent
symbols" (e.g. 1). Even though that AC implementation uses large loop
unrolling and other speed optimizations, it only needs 8 times more
work per one bit than QI needs to fetch the dword and answer "is it
only the more frequent symbols in this word" (e.g. w==0 or w==(-1)) to
get the speed factor of 256. If you look their code (binary coder, no
contexts):
http://www.cs.mu.oz.au/%7Ealistair/arith_coder/
you will see that they certainly do that much more work per encoding of
each bit, in fact even more than the ratios show, if you count the
numbers of instructions. But, as you note, the fast processors with
lots of cache mask out some of the speed difference, penalizing more
the new memory reads, which both coders have to do. On an older 400Mhz
Celeron laptop the speed ratios are 1.5-2 times larger in favor of QI
than on the recent VAIO lapop where those figures in the preprint came
from.
| |
| nightlight 2005-11-20, 9:55 pm |
| >> The sequence "Vary" was a sequence similar to that, since the negative
>
> What do you mean by "negative half" and "positive part"?
> Applying the concept of sign to sequences of bits is, erm, unconventional.
The negative 32 bit integer, say -1 has 32 1's. The idea was to give
both coders (which were order 0 binary coders, thus they view data as
sequence of uncorrelated bits) some simple sequence which has
regularity invisible to order-0 (but visible to some higher order
coder). It was meant to illustrate the robustness of the universal way
of coding of the EC/QI coder and the fragility of the
predictive/adaptive way of the AC coder.
The origin of the greater robustness in the "descriptive" way of
modeling is in the fact that effect of the "surprise" affects chiefly
the count component of the output, which for the 1K input block is on
average a 10 bit number (e.g. specifying # of 1's in the block). The
rest is a pure combinatorial index, giving the shortest possible
decription, absent any further knowledge by the coder, on how that
number of bits is placed in that block. With the adaptive AC, each
coding step and the entire output is affected by the incorrectly
conjectured probabilities. Thus it produces as much as twice the
output size of the QI.
Obviously, the AC can be made to code the 'universal'/descriptive way,
in which case the error would be much smaller (similar to the rest of
tables, for the matching density of 1's and input size i.e. few percent
extra output), while the speed ratio would still remain dramatically in
QI favor.
The example was meant to help demythologize a bit the
'adaptive/predictive coding mystique' that lots of people are
apparently mesmerized by (as the previous discussion in this thread
illustrates). The 'predictive coding', at any modeling order, is a
handicap (a historical fluke of bending the entire modeling engine to
the peculiarities of the particular interpretation of the AC
computational algorithm, the perceived need to have probability of next
symbol to perform the encoding), not a strength as often touted, and a
'descriptive'/universal type algorithm at the same order of modeling
will always come out ahead.
When AC is made to code the universal/descriptive way, the
"probabilities" used for the computation are simply interpreted as
normalized plain counts (instead of the presumed infinite n limits of
such ratios, the 'probabilities", which is the adaptive AC
interpretation of the AC algorithm and which contains a much bigger
implicit assumption about the infinite rest of the sequence, that in
the case of the array "Vary" takes it to a catastrophically wrong
track). But in this universal mode, AC uses the plain normalized counts
in the exactly same way as the counts of EC/QI, slightly less
accurately (by 1/2 log(n) excess bits in the AC output, for the
sequence of length n) and encoded much more slowly. The Rissanen's 1976
coder (which was the first finite precision arithmetic coder) was
coding this way. See the reference:
5. J. Rissanen Generalised Kraft inequality and arithmetic coding
IBM J. Res. Dev. 20, 198-203, 19 76
http://www.research.ibm.com/journal.../ibmrd2003B.pdf
where the quantities Phi (in eq. 5) are merely the approximate binomial
coefficients of regular EC (see footnote on page 2 in the preprint and
ref [23] for details). Although Rissanen suggests in that paper a table
based faster coding alternative (which allows skipping of the most
frequent symbol as in QI), for the AC one needs different tables of
O(n^2) size for each source probability, while QI needs only a single
table of O(n^2), the quantized binomials C(n,k), which is valid for all
quasi-stationary sources and all probabilities.
That's why the AC algorithm was a doubly partial solution of the
precision problem of the exact enumeration -- it not only produced an
approximate (and a larger) enumerative index, but it also left unsolved
the O(n^3) table size problem of the exact enumeration (which it then
worked around by trading the table size for quite a few extra CPU
cycles, as illustrated in the tests chart). The QI solves both problems
- it takes the factor O(n) out of the exact EC computation speed, as
did AC, and it takes the factor O(n) out of the table sizes, which AC
missed (thus QI doesn't need to trade off the coding speed to reduce
the table sizes as AC does; although QI can do that kind of tradeoffs
too, cf. page 9, [N2] in the preprint, but from a much better starting
point than AC). Rissanen was pleasantly surprised to see that "great
difficulty" (see the preprint pages 1-2, for citations) finaly solved
optimally and commented on the solution "Very clever!"
| |
| Malcolm Taylor 2005-11-20, 9:55 pm |
| How about comparing QI with a faster and more efficient binary
arithmetic coder such as the fpaq one (see
http://www.cs.fit.edu/~mmahoney/compression/ for more info).
Malcolm
nightlight wrote:
>
>
> The AC has to process and encode each bit, 32 per machine word, for
> which QI needs only to establish if there is any of the "less frequent
> symbols" (e.g. 1). Even though that AC implementation uses large loop
> unrolling and other speed optimizations, it only needs 8 times more
> work per one bit than QI needs to fetch the dword and answer "is it
> only the more frequent symbols in this word" (e.g. w==0 or w==(-1)) to
> get the speed factor of 256. If you look their code (binary coder, no
> contexts):
>
> http://www.cs.mu.oz.au/%7Ealistair/arith_coder/
>
> you will see that they certainly do that much more work per encoding of
> each bit, in fact even more than the ratios show, if you count the
> numbers of instructions. But, as you note, the fast processors with
> lots of cache mask out some of the speed difference, penalizing more
> the new memory reads, which both coders have to do. On an older 400Mhz
> Celeron laptop the speed ratios are 1.5-2 times larger in favor of QI
> than on the recent VAIO lapop where those figures in the preprint came
> from.
>
| |
| David A. Scott 2005-11-21, 3:55 am |
| "nightlight" <nightlight@omegapoint.com> wrote in
news:1132514981.050575.313440@g44g2000cwa.googlegroups.com:
>
> That was a test of the same order 0 coders. At any order, you can
> suprise the "predictive" scheme catastrophically (unless it cheats via
> some ad hoc esc mechanism, allowing it to use temporarily universal way
> of coding), while you can't do so to a descriptive/universal scheme.
>
>
Look is your so called coded able to actaully compress some files.
Do you have test files to test the order 0 codings. So one can campare
it to real world order 0 encoders. With out real examples one can test
I don't belive you.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
| |
| David A. Scott 2005-11-21, 3:55 am |
| Phil Carmody <thefatphil_demunged@yahoo.co.uk> wrote in
news:87wtj2ncpl.fsf@megaspaz.fatphil.org:
> "nightlight" <nightlight@omegapoint.com> writes:
>
> What do you mean by "negative half" and "positive part"?
> Applying the concept of sign to sequences of bits is, erm,
> unconventional.
>
> Phil
I guess that means if your given say a file with say one
thousand bytes of zero followed by say one thousand bytes of all
ones 0xFF he has no idea how his compressor or some arbitrary
arithmetic would compress it. It really should not be that hard
to test. But maybe he can't test it with his method yet.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
| |
| Willem 2005-11-21, 3:55 am |
| nightlight wrote:
) The negative 32 bit integer, say -1 has 32 1's. The idea was to give
) both coders (which were order 0 binary coders, thus they view data as
) sequence of uncorrelated bits) some simple sequence which has
) regularity invisible to order-0 (but visible to some higher order
) coder). It was meant to illustrate the robustness of the universal way
) of coding of the EC/QI coder and the fragility of the
) predictive/adaptive way of the AC coder.
If an adaptive order-0 AC coder performs badly on a sequence that has
varying statistics, then you wrote a shitty adaptive model.
A good adaptive model views symbols that are nearer with more weight.
It's no wonder that badly written code is outperformed.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| nightlight 2005-11-21, 3:55 am |
| > How about comparing QI with a faster and more efficient binary
> arithmetic coder such as the fpaq one (see
> http://www.cs.fit.edu/~mmahoney/compression/ for more info).
QI is a fundamental algoritm not a program (although several research
variants of the algoritham are implemented). If your algorithm needs to
do more work than about 3 adds, 1 shift and two 1-dim array reads, per
less probably symbol, and any operation at all (other than traversing
the array to examine what the symbols are) per more probable symbol,
then it is inherently slower algorithm than QI. Any arithmetic coder
falls in this category, since a general purpose AC needs to encode 0's
and 1's explicitly. Check the simplicity of the generic EC/QI coding
loop (from p. 8 in [23])
23. TR05-0625 "Quantized indexing: Background information"
http://www.1stworks.com/ref/TR/tr05-0625a.pdf
I=0; // init cumulative path index
for(k=n=0; n<M; ++n) // M is the number of input bits
{ // or a block size
if (getbit(n)==1) // get bit at 0-based offset n
{ // skip on bit=0, add on bit=1
++k; // increment count of 1's found
I=I+C(n,k); // add binomial to the path index
}
} // k is now the final count of 1's
The encoding step of the less frequent symbol (bit=1 by convention; for
the more frequent symbol only the outer loop executes) is simply an add
I=I+C(n,k); of a quantized binomial coefficient C(n,k), which is a 32
bit integer (taken out of an 1-dim array as (row[n])[k], with another
array giving the row n subarray) to a large integer "I" (which is of
the size of entire output). The 32 bit int C(n,k) is always added to
the leading 32 bits of "I" (thus at most 1 carry can occur into
leading 0's of "I"). The symbolic getbit(n) gets the next unprocessed
input bit, which is just a shift of a 32 bit word holding the next set
of 32 input bits. That is really as simple and as fast as entropy
coding can ever get. The reason for such extreme simplicity is that the
real caluculation, the one of the quantized addends C(n,k), was done
outside of the coding loop.
That's the basic beauty of the EC/QI's much cleaner division of labor
than the one of AC (see [23] pages 19-26, 30-32), where each addend is
inseparably entangled with the source instance properties (the
particular symbol probabilities at that point), hence the addends can't
be precomputed generically/universally for all sources an taken outside
of the AC coding loop. That gives QI a huge and a fundamental
algorithmic edge which can't be compensated by any special programming
tricks available uniquely to AC.
Note also that for the test we had stripped the Moffat et al. V3 binary
coder to its bare engine, replaced stream i/o with memory to memory
coding, no allocations were done within the timed loop, model decisions
were taken out (since only order 0 was tested) and all dynamical coding
options (alternatives that didn't matter) were hardwired as static to
avoid time waste in the test. So the arithmetic coder tested was
already quite a bit faster than the out-of-a-box Moffat et al code. We
wanted it to take its best shot, the best it possibly could (since QI
didn't need any such implementation/code level edge, being so much more
efficient at the fundamental, algorithmic level).
Any other arithmetic coder which improves on Moffat coder in these
kinds of simple common sense but otherwise trivial ways (as the
fpaq0.cpp seems to show), without providing any essential improvement
at the algorithmic level will do roughly the same as the chart showed
for the streamlined Moffat's coder.
| |
| Willem 2005-11-21, 3:55 am |
| nightlight wrote:
)> How about comparing QI with a faster and more efficient binary
)> arithmetic coder such as the fpaq one (see
)> http://www.cs.fit.edu/~mmahoney/compression/ for more info).
)
) QI is a fundamental algoritm not a program (although several research
) variants of the algoritham are implemented). If your algorithm needs to
) do more work than about 3 adds, 1 shift and two 1-dim array reads, per
How large are these arrays ?
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| nightlight 2005-11-21, 7:55 am |
| > If an adaptive order-0 AC coder performs badly on a sequence
> that has varying statistics, then you wrote a shitty adaptive
> model.
> A good adaptive model views symbols that are nearer with more weight.
>
Making AC adapt faster to more variable inputs, costs in extra
redundancy for the stationary inputs (which don't need a waste in code
space equivalent to an explicitly reserved escape code space used to
change the statistics on the fly). You can't have it both ways, optimal
for the stationary or slow varying source yet adapting fast to the
dynamic source, while coding within the adaptive AC scheme.
The faster AC adaptation implies fewer significant bits in the
probabilities p(a). You will have roughly 1/2 log(n) bits in p(a) if
you count exactly, with full weight, the n past bits, i.e. p(a) has
only half the precision bits of the precision of n. Any forgetting of
the older stuff (be it via sharp cutoff or some decaying law) simply
shortens the effective n, thus the precision of p(a). The fast changing
AC thus comes down to the similar approximation that quasi-arithmetic
coders do when they approximate p(a) with numbers which have only few
bits set, so they can scale the AC interval quicker via few shifts and
adds instead of multiplies. Here it would make AC more robust to the
sudden drastic changes of statistics, but at the cost of extra 5-10
percentage points compression loss for the stationary inputs.
Thus, this is a matter of a tradeoff you need to select. In the extreme
preference for the robustness to the input dynamics, you can easily
have AC code the 'universal' / descriptive way instead of the usual
adaptive / predictive way, i.e. make it code exactly as QI does, which
is extremely robust to statistical variability and which with QI is
optimal, while the AC coding this way will still produces an excess of
O(log(n)) output bits (where n is the input size or indexing block
size; the regular adaptive AC has this funadmental excess vs QI as
well, cf. preprint footnote on page 2). The O(log(n)) bits AC output
excess becomes pretty significant compared to the output size if the
source entropy is low (see top rows in the earlier chart) or if the n
is small (which may be required for very dynamic sources, in order to
segment the input into quasi-static sections), as illustrated by the
leftmost column in the test chart
Moffat et al 1998, which is the AC we used, explain their particular
tradeoff in their paper (hyperlinked earlier). You can tell them what
you think of their order 0 binary coder if you think you have a case.
I don't think you do. The problem is inherent in the adaptive /
predictive coding scheme -- they could choose to look Ok in the last
row of the chart, while looking much worse in the top rows (when just
few bits are set), or they could choose what they did, look bad in the
last row and look Ok in the top rows. On the other hand, there is no
such an input where they would look Ok, while QI would look bad (not
just at order-0, but at any order source, provided both coders code in
the same order model, obviously).
It is a trivial observation (but which you nevertheless keep bringing
up in different wordings), that you can make the two coders use
inadequate order model for some source and have both look very bad on
such source, compared to some other coder using a higher order model.
As long as you compare apples to apples and oranges to organges, thus
let them use the same order model (adequate or inadequate to the order
of the source), the adaptive AC will always do worse than QI, and very
much so for the very unpredicatble sources. That is the fundamental
weakness or fragility of any predictive scheme when faced with
unpredictable (for the model order that coder has) inputs.
The universal / descriptive schemes don't have such fatal fragility,
which was the whole point of the Davisson's minimax universal codes
(constructed using enumerative coding) -- they may code slightly
suboptimally, when given the inadequate order model for a given source,
but their worst case is never farther than about O(log(n)/n) bits per
symbol from the output of the best specialized coder which was tuned
specifically to the particular source. (And when they do know or model
the proper source parameters, the EC/QI code optimally.)
The point of the array "Vary" in the test was to illustrate how an
adaptive AC, when tuned to perform nearly optimally on the stationary
inputs, thus to appear fairly competitive with QI on such inputs (with
no more than about 6-7% worse output in the rest of the chart), is very
fragile and it breaks down catastrophically if the statistics suddenly
changes. The array "Vary" has only one drastic change of statistics, in
the very middle of the array when -1 goes to 0, hence the negative
effects on AC decrease as the array size increases. Had the statistics
changed again in the longer arrays such as in 128k input, the AC
performance would have been reset down, to the worse performance
figures from the shorter arrays.
| |
| nightlight 2005-11-21, 7:55 am |
| >> two 1-dim array reads, per
>
> How large are these arrays ?
In the code used for the test results in the preprint, the two arrays
were n^2/4 of 32 bit entries (just mantissa was stored; exponent was
computed cf. report page 9, [N2]) in the main quantized binomial table
and n pointers in the row pointer array (pointing into the rows of
binomial table). The n used was 1024, hence the big table was 256K
entries i.e. 1 Mbyte, which was about 0.05% of the memory (2G) in the
machine it was tested on. There would have been no visible difference
in the compression ratios for the the range of the tests, had only 16
bit mantissas been used by QI, thus the table could have been 1/2
Mbyte. But the research prototype code wasn't really optimized, it
simply kept table space for the maximum mantissa it may ever use and
there was no need to waste time coding separate 16-bit mantissa
routines to save a bit of table memory.
If one were aiming to merely match the AC (and not to be uniformly
ahead) on compression, a much smaller table could have been used, e.g.
n=256 needs for the big table only 16K entries of 8 bits each to
produce an output excess below 1 bit per 256 bit input block (the QI
worst case redundancy is below log(e)/2^(g-1) bits/symbol for precision
g). One can also trade off the table size for some execution time (cf.
preprint, p. 9, [N2]), by skipping the rows of binomials, e.g. by
skipping evey 2nd row in Pascal triangle, you need an extra add to
compute a missing binomial, hence on average it has on average 1/2 add
per step more work, but the big table size drops in half, e.g. to 8k
for the n=256 fixed block coder.
Similarly, the same tables of log(x!) used for the exponent
computations (see p.9 [N2]), are accurate enough to give you all but
the lowest 10-11 bits of the 32 bit mantissa for the quantized
binomials, although, again, this kind of saving (to cut down the big
table to 1/3rd) wasn't a great priority. For an embedded production
coder, yes, one would certainly take advantage of such savings. For a
desktop coder, e.g. in a video codec, it probably wouldn't matter much.
| |
| nightlight 2005-11-21, 7:55 am |
| >> two 1-dim array reads, per
> How large are these arrays ?
In the code used for the test results in the preprint, the two arrays
were n^2/4 of 32 bit entries (just mantissa was stored; exponent was
computed cf. report page 9, [N2]) in the main quantized binomial table
and n pointers in the row pointer array (pointing into the rows of
binomial table). The n used was 1024, hence the big table was 256K
entries i.e. 1 Mbyte, which was about 0.05% of the memory (2G) in the
machine it was tested on. There would have been no visible difference
in the compression ratios for the the range of the tests, had only 16
bit mantissas been used by QI, thus the table could have been 1/2
Mbyte. But the research prototype code wasn't really optimized, it
simply kept table space for the maximum mantissa it may ever use and
there was no need to waste time coding separate 16-bit mantissa
routines to save a bit of table memory.
If one were aiming to merely match the AC on compression (and not
to be uniformly ahead), a much smaller table could have been used
such as n=256, which needs for the big table only 16K entries of 8 bits
each to produce an output excess of less than 1 bit per 256 bit input
block
(the QI worst case redundancy is below log(e)/2^(g-1) bits/symbol for
precision g). One can also trade off the table size for some execution
time
(cf. preprint, p. 9, [N2]), by skipping the rows of binomials, e.g. by
skipping every 2nd row in Pascal triangle, you need an extra add to
compute a missing binomial, hence on average it has 1/2 add per step
more work, but the big table size drops in half, e.g. to 8k
for the n=256 fixed block coder.
Similarly, the same tables of log(x!) used for the exponent
computations (see p.9 [N2]), are accurate enough to give you all but
the lowest 10-11 bits of the 32 bit mantissa for the quantized
binomials, although, again, this kind of saving (to cut down the big
table to 1/3rd) wasn't a great priority. For an embedded production
coder, yes, one would certainly take advantage of such savings. For a
desktop coder, e.g. in a video codec, it probably wouldn't matter much.
| |
| Thomas Richter 2005-11-21, 7:55 am |
| Hi,
> QI is a fundamental algoritm not a program (although several research
> variants of the algoritham are implemented). If your algorithm needs to
> do more work than about 3 adds, 1 shift and two 1-dim array reads, per
> less probably symbol, and any operation at all (other than traversing
> the array to examine what the symbols are) per more probable symbol,
> then it is inherently slower algorithm than QI. Any arithmetic coder
> falls in this category, since a general purpose AC needs to encode 0's
> and 1's explicitly. Check the simplicity of the generic EC/QI coding
> loop (from p. 8 in [23])
This is incorrect. Check the MQ and QM coder family. Typical
implementations have about the same complexity and are fine
implementations of arithmetic coding.
So long,
Thomas
| |
| Willem 2005-11-21, 7:55 am |
| nightlight wrote:
)>> two 1-dim array reads, per
)>
)> How large are these arrays ?
)
) In the code used for the test results in the preprint, the two arrays
) were n^2/4 of 32 bit entries (just mantissa was stored; exponent was
) computed cf. report page 9, [N2]) in the main quantized binomial table
) and n pointers in the row pointer array (pointing into the rows of
) binomial table). The n used was 1024, hence the big table was 256K
) entries i.e. 1 Mbyte, which was about 0.05% of the memory (2G) in the
) machine it was tested on. ...
Are you familiar with the concepts of CPU caches ?
A cache miss on most CPUs is easily as costly as a multiply.
On modern CPUs it may get near the cost of a division.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| nightlight 2005-11-21, 7:55 am |
| > Are you familiar with the concepts of CPU caches ?
> A cache miss on most CPUs is easily as costly as a multiply.
> On modern CPUs it may get near the cost of a division.
Of course, that's why n=1K was used for the tests. Using larger n made
the active parts of the table (which are roughly the sqrt(n) wide band
around the straight line k = p*n for given probability of 1, p) fall
out of the cache on some machines, reducing the speed ratios against
AC. The 1k seems to be just about the right size for most of todays
PC's laptops & desktops. Now, with the older CPUs, e.g. 300-400Mhz old
Celerons from 6-7 years ago, the cache is small, but the ratio of the
cache speed vs main RAM speed is small enough, so that the speed ratios
in the chart (QI vs AC) look even better on such machines by about
factor 1.5-2, even with larger n (e.g. 2 or 4k). With larger n, the
compression is usually better for very low entropy, and it also runs
faster since the inter-block housekeeping may get costly if the blocks
are too small (due to the mixed radix coding used to eliminate bit
fraction losses at the block boundaries; in practice one may ignore
this tiny compression improvement, but it does matter for accuracy
benchmarks).
The production version of QI targeted to some CPU range, could easily
optimize the tradeoff by using the right kind of row skipping &
mantissa interpolation, to reduce the table size if the cache gain
offsets the extra calculation for the interpolations. Other simple
improvement (which wasn't used in the tests, bit which does work well
in another prototype coder) is to use variable n for different sections
of the table. Normally, for fixed block coding, the table is triangular
(Pascal triangle). But one can get much more bang per K of tables, by
using V shaped table, which has much longer n range near the lattice
axes (addends fo sparse densities) and shorter n near the center of
symmetry of Pascal triangle (the high entropy limit). In the simple
fixed block coding used in the test, the accuracy was a large overkill
in the high entropy regions, which could have easily (with more
complicated code) been cut in 1/2 or 1/3rd without affecting the
compression ratios. You can check Lawrence & Tjalkens et al papers on
this kind of enumerative coding:
12. J. C. Lawrence "A new universal coding scheme for the binary
memoryless source" IEEE Trans. Inform. Theory IT-23 (4), 466-472,
1977
http://citeseer.ist.psu.edu/lawrence77new.html
13. T.J. Tjalkens, F.M.J. Willems "A universal variable-to-fixed
length source code based on Lawrence's algorithm" IEEE Trans.
Inform. Theory IT-38 (2), 247-253, 1992
http://citeseer.ist.psu.edu/585782.html
| |
| nightlight 2005-11-21, 6:55 pm |
| > This is incorrect. Check the MQ and QM coder family. Typical
> implementations have about the same complexity and are fine
> implementations of arithmetic coding.
It is correect if you restrict the statement to full precision
arithmetic coders, as I did, (thus the Moffat et al 1998 coder is the
best of full AC implementations). The quasi-arithmetic coders which
sacrifice compression for speed are dime a dozen. Moffat98 paper shows
some benchmarks against the best Q coder implementation they found and
it runs only 2-3 times faster than their binary coder. That is still up
to 100 times slower than QI, at the great compression cost in the low
entropy range of the inputs (it would do Ok only on medium to high
entropy inputs, just a point or two below the full AC).
But, even the quasi-AC still needs to either code each symbol, 0 and 1,
or they need an O(n^2) size table for each different probability (they
may just use a few of such tables, with very coarse quantization of
probabilities) -- you can also check Rissanen's discussion of these two
tradeoffs he already outlined in his 1976 AC paper. An AC coder may
also use a multiplication table of O(n^2) size, thus do no
multiplications, but it still needs to code each symbol 0 and 1. To
avoid coding each symbol, like QI, there is no way for AC around having
a different O(n^2) table for each different probability (however many
quantized p() levels the pick).
If you think someone can do it, code via AC only the less frequent
symbol and just skip over the most frequent symbol and still use the
single O(n^2) table for all probabilities, post a link to a paper
showing that and I will show you where the error in their proof was.
Basically, the AC enumerative addends which avoid the coding on the
more probable symbol are of the form A(n,k) = p^k q^(n-k) (where p=p(1)
and q=1-p = p(0) >= p(1), k=count of 1's, n=number of coded symbols so
far; you get these 1-step addends by expanding the regular AC scaling,
cf. [23] pp. 19-22). They have the probability p hardwired into the
addend. You can have a table of A(n,k) for AC, just as you have the
quantized binomial table C(n,k) for QI, and then each coder will do
exactly the same amount of work per coding step, but the entries in the
AC table work only for the given p & q used to compute that table,
while the entries in the single QI table work for all p & q. The AC is
simply an approximation (of QI), which in the approximation process
(cf. preprint footnote on p.2 ) has mangled the delicate combinatorics
of the binomial coefficients, entangling them with the p & q, and has
lost the addend universality.
When I post the reference implementation of QI on the coder page (in
the next 2-3 months), anyone is welcome to test it against different
kinds of quasi-arithmetic coders. I would be curious to see the
results.
| |
| nightlight 2005-11-21, 6:55 pm |
| >> This is incorrect. Check the MQ and QM coder family. Typical[color=darkred]
It is correect if you restrict the statement to full precision
arithmetic coders, as I did, (thus the Moffat et al 1998 coder is the
best of full AC implementations). The quasi-arithmetic coders which
sacrifice 5-10% of compression for speed are dime a dozen. Moffat98
paper shows some benchmarks against the best Q coder implementation
they found and it runs only 2-3 times faster than their binary coder.
That is still up to 100 times slower than QI, at the great compression
cost in the low entropy range of the inputs (it would do Ok only
on medium to high entropy inputs, just a point or two below the full
AC).
But, even the quasi-AC still needs to either code each symbol, 0 and 1,
or they need an O(n^2) size table for each different probability (they
may just use a few of such tables, with very coarse quantization of
probabilities) -- you can also check Rissanen's discussion of these two
tradeoffs he already outlined in his 1976 AC paper. An AC coder may
also use a multiplication table of O(n^2) size, thus do no
multiplications, but it still needs to code each symbol 0 and 1. To
avoid coding each symbol, like QI, there is no way for AC around having
a different O(n^2) table for each different probability (however many
quantized p() levels the pick).
If you think someone can do it, code via AC only the less frequent
symbol and just skip over the most frequent symbol (without
renormalization) and still use the single O(n^2) table for
all probabilities, post a link to a paper showing that.
Basically, the AC enumerative addends which avoid the coding on the
more probable symbol are of the form A(n,k) = p^k q^(n-k) (where p=p(1)
and q=1-p = p(0) >= p(1), k=count of 1's, n=number of coded symbols so
far; you get these 1-step addends by expanding the regular AC scaling,
cf. [23] pp. 19-22). They have the probability p hardwired into the
addend. You can have a table of A(n,k) for AC, just as you have the
quantized binomial table C(n,k) for QI, and then each coder will do
exactly the same amount of work per coding step, but the entries in the
AC table work only for the given p & q used to compute that table,
while the entries in the single QI table work for all p & q. The AC is
simply an approximation (of QI), which in the approximation process
(cf. preprint footnote on p.2 ) has mangled the delicate combinatorics
of the binomial coefficients, entangling them with the p & q, and has
lost the addend universality.
When I post the reference implementation of QI on the coder page (in
the next 2-3 months), anyone is welcome to test it against different
kinds of quasi-arithmetic coders. I would be curious to see the
results.
[this may a duplicate, since I already posted this reply, but it didn't
show up somehow.]
| |
| Willem 2005-11-21, 6:55 pm |
| nightlight wrote:
) ...
) tradeoffs he already outlined in his 1976 AC paper. An AC coder may
) also use a multiplication table of O(n^2) size, thus do no
) multiplications, ...
Are you aware of the fact that a multiplication table is
a very stupid idea on any recent CPU ?
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| nightlight 2005-11-21, 6:55 pm |
| > Are you aware of the fact that a multiplication table is
> a very stupid idea on any recent CPU ?
Depends what you call "any recent CPU". For battery powered, embedded
processors (and there are more of those than deskotops; and they're as
"recent"), the absence of multiplications and divisions in a coder is
quite an important element for stretching the battery life. What may
be a "stupid idea" for a desktop, may be a smart idea for a cell phone
or a camera. I don't wish to appear condenscending or anything, but
scientific papers are written on this very topic, you know, and there
are people who specialize in how to transform the existent algorithms
so they consume less power. Multiplications & divisions are often the
key targets for elimination in these transformations. Since your
conversational manners in this thread haven't been exactly exemplary, I
will skip the relevant links and let you enlighten yourself on that
subject on your own.
| |
| Willem 2005-11-21, 6:55 pm |
| nightlight wrote:
) <snip>
) ... Since your
) conversational manners in this thread haven't been exactly exemplary, I
) will skip the relevant links and let you enlighten yourself on that
) subject on your own.
I assume that answering simple questions with several paragraphs of
in-depth analysis of the QI coder, most of which is irrelevant to said
question, is exemplary conversational manner ? I do admit that politicians
do this all the time, but I personally would not take example from them.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| nightlight 2005-11-21, 6:55 pm |
| > How about you stating what you feel a typical binary zero order
> based arthmetic coder would code one thousand bytes of all zero
> followed by one thousand bytes of all ones.
That question (besides being trivial) was already discussed in this
thread... see the post:
http://groups.google.com/group/sci....656f2937e768fa3
| |
| David A. Scott 2005-11-21, 6:55 pm |
| "nightlight" <nightlight@omegapoint.com> wrote in
news:1132590148.601644.143950@g43g2000cwa.googlegroups.com:
>
> That question (besides being trivial) was already discussed in this
> thread... see the post:
>
> http://groups.google.com/group/sci....656f2937e768fa3
>
Yes the question was trival on purpose to see if you had code
to actually do something. I see by the anwser its not yet quite
there. if it ever gets working and debuged it would be interesting
to see how well it can compress certain files. If your lucky it
might to better compression for a few files but then again it
might not. The proof is in the actual results not in your paper.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
| |
| nightlight 2005-11-21, 6:55 pm |
| > Yes the question was trival on purpose to see if you had
> code to actually do something. I see by the anwser its
> not yet quite there.
Of course there is plenty of code which compresses and
expands correctly arbitrary data. Each of those test points
listed in the preprint has 500 runs over random inputs, with
decoding output compared to input. The last decoding glitch
(where decoder output didn't compare to input) was fixed nearly
a year ago.
The preprint should also be enough for anyone with a bit of math
knowledge to understand the constructive proof of decodability and
the worst case redundancy derivation. The preprint & the algorithm
have been already discussed via emails with number of highly
competent people in the coding field (including Rissanen, the
inventor of arithmetic coding), and the presentations, with the
demo coder with a DLL to keep and play with, were given to
several companies.
It is still a research project, too, in the most interesting areas,
such as the whole field of modeling the universal/descriptive way
(the practical forms of Kolmogorov/MDL approach to modeling).
There is also a video codec being develped, using the full power
of QI, not just the coding but the more powerful new modeling. At
the moment these two projects have higher priority than the
publicizing of the algorithm or releasing the reference source
(which are just sideway offshoots from the main projects, as
time allows).
> if it ever gets working and debuged it would be interesting
> to see how well it can compress certain files.
The reference implementation, which performs at least as well as
the test figures show for the (unoptimized research) prototype code,
will be released within the next 2-3 months on the QI page.
| |
| David A. Scott 2005-11-22, 3:55 am |
| "nightlight" <nightlight@omegapoint.com> wrote in
news:1132611985.522491.301580@z14g2000cwz.googlegroups.com:
>
> The reference implementation, which performs at least as well as
> the test figures show for the (unoptimized research) prototype code,
> will be released within the next 2-3 months on the QI page.
>
>
In your paper you talk about a test input string of
00101001 since you seemed to pick it for an example.
What would this string compress to. That is if its not
to hard solve. I am trying to see what you get.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
| |
| nightlight 2005-11-22, 3:55 am |
| > In your paper you talk about a test input string of
> 00101001 since you seemed to pick it for an example.
> What would this string compress to. That is if its not
> to hard solve. I am trying to see what you get.
The answer is on page 6, where it shows the string index
I(S8)=2+6+35=43 and how to calculate it (eq. 17). The actual size in
bits of the index is L=log(56)=5.807... bits since the valid values of
the index are 0..55 (there are 56 paths from A to B). The fractional
bits of L don't normally go to waste since they are coded in the mixed
radix with other items sent, usually with the fractional bits of
indices for other blocks (cf. p.9 [N4]). The encoder also sends the
count of 1's per block, which is 3 here and the length of the entire
string which is 8. The latter two items get coded in variety of ways in
different variants and different parts of the coder/s and experimental
prototypes (cf. p.9 [N6]).
| |
| David A. Scott 2005-11-22, 6:55 pm |
| "nightlight" <nightlight@omegapoint.com> wrote in
news:1132649853.206220.13050@g14g2000cwa.googlegroups.com:
>
> The answer is on page 6, where it shows the string index
> I(S8)=2+6+35=43 and how to calculate it (eq. 17). The actual size in
> bits of the index is L=log(56)=5.807... bits since the valid values of
> the index are 0..55 (there are 56 paths from A to B). The fractional
> bits of L don't normally go to waste since they are coded in the mixed
> radix with other items sent, usually with the fractional bits of
> indices for other blocks (cf. p.9 [N4]). The encoder also sends the
> count of 1's per block, which is 3 here and the length of the entire
> string which is 8. The latter two items get coded in variety of ways in
> different variants and different parts of the coder/s and experimental
> prototypes (cf. p.9 [N6]).
>
>
Again that is not a complete anwser!!!
I see the 43 so you are claiming that along with a 3 for the number of ones
and along with a length indicator 8 that that you could encode this
complete string. But given the task to write this out as an complete
compression how would you do it. Do you write 3 universal numbers such
as 011 for 3 and 0001000 for the 8 and 00000101011 for 43
this is already longer than the string you started with.
A good arhtimetic zero order string compress easly beats this.
Could you at least finish your example to show how wonderful
your method is or is the string you choose to hard to deal with?
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
| |
| Sachin Garg 2005-11-22, 6:55 pm |
| David A. Scott wrote:
> "nightlight" <nightlight@omegapoint.com> wrote in
> news:1132649853.206220.13050@g14g2000cwa.googlegroups.com:
>
>
> Again that is not a complete anwser!!!
> I see the 43 so you are claiming that along with a 3 for the number of ones
> and along with a length indicator 8 that that you could encode this
> complete string. But given the task to write this out as an complete
> compression how would you do it. Do you write 3 universal numbers such
> as 011 for 3 and 0001000 for the 8 and 00000101011 for 43
> this is already longer than the string you started with.
> A good arhtimetic zero order string compress easly beats this.
> Could you at least finish your example to show how wonderful
> your method is or is the string you choose to hard to deal with?
I feel the bijective winds flowing :-)
Are you not trying to twist this discussion towards how his scheme
should be bijective to save these few more bits?
Sachin Garg [India]
http://www.sachingarg.com
| |
| David A. Scott 2005-11-22, 6:55 pm |
| "Sachin Garg" <schngrg@gmail.com> wrote in
news:1132673451.504542.315480@f14g2000cwb.googlegroups.com:
> David A. Scott wrote:
>
> I feel the bijective winds flowing :-)
>
> Are you not trying to twist this discussion towards how his scheme
> should be bijective to save these few more bits?
Why would I do that he claims its already optimal ;-)
And he had experts look at it ;-)
>
>
> Sachin Garg [India]
> http://www.sachingarg.com
Hope things have been well with you I have been sick alot this
year hope the next is better.
>
>
The guy stated it was better than arithmetic I am trying to see
how it would compress even his own example. Getting it to 3
universal codes is not proof of anything. Besides
00101001 it should compress any string of 8 bits with 3 ones to
roughly the same size. Not sure it even does that. The point is
a properly coded arithmetic is hard to beat there are no gaps or
unsed codings. Not sure he wants to or even can completely code a
simple example that was based on his on choosing.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
| |
| nightlight 2005-11-22, 6:55 pm |
| > A good arhtimetic zero order string compress easly beats this.
Not really. The Moffat98 coder outputs for that string 13 bits (in
"frugal bits" mode, hardwirded binary coder, no contexts) which is the
best it can do. One can, of course, rig an arithmetic coder to produce
fewer bits for such very short inputs, but that will cost it on longer
inputs. (So you don't need to feel compelled to tell me about some
coder xyz which puts out 8 bits for S8 or some such.)
> Could you at least finish your example to show how wonderful
> your method is or is the string you choose to hard to deal with?
That has no more relation to "my method" than the variable names I used
in the code. Even the index 43 is in the domain of conventional
enumerative coding (which was known at least since 19th century), thus
it has nothing to do with QI. A variant of QI which uses Fenwick's PE
(Punctured Elias) codes to encode the total input length and tapered
Huffman codes for counts of 1's, outputs 1+3+6=10 bits. The first few
of PE codes
[ see: http://citeseer.ist.psu.edu/fenwick96punctured.html ]
are listed below with their mapping to the total count N of input bytes
(this is the same input granularity, 1 byte, as used by the Moffat98
coder):
--------------------------
N Code
-------------------------
1 = 0
2 = 101
3 = 1001
4 = 11011
5 = 10001 ... etc.
------------------------
In this case N=1, and the code for the input byte count is a single
bit: 0. The count of 1's is coded as tapered Huffman code (which
encodes number k from 0..8 in two sets, 0..6 in 3 bits and 7,8 in 4
bits), thus it uses 3 bits for k=3. Hence the total output is 1+3+6
bits. (There are, of course many trivial ways to encode this S8 in
fewer bits, down to a single bit code.) In any case, none of this time
and bandwidth squandering chicken feed is "my method".
| |
| Sachin Garg 2005-11-22, 6:55 pm |
| David A. Scott wrote:
> "Sachin Garg" <schngrg@gmail.com> wrote in
> news:1132673451.504542.315480@f14g2000cwb.googlegroups.com:
>
>
> Why would I do that he claims its already optimal ;-)
> And he had experts look at it ;-)
>
>
> Hope things have been well with you I have been sick alot this
> year hope the next is better.
Best Wishes.
I hope it wasnt anything serious.
> The guy stated it was better than arithmetic I am trying to see
> how it would compress even his own example. Getting it to 3
> universal codes is not proof of anything. Besides
> 00101001 it should compress any string of 8 bits with 3 ones to
> roughly the same size. Not sure it even does that. The point is
> a properly coded arithmetic is hard to beat there are no gaps or
> unsed codings. Not sure he wants to or even can completely code a
> simple example that was based on his on choosing.
As far as compression ratios are concerned, with enough word-size AC
produces *very* close to theoritical SUMi(log2(1/Pi)) bits, which is
the best expected. I havent yet gone into details of what differences
it causes to how model is managed or updated, there might be something
to loose or gain there.
But if this new scheme can speed things up as claimed, it will ofcourse
be good. We will know when we see code, discussions never really end
:-)
I have made a post on this at http://www.c10n.info/archives/251
There are some more answers there.
Sachin Garg [India]
http://www.sachingarg.com
| |
| nightlight 2005-11-22, 6:55 pm |
| > As far as compression ratios are concerned, with enough word-size
> AC produces *very* close to theoritical SUMi(log2(1/Pi)) bits,
The redundancy of AC is roughly: R = r/n + 1/2^r bits/symbol, see
[27] p. 51:
27. J. Rissanen "Arithmetic codings as number representations"
Acta Polyt. Scand., Math. & Comp. Sc. Vol. 31 pp 44-51, 1979
http://www.1stworks.com/ref/ariNumRepr.pdf
Thus increasing of coder precision r (in bits), decreases the 2nd term
but increases the 1st term (the overhead of termination). The optimum r
is around log(n), where n is number of symbols in the input, which
gives optimum redundancy of R=log(n)/n + 1/n.
The EC/QI redundancy is 1/2 log(n)/n +1/n (which is also the
theoretical lower bound, i.e. no coder can do better than that), thus
the QI output is about 1/2 log(n) bits smaller than AC output at the
same precision r. This is roughly the AC output excess which shows up
in our tests and which is mostly visibile in the top rows where the
total output is small (low entropy limit). The row "Vary" is an
entirely different effect -- the fragility of the predictive AC
approach when trying to predict what can't be predicted in order-0 vs.
the robustness of descriptive EC/QI approach (in the same order-0
modeling).
| |
| nightlight 2005-11-23, 3:55 am |
| > K log (K+L)/K + L log (K+L)/L = log ((K+L)? / (K? L?))
>
> bits, where I write N? to mean N^N.
>
> Arithmetic coding is asymptotically as efficient as enumerative coding,
> since O(N?) = O(N!), but enumerative coding will always use fewer bits,
> since N! < N? for interesting values of N.
Roughly correct, except that it is not as bad as the difference N^N vs
N! ~ (N/e)^N since factor e in N! cancels out in the binomial
coefficients. The distinction left is the Stirling approximation itself
and subsequent dropping of combined sqrt() factors and the O(1+1/n)
factor, both smaller than 1, yielding thus larger addend than exact
EC/QI addend (and the final path count), with excess of about 1/2
log(Pi N pq) bits (p and q a probabilities of 1 and 0). You can see
exactly how these approximations work out and how to get AC by
approximating EC/QI in [23] pp. 19-26. Note that these AC
approximations are _before_ any precision reductions. The precision
reduction itself also works tighter with QI (because it is done bottom
up instead of top down as in AC), yielding 1/2^(r-1) which is half of
the corresponding redundancy for AC 1/2^(r-2).
The small (when compared to input size as usually done, but possibly
large when compared to oputput size, which is the relevant comparison,
the one that matters) additional redundancy for AC is only one among
multiple adverse side-effects of the AC approximation (cf. preprint,
footnotes on p. 2. and [23]).
23. TR05-0625 "Quantized indexing: Background information"
http://www.1stworks.com/ref/TR/tr05-0625a.pdf
| |
| David A. Scott 2005-11-23, 3:55 am |
| "nightlight" <nightlight@omegapoint.com> wrote in
news:1132680066.207052.52230@f14g2000cwb.googlegroups.com:
>
> Not really. The Moffat98 coder outputs for that string 13 bits (in
> "frugal bits" mode, hardwirded binary coder, no contexts) which is the
> best it can do. One can, of course, rig an arithmetic coder to produce
> fewer bits for such very short inputs, but that will cost it on longer
> inputs. (So you don't need to feel compelled to tell me about some
> coder xyz which puts out 8 bits for S8 or some such.)
>
>
Well the Moffat98 coder which is most likely not optimised for use
in compressing strings may not be a good source for the estimate of
13 bits. So your wrong once again a good zero order string compressor
can easily beat this. Using one that is really more for crypto than
for speed one gets
00101001 mapping to 1100011100 which is 8 bits to 10 bits still
10 bits is significantly less than 13.
if you use a string of 8 with less ones such as
00101000 it maps to 110010001 which is 9 bits
if you you drop down to 1 one such as
00100000 is maps to 1100111 which get us finally to 7 bits
and if you change all the ones to zeros you get
00000000 to 1000 4 bits
This is not a rigged compressor it is one based on 64 bit registers.
I think you lack an understanding of how arithmetic codeing can really
work. But then I guess that was obvious in you first post when you said
"The new algorithm, even the exploratory unoptimized
prototype runs much faster (see the table below, copied
from the of paper, with speedup factors over 200) and
always compresses better than the best of arithmetic
coders"
It clearly does not always compress better than the
best. But I guess you have made up your mind and don't
want the facts to get in your way. I will look again if
you ever get a working model in the next few years.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
| |
| nightlight 2005-11-23, 3:55 am |
| > So your wrong once again a good zero order string compressor
> can easily beat this. Using one that is really more for crypto than
> for speed one gets 00101001 mapping to 1100011100 which is 8
> bits to 10 bits still 10 bits is significantly less than 13.
You're welcome to take Moffat98 code (use frugal bits and F=27 or 26
and make sure to extract their raw bit count of the output, not the
byte rounded figure) at:
http://www.cs.mu.oz.au/~alistair/arith_coder/
and report results for a bit wider sample of input sizes than the 1
instance of 1 byte input you seem to be focused on. Take the same sizes
& densities used in the test chart QI vs AC, use 500 random samples or
so per point, and report how well does your coder do then. Taking 1
value of 1 byte as your compression sample is, to be frank, ridiculous
as a way to compare the compression algorithms (or to waste readers'
time or my time on such vacuous discussion).
> But then I guess that was obvious in you first post when you said
> "The new algorithm, even the exploratory unoptimized
> prototype runs much faster (see the table below, copied
> from the of paper, with speedup factors over 200) and
> always compresses better than the best of arithmetic
> coders"
> It clearly does not always compress better than the
> best.
The AC is an approximation of EC/QI and it approximates it by uniformly
increasing the sizes of coding addends (see the two last messages above
& cited refs). Hence, when all else is set the same (such as arithmetic
precision & model/side info available to coders, encodings of data
sizes etc), i.e. when apples to apples are compared, QI will always
produce smaller or at worst equal output to AC (where the "equal" case
occurs alsmost exclusively when one rounds the output to the whole
number of bits, in which case both round up to the same number of whole
bits).
This is not a matter of tests or benchmarks but of matematical
properties of the two algorithms that are shown in the preprint (and in
the cited tech reports). The test charts were put in the preprint to
illustrate the point not to prove it. If you wish to disprove
something there, you'll need to read and grasp that bit of elementary
math (which my 7th grade daughter was able to understand; well, she did
get a clean 800 on the math SAT she took a year after that).
If you think your coder is better than Moffat98 AC, show it. Present
your chart with the wider (than the ridiculous 1 instance of 1 byte)
input sample, at least the sizes & densities I listed in my chart, and
make sure _all is counted_ the same way as in the Moffat98 i.e. no size
or counts information is passed to your coder without counting it
within your total output size -- all such 'side information' you may be
passing to your decoder has to be accounted for. The test loop simply
calls expand loop with no sizes (expanded or compressed) given, but
with just a pointer to the compressed data (hence the data sizes
encoding must be a part of the compressed package & counted in the
compressed size to compare to Moffat98 AC).
| |
| Ignorant 2005-11-23, 7:55 am |
|
Last heard, Intel is looking for terribly resosource wasting
ANDALSO hungry cpu
intensive algorithms for their next generation dies. They surely will
be interested in
this new algorithm . Get someone gainfully employed with INTEL to be
intersted in
this %^&* new method.
| |
| David A. Scott 2005-11-23, 6:55 pm |
| "nightlight" <nightlight@omegapoint.com> wrote in
news:1132730530.984837.250880@z14g2000cwz.googlegroups.com:
| | |