For Programmers: Free Programming Magazines  


Home > Archive > Compression > January 2006 > Re: Quantized Indexing Source Code (update & alg. history)









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: Quantized Indexing Source Code (update & alg. history)
nightlight

2006-01-11, 3:55 am

>-- Willem wrote:
>
> Well, yeah. It is the 'always compresses better' that
> he keeps harping on about that just simply isn't true.
> And in his discussions, he keeps on demonstrating
> this misunderstanding of his.
> http://groups.google.com/group/comp...f830d20dcd0ee50


There is no misunderstanding here. There are few frivolous ways in
which "always compresses better" cannot obviously be true and which are
not worth cluttering a discussion with by guarding against when talking
with presumably informed participants. Hence, a statement "always
compresses better" made in this newsgroup, where one can assume
informed readers, should naturally be understood to include implicitly
allowances such as "excluding these frivolus cases (a),(b),(c)... to
which the statement does not apply".

It appears that, having no argument against the substance of the claim,
you have fallen to clinging onto some of these frivolus ways of
circumventing "always". Let me review few of these.

Say, we have agreed to a data set of 256 input samples that we wish to
use to test the QI claims against. You can always "prove" the QI claims
false using any the following "compression techniques" or
"enhancements" of AC:

a) You can write a QI work-alike (or if need be, just a clone, since
the QI source is publicly available), which will create exactly the
same output size as QI, at least on the given test. Hence it trivially
cannot be true that QI "always compresses better" than any other
entropy coder.

b) You may insist that the sample inputs must be fed to the coders as
"files", so that the OS will keep track of their lengths. Then you can
"enhance" any existent AC (or generally any other compressor which
assumes that its output must be a self-terminating code) by providing
the AC decoder with the compressed length you obtained from the OS,
which allows you to save approximately log(H) bits on the AC decode
termination cost. If per chance the competitor doesn't follow the suit
and "enhance" similarly his coder, or if he is entirely unaware that
you are using this age old kind of "compressor helper", you've just
pocketed a log(H) bits edge.

c) You can compile the test set into your coder and decoder executables
and then "compress" any of the 256 test inputs by simply transmitting
to the decoder the 8 bit number telling it which of the samples to
retrieve from the built in list.

d) You can refine the technique (c), so that it is not as blatantly
obvious and which doesn't take as much room. Instead of including all
of the 256 test files into your coder, you just select one and make an
"enhancement" on top of regular AC, so that for that selected input the
"enhancement" outputs a single bit 0, which decoder uses to retrieve
the built in pre-selected sample, and for the remaining 255 samples, it
outputs 1 followed by the original output of the AC. You can further
refine this, so it is harder to detect, by not storing the entire input
sample, but just some section of it, and also by not going all the way
to a single bit, but maybe few or some such. The AC with this
"enhancement" will come out worse off by 1 bit on average, but at least
it will "prove" the QI claim wrong. What's 1 little bit, compared to
the great "victory".

e) But even the "technology" (d), however subtle with its refinements
compared to (c), still leaves some vulnerabilities. It still kind of
looks like cheating, and one can get caught since there as an 'intenet'
hiding in there. Plus it breaks down if the data set changes and you
don't get chance to recompile for the new set. The breaktrough of the
present method is to do a randomized version of (d), where you
"enhance" the AC by letting a random number generator help select which
of the input message patterns, which may be just small bit-string
sections bound to occur by chance in any nontrivial test set, will get
shorter codewords and by how much. As in (d), at some cost to the
average performance, this kind of "compression technology" can "prove"
the QI claim wrong, although it may take much longer time than (d) to
find an actual example "disproving" the claim. But, in return for the
longer wait (possibly astronomically long) compared to (d), this
"compression technology" doesn't require any test set to be given
upfront and it is "good" not just against QI but against any other
genuine advance in the future, since it always can "beat them" at least
on some inputs. And you virtually cannot get caught, since the
deterministic intent of (d) has become a random fluctuation beyond
human will or control.

>From your latest line of arguments on 'random quantization' it would

seem you have taken your final fallback position -- the "compression
technology" of class (e), which is the hardened version of technique
(d), which in turn was a subtler variant of method (c). The random
generator is simply the set of discarded fractional parts of the AC
'range', which in turn can always be used to select a small, random
fluctuation in the codeword lengths (at some cost to the average output
size), hence implement the "compression technology" (e). Well, you're
welcome to continue clinging onto that.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com