For Programmers: Free Programming Magazines  


Home > Archive > Compression > January 2006 > Re: Quantized Indexing Source Code (update & alg. history)









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: Quantized Indexing Source Code (update & alg. history)
nightlight

2006-01-10, 3:57 am

There was an small update to the source. The main addition (requested
in several emails) was an option:

#define _ASM 1 // Set to 0 to disable inline asm

in the Qitypes.h which allows disabling of the VC6 inline asm (used in
some macros). The speed of operations (mostly radix codes decoder)
drops by a few percent without the inline asm. There were also some
minor compiler warnings that some people emailed about which were
cleaned up in the latest code (I left the file name unchanged for the
old links to work):

http://www.1stworks.com/ref/C/QIC100.zip

There is also a thread about the algorithm discovery in the Computer
Chess Club ( http://www.talkchess.com/ <-- signup screen), where the
seed for it was planted way back in the last century. Here is the
excerpt for those interested in that aspect:

------ Computer Chess Club thread excerpt (few typos fixed) ----------

"Number of positions in chess -- The rest of the story"
http://www.talkchess.com/forums/1/message.html?476509

Posted by Ratko V. Tomic on January 03, 2006 at 08:12:12:

> Uri Blass: There is a better upper bound see:
>
> http://chessprogramming.org/cccsear...hp?art_id=77068
>
> Uri


Hi Uri, that little excercise in enumeration you brought up
(and mentioned in the other thread) set me off back then to
try make it work as a general purpose compression algorithm.
While a neat idea on paper, the problem was that the arithmetic
precision had to be of the size of output.

After struggling for a while, I searched the literature and
it turned out such compression algorithm already existed,
called "Enumerative Coding", since 1960s (first in Russian
literature, from Kolmogorov and his disciples, then shortly
thereafter here, in USA, from Lynch and Davisson). And, as
in my version, the precision problem was still unsolved after
over four decades of various attempts to make the algorithm
practical.

Since I arrived at it on my own, my conventions for
enumeration happened to be backwards from those that
existed in the literature (mine sorted right to left,
the so-called colex sorting of combinations, and built
up the enumerative addends bottom up, while the standard
scheme sorted lexicographically & worked recursively top
down, plus all my pictures were rotated 45 degrees from
theirs). Further, due to my playing with lattice methods
in QCD (in my physics graduate school days), I also had
my own visual representation of combinatorics as lattice
walks, which is a very intuitive, heuristically rewarding way
of looking at it, allowing one to see all of the combinatorial
identities at a glance (especially useful for tossing and
checking out algorithms in the head when going to sleep
or waking up, without a pencil and paper). The lattice
formulation turns out to have existed in the EC literature
as well (as the Schalkwijk's Pascal triangle walks), although
not in as general or elegant formalism as mine, lacking even
a notation for the lattice walks, key sets of paths, enumerative
classes, constraints... (stuff I worked out while doing physics).

Since that thread, I kept returning to the problem, on and
off, trying various ideas. Nothing worked. Then, in summer
2004, when my wife and kids went to a summer camp for
a w, I stayed home to work on a programming project
(a video codec). The first night home alone, at about 2AM,
while debugging the latest batch of code, out of nowhere
an idea popped into my head on that pesky enumeration
problem, something I didn't yet try. I quickly coded just a
toy version, allowing input buffers of 32 bits only, and by
dawn it worked -- a version using arithmetic precision of only
8 bits encoded & decoded correctly all 2^32 possible inputs.

That same early Sunday morning, it must have been around 6AM,
I called and woke up the company owner (I am a CTO & a chief
scientist), and he, still half awake, yielded to my enthusism and
agreed to suspend the original project, so I could try if the idea
works on data of any size.

At the time I didn't have a proof, and wasn't sure even at the
heuristic level, that it can always be decoded. I also didn't
know what the maximum or average redundancy would result
from the reduced precision. Within a w I had a simple
version of code working on buffers up to 4 kilobits, using only
16 bit arithmetic precision (instead of 4 kilobit precision).
It worked again, and it was very fast, even in that crude version.
The redundancy due to the limited precision arithmetic was
measured and it was on average about 0.05 bits (and always
below 0.07 bits) for the entire 4k block.

The next couple months I extended the algorithm to any input
size and to general alphabet (from the original binary alphabet).
I also found a proof of general decodability and an expression
for the max redundancy due to finite precision. The max
redundancy is always below log(e)/2^(g-1) for g bit arithmetic
precision (I use now g=32 bit precision). The fourty years old
puzzle was finally cracked. The accidental backwards conventions
of my initial approach turned out to be the critical element
exposing the key to the solution, which is virtually impossible
to spot from within the conventional enumerative coding scheme.

I also developed a new modeling scheme for combinatorial methods
of coding (such as the new algorithm) which is quite promising
on its own. It is basically a scheme along the lines of Kolmogorov's
algorithmic approach to information theory (in contrast to Shannon's
probabilistic approach, which is dominant at present, and where
modeling for arithmetic coding consists in calculating the
probabilities of the next single symbol).

The algorithm, which I named "Quantized Indexing", turned out
pretty amazing. It codes always tighter than the present best
entropy coding algorithm, the Arithmetic Coding (which is only
a particular approximation of QI), yet it codes much faster than
AC due to using only a simple table add (of a machine size word)
for the less frequent symbol and no coding operations for the
most frequent symbol (AC needs coding operations for both types
of symbols, and more expensive operations at that, such as mul, div).

As result, QI runs typically 10-20 times faster than the
fastest full Arithmetic Coder implementations (and always at
least 6 times faster, which occurs in high entropy limit,
while for very sparse data, the low entropy limit, QI runs
well over 200 times faster).

Recently, I posted a preprint about the algorithm to arXiv:

http://arxiv.org/abs/cs.IT/0511057

and also created a web page with additional more detailed
technical reports and the C source code:

http://www.1stworks.com/ref/qi.htm

A nice little surprise came when I emailed about the
preprint to Jorma Rissanen, the inventor of arithmetic coding
himself. He had struggled for several years with the very
same enumerative coding precision problem, inventing in the
process arithmetic coding as a compromise solution (in 1976,
while working for IBM). Although busy with a lecture tour,
he read the paper right away and was quite pleased to see
his old problem solved at last and a bit surprised at how
"very clever" the solution turned out to be.

So, that's the chain of unlikely events triggered by your
original code snippet for enumerating the chess positions.

David A. Scott

2006-01-10, 3:57 am

"nightlight" <nightlight@omegapoint.com> wrote in
news:1136364005.126391.156010@g43g2000cwa.googlegroups.com:

>
> The algorithm, which I named "Quantized Indexing", turned out
> pretty amazing. It codes always tighter than the present best
> entropy coding algorithm, the Arithmetic Coding (which is only
> a particular approximation of QI), yet it codes much faster than
> AC due to using only a simple table add (of a machine size word)
> for the less frequent symbol and no coding operations for the
> most frequent symbol (AC needs coding operations for both types
> of symbols, and more expensive operations at that, such as mul, div).
>
>



I know you claim to have this great code. But others such as Matt
the inventor of PAQ codes at least wrote a simple arithmetic coder
FPAQ0 to show how his methods would compete on files where zero order
arithmetic coding would come into play. Since your model is always
"tigther than the present best enropy coding algorithm" do you have
actually test code to compare with real files and test sets. Or is
the method not yet advanced enough to do real compression on files
yet.

Don't get me wrong Matt is not saying FAPQ0 is the best entropy
coder he knows it isn't that good. He is just showing how it works
for a certain model. The same sort of thing could be dome with your
method. It least that is if your method is comparable at all with
real world entropy coders. And since using the same model as most
one could calculate the true entropy of versus standard test models.

If one does this one can see Matts code does not produce on average
the shortest file. Maybe yours could do better if you ever actually
get real working code. But I find it hard to belive it could compress
as well as some current entropy encoders.

Where Matts code shines is his use of various models and a slick
way to combine them which makes his family of PAQ coders among the
best on the net. The point is if your code is half as good as you
claim then his simple 2 state enropy coder could be replaces by
you faster and tighter 2 state coders wich would bring you name fame.
But I won't hold my breath.


David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

David A. Scott

2006-01-10, 3:57 am

"nightlight" <nightlight@omegapoint.com> wrote in
news:1136379659.121605.50640@g44g2000cwa.googlegroups.com:

>
> You can download the source code which shows the coding aspects where
> QI is essentially different from the existent entropy coding
> algorithms. All that the source will show you is that QI is more...


I guess that means you don't yet have code where you can compare
it to even simple airhmtic file coders. I don't have faith in your
work from your earlier posts.

A simple No you can't actully test it in any real applications yet
would have been enough. Again from the earlier thread its not obvious
to me you have a full understanding of arithmetic coding methods.


>
> All that is, of course, measurable using the source code provided
> (which also includes very accurate timing functions). The above are not
> religious claims or invitation to believe or debate belief systems, but
> a simple statement of easily verifiable empirical facts. If you need to
> know also how it would do on "real" file, and can't extrapolate from
> how it does on memory buffers filled with arbitrary content, well, you
> are welcome to add such file related code and try it. Now, if you do
> add a file i/o which takes hundreds times longer than the coding, I can
> predict you won't see almost any speed difference.



Very funny.
Its very strange you make a big deal of claiming you compare it
against what you claim is a good entropy coder Moffat. Yet you don't
even test against it on a level playing ground. You think by modifying
Moffat that you are giving an honest test. However if you really wanted
an honest test since you are the new kid on the block. You would think
you could easily convert your code to work on files like Moffat's or are
you afraid to test it on the same playing field Moffat and others have
picked so various methods can be tested against yours.

I for one belive you shifted ground because you fear real aithemetic
coders and people could take existing software without modification and
show you directly that you coder does not lead to shorter compressed output
than already existing coders. I suspect this since most would have tested
the Moffat code on the same playground instead of moding it to one of your
choice where its not easier to compare against any other standard codings.

Look maybe you have something. If your method is any good at all
surely you could easily add the stuff you stripped out of Moffat's
to your code so that you can compare the compression results or is
there some reason you can't. If you can't than it would seem to be of
little use.

See:

http://groups.google.com/group/comp...hread/ffc7f7ca8
4c76378/792884848866dc4c?q=moffat&rnum=4#792884848866dc4c

From below if its true at all and if your code works at all you should
have the courage of your convictions to test it against Moffat and others
where they were designed to run. Doesn't yours work there whats the
problem?

YOUR QUOTE
"Note also that for the test we had stripped the Moffat et al. V3 binary
coder to its bare engine, replaced stream i/o with memory to memory
coding, no allocations were done within the timed loop, model decisions
were taken out (since only order 0 was tested) and all dynamical coding
options (alternatives that didn't matter) were hardwired as static to
avoid time waste in the test. So the arithmetic coder tested was
already quite a bit faster than the out-of-a-box Moffat et al code. We
wanted it to take its best shot, the best it possibly could (since QI
didn't need any such implementation/code level edge, being so much more
efficient at the fundamental, algorithmic level)."

If you wanted to take the best shot Again let me stated that for the
dense. If you wanted to take the best shot. Then test it like others on
files where Moffats was made to work. If you don't one can only wonder
what you are hiding. Especailly since you clain this is so much better
than arithmetic.

I thought about downloading as you suggested and converting it to files
so it really can honestly be compared to Moffat. But I realize from your
post that you would claim I did a poor job of converting so I will not do
it. After all your the one making the cliam its better than Moffat. Your
the one saying you compared it to Moffat. Yet you really only compared it
to a modifed verision of Moffat that you yourself modified.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

David A. Scott

2006-01-10, 3:57 am



NIGHTLIGHT what is wrong with you?

You claim a coder better than the best arithmetic, Yet
your only proof is your word by using Moffat code you
modifed yourself.

Look people here are willing to help those trying to
learn. Yet you never seem to anwser real questions.

Such as why you went to the trouble to modify Moffats
code yourself and then proclaim your method is better
than the best present day arithmetic encoders.

Please stop with the ridiculus post of refereces that
have nothing to do with this unless your trying to pull
the wool over peoples eyes.

Since we don't work for you and don't have to kiss
your ass for a job. If you really want to show its better
and can compress to smaller sizes than Moffats. Then do
an honest test on unmodifed code.

Why is it you can change his code and put him down
with code that is not even completely his?
Yet you seem unable to change your code to work with
the data his is designed for? Surely you have the
ability with your team of people to do that. Is it
that your afraid other current arithmetic coders
can already compress most files better?

Willem

2006-01-10, 3:57 am

nightlight wrote:
) That's not the "only" proof. Analysis of the algorithms,
) as sketched below (and covered in more detail in the the
) preprint & the tech reports) will give you the same answer,
) showing it even more clearly.

This analysis is no different from any other analysis in that
you have to make lots of assumptions. This means that if you use
such an analysis to make real-world predictions, then that depends
on how well your assumptions match the real world.



Because of the massive nature of your posts, I haven't been able to
find an answer to a few questions I have about the QI coder:

- if I read you correctly, it is not an adaptive coder,
so how do you transmit the model information for the QI coder ?

- how would you use a QI coder with an adaptive model ?

- assuming I have a stream of symbols, where at each position in
the stream, the probability distribution of the symbols is different,
then how does QI coder adapt itself to all those different distributions ?


(I have a few more questions, but stick to these for now.)


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
David A. Scott

2006-01-10, 3:57 am

"nightlight" <nightlight@omegapoint.com> wrote in
news:1136572988.625145.191290@z14g2000cwz.googlegroups.com:

>
> Of course, both aspects, the speed and the coding accuracy
> advantage of QI, are self-evident to anyone who understands the
> mathematical description of the two coding algorithms. QI is
> always more accurate since AC is its direct approximation
>


I wish we had someone here that was an expert on both
algorithms since from your previous posts it sure indicates
to me that you are no expert on arithmetic coding. I
admit I know very little about your QI however I now enough
about arithmetic coding to realize that proper optimal bijective
file coding is one of the best methods and it is an optimal
method something you don't seem to understand from your various
posts. Even if QI could be used to make some sort of optimal
file compressor it could never compress all files as well
as an optimal arithmetic. Since you can't grasp that simple
fact you can't be an expert in arithmetic so I doubt what you
think is self evident has any relationship to reality.


You seem to think your stuff is better than arithmetic when
used as an entropy encoder. Yet it appears this so called neat
method of yours has yet to be even tested in any simple code
where one is using an entropy coder. Why is that?

Even Matt did FPAQ0 can't you or your team do something
similar with QI or is it to complex of a task?


I know you could care less what I think. But many people
here would like to see real results. We get plenty of people
that can quote and talk about how good there stuff is and people
here realize talk is cheap. They want to see REAL RESULTS can
you do that or do you care what the average person here thinks
of your method.

Like mentioned above it appears you can modify Moffat which
is not the best but you picked what you wanted to pick and then
called it the best. I would continue asking new questions but
for some reason you seem not to anwser even the most simple
of questions.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

David A. Scott

2006-01-10, 3:57 am

"Sachin Garg" <schngrg@gmail.com> wrote in
news:1136588076.767621.250110@g14g2000cwa.googlegroups.com:

>
> Willem wrote:
>
> I agree that someone having enough time for reading his massive posts
> could rather read the paper and find answers himself.
>


If the paper is that small pdf file at his site it does not anwser
the simple questions being asked. So I don't see this as an anwser.
And if its some stuff where one would have to sign some nondiscloser
agreement I think that to would be a waste to time. Since the questions
asked of him are not that hard.

> I will try to answer some questions on the basis of my understanding
> of what he has posted here (I have not read his papers yet).
>
>
> He says that QI is based on enumerative coding, where models are
> conceptually different from PPM models which we are more familiar
> with.
>
>
> So it will probably mean that we will need to study EC from ground up
> and then see how QI fits in (atleast I will have to as I am not
> familiar with EC) and how it relates to PPM models.
>
> Or if someone can summarize the difference here for the people like
> me, please do so.
>


He seems to belive arithemtic compression is a poor approximation
of EC compression. Here is summary take two symbol one and zero
suppose you have 2 ones and 2 zeros then there are 4!/(2!*2!) or 6
combinations. his method would assign the 6 possible numbers to this
problem so you could say its exact. At least if numbers small. But
the paper really doesn't say how to do compression. You read various
other papers that show how to assign a number to a combination.

All he seem to do is get a number for a combination.
He has in his paper the example of 00101001 in which he calculates
the value as 2+6+35 = 43 this was done in long post message
59 at

http://groups.google.com/group/comp...hread/ffc7f7ca8
4c76378/30566aec6aa7d363?q=nightlight&rnum=1#30566aec6aa7d363

Big deal thats not compression. He state there

"The answer is on page 6, where it shows the string index
I(S8)=2+6+35=43 and how to calculate it (eq. 17). The actual size in
bits of the index is L=log(56)=5.807... bits since the valid values of
the index are 0..55 (there are 56 paths from A to B). The fractional
bits of L don't normally go to waste since they are coded in the mixed
radix with other items sent, usually with the fractional bits of
indices for other blocks (cf. p.9 [N4]). The encoder also sends the
count of 1's per block, which is 3 here and the length of the entire
string which is 8. The latter two items get coded in variety of ways in
different variants and different parts of the coder/s and experimental
prototypes (cf. p.9 [N6])."

This means he never compares it to an arithmetic compressor. He only
makes claims thats it better. I think if he could actually compress the
one example he gives in paper then he might be more beliveable. But he
wants to say its exact. But he can't be pinned down on any application
that compares it to an arithmetic. So not having any simple examples
done in a complete way says him being shown arithmetic ways that beat
his method. He leaves it to you to do the coding. At which point he
could claim you didn't do it the right way. Actually its a clever way
to prevent it from being compared to any real world entropy compressor.

So the only thing I got out of the paper besides the fact he never
completes anything is to say for a string made of only ones and zeros
the compressed encoded result is 3 items that are easy to combine
just don't ask him how to combine them since he seems not likely to do so.
1) the length of entire string
2) the number of ones
3) the index value.

He doesn't risk actually combining these in an output string
its possible he does not want to risk being laughed at or he
fears one could show how a simple bijective string compressor gets
better results. If you can think of any other reason no examples
like this are done to the end please Schin tell us what you think
it is.




>
> He said that QI is the "natural" choice for BWT post-processing. This
> probably means that QI itself cant be used for higher order adaptive
> coding but by using BWT, the higher-order-adaptive-modelling problem
> can be reduced into something which QI can handle.
>
>
> I dont know the answer to this one.
>


Maybe it depends on the defination of "natural"

> I hope this helps (my apologies to nightlight if I am mistaken
> somewhere, feel free to correct me).
>


I suspect he will paste in lots of links to various papers that
seems to be his style. So don't hold your breath for useful anwsers.
They may ot may not be related to your questions.

> As for nightlights's comments on QI's speed, I am afraid that as the
> modelling scheme for QI is different from modelling scheme for
> ArithCoding, we will need to compare speed of "QI+its modelling code"
> with "AC+its modelling code". Where both models should be of same
> order, or chosen to give same compression ratio. (My doubt here is
> "what if QI just *shifts* computation burden to modelling code instead
> of reducing it".)
>
> Sachin Garg [India]
> http://www.sachingarg.com
>




David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

David A. Scott

2006-01-10, 3:57 am

"nightlight" <nightlight@omegapoint.com> wrote in
news:1136572988.625145.191290@z14g2000cwz.googlegroups.com:


.....

> infinite precision, all intervals would fit exactly next to each
> other, without gaps. Since allowing intervals to overlap would
> result in non-decodable output (by Kraft inequality), any loss in
> precision for specifying interval boundaries must leave unused
> gaps in the output code space.
>
> The basic arithmetic difference in coding is the extra loss of
> precision for AC. A rough analogy would be as if two of us are
> balancing some expenses and I use exact integer number of cents
> from the receipts, while you take integers from the receipts
> and divide them by some large total, then, since you will
> generally get an infinite decimal fraction, you terminate it to
> some number of places. Hence you're making an error even before
> the first add, while my integer scheme won't have any error at
> all (until the sum reaches certain magnitude).
>
> The QI.exe included with the source has a command "cbr" which
> lists all such code space gaps for any n-symbol input, as well as
> the cumulative redundancy in bits resulting from the gaps.
>


.....


You don't seem to grasp the obvious. There are arithmetic coders
that have zero gaps. You don't know what you are talking about.
One such coder is arb255.exe you seem to repeat useless stuff over
and over with out actually thinking. You can't anwser simple
questions or provide simple anwsers. Who are you kidding?



David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

Willem

2006-01-10, 3:57 am

nightlight wrote:
) This truncation of the infinite fractions even for a small number
) of symbols (which is absent in QI's integer format of addends),
) is a loss of precision which leads to AC losing parts of its
) coding interval in each step. If one were to use fractions of
) infinite precision, all intervals would fit exactly next to each
) other, without gaps. Since allowing intervals to overlap would
) result in non-decodable output (by Kraft inequality), any loss in
) precision for specifying interval boundaries must leave unused
) gaps in the output code space.

This paragraph clearly demonstrates that you do not understand well
enough how Arith Encoding works. Any decent AC does *not* lose parts
of its coding interval each step. Try to get that through your head.
It *is* possible (hell, it's quite easy) to get the intervals to line
up exactly without infinite precision.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
David A. Scott

2006-01-10, 3:57 am

Willem <willem@stack.nl> wrote in
news:slrndrvkk7.188o.willem@toad.stack.nl:

> nightlight wrote:
> ) This truncation of the infinite fractions even for a small number
> ) of symbols (which is absent in QI's integer format of addends),
> ) is a loss of precision which leads to AC losing parts of its
> ) coding interval in each step. If one were to use fractions of
> ) infinite precision, all intervals would fit exactly next to each
> ) other, without gaps. Since allowing intervals to overlap would
> ) result in non-decodable output (by Kraft inequality), any loss in
> ) precision for specifying interval boundaries must leave unused
> ) gaps in the output code space.
>
> This paragraph clearly demonstrates that you do not understand well
> enough how Arith Encoding works. Any decent AC does *not* lose parts
> of its coding interval each step. Try to get that through your head.
> It *is* possible (hell, it's quite easy) to get the intervals to line
> up exactly without infinite precision.
>
>
> SaSW, Willem


It his inability to grasp this simple well understood fact that
makes one wonder if he understands the subject field at all. Of course
you can expect his usually reply with lines and lines of quoted text
having nothing to do with this fact which he either will not or can
not seem to grasp. He reminds me of the current crop of students who
are never corrected when thay make common errors so they just go on
making ever larger errors while never having learned how to learn
from mistakes.


David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

nightlight

2006-01-10, 3:57 am

> This analysis is no different from any other analysis
> in that you have to make lots of assumptions. This
> means that if you use such an analysis to make
> real-world predictions, then that depends
> on how well your assumptions match the real world.


I see now where you're misreading my redundancy
statements in this and in the earlier thread. The
redundancy I was deriving in [T3] at end of page 8,
and talking about here, is simply the number of
excess bits that the finite precision arithmetic
used by QI will add to the unlimited precision
enumerative code. The paragraph on page 8 after all,
starts with "To obtain QI redundancy d(g) _due to
SW rounding_ in (21)..." The upper bound for
_this_ redundancy d(g) obtained there is:

d(g) < log(e)/2^(g-1) .... (1)

No model or source assumptions are needed for (1), and
none is reflected in (1). The only parameter in (1) is
the assumed arithmetic precision of g bits (the SW
mantissa length). The eq. (1) is simply an upper bound
on any additional bits relative to the exact EC, produced
by QI due to its use of SW rounding up in eq. (21). It has
nothing to do with how well EC or QI will code in any
given setup (model, source, parameter coding method, etc).
If the exact EC model is perfect in a given setup, than
(1) shows what is the maximum that QI can fall short of
that "perfect" output. If EC is coding using an imperfect
model, resulting in some redundancy of R bits per symbol
relative to the best model, (1) shows the maximum that
QI can add to R. But (1) doesn't care or predict what
R is. The two types of redundancy are completely unrelated.

The question of how well the exact EC and AC can code in
different coding setups, is an entirely different and a
much larger topic, well covered in the literature, starting
with Davisson's [28] and many other papers since.

Krichevsky-Trofimov's paper [33] provides great many
bounds for variety of coding setups. Some related later
results are in [40],[41],[41a]-[41d]. The basic result
is that even for the unlimited precision coders (and
coding under 'everything else set the same') the exact
EC has a slightly lower redundancy than the exact AC (by
approximately 1 bit for the entire input, for max & avg).
This is the same difference as between the Huffman and
the Shannon-Fano prefix codes. Even the origin of the
difference is the same: the bottom-up addend construction,
as done by QI and Huffman, is tighter than the top down
addend construction, as done by AC & Shannon-Fano.

Now to your main questions, starting with the last one,
which is the most specific:

> assuming I have a stream of symbols, where at each
> position in the stream, the probability distribution
> of the symbols is different, then how does QI coder
> adapt itself to all those different distributions ?


This is the scenario of AC modeling engine feeding QI,
which was sketched in note N7 on p. 9, [T3]. Two ways
are described there:

a) --- QI coding AC style

QI can code the AC style by performing "Lattice
Jumps". It is simplest to see how this is done by looking
at the decoder, arriving at some point B=(x,y) (see
Fig. 1, p. 4, [T3]). The path count at B is N(x,y)=56.
The index at point B can have values 0..N(x,y)-1, hence
the length of the index at B is log(N(x,y))=log(56) bits.
If bit=1 gets decoded (as shown by path on Fig 1), the
decoder moves up, to point BA=(x,y-1), which has the
path count 21, hence the index has length log(21) bits.
Hence, upon decoding bit=1 at B, the index length has
dropped by log(56)-log(21)=log(8/3) bits, which is
precisely the ideal code length log(1/p) for bit=1
at B, where p=3/8=probability of 1 at B. If bit=0
gets decoded at B, decoder moves to point BL=(x-1,y)
where path count is N(x-1,y)=35, hence the index
length is log(35). In this case the index length
drops by log(56)-log(35)=log(8/5) which is exactly
same as the ideal code length log(1/q), where q=5/8
is probability of 0 at B. It is easy to see from
multiplicative recurrences for binomial coefficients
(eq's (1) & (2) from the previous post here) that
this pattern always holds - after every decode step,
the index length drops by exactly log(1/P), where P
is the probability of the decoded symbol. Analogous
relation holds for each encode step, where the index
length increases by the ideal code length of the
encoded symbol at that point. Note also that due to
integer arithmetic, this is not an approximate
optimality (such as one would get using truncated
infinite fractions, as AC does). With QI/EC, this
coding optimality at every point is built into
the table entries. { You can check the quantization
errors using e.g. QI.exe cbr n36, which shows no
quantization errors for n=36 (or below), and with
n=37, the 1st error for k=16, of just +1 in the
SW mantissa which adds 4e-10 bits to the index.}

With QI, for a general point B=(x,y), the quantized
path count L(x,y) (computed via (21)) is an SW integer
with a g-bit mantissa w(x,y) and exponent e(x,y).
The ideal code lengths and ratios for the steps
from B described above still hold, but only within
the limits d(g). In particular, L(x,y-1)/L(x,y) is
approx. =p=y/(x+y) and L(x-1,y)/L(x,y)=q=x/(x+y).

The index at B will have for the leading g bits
at the bit offset e(x,y) some g-bit integer Iw(x,y)
which is in the interval [0,w(x,y)-1] (this is a
simple consequence of the index at any point
ranging from 0 to path count-1 and the fact that
quantized path count L(x,y) has trailing zeros
after the leading g bits given by w(x,y), hence
L(x,y)-1 will decrement w(x,y)). We can thus view
for any point B(x,y) and index I(x,y), the Iw(x,y)
as a digit in the radix w(x,y).

Suppose now, decoder at B=(x,y) gets from the modeler
some probabilities p' and q' different from p,q. To
continue decoding, decoder makes a jump to another
lattice point B'=(x',y') where x'/(x'+y')=p' and
y'/(x'+y')=q'. One can use Farey fractions (see
[F]) to obtain the optimum such point for any
given binomial table size. Alternatively, one
can simply jump to another point on the same
front i.e. one would keep n fixed, x+y=n=x'+y'
and select point B' using x'=n*p'. The path
count at B' is L(x',y') with mantissa w(x',y')
and exponent e(x',y'), which are different from
w(x,y) and e(x,y). The exponent is easy to adjust:
you simply change the presumed position of the
least significant bit of the index I(x',y') (this
is the origin, A on Fig 1., toward which decoder
is heading, but hasn't reached yet since there
are more symbols to decode; in the QI source
code in file EncDec.c this presumed origin of the
index is given as argument "sb" to function qiDec()).

The main work is with the difference in the path
count mantissas w(x,y) and w(x',y') at B and B'.
Namely at B' the leading g bits of index Iw(x',y')
have to be a digit in the radix w'=w(x',y'). But we
only have a g-bit digit left over from B which is
in the radix w=w(x,y). So, the problem here is
that of radix conversion -- we have a digit Iw
in radix w and we need a digit Iw' in radix w'.
There are several ways to do this. A conceptually
simple one is as follows: decoder extracts the
digit Iw and encodes it as digit of some mixed radix
output integer M, which serves as an accumulator or
recycler for all such 'orphaned' Iw digits. The bits
of M (which are an arbitrary bit pattern, being a
binary form of a mixed radix integer) can simply be
reused, e.g. by growing M at the unprocessed end
of the compressed input (or just having M as separate
component). At this stage the encoder would have
done the opposite - it would have "decoded" (see
file Radix.c, function dec_radix()) the far end
of the compressed data (which was an arbitrary
binary pattern) into a digit Iw in radix w and
concatenated it to the leading end of the index.
There are other similar ways to perform this
radix conversion, all of them using amount of
processing per symbol very similar to the
conventional AC algorithm. They all also have
to perform explicit coding/decoding operations
(which include mul/div) for both, the most and
the least probable symbols, just as AC does.

The summary of this is that if you want the AC
modeling plus AC coding style, you get the AC
speed and the AC accuracy. The AC scheme, with its
'single next symbol probability' bottleneck interface
between the modeler & the coder (where the modeler
micro-manages the coder, symbol by symbol, and where
the whole coder+ modeler processing and interaction
is traversed from top to bottom on every symbol)
is simply intrinsically a poor division of labor
to allow for any high performance coding.

It is analogous to organizing car manufacturing,
and requiring that the next car can be started
only after the current car is complete and out
the door. That's a kind of conceptual constraint
imposed by the AC modeling "paradigm" as its
so-called "online" coding requirement. This
online" is taken to mean some kind of analog,
memoryless, CPU-less, Morse telegraph device.

That has nothing to do with the actual online
as it is done, or any actual requirements or
inherent design constraints. One normally has
a fairly large buffer space and processor which
can access any of it, running programs of high
complexity. Internet would grind to a halt if
its protocols interpreted "online" as a constraint
to have to send or receive a single bit (or a
single symbol) at a time. Even the old style
point-to-point modems had several KB buffers
to accumulate, batch and compress the data.
And similarly for disk sectors & clusters.

The point of the above critique of the present
AC "paradigm" is that it is simply a gratuitous,
historically accidental conceptual bottleneck and
an intrinsic performance drain. Any algorithm that
follows its prescribed division of labor will bog
down. Once you snap out of its "online" spell,
many better possibilities open up. For example,
even retaining the AC modeling engine, with its
"probability of the next single symbol" bottleneck
parametrization of all the information about the
sequence being encoded, but just allowing coder to
ignore the imagined "online" constraint, one can
get much better performance with QI as follows:

b) --- QI Modeling AC style

QI breaks the probabilities into classes, so that
each class includes an interval of probabilities of
size 1/sqrt(n), where n is the size of data to be
encoded. Since coder doesn't assume any more the Morse
telegraph kind of "online", it doesn't assume n is 1
but some much larger number. The modeler is still left
to work under the old "online" spell and imagine that
it has to convert, by hook or crook, all it knows
or that it could know about the input sequence into
the probabilities for the next single symbol p(c).

Consider now a binary sequence of n symbols, for which
the modeler produces, symbol by symbol probabilities
p of bit=1, with p in some interval D=[a,b), of size
d=b-a. We divide D into s=sqrt(n) equal subintervals
of lengths d/s. Each input symbol is assigned to
one of s enumerative classes (thus enumerated by a
separate index) based on the subinterval in which
the modeler's p at that point falls in. Hence, we're
using quantized probabilities to classify the symbols
as "equiprobable". The excess output E in bits per
symbol due to this 'p quantization' is about
(cf. [41c], p. 8):

E = [dp^2/p+dq^2/q)]*log(e)/2 .... (2)

where dp & dq are quantization errors of p and q. Since
dp & dq are =< 1/s, then E =< log(e)/2npq = O(1/n). Note
that since adaptive AC probability estimates have also
sampling error dp, dq of order 1/sqrt(n), this redundancy
is of the similar size as that of the adaptive AC. One can
further optimize this method (to reduce its worst case)
by selecting non-uniform partition of interval D, so that
the subintervals around smaller probabilities are shorter.

In practical situations, the AC modeler would be producing
its predictions p(c) based on statistics from the processed
part of the input, hence its probabilities would already
have a built in sampling error interval (which decreases
as 1/sqrt(n)), which can be used by the QI coder as the
partition criteria for the enumerative classes (instead of
an ad hoc partition described above). Various existent
methods for growing and splitting the contexts based on
such past statistics, such as CTW or Rissanen's Context,
would transfer here as methods for generating enumerative
classes adaptively.

For the multi-alphabet case one would perform the decomposition
described [T1] pp. 31-38 with the only difference that instead
of combining the symbol counts k(c) based on symbol's binary
code c, one would combine the modeler's probabilities p(c).

A special case of interest for this method are the finite
order Markov sources. Here, for order m, the probabilities
of the next symbol are defined by the m previous symbols.
For smaller m, one could simply bypass the computation
of probabilities (since QI doesn't need them) and simply
assign the enumerative class for the next input symbol
directly: using m previous symbols as the class tag
(hence there would be 2^m classes in binary case).

In this case we can notice another advantage of QI/EC
coder for modeling over the AC: to encode a symbol QI
needs to know only whether the symbol has the same or
different probabilities as some other symbols, but unlike
AC, QI doesn't also need to know what values these
probabilities have. Hence, QI places much lower demand
on the modeling engine, since the modeler here can simply
pass on to QI the context ID (the last m symbols) and
QI will code the symbol into the index for that ID,
whatever its probability may be.

In conclusion for this method (b), QI can use AC modeling
engine, with its full speed advantage over AC, and with
the redundancy being same as that of an adaptive AC in
the same setup.

> how would you use a QI coder with an adaptive model ?


Sections (a) and (b) above have couple answers.

> if I read you correctly, it is not an adaptive coder,
> so how do you transmit the model information for the
> QI coder ?


It is not "adaptive AC", but as illustrated in (a),(b)
above, it can function that way. The native QI modeling
is descriptive, in the sense of Rissanen's MDL. So the
QI model information is a much wider type of information
than just a list of probabilities (although it can be
that, too).

Consider an order-0 QI coding. The modeling analogous to
the order-0 adaptive AC, except more resilient against
the "surprises", becomes here the selection of the
segmentation of the input into contiguous sections,
based on measured symbol frequencies. Its resilience
to surprises is illustrated in the row "Vary" (cf. table
on page 10 in [T3]). The adaptive order-0 modeler
for QI has entire input sequence available and it
does not have to rely on a possibly false assumption
that the symbol frequencies in the initial parts of
the sequence are predictive of the frequencies in the
later parts, or gamble on which way might they
be predictive. While AC can code this way, too,
all it would achieve with that would be to advance
into a literal role of a less accurate and a lot
slower imitation of QI.

QI order-0 adaptive modeler identifies contiguous
quasi-stationary sections of the input sequence and
uses them as enumerative classes. There are many ways
to do such segmentation and even more ways to encode
it, along with the corresponding section counts, into
the output. Some of these methods, especially the
encoding aspect, were developed already for conventional
EC, such as described in [11]-[15], [23]. I have also
developed several for QI (some of which were touched
upon in [T2], where the general QI/EC modeling pattern
is presented, pp. 26-35).

Due to a lack practical EC coder and the exaggerated
dominance of the AC modeling paradigm (hypertrophied
to the point of pathology by the absence of practical
competition), this entire field of EC modeling is
highly under-explored. With the precision & performance
problems finally solved by QI, an algorithmic gold mine
has opened, where just about anything you do, and there
is more to do than an eye can see, is a new algorithm,
maybe a great new discovery to be taught to kids ever after.



-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1-T3 are on http://www.1stworks.com/ref/qi.htm

28. L.D. Davisson "Universal noiseless coding"
IEEE Trans. Inform. Theory IT-19 (6), 783-795, 1973
http://cg.ensmp.fr/~vert/proj/bibli...73Universal.pdf

33. R. Krichevsky, V. Trofimov
"The performance of universal encoding"
IEEE Trans. Inform. Theory IT-27 (2), 199-207, 1981
http://cg.ensmp.fr/~vert/proj/bibli...performance.pdf

41. M.Drmota, H-K. Hwang, W. Szpankowski
"Precise Average Redundancy of an Idealized
Arithmetic Coding" DCC 2002, 222-231.
http://citeseer.ist.psu.edu/drmota02precise.html

34. J.G. Cleary, I.H. Witten
"A Comparison of Enumerative and Adaptive Codes"
IEEE Trans. Inform. Theory IT-30 (2), 306-315, 1984
http://www.1stworks.com/ref/Cleary84Enum.pdf

F. Farey Fractions:
http://www.cut-the-knot.org/ctk/PickToFarey.shtml

41c. P.G. Howard, J.S. Vitter
"Practical Implementations of Arithmetic Coding"
Tech. Rep. No. 92-18, CS, Brown University, 1992
http://www.1stworks.com/ref/Howard92PractAri.pdf
Willem

2006-01-10, 3:57 am

nightlight wrote:
) > if I read you correctly, it is not an adaptive coder,
) > so how do you transmit the model information for the
) > QI coder ?
)
) It is not "adaptive AC", but as illustrated in (a),(b)
) above, it can function that way. The native QI modeling
) is descriptive, in the sense of Rissanen's MDL. So the
) QI model information is a much wider type of information
) than just a list of probabilities (although it can be
) that, too).
)
) <snip theoretical discussion>

I have read through your entire description, and I haven't found even
a single hint to the practical question of what exactly the QI coder
needs as modeling information to do its job.

Can I assume that 'enumerative coder' means that you need the exact
symbol counts for each of the subclasses ? And if not, then what do
you need ? Please try to explain this as simply as possible, within
a single paragraph.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
David A. Scott

2006-01-10, 3:57 am

nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote in
news:SY6dnUrMLIWIZyLenZ2dnUVZ_s6dnZ2d@rc
n.net:

>
> I see now where you're misreading my redundancy
> statements in this and in the earlier thread.


Actually I think he is reading them correctly he
has an understanding of arithmetic coding its not clear
you do.


....


> Now to your main questions, starting with the last one,
> which is the most specific:
>
>
> This is the scenario of AC modeling engine feeding QI,
> which was sketched in note N7 on p. 9, [T3]. Two ways
> are described there:
>
> a) --- QI coding AC style


.....

>
> The summary of this is that if you want the AC
> modeling plus AC coding style, you get the AC
> speed and the AC accuracy. The AC scheme, with its
> 'single next symbol probability' bottleneck interface
> between the modeler & the coder (where the modeler
> micro-manages the coder, symbol by symbol, and where
> the whole coder+ modeler processing and interaction
> is traversed from top to bottom on every symbol)
> is simply intrinsically a poor division of labor
> to allow for any high performance coding.
>


I gues that means it doesn't work so hot, Took you
a long time to state it. So you are admiting the
arithmetic may beat the QI when you attempt to shoe
horn in your QI method where a arithmetic might be
a natural fit. No wonder you changed Moffat to what
ever porblem your doing instead of the other way around.


....

>
> b) --- QI Modeling AC style
>


....


> QI breaks the probabilities into classes, so that
> each class includes an interval of probabilities of
> size 1/sqrt(n), where n is the size of data to be
> encoded. Since coder doesn't assume any more the Morse
> telegraph kind of "online", it doesn't assume n is 1
> but some much larger number. The modeler is still left
> to work under the old "online" spell and imagine that
> it has to convert, by hook or crook, all it knows
> or that it could know about the input sequence into
> the probabilities for the next single symbol p(c).
>
> Consider now a binary sequence of n symbols, for which
> the modeler produces, symbol by symbol probabilities
> p of bit=1, with p in some interval D=[a,b), of size
> d=b-a. We divide D into s=sqrt(n) equal subintervals
> of lengths d/s. Each input symbol is assigned to
> one of s enumerative classes (thus enumerated by a
> separate index) based on the subinterval in which
> the modeler's p at that point falls in. Hence, we're
> using quantized probabilities to classify the symbols
> as "equiprobable". The excess output E in bits per
> symbol due to this 'p quantization' is about
> (cf. [41c], p. 8):


Well it likes like your having trouble since you need to
break it up into smaller chunks. Thats to bad since a real
entropy compressor would take a file of millions of symbols
and still compress to roughly the same length no matter how
the symbols arranged. Form what you wrote it seems like your
only doing local considerations.

....
....

>
>
> Sections (a) and (b) above have couple answers.
>
>
> It is not "adaptive AC", but as illustrated in (a),(b)
> above, it can function that way. The native QI modeling
> is descriptive, in the sense of Rissanen's MDL. So the
> QI model information is a much wider type of information
> than just a list of probabilities (although it can be
> that, too).
>
> Consider an order-0 QI coding. The modeling analogous to
> the order-0 adaptive AC, except more resilient against
> the "surprises", becomes here the selection of the
> segmentation of the input into contiguous sections,
> based on measured symbol frequencies. Its resilience
> to surprises is illustrated in the row "Vary" (cf. table
> on page 10 in [T3]). The adaptive order-0 modeler
> for QI has entire input sequence available and it
> does not have to rely on a possibly false assumption
> that the symbol frequencies in the initial parts of
> the sequence are predictive of the frequencies in the
> later parts, or gamble on which way might they
> be predictive. While AC can code this way, too,
> all it would achieve with that would be to advance
> into a literal role of a less accurate and a lot
> slower imitation of QI.
>


Again this show your lack of understanding just
what an order-0 adaptive AC coder does. If a file is made up
of 2 symbol types. And one wanted a to compress it by
the above method it would get the same length no matter
how things ordered. You could have all the zeros then the ones
or any combination you would get roughly the same length file.
In fact even in you own exaples of small strings you use the
fact to get a combination number for each arrangemetn based
entirely on the length of string and number of ones. That is
what the adaptive order-o arithemtic coder would do. If you
have to segment the file then local effects take over and you
would not get the same compression for different combinations
of the ones and zeros. The order if the symbols makes no
difference.

....


Here is a question I doubt you will anwser since you really
don't seem to answer anything. But I leave so others can see
just how your code can't solve simple problems.

I go back yo your on example with a string where you use QI
in lacttice jumps to fet an index. This truely is about as
easy as it gets. Something even you could but your fingers around.
You claim you need only three things.
a) this combination index
b) the number of ones in the string
c) the length of string

How would you combine this infromation into a string so decompresstion
could be done. This is something even you should be able to do.
Yet you will not. You may give several pages of references but
you will not give one complete example.


David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

nightlight

2006-01-10, 3:57 am

> Any decent AC does *not* lose parts of its coding
> interval each step. Try to get that through your
> head. It *is* possible (hell, it's quite easy) to
> get the intervals to line up exactly without
> infinite precision.


It seems you and few others here are either restricting
the noun "gap" to a subset of cases where gaps do occur,
or you are making certain implicit assumptions about
the source and the sequences it produces, without being
aware of making them.

In order to clarify which is the case here, why don't
you demonstrate your easy and gapless finite precision
binary AC coding for the finite input sequences produced
by a binary source S3, which satisfy the following
conditions:

a) Each sequence from S3 is exactly 3 bits long,

b) each sequence has exactly two bits=0 and one bit=1 and

c) each sequence is equally likely as any other.

How does your easy gapless AC coding for all possible
input sequences from S3 look like? Since the inputs
are pretty short, you should be able to show it all
using an AC working in 4 bit precision. But if you
need more, well use more (as long as the bit strings
fit on a 72 char line).

NOTE 1: The S3 output needs to be coded using a binary
order-0 AC in incrementing or decrementing or static mode,
which codes bit by bit as if the sequences have arbitrary
lengths. Namely, the short sequences are given above
only to allow you to show explicitly the coding, not
to allow switching over to let you construct the small
Huffman tree for the entire input (or some equivalent
ad hoc codes).

For example, the exactly same coding algorithm should be
demonstrable (e.g. as a little sample exe coder that
will work on a regular PC with, say, 1G of RAM) with
inputs of, say, exactly 30 million bits long with exactly
10 million bits=1 set. In this case you couldn't code it
via the multi-alphabet AC which treats the entire input
as a single character of a gigantic alphabet. Hence,
you can't use any such methods for S3 either. So,
just show it for a plain bitwise binary AC.

NOTE 2: You can't feed your AC input sequences with
0, 2 or 3 ones in order to fill the gaps and declare
them as "used". The S3 specified does not produce
any such sequences.


Willem

2006-01-10, 3:57 am

nightlight wrote:
) It seems you and few others here are either restricting
) the noun "gap" to a subset of cases where gaps do occur,
) or you are making certain implicit assumptions about
) the source and the sequences it produces, without being
) aware of making them.

Neither. See below.

) In order to clarify which is the case here, why don't
) you demonstrate your easy and gapless finite precision
) binary AC coding for the finite input sequences produced
) by a binary source S3, which satisfy the following
) conditions:
)
) a) Each sequence from S3 is exactly 3 bits long,
)
) b) each sequence has exactly two bits=0 and one bit=1 and
)
) c) each sequence is equally likely as any other.
)
) How does your easy gapless AC coding for all possible
) input sequences from S3 look like? Since the inputs
) are pretty short, you should be able to show it all
) using an AC working in 4 bit precision. But if you
) need more, well use more (as long as the bit strings
) fit on a 72 char line).
)
) NOTE 1: The S3 output needs to be coded using a binary
) order-0 AC in incrementing or decrementing or static mode,
) which codes bit by bit as if the sequences have arbitrary
) lengths. Namely, the short sequences are given above
) only to allow you to show explicitly the coding, not
) to allow switching over to let you construct the small
) Huffman tree for the entire input (or some equivalent
) ad hoc codes).
)
) For example, the exactly same coding algorithm should be
) demonstrable (e.g. as a little sample exe coder that
) will work on a regular PC with, say, 1G of RAM) with
) inputs of, say, exactly 30 million bits long with exactly
) 10 million bits=1 set. In this case you couldn't code it
) via the multi-alphabet AC which treats the entire input
) as a single character of a gigantic alphabet. Hence,
) you can't use any such methods for S3 either. So,
) just show it for a plain bitwise binary AC.
)
) NOTE 2: You can't feed your AC input sequences with
) 0, 2 or 3 ones in order to fill the gaps and declare
) them as "used". The S3 specified does not produce
) any such sequences.

From this note, I assume the model is allowed to 'know'
that the sequence will have a single '1' and two '0' bits,
and update its probabilities accordingly.


I'll sketch out an AC encoding, that should be enough:

The starting range is [0..256)

The steps for encoding the sequence '100' are:

Step 1: Encode the symbol '1'.
The probabilities are 1/3 for a '1' and 2/3 for a '0'.
Therefore, the range is subdivided as follows:
[0..170) for a '0', and [170..255) for a '1'.
Thus, the range is reduced to [170.255)

Step 2: Encode the symbol '0'.
The probabilities are 0 for a '1' and 1 for a '0'.
Therefore, the range is subdivided as follows:
[170..255) for a '0', and [255..255) for a '1'.
Thus, the range is reduced to [170.255)

Step 3: Encode the symbol '0'.
The probabilities are 0 for a '1' and 1 for a '0'.
Therefore, the range is subdivided as follows:
[170..255) for a '0', and [255..255) for a '1'.
Thus, the range is reduced to [170.255)



The steps for encoding the sequence '010' are:

Step 1: Encode the symbol '0'.
The probabilities are 1/3 for a '1' and 2/3 for a '0'.
Therefore, the range is subdivided as follows:
[0..170) for a '0', and [170..255) for a '1'.
Thus, the range is reduced to [0.170)

Step 2: Encode the symbol '1'.
The probabilities are 1/2 for a '1' and 1/2 for a '0'.
Therefore, the range is subdivided as follows:
[0..85) for a '0', and [85..170) for a '1'.
Thus, the range is reduced to [85.170)

Step 3: Encode the symbol '0'.
The probabilities are 0 for a '1' and 1 for a '0'.
Therefore, the range is subdivided as follows:
[85..170) for a '0', and [170..170) for a '1'.
Thus, the range is reduced to [85..170)


The steps for encoding the sequence '010' are:

Step 1: Encode the symbol '0'.
The probabilities are 1/3 for a '1' and 2/3 for a '0'.
Therefore, the range is subdivided as follows:
[0..170) for a '0', and [170..255) for a '1'.
Thus, the range is reduced to [0.170)

Step 2: Encode the symbol '0'.
The probabilities are 1/2 for a '1' and 1/2 for a '0'.
Therefore, the range is subdivided as follows:
[0..85) for a '0', and [85..170) for a '1'.
Thus, the range is reduced to [0.85)

Step 3: Encode the symbol '1'.
The probabilities are 1 for a '1' and 0 for a '0'.
Therefore, the range is subdivided as follows:
[0..0) for a '0', and [0..85) for a '1'.
Thus, the range is reduced to [0..85)


As you can see, the three possible sequences each lead to a range of
approximately the same size, and the ranges of all possible sequences,
when put together, form the starting range without any gaps.

This scheme easily scales up to sequences with millions of bits.


There is a coding gap caused by having to terminate the encoder, but that
is a fixed cost of two or three bits, not a loss at each step as you have
claimed. Furthermore, this termination cost can be avoided as well, with
careful handling.

I hope to have cleared up your misgivings about Arith Coding with this.



SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Phil Carmody

2006-01-10, 3:57 am

Willem <willem@stack.nl> writes:
> ) a) Each sequence from S3 is exactly 3 bits long,


> There is a coding gap caused by having to terminate the encoder,


Not in this particular case.

Phil
--
What is it: is man only a blunder of God, or God only a blunder of man?
-- Friedrich Nietzsche (1844-1900), The Twilight of the Gods
David A. Scott

2006-01-10, 3:57 am

nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote in
news:HaudnSFNItBFil3eRVn-rg@rcn.net:

>
> It seems you and few others here are either restricting
> the noun "gap" to a subset of cases where gaps do occur,
> or you are making certain implicit assumptions about
> the source and the sequences it produces, without being
> aware of making them.
>
> In order to clarify which is the case here, why don't
> you demonstrate your easy and gapless finite precision
> binary AC coding for the finite input sequences produced
> by a binary source S3, which satisfy the following
> conditions:
>
> a) Each sequence from S3 is exactly 3 bits long,
>
> b) each sequence has exactly two bits=0 and one bit=1 and
>
> c) each sequence is equally likely as any other.
>
> How does your easy gapless AC coding for all possible
> input sequences from S3 look like? Since the inputs
> are pretty short, you should be able to show it all
> using an AC working in 4 bit precision. But if you
> need more, well use more (as long as the bit strings
> fit on a 72 char line).
>
> NOTE 1: The S3 output needs to be coded using a binary
> order-0 AC in incrementing or decrementing or static mode,
> which codes bit by bit as if the sequences have arbitrary
> lengths. Namely, the short sequences are given above
> only to allow you to show explicitly the coding, not
> to allow switching over to let you construct the small
> Huffman tree for the entire input (or some equivalent
> ad hoc codes).
>
> For example, the exactly same coding algorithm should be
> demonstrable (e.g. as a little sample exe coder that
> will work on a regular PC with, say, 1G of RAM) with
> inputs of, say, exactly 30 million bits long with exactly
> 10 million bits=1 set. In this case you couldn't code it
> via the multi-alphabet AC which treats the entire input
> as a single character of a gigantic alphabet. Hence,
> you can't use any such methods for S3 either. So,
> just show it for a plain bitwise binary AC.
>
> NOTE 2: You can't feed your AC input sequences with
> 0, 2 or 3 ones in order to fill the gaps and declare
> them as "used". The S3 specified does not produce
> any such sequences.
>
>


If you have a sequence that is defined by 3 bit strings you in
effect have 3 input symbols. With an adaptive coder you don't
need to know in advance that each of the 3 symbols is equal likely
but what the hell lets play the game and compress with a coder the
likes of arb2x or arb255 the only difference being you need two cells
one cell is split equal in half the other is split 1 to 2. Note
this split is not perfect but when one one is useing 64 bit registers
the split is dam close and no gaps.

To explain what is happening lets use 1 bit registers the first
splits so that a one is output when you have the 100 token
the next is split 50 50 so the mappings would be
100 goes to 1
010 goes to 01
001 goes to 00
in fact this is what you would get in huffman case. And you can see
that since its a complete tree any sequence of 1 and 0's on
decompression gives you any of the 3 symbol sequences no gaps.

You might complain in that if you had 30 million bits from
the source your compressing that the compressed length would
vary from 10 million bits in the unlikely event you have 10
million 100's being output even though it should have been
only occuring roughly 1/3 of the time. The compressed length
could also be as long as 20 million bits if the 100 never occured
at all highly unlikely. However using the huffman apprpximation
the average compressed length would be 16.666... million bits.

When you go to more bits it compresses better again no gaps
The ideal length for the 30 million bit sequence is which is
actually 10 million times -lg(1/3) which is roughly
15849625.0072115618145373894394782 bits for the 10 million.
while if using 64 bit register
its - lg (6148914691236517204/18446744073709551616) for 100
which is roughly
15849625.0072115618176657356346099 and slightly less
for the other 2 symbols. In short there would be no way
to tell your not actually using 1/3 and there would be no
gaps. Since you would need 15849625 bits give or take a
bit to express the compression.

If you have high enough interger precession math you could
calculate the index. However sooner of later when you do compression
you have to right that number out as ones and zeros. You gain
nothing in space savings by have the exact number since the
airhtmetic compression has no gaps. Sure you might be one bit
shorter for some combinations but then you will be a bit longer
for otherse. Its just a plain fact the bijective arithmetic
can do this kind of compression in an optimal way. It not
clear you understand how to do this with your method in general.



and

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

David A. Scott

2006-01-10, 3:57 am

nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote in
news:HaudnSFNItBFil3eRVn-rg@rcn.net:

> In order to clarify which is the case here, why don't
> you demonstrate your easy and gapless finite precision
> binary AC coding for the finite input sequences produced
> by a binary source S3, which satisfy the following
> conditions:
>
> a) Each sequence from S3 is exactly 3 bits long,
>
> b) each sequence has exactly two bits=0 and one bit=1 and
>
> c) each sequence is equally likely as any other.
>
> How does your easy gapless AC coding for all possible
> input sequences from S3 look like? Since the inputs
> are pretty short, you should be able to show it all
> using an AC working in 4 bit precision. But if you
> need more, well use more (as long as the bit strings
> fit on a 72 char line).
>
> NOTE 1: The S3 output needs to be coded using a binary
> order-0 AC in incrementing or decrementing or static mode,
> which codes bit by bit as if the sequences have arbitrary
> lengths. Namely, the short sequences are given above
> only to allow you to show explicitly the coding, not
> to allow switching over to let you construct the small
> Huffman tree for the entire input (or some equivalent
> ad hoc codes).
>
> For example, the exactly same coding algorithm should be
> demonstrable (e.g. as a little sample exe coder that
> will work on a regular PC with, say, 1G of RAM) with
> inputs of, say, exactly 30 million bits long with exactly
> 10 million bits=1 set. In this case you couldn't code it
> via the multi-alphabet AC which treats the entire input
> as a single character of a gigantic alphabet. Hence,
> you can't use any such methods for S3 either. So,
> just show it for a plain bitwise binary AC.
>
> NOTE 2: You can't feed your AC input sequences with
> 0, 2 or 3 ones in order to fill the gaps and declare
> them as "used". The S3 specified does not produce
> any such sequences.
>
>
>


Nightlight here is what I am willing to do so others
and maybe even you can see the difference. This is
basically your example. But as usually you are not clear
enough. I would like anybody to be to test the code.

1) So first of all the input for the compressor has to be
files of ascii characters in multiples of 3 namely
100 or 010 001 that way the file will be exactly a multiple
of 3 bytes where only those 3 combinations allowed. So file
can be short as 3 byte or millions of bytes long but also
a multiply of 3.

2) I will map that bijectively to a packed binary file you
really don't need the format here but its for the arithmetic
coder I don't wish to mod arb255 very much so that it easy
for you and other to follow the changes I but in.

3) The bijective arithemtic coder will compress with no gaps using
the fixed and unchanging wieghts of as close to 1/3 that it can
get. ( One could design a cusom way but hay powers or 2 good enough)
The output will be an binary file where each of the 3 sequences maps
to roughly 1.5849625 bits per use.

4) The reverse is all so true. Take any file and do the reverse of
3 2 and 1 above and you get a unique sequemce of type 1 such that
if X is an ascii file in begining then steps 2 and 3 would be
compression and reverse of 3 and revese of 2 would be uncompression.
if Y is any file period then:
compress( uncompress (Y)) = Y
and
uncompress( compress(X)) = X

Even though this is your example not sure you can do this with your
method. It not even ideal for the arithmetic but even so it would be
trival to mod arb255 to do this.


David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

nightlight

2006-01-10, 3:57 am

First, few minor problems with your answer:

> The starting range is [0..256) ...
> Therefore, the range is subdivided as follows:
> [0..170) for a '0', and [170..255) for a '1'. ...


You start with full "range" [0..256), but then you switch
the endpoints from there on to [*..255). You also meld together
two mutually exclusive notations, the Pascal range A..B
with the semi-open interval math notation [A,B) for a "range".
The Pascal "range" A..B includes into the interval both points
A and B, while the semi-open interval [A,B) doesn't include B.
That notational meld of two mutually exclusive prescriptions
was probably what led to your endpoint mixup 256 vs. 255. The fact that
255 is divisible by 3, while 256 is not gave it an extra wishful nudge
to go with the 255. I will switch to a single coherent notation, [A,B),
hence use e.g. [*,256).

> There is a coding gap caused by having to terminate
> the encoder, but that is a fixed cost of two or three
> bits, not a loss at each step as you have claimed.


The AC doesn't normally build the complete, ready to go output code on
each step. Part of the output code is implicit in the coder's state
variables and the implementation conventions. If at any point the coder
is told to wrap it up and provide output code, the excess of 1-2 bits on
that total output code will be in that output in the case of an infinite
precision AC coder (ACX). This excess will be 2-3 bits in the case of a
finite precision AC coder (ACF). Hence, these 1-2 or 2-3 excess bits on
the total output are part of the AC output throughout -- they are only
represented differently at different stages (implicit in the algorithm
conventions and the internal state variables until the last step, at
which point the output code creation is finalized and they become
explicitly part of that output). Hence, at best your argument above is
vacuous (arguing about ambiguities in verbal conventions), and at worst
it is incorrect. I will grant you the benefit of the best case.

Well, on the positive side, at least you recognize that these excess
bits result in a "coding gap", which they do, of course.

There is an additional and unrelated misconception revealed in the last
part of that sentence:

> ... but that is a fixed cost of two or three
> bits, not a loss at each step as you have claimed.


An infinite number of additive contributions can easily produce a sum
with fixed upper bound, or within some fixed range, such as 1-2 excess
for ACX or 2-3 for ACF. For example 1 + 1/3 + 1/3^2 +... = 1.5 and the
partial sums after 2 terms are between 1.33 and 1.5. Hence, that the
cost is "fixed" (or rather the cost interval is fixed) gives you no
information on how many steps contributed to the cost.

> Furthermore, this termination cost can be avoided as
> well, with careful handling.


Not in the general AC coding setup. In a special cases, e.g. when the
number of symbols is prescribed and known upfront to encoder and
decoder, you can avoid, at least part of it. In a general AC setup,
without any such "side information" provided, the 1-2 bit excess for ACX
(or 2-3 for ACF) is a part of the standard ACX code length (needed to
select unambiguously the final codeword interval):

L = ceiling(log(1/Pc)) + 1 .... (1)

whole bits (see [41a], pp. 13-14), where Pc is the 'coding probability'
(the probabilities you compute with in AC) of the complete message. For
a stationary order-0 AC model, Pc is simply p^k * (1-p)^(n-k), where
p=AC's probability of 1's, k=count of 1's, n=count of all symbols.
Hence, the upper bound for ACX redundancy is 2 bits (unlike the Huffman
or exact EC code, where the +1 is absent and where the upper bound is 1,
resulting solely from rounding up of the log(1/Pc) to the next whole bit).

Now to the fundamental problem of your answer -- the complete and
hopeless conceptual meld of the three distinct concepts of "interval"
relevant in arithmetic coding. Hence, before we can head toward any
approximation of a coherent discussion, we need a finer resolution vew
of these three kinds of "intervals":

1. IVF ==> the coder's internal variables in finite precision specifying
finite precision intervals (these are your [0,85),
[85,170),... etc. variables).

2. IVX ==> the coder's internal variable specifying an infinite
precision interval (this type is implicit in your description, when you
are referring to your intervals as approximate). While one can't store
infinite precision binary fraction in an internal variable, one can
easily store an equivalent rational fractions with unlimited precision
integers for numerators and denominators (e.g. as done in [41a] pp.
13-14, 46).

3. CWI ==> Codeword interval - this is the interval defined by the AC
codeword (the explicit codeword ready for output or the implicit
codeword which could produced at each step). The AC codeword, implicit
or explicit, is a binary string C of finite length L (for ACX L is given
via (1)). The usual AC convention is to interpret C as representing
fractional binary digits of some rational number Z, defined as: Z = 0.C.
For example, if the AC codeword is C=1010 then L=4, Z=0.1010 =
1/2+0/4+1/8+0/16 = 5/8 = 0.625. The mapping of codewords C to intervals
CWI is used more generally than just for AC analysis e.g. in the Kraft
inequality proofs (cf. Cover & Thomas IT textbook [24]). The common
convention for this mapping is as follows ([24] eq. (24) which I use
below as (2); also [41a] pp. 13-14 & eq. (1.7)):

CWI = [Z, Z+G) .... (2)

where the "granularity" G (=the interval length) is defined as:

G=1/2^L .... (3)

Some examples of codewords C and their corresponding intervals CWI are
shown below:

C ... CWI
-------------------------
0 ... [0, 0.5)
1 ... [0.5, 1)
00 ... [0, 0.25)
01 ... [0.25, 0.5)
10 ... [0.5, 0.75)
11 ... [0.75, 1) ... etc.
-------------------------

The soft & fuzzy pedagogical descriptions of the AC encoding algorithm
usually describe construction of the nested IVX or IVF intervals from
the input sequence, then state that the coder transmits binary digits of
some point Z from within the final IVX or IVF interval as a
specification for that interval and they'll prescribe that the number of
digits should be given via (1) (sometimes omitting the +1 term). But,
the point Z given with L fractional bits C, can't all by itself specify
even the IVF, let alone IVX.

The only interval that Z can specify unambiguously, and only if further
supplemented with a precise convention, such as eqs. (2) and (3), is the
interval CWI. The higher resolution descriptions (such as [41a] pp.
13-14, 41-48) explain and derive code length (1) -- its origin is in the
requirement (in ICW terms used above) that the interval ICW has to fit
within the coder's final "internal variable" interval IVX or IVF (cf.
eq. (1.7) in [41a]).

We can look now at a set {C} = {C1,C2,...} containing all possible
codewords that an AC can produce from a given source (this may be an
infinite set). The corresponding intervals CWI(C1), CWI(C2),... form a
set {CWI(C)} of non-overlapping intervals (consequence of the AC output
decodability). Since
all CWI intervals fit within [0, 1), the non-overlapping property
implies that their lengths G(C1), G(C2),... can add up to at most 1, i.e.

G(C1)+G(C2)+G(C3)+... =< 1 ... (4)

The eq. (4), recalling the definition (3) of G's, is the Kraft
inequality which must hold for any uniquely decodable codes (cf. [24] p.
90, McMillan theorem), including our set {C}.

The concept of "coding gaps" (in code space or in code intervals), which
is our topic here, is normally used in the context of Kraft inequality
(4) as follows: The codes for which (4) holds as the exact equality are
called "compact" codes and such codes don't have "coding gaps". The
codes for which (4) holds only as the "<" part of the "=<" inequality
have "coding space gaps" or "coding gaps" (unused codes). The term
"interval gap" simply refers to these same gaps in terms of the CWI
intervals whose lenghts L(C1), L(C2)... are used in (4).

Even in the low res analysis you apparently did manage somehow to catch
a vague glimpse, however briefly, of what these gaps refer too, since
you said:

> There is a coding gap caused by having to terminate
> the encoder, but that is a fixed cost of two or three
> bits, ...


Clearly, you are realizing above that adding 2 or 3 bits as a
termination cost to the codewords, does create gaps somehow and
somewhere. But since that didn't seem to fit into the your
conceptual meld in which all three interval types are just the
"interval" and where you can make your internal variables interval IVF
fit together as tightly as you wish, you couldn't quite see how could
the "interval^3" have this gap. Now, with the three higher res interval
concepts in front of you, you can look at (3) and see that adding 2 or 3
to L will make the corresponding CWI interval length G smaller by 4 to 8
times, hence a pretty big gap will be opened in the CWI coverage (4) of
the interval [0,1) (since 75-87 percent of that CWI will turn into the
unused code space, the interval gap).

But seeing in it the low res back then, you concluded:

>... but that is a fixed cost of two or three
> bits, not a loss at each step as you have claimed.


Now look at (1) which shows how the L is computed -- the smaller the
probability Pc of the whole symbol sequence, the larger the L in (1)
will be. But Pc in (1) is the coding probability for the particular
sequence e.g. for order 0 model (as we used), Pc is a series of
truncated products of probabilities p & q of individual symbols
encountered. Now what happens if you have the actual p=1/3, which is an
infinite binary fraction 1/3=0.01010101... and use instead p'=0.010 for
AC coding? As you probably know from basic coding theory, the greater
the deviation of p' from the actual probability p, the longer on average
the codeword length L. In turn, the increase in avg. L shortens the
average CWI interval length G via (3), contributing to the increase in
CWI interval gap (which is the shortfall from 1.0 in Kraft inequality
(4)). Note also that the average increase in L is the result of
accumulation of these contributions along the way from all single coding
step coding probability deviations from the actual probability at that step.

The exactly same coding gaps happen with the QI coder due to its finite
precision. Since QI's construction of interval lengths (eq. (21) p. 8 in
[T3]) is bottom up, the accumulation is much more transparent here. You
can even see it as it accumulates step-by-step using the QI.exe program,
option "cbr" (to show complete binomial row; set also option n50 since
default is n=1024 which would scroll off). The QI errors are due to
rounding up to g (g=32) bit SWI mantissa in (21). The "cbr" command
shows all quantized SWI binomials in row n, with asterisk next to those
which were rounded up when computed via recurrence (21), which here, for
the binary coder, is simply the Pascal triangle recurrence: C(n,k) =
C(n-1,k) + C(n-1,k-1). The columns labeled dm16 and dm10 show how much
this increments of mantissa have accumulated over all such adds (for all
prior n). The "Extra Bits" column shows how much these dm contrinbute to
excess bits (usually 1/10^9). If the mantissa were shorter, or if n is
very large, these accumulations of increments would eventually cause
mantissa to overflow, which would then increase the SWI exponent by 1,
hence lenghtening the output by 1 bit.

Unlike QI, AC uses multiplicative binomial recurrences, described in
earlier post [P1], eq (1), (2). Even more unlike QI, which does this
only during the table construction (which is a universal table, the same
table for all source probabilities, hence it can be premade & loaded
from a file when the program starts), AC computes them from scratch on
every coding task and for every symbol along the way. The final
difference from QI, is that AC works top down, from the largest binomial
or addend (point B on Fig 1 in [T3], see also [T2] pp. 19), which in AC
is scaled to 1, and all other addends are smaller than this since
they're all divided by this largest binomial (which is at the endpoint
of the path, cf [T2] pp. 19-25 for details on this very close
correspondence between the two coders). Hence, the rounding down
arithmetic of AC works by fixing the starting binomial to 1 (or an
addend generally) and reducing the size of the lower order ones,
resulting at the end in smaller Pc in (1), thus longer output L. With QI
which computes from lower to higher binomials, the higher binomials are
incremented when rounding (since SWI rounding in (21) is up). So, while
QI's rounding up errors accumulate toward the larger addends making them
larger, the AC's rounding down errors accumulate towarad the smaller
addends making them smaller, which it eventually pays for via (1) in 1-2
excess bits for ACX or 2-3 excess bits for ACF (note that for AVF, the
excess can go beyond 2-3 if the input gets too long for the frequency
counters).

As to your IVF intervals fitting perfectly together, that is irrelevant
for the gaps in the ICW intervals (which are the intervals defined by
the codeword C, the quantity that is transmitted and to which (4), the
gaps, the unused codes & the redundancy apply). IVFs are your internal
variables and you're managing them as you wish (e.g. what to do with
their fit when you need to drop their digits in finite precision). If
you want to fit the IVFs together, all tight and snuggly, thats' fine
and you are free to do so, but that won't make the gaps in the CWI go
away or average L become shorter. For example, you could reduce your AC
precision to 4 bits and your avg. excess bits, the unused code space and
the CWI interval gaps will baloon (see table 3.4 p. 52 in [41a] for
effects of the 4 bit AC precision), yet all your internal variables IVFs
are still happily covering the entire interval [0,1), as snuggly as
ever, with no gaps in sight.

You're welcome, of course, to write out all the codes AC will produce on
that example (under conditions described i.e. no special ad hoc code
tweaks for the 3 short strings; or just run some AC, in decrementing
mode as you did in your description) and show how the CWI intervals for
them fit gaplessly or equivalently, show how these actual AC codes use
entire output code space without "coding gaps" (and, obviously, without
contradicting your earlier statement on "coding gaps" due to 2-3 bit
termination overhead).

> I hope to have cleared up your misgivings about Arith
> Coding with this.


Well, I hope yours will clear up, too :)

-- References ( http://www.1stworks.com/ref/RefLib.htm )

41a. P.A.J. Volf "Weighting Techniques In Data Compression: Theory and
Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec 2002
http://alexandria.tue.nl/extra2/200213835.pdf

24. T.M. Cover, J.A. Thomas
"Elements of Information Theory" Wiley 1991

P1. Earlier post on AC multiplicative recurrences:

http://groups.google.com/group/comp...1847a32daa9a571
David A. Scott

2006-01-10, 3:57 am

nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote in
news:yIGdndvhfe_GGVzenZ2dnUVZ_tqdnZ2d@rc
n.net:


First some minor problems with you whole line of thought.
We can easily code this so most of your anwser has nothing
to do with reality. Second just what you want coded is unclear
I have tried to ask for enough details so others can look at
real test results instead of the long rants. Third are you
going to write code that would work on real files for this
example or is that to hard even though you designed this
yourself to make AC look bad you your stuff good?



>
> The AC doesn't normally build the complete, ready to go output code on
> each step. Part of the output code is implicit in the coder's state
> variables and the implementation conventions. If at any point the
> coder is told to wrap it up and provide output code, the excess of 1-2
> bits on that total output code will be in that output in the case of
> an infinite precision AC coder (ACX). This excess will be 2-3 bits in
> the case of a finite precision AC coder (ACF). Hence, these 1-2 or 2-3
> excess bits on the total output are part of the AC output throughout
> -- they are only represented differently at different stages (implicit
> in the algorithm conventions and the internal state variables until
> the last step, at which point the output code creation is finalized
> and they become explicitly part of that output). Hence, at best your
> argument above is vacuous (arguing about ambiguities in verbal
> conventions), and at worst it is incorrect. I will grant you the
> benefit of the best case.
>
> Well, on the positive side, at least you recognize that these excess
> bits result in a "coding gap", which they do, of course.
>


Actually thats not entirely true. As one does a full bijective
arithemtic file encoder for this you write out only what is done when
compressing you realize for each or the 3 possible output you need
roughly 1.58 bits So naturally the coder is carrying the excess around
till the input terminates. On termination some times you round up
sometimes you round down. The interesting thing is this.
Take the output file from a long set of inputs but change the last
byte of the compressed file. In fact change it so that you have 256
files each with a different last value. In turn each of these files
will decompress back to a valid input that if recompressed goes back
to same unique compressed file. THER ARE NO GAPS. Can you do this with
your method. I don't think so.


REST OF USELESS RANT CUT




David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

nightlight

2006-01-10, 3:57 am

--- Errata:

The citation:

> (cf. Cover & Thomas IT textbook [24]). The
> common convention for this mapping is as follows
> ([24] eq. (24)


should have in the last line:

([24] eq. 5.11 p. 84

Willem

2006-01-10, 3:57 am

nightlight wrote:
) First, few minor problems with your answer:
)
) > The starting range is [0..256) ...
) > Therefore, the range is subdivided as follows:
) > [0..170) for a '0', and [170..255) for a '1'. ...
)
) You start with full "range" [0..256), but then you switch
) the endpoints from there on to [*..255). You also meld together
) two mutually exclusive notations, the Pascal range A..B
) with the semi-open interval math notation [A,B) for a "range".

Semantics, and a typo. I trust it was obvious what I meant.

) > There is a coding gap caused by having to terminate
) > the encoder, but that is a fixed cost of two or three
) > bits, not a loss at each step as you have claimed.
)
) The AC doesn't normally build the complete, ready to go output code on
) each step. Part of the output code is implicit in the coder's state
) variables and the implementation conventions. If at any point the coder
) is told to wrap it up and provide output code, the excess of 1-2 bits on
) that total output code will be in that output in the case of an infinite
) precision AC coder (ACX). This excess will be 2-3 bits in the case of a
) finite precision AC coder (ACF). Hence, these 1-2 or 2-3 excess bits on
) the total output are part of the AC output throughout

This does not follow. As I see it, there are no excess bits in an Arith
Coder, until the coder is told to wrap it up. Can you give some clear
arguments why you claim that the bits added by termination are present
before termination ?

To any sane person, the phrase 'a loss at each step' implies that this
loss will grow as the number of steps increases.


) Well, on the positive side, at least you recognize that these excess
) bits result in a "coding gap", which they do, of course.

A coding gap of a whopping two bits. If your claim is that AC is worse
than QI because of those two bits, I laugh in your face.

I have snipped your whole discussion on those two or three bits because
they are irrelevant, and your claim that a cost that does not grow as
the number of steps increases is nevertheless incurred at each step is
ridiculous and arbitrary.

) Now to the fundamental problem of your answer -- the complete and
) hopeless conceptual meld of the three distinct concepts of "interval"
) relevant in arithmetic coding. Hence, before we can head toward any
) approximation of a coherent discussion, we need a finer resolution vew
) of these three kinds of "intervals":

I have snipped your entire discussion below, because in the end it still
boils down to the fixed cost of two or three bits of excess.

) You're welcome, of course, to write out all the codes AC will produce on
) that example (under conditions described i.e. no special ad hoc code
) tweaks for the 3 short strings; or just run some AC, in decrementing
) mode as you did in your description) and show how the CWI intervals for
) them fit gaplessly or equivalently, show how these actual AC codes use
) entire output code space without "coding gaps" (and, obviously, without
) contradicting your earlier statement on "coding gaps" due to 2-3 bit
) termination overhead).

As evident from your conclusion. As I said before, I will now laugh
in your face. You have written page upon page of explanation, all of
which in the end revolved around a fixed coding excess of a few bits.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Willem

2006-01-10, 3:57 am

nightlight wrote:
) In any case, this is only one component of the ACF excess, which was
) the particular subtopic we were discussing. You can make ACF perform at
) about that level of excess by using decrementing AC (see [34]). In that
) role, though, while ACF will code nearly as optimally as QI (within
) these few bits, which grow very slowly with N), all it achieves is to
) become a slightly less accurate and much slower exact work-alike of QI.
)
) The regular adaptive or static AC's that one normally finds in
) practical implementations, there will be an additional redundancy
) relative to QI which, in case of order-0 stationary sources can be
) derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the
) leading order 1/2 log(2P npq) for binary (for general alphabet of size
) A, there are A-1 such terms which are summed, resulting in approx 1/2
) changing to (A-1)/2). As shown in [T2], this error is due to the
) approximate enumeration of AC, where the Stirling approximation and the
) subsequent dropping of the sqrt() & other factors (each being <1),
) causes AC addends to increase relative to QI, leading to an excess of
) O(log(n)) bits on the total output.

This redundancy is offset by the need to transmit extra bits for the
modeling information, which is something you have neatly excluded in
your calculations.

) This O(log(n)) order redundancy was the dominant effect shown in the
) test table (p.10 in [T3]) for all but the last row (which illustrates
) different effect). Depending of course on what the total output size is
) (which depends on input size and density of 1's), this may amount to a
) tiny fraction of 1% or to 5-6% for the parameter ranges shown in the
) table.

I assume your tests only counted the actual bits output by the encoder,
and not the bits needed for transmitting the model ? That makes it
an unfair comparison.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
David A. Scott

2006-01-10, 3:57 am

NNTP-Posting-Host: n8JTGId0GJMltOZqvtpIHw.user.domitilla.aioe.org
X-Complaints-To: abuse@aioe.org
User-Agent: Xnews/5.04.25
X-Face: *M,`iYi}2A#%e&7.vp<8f81-*udLB9<ppLF"bWcmt1_Z61X/9NPK/$nTz3[j\yWuLXM>nixvZ`=}%']'>qG%|M$ic5V6oey="NWI!LE.V|>|xmrCcP)w
Xref: number1.nntp.dca.giganews.com comp.compression:67606 sci.math:821731

"Matt Mahoney" <matmahoney@yahoo.com> wrote in
news:1136681911.708860.197090@g47g2000cwa.googlegroups.com:

> David A. Scott wrote:
>
> Actually all the code space is used because the first bit is 0 for an
> empty file.
>


You could leave the enpty file empty.

And yes one could think of the code space there as being used when
one likes at compression. I guess I was also looking at decompression
of a long file and wondering what happens to the rest of the file
to be decompressed if what is returned is marked as an EOF while the
uncompressed file has much wasted file space since following bytes
will not be looked at.


> Of course it could be improved by making it bijective instead of using
> 2*log2(n) bits to encode the length in bytes.
>


You don't have to go to that extreme and make it hard for most to
follow. You could use just log2(n) bits to encode the length. It still
would not be bijective but it would not complicate the code that much
more and a rough file integrity check could be done during
decompression without any additional output to the compressed file.

Matt if I can get nightlight to commit to coding his example of
the 3 symols types. I would like to play again with fpaq0. To see
how much better it can be made with as little change as possible.
I like your style but I don't think I will go to the wall and make
it bijective. But the nine times for each eight can be changed to
eight for eight with a ninth only needed for the last byte.


But it looks like nightlight is all talk and will not even
attempt to code his simple example. Maybe he has looked at the
example he himself made up and realizes that he can't beat a
real aritmetic coder that can be written to do his own example
where QI was suppost to shine and arithmetic fail.

> -- Matt Mahoney
>


Again you code really it neat for the other methods. Where this
nodel code is different is that in your other coding you have the
length field in front of file so you never add this extra cost
through out the file. I think you can add the cost at the back end
and save a little extra space with very little change in source
code.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

David A. Scott

2006-01-10, 3:57 am

Willem &