For Programmers: Free Programming Magazines  


Home > Archive > Compression > March 2007 > Entropy/compression rate









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Entropy/compression rate
roche.alexis@gmail.com

2007-02-15, 6:56 pm

I'm working on a compression algorithm and i need to compare my
compression rate to the maximum theorical rate.
My problem is to calculate the maximum theorical rate for each knowing
input binary datas.
For the moment I calculate the entropy corresponding to the '0'
presence probabilitie.

By exemple : I want to know the maximum theorical compression rate of
'00000000001'. So probability of '0' is 9/10 and for '1' is 1/10. The
entropy is 0.46899559358928117. So the mimimun medium size of the code
is 0.46899559358928117*N = 4.6899.
But now how can i calculate the maximum compression rate???

PS : Sorry if my english is not enough understanding

Thomas Richter

2007-02-15, 6:56 pm

roche.alexis@gmail.com wrote:

> I'm working on a compression algorithm and i need to compare my
> compression rate to the maximum theorical rate.
> My problem is to calculate the maximum theorical rate for each knowing
> input binary datas.


I doubt anyone's able to compute the maximal theoretical rate. (-:
(That would be something like the Kolmogorov entropy, and it cannot be
computed...)

What you likely mean is that you have some probabilistic model for your
information source, and want to compute the entropy within this model?

What is it? Zero-order, n-th order Markov?

> For the moment I calculate the entropy corresponding to the '0'
> presence probabilitie.


What is the zero-order presence probability? What is your alphabet,
what is your source, what is your model?

> By exemple : I want to know the maximum theorical compression rate of
> '00000000001'. So probability of '0' is 9/10 and for '1' is 1/10.


No, why? You just presented one string, this says nothing about the
probabilities. So, are you potentially meaning that:

a) your alphabet is 0/1
b) your source is a zero-order iid source?

> The
> entropy is 0.46899559358928117.


What you have here is the average number of bits required to describe
one symbol from a zero-order Markov source with probabilities 1/10 and
9/10th.

> So the mimimun medium size of the code
> is 0.46899559358928117*N = 4.6899.


....bits.

> But now how can i calculate the maximum compression rate???


I would define compression rate = input bits / output bits.

So long,
Thomas
roche.alexis@gmail.com

2007-02-15, 6:56 pm

Ok,
In fact i've got input datas and i want to calculate a theorical
maximum compression rate and to compare it to my compression method
results in order to know if we can improve it or not.
In other words knowing the number of '0' and '1' in input datas i want
to calculate the maximal compression rate.

I don't know if i am clear, so don't hesitate to say i it isn't.
Thanks.


byaarov@yahoo.com

2007-02-15, 6:56 pm

On Feb 15, 10:32 am, roche.ale...@gmail.com wrote:
> Ok,
> In fact i've got input datas and i want to calculate a theorical
> maximum compression rate and to compare it to my compression method
> results in order to know if we can improve it or not.
> In other words knowing the number of '0' and '1' in input datas i want
> to calculate the maximal compression rate.
>
> I don't know if i am clear, so don't hesitate to say i it isn't.
> Thanks.


>From what I understand, I dont think anyone can compute such a thing

as theoretical maximum compression. To do that, one must know the
source of the creation of the data. For example, if a sequence of
bits were created, which seems random to a compressor such as gzip,
but lets say I had another compressor that knew of the random seed and
algorithm that created the sequence, I can encode the data with just a
few bits. So it is all relative. Various transformations on the data
produce various compression rates.

So just given a sequence of 0s and 1s alone you cannot compute
compression rate. You need a third parameter, which is the statistics
of the alphabet. Then you can use shannons theorem, But then, one
may come up with a better compressor that isnt limited to that
alphabet.... which is why you need to understand how the source of the
generation of the data works

Bargav

Thomas Richter

2007-02-16, 7:56 am

roche.alexis@gmail.com wrote:

> In fact i've got input datas and i want to calculate a theorical
> maximum compression rate and to compare it to my compression method
> results in order to know if we can improve it or not.
> In other words knowing the number of '0' and '1' in input datas i want
> to calculate the maximal compression rate.


I afraid you still don't understand what I wanted to say. What is the
smallest output for the string 0000000001? Of course the optimal
compression for this is zero bit, generate it at the decompressor and
forget the original. You cannot compress any other string, but the one
you can compress you can represent by zero bits. Cool, infinite compression.

Not satisfied? Ok, here's a one-bit solution that works for all strings:
Compress your string by the single bit 0, expand all other strings by
the single bit 1 plus the orginal string. This compresses your input to
one bit, and makes all other inputs just one bit longer.

Not satisfied? (-: Maybe not, but I think you first need to understand
your question before you can understand the solution.

Thus, again: What is your alphabet, and what is your source?

So long,
Thomas
roche.alexis@gmail.com

2007-02-19, 7:55 am

On 16 f=E9v, 12:45, Thomas Richter <t...@math.tu-berlin.de> wrote:
> roche.ale...@gmail.com wrote:
>
> I afraid you still don't understand what I wanted to say. What is the
> smallest output for the string 0000000001? Of course the optimal
> compression for this is zero bit, generate it at the decompressor and
> forget the original. You cannot compress any other string, but the one
> you can compress you can represent by zero bits. Cool, infinite compressi=

on.
>
> Not satisfied? Ok, here's a one-bit solution that works for all strings:
> Compress your string by the single bit 0, expand all other strings by
> the single bit 1 plus the orginal string. This compresses your input to
> one bit, and makes all other inputs just one bit longer.
>
> Not satisfied? (-: Maybe not, but I think you first need to understand
> your question before you can understand the solution.
>
> Thus, again: What is your alphabet, and what is your source?
>
> So long,
> Thomas


I'm not really good in English, that could be the aim of our
ununderstanding.
I'm trying to explain better :
My source is constitued by 58*32 bits. The probability of '0' is about
0=2E90 to 1.
We have chosen a compression method based on RLE : we count the number
of '0' between two '1' with a maximum size of 32, and we compress the
result using an Huffman tree ( '0' for 32 and '1'+(size coded on 5
bits)).
We've got the practical result on 1000 events and what i want is to
know if we can really improve the compression whith adding another
method (by example Deflate or just LZ).
That why i asked if it is possible to know of to compute the maximum
theorical rate knowing the probabilty of '0'.
I've find something : f < (log(r) / H(X)) with f: factor of
compression , r : numer of code of the alphabet ans H(X) : entropy.
so for an alphabet of '1' or '0', we have f < 1/H(X) .

Is it right??

PS : thancks for you aswers

Thomas Richter

2007-02-19, 7:55 am

roche.alexis@gmail.com wrote:

> I'm trying to explain better :
> My source is constitued by 58*32 bits.


Ok, that's a new fact. It's also an important fact. C

> The probability of '0' is about 0.90 to 1.


Is that all you know? For example, does the probability of having a
zero or a one depend on the *history* of the previous bits? That is,
you you have a Markov source (yes, please look up this term if you don't
know what it is.)

> We have chosen a compression method based on RLE : we count the number
> of '0' between two '1' with a maximum size of 32, and we compress the
> result using an Huffman tree ( '0' for 32 and '1'+(size coded on 5
> bits)).


That would model, however, a different type of source, and thus your
statement about the "probability" of zeros and ones and the computation
of the zero-order entropy of *this* probability distribution has not
much to do with the rate you get by this type of RLE compression.
Instead, you have now a probability source with the alphabet containing
the symbols 0..58*32, where each symbol encodes the runs. (Aparently, it
is enough to encode the runs of the zeros).

Do you have reasons to believe that *these* symbols are zero-order
Markov? (Maybe not!) But if you, you need to find the probability of
each run-length, and compute from that the zero-order entropy.

So, now we have two models: The zero-order Markov source with alphabet
0,1 and the zero-order Markov model of runs, with alphabet 0..58*32.
One can compute the (one-symbol) probabilities of the second source from
the first source, of course. The probability of a run of n zeros (in the
second model) is the probability of a zero symbol in the first source to
the power of n.

> We've got the practical result on 1000 events and what i want is to
> know if we can really improve the compression whith adding another
> method (by example Deflate or just LZ).


But not on these terms. Deflate and LZ are *again* different models with
their own entropy. You always say "I have a big bunch of zeros and
ones", even though you don't. If "you only have a big bunch of zeros and
ones and these are a zero-order memoryless source", LZ and Deflate won't
compress better. But the actual reason why these programs work so well
is that (interesting) data does have more structure in reality.

What *else* do you know about your data? Try to think beyond the zeros
and ones and try to think about what they actually mean. From that
ground, work forwards on finding a suitable model for your data and
compute the entropy of that model.

> That why i asked if it is possible to know of to compute the maximum
> theorical rate knowing the probabilty of '0'.
> I've find something : f < (log(r) / H(X)) with f: factor of
> compression , r : numer of code of the alphabet ans H(X) : entropy.


No, and no again. Information science does not work by pulling a formula
from the rag and inserting the numbers. For the above formula, you
assume that:

a) you have a zero-order (memoryless) random source. That is, a machine
without internal states that generates a symbol from a certain symbol
set on the press of a button
b) You want to know how efficiently you can possibly encode an infitely
long stream of symbols from such a machine such that you can reconstruct
from this encoding the original output, given an infinitely long(!)
stream of symbols.
c) The answer to question b) given situation a) is the zero-order entropy.

However, you applied c) without even knowing a). Why do you expect
meaningful results?

> so for an alphabet of '1' or '0', we have f < 1/H(X) .
>
> Is it right??


How should I know? I still don't know what your random source is. It
might be right, it might be wrong. Likely, it is wrong.

By that I mean: *If* your random generator has no memory, and picks
zeros and ones "at random" with no memory whatsoever, and that is all
you know, then you cannot represent an infinitely long stream more
efficiently than with H_0 bits per symbol. There are then methods to
compress it asymptotically with H_0 bits, for example arithmetic coding.
There are also other simpler methods that allow larger losses, as for
example Huffman that requires an integer number of bits per symbol.

However, I still don't know whether these prequisites hold, and from
what I see in your question, they likely do not hold.

As said, please first try to understand the question before you can aim
at understanding the answer. This means the following homework for you:

Please look up what a random source is, what "memory" means, what a
"Markov source" is. Yes, that's work, but I afraid if you really want to
understand what you're doing, you need to know a bit more.

So long,
Thomas
roche.alexis@gmail.com

2007-02-19, 9:55 pm


> Is that all you know? For example, does the probability of having a
> zero or a one depend on the *history* of the previous bits? That is,
> you you have a Markov source (yes, please look up this term if you don't
> know what it is.)
>

Yes it is a 0 order Marjov source => each character is selected
independent of the last characters

That why we use for the entropy : H(S) = -
( p(0).Log(2)p(0)+p(1).Log(2)p(1))

In fact the data correspond to pad datector. each pad is represented
by a bit. We can't precissely now the probability of each pad to be
touch, so we can't estimate the probability of having a '1' or a '0'.
The simulation shows that the '0' bits represents 90 to 99% of our
datas, but depends on the evenement.
By example :
Event one : 0 muon is detected => 0 pad are touched, the data
represents the noice.
Event two : 1 muon is detected, X pads are touched.
.....
....
The estimation is about 1.1 muon per event.

>
> That would model, however, a different type of source, and thus your
> statement about the "probability" of zeros and ones and the computation
> of the zero-order entropy of *this* probability distribution has not
> much to do with the rate you get by this type of RLE compression.
> Instead, you have now a probability source with the alphabet containing
> the symbols 0..58*32, where each symbol encodes the runs. (Aparently, it
> is enough to encode the runs of the zeros).


You means that i have 2 differents sources ???
I believed that i had only one source consituted by 58*32=1856 codes
correspondind to my alphabet (0,1)

I you are interested we could continue directly by mail.


> Yes, that's work, but I afraid if you really want to
> understand what you're doing, you need to know a bit more.


You're right that I'm doing, but it's a quite abstract thing for a
poor lonesome Ingeneer like me.
In all cases thancks for you're help i'll continue my investigations

roche.alexis@gmail.com

2007-02-19, 9:55 pm

> Is that all you know? For example, does the probability of having a
> zero or a one depend on the *history* of the previous bits? That is,
> you you have a Markov source (yes, please look up this term if you don't
> know what it is.)


Yes it is a 0 order Marjov source => each character is selected
independent of the last characters

That why we use for the entropy : H(S) = -
( p(0).Log(2)p(0)+p(1).Log(2)p(1))

In fact the data correspond to pad datector. each pad is represented
by a bit. We can't precissely now the probability of each pad to be
touch, so we can't estimate the probability of having a '1' or a '0'.
The simulation shows that the '0' bits represents 90 to 99% of our
datas, but depends on the event.
By example :
Event one : 0 muon is detected => 0 pad are touched, the data
represents the noice.
Event two : 1 muon is detected, X pads are touched.
.....
....
The estimation is about 1.1 muon per event.

[color=darkred]
> That would model, however, a different type of source, and thus your
> statement about the "probability" of zeros and ones and the computation
> of the zero-order entropy of *this* probability distribution has not
> much to do with the rate you get by this type of RLE compression.
> Instead, you have now a probability source with the alphabet containing
> the symbols 0..58*32, where each symbol encodes the runs. (Aparently, it
> is enough to encode the runs of the zeros).


You means that i have 2 differents sources ???
I believed that i had only one source consituted by 58*32=1856 codes
correspondind to my alphabet (0,1)

I you are interested we could continue directly by mail.

> Yes, that's work, but I afraid if you really want to
> understand what you're doing, you need to know a bit more.


You're right that I'm doing, but it's a quite abstract thing for a
poor lonesome Engineer like me.
In all cases thancks for you're help i'll continue my investigations

Thomas Richter

2007-02-20, 3:55 am

roche.alexis@gmail.com wrote:
>
> Yes it is a 0 order Marjov source => each character is selected
> independent of the last characters
>
> That why we use for the entropy : H(S) = -
> ( p(0).Log(2)p(0)+p(1).Log(2)p(1))


Ok, so you *do* have the probabilities? In that case, you cannot
compress better (on average!) than the entropy. Specifically, given a
huge number of samples of your source, the average number of bits you
need to invest per symbol is given by H(S). You already computed this
quantity.

Now, if you can compress better than this by the runlength method this
only means that your model doesn't fit your data, i.e. you do not have
a zero-order Markov source.

> In fact the data correspond to pad datector. each pad is represented
> by a bit. We can't precissely now the probability of each pad to be
> touch, so we can't estimate the probability of having a '1' or a '0'.


Thus, not a zero-order Markov source?

> The simulation shows that the '0' bits represents 90 to 99% of our
> datas, but depends on the evenement.


Huh? This is a contradition to what you said above. Either you do have
probabilities, or you don't. Probabilities never depend on single
samples, but on large averages. Compression rates as predicted by the
entropy also only makes statements about averages (thus, on infinitely
long streams). You don't have inifinitely long streams, but you do have
a lot of samples. Thus,

a) if your simulation predicts the reality well, that is the probability
estimates are suitable, and,
b) the symbol probabilities are, indeed, independent, then

=> c) On average, you cannot compress better than H.

By that I mean: Given the source simulated by your input, compress a
huge number of samples generated by your random source, and measure the
compression output of the source in the number of bits required per
symbol, averaged over all the input samples. Then, for an optimal
compressor, the number of bits per sample is equal to H(S).

Fact is furthermore that Huffman coding can get "close" to this limit,
up to the precision of a single bit per sample. By that I mean that if
a symbol requires a predicted number of b = -log_2 p bits in the stream,
the huffman code will, for very large number of samples, require ceil(b)
bits, where ceil(b) is the smallest integer larger or equal to b.

One can improve on this with arithmetic coding, which in the limit,
really gets the b bits per symbol. In the limit means: "as the number of
symbols goes to infinity, the remainder will go to zero".

[color=darkred]
> By example :
> Event one : 0 muon is detected => 0 pad are touched, the data
> represents the noice.
> Event two : 1 muon is detected, X pads are touched.
> ....
> ...
> The estimation is about 1.1 muon per event.
>
>

If the events are really independent, then there is no need to use an
RLE compressor because huffman/arithmetic are optimal. The question
really is: Why are you using RLE then? Probably because you do not
believe that the probabilies are independent? Probably because your
experience showed that RLE works better than huffman coding? If the
latter is the case, then the above statement on the independence of the
symbols was aparently wrong.

So long,
THomas
roche.alexis@gmail.com

2007-02-20, 7:55 am


>
>
> Ok, so you *do* have the probabilities? In that case, you cannot
> compress better (on average!) than the entropy. Specifically, given a
> huge number of samples of your source, the average number of bits you
> need to invest per symbol is given by H(S). You already computed this
> quantity.
>


In fact for each simulated event we use the number of 0 and 1 and we
use them to calculate an H(S) corresponding to one event. And we
calculate the maximum compression rate with f= ln(2)/H(S). And for
each event we compare it to the rate obtained by our algo.




> Now, if you can compress better than this by the runlength method this
> only means that your model doesn't fit your data, i.e. you do not have
> a zero-order Markov source.
>
>
> Thus, not a zero-order Markov source?


How ca i have a non zero-order Markov source knowing that each pad are
independant ??


>
>
> Huh? This is a contradition to what you said above. Either you do have
> probabilities, or you don't. Probabilities never depend on single
> samples, but on large averages. Compression rates as predicted by the
> entropy also only makes statements about averages (thus, on infinitely
> long streams). You don't have inifinitely long streams, but you do have
> a lot of samples. Thus,
>
> a) if your simulation predicts the reality well, that is the probability
> estimates are suitable, and,
> b) the symbol probabilities are, indeed, independent, then
>
> => c) On average, you cannot compress better than H.
>
> By that I mean: Given the source simulated by your input, compress a
> huge number of samples generated by your random source, and measure the
> compression output of the source in the number of bits required per
> symbol, averaged over all the input samples. Then, for an optimal
> compressor, the number of bits per sample is equal to H(S).
>
> Fact is furthermore that Huffman coding can get "close" to this limit,
> up to the precision of a single bit per sample. By that I mean that if
> a symbol requires a predicted number of b = -log_2 p bits in the stream,
> the huffman code will, for very large number of samples, require ceil(b)
> bits, where ceil(b) is the smallest integer larger or equal to b.
>


Yes I agree, but in this case we would create an Huffman tree in each
event, right ?
But we have important real time countraint and we have to adapt it in
a FPGA, so i we want to implement Huffman code we would use a static
tree, so with lower performances.
For the aritmetic coding, I thinck it is to complex to impement in a
FPGA, because it need to use floating number.



> One can improve on this with arithmetic coding, which in the limit,
> really gets the b bits per symbol. In the limit means: "as the number of
> symbols goes to infinity, the remainder will go to zero".
>



> If the events are really independent, then there is no need to use an
> RLE compressor because huffman/arithmetic are optimal. The question
> really is: Why are you using RLE then? Probably because you do not
> believe that the probabilies are independent? Probably because your
> experience showed that RLE works better than huffman coding? If the
> latter is the case, then the above statement on the independence of the
> symbols was aparently wrong.
>


An earlier study shown that in practical tests RLE with 32 bits
windows get the best results.
But the results are not efficient, I you have already work with
physicians, everithing is not enough.
That why I wanted to calculate the theorical maximum compression rate
and compare it to our RLE compression in order to know if we can
improve it by adding an other compression method.
I though add Deflate, because it's quite simple to implement in an
FPGA, but it use Huffman coding, and It would be necessary to use
static Huffman tree, which is not the better compression method.

I'm looking for adding the real frame of our Datas, may be it could be
usefull for you to understand.

Thanks,
Alexis



Thomas Richter

2007-02-20, 9:55 pm

roche.alexis@gmail.com wrote:

>
> In fact for each simulated event we use the number of 0 and 1 and we
> use them to calculate an H(S) corresponding to one event.


That is not very sensible. The entropy makes statements about infinitely
long sequences, and thus you must collect a lot of data. Yes, I really
meant *probabilities*, as probabilities of a model. *NOT* relative
frequencies. The (usually unspoken) hope is that for very large streams,
the relative frequencies are close the probabilities of a model setup
"ad hoc" in advance, but this is not very sensible for 32*56 (what was
the number) of samples.

> And we
> calculate the maximum compression rate with f= ln(2)/H(S). And for
> each event we compare it to the rate obtained by our algo.


Well, this kind of entropy does not really provide much information.
(Hmm. Actually, no pun intended here).

>
>
> How ca i have a non zero-order Markov source knowing that each pad are
> independant ??


I don't know, I do not even know what a "pad" is, nor how it creates its
output. (-:

But again, I think you still confuse "Model, probability" with "relative
frequency". That's not the same thing. My statement - actually,
Shannon's theorem - holds, but it holds for

a) a long term average, and
b) for a given (fixed) model.

If you have reasons to believe that you cannot define probablities, or
rather, that the relative frequencies change slowly over large sample
sets (thus, not a Markov zero order model, but rather a "nonstationary
source" that is probabily Markovian in short time), then the above does
not hold. You might try to estimate probabilities from relative
frequencies - the usual trick is then to count them and "reset" or
"scale out" the statistics after some time. This is for example what
adaptive huffman or adaptive AC does.

It implies a more complicated model.


Thus, the following requires qualification:

i) Is the 32x58 data a sample from a long-term process which is
stationary (implied by Markov-zero order), that is, do you have
probabilities? If so, Shannon holds.
ii) Or are rather the 32x58 events independent samples, and nothing at
all can be said about their average behaivour?

From what you said, this seems rather to be the case, but then all bets
are off.

>
>
> Yes I agree, but in this case we would create an Huffman tree in each
> event, right ?


No, if the probablities are known, one fixed huffman tree is sufficient.
What is an "Event"? The 32x56 bit sequence? If so, no, the tree can
remain constant, because the probabilities are constant (almost by
definition, since this word only makes sense in a long-range average).

> But we have important real time countraint and we have to adapt it in
> a FPGA, so i we want to implement Huffman code we would use a static
> tree, so with lower performances.


That's fine, as long as you really have statements about the long-term
statistics, nothing's wrong with static huffman. The above theorem is
about *fixed* probabilities, i.e. stationary models, as for example
zero-order Markov.

> For the aritmetic coding, I thinck it is to complex to impement in a
> FPGA, because it need to use floating number.


No way. (-: There's a very good article on arithmetic coding by,
IIRC,Marc Nelson. Just a second... Here:

http://www.dogma.net/markn/articles/arith/part1.htm

(Highly recommended).

This implementation uses integers only, but requires multiplication. A
binary arithmetic coder can do completely without, see for example IBMs
MQ coder.

Beware, however! Arithmetic coders are often patented, so is the MQ
coder (pretty ideal for hardware, so a good choice if you can pay it!).


>
>
> An earlier study shown that in practical tests RLE with 32 bits
> windows get the best results.


Thus, your model is inappropriate. See above! Probably the source is
non-stationary, probably there is correlation.

> But the results are not efficient, I you have already work with
> physicians, everithing is not enough.
> That why I wanted to calculate the theorical maximum compression rate
> and compare it to our RLE compression in order to know if we can
> improve it by adding an other compression method.


You're asking about a different thing. The "maximum compression rate"
is, or at least one suitable definition is, the size of the shortest
program that can reproduce the same output. That is an operational
definition, nobody can compute this for "arbitrary" sequences. (And this
even provably IIRC, there is no terminating algorithm that could.)

For zero-order Markov, there is a theorem, even for n-th order Markov,
and then you *only* get results that hold "on average" for large number
of samples.

> I though add Deflate, because it's quite simple to implement in an
> FPGA, but it use Huffman coding, and It would be necessary to use
> static Huffman tree, which is not the better compression method.
>
> I'm looking for adding the real frame of our Datas, may be it could be
> usefull for you to understand.


Looking at the data is, I afraid, not really useful to come up with a
good model. You need to understand how the data is generated.

For example, is it really the case that the (long range, average)
probability of a pad being zero or one is independent from the pad index
in the bit array? I don't know what you're doing (reminds me a bit on
chip testing, but whatever) but probably it's not.

So long,
Thomas
roche.alexis@gmail.com

2007-02-21, 3:55 am

A pad is a sensor wich detect a hit from a particule, '1' a particule
is detected and '0' not. One pad coorespond to 1 bit of our datas,
that why I said the datas are independant.

Ok, so your advise is to re-study my source ....
So for that, I have to simulate on a large number of case (2000 could
be enought ??? (I'm sure not))
And I'll study the probability for each bit to be at '0'.

So i will have to calculate the entropy for 56*32 codes.. Am i right?
Knowing that i study an algorithm which fit to this entropy. Probably
an Huffman coding.

Thanck,
Alexis




Thomas Richter

2007-02-21, 3:55 am

roche.alexis@gmail.com wrote:

> A pad is a sensor wich detect a hit from a particule, '1' a particule
> is detected and '0' not. One pad coorespond to 1 bit of our datas,
> that why I said the datas are independant.


Ok, so that seems pretty much independent, unless there's something
special about the source of the particles and the geometry of the pads.

> Ok, so your advise is to re-study my source ....
> So for that, I have to simulate on a large number of case (2000 could
> be enought ??? (I'm sure not))
> And I'll study the probability for each bit to be at '0'.


Right. It is not clear what "large enough" will be. Depends on your
source. I would collect relative frequencies on various test runs over
larger, but not too large sample sets, and from there check whether the
frequencies are approximately constant. There are also classical tests
that can test the "quality" of a random source, like the chi-squared
test. Maybe that's also another direction to look at.

> So i will have to calculate the entropy for 56*32 codes.. Am i right?


Yup. Note again that this will then be optimal for the measurement
conditions you've had when measuring the probabilities. It is not clear
to me whether your sensor will always find these conditions later on.

> Knowing that i study an algorithm which fit to this entropy. Probably
> an Huffman coding.


That's the easiest choice and often appropriate.

So long,
Thomas
Thomas Richter

2007-02-24, 6:55 pm

Thomas Richter wrote:

>
> That's the easiest choice and often appropriate.


Forgot one thing: If your alphabet is just two symbols (0 and 1), then
you cannot compress by huffman. This is because a huffman symbol is at
least one bit large...

So long,
Thomas
Phil Carmody

2007-02-25, 6:55 pm

Thomas Richter <thor@math.tu-berlin.de> writes:
> Thomas Richter wrote:
>
>
> Forgot one thing: If your alphabet is just two symbols (0 and 1), then
> you cannot compress by huffman. This is because a huffman symbol is at
> least one bit large...


Tunstall codes?

Phil
--
"Home taping is killing big business profits. We left this side blank
so you can help." -- Dead Kennedys, written upon the B-side of tapes of
/In God We Trust, Inc./.
Thomas Richter

2007-02-26, 3:56 am

Phil Carmody wrote:

>
>
> Tunstall codes?


My initial pick would have been arithmetic, but that would also work,
yes. If the probabilities are known in advance, a static code would do
the job.

So long,
Thomas
Hatard75

2007-03-10, 11:36 am

Catherine Zeta Jone Throatjob!
http://Catherine-Zeta-Jone-Throatjo...hp?movie=148803
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com