Home > Archive > Compression > November 2004 > ziv-lempel complexity
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
ziv-lempel complexity
|
|
|
| I am trying to classify data streams into Compressible or not.
If it is compressible, the data stream will be compressed by using Zlib.
Otherwise, it is transfered in clear.
Entropy test (such as ENT) do not fit, because Zlib using a variant of LZ77.
Kolmogorov complexity seems not to be a pratical solution, either.
I know V.42bis do such compressiblity test, but I don't know how it was done.
Can anybody give me a pointer on this? Shall I use ziv-lemple complexity here?
Is there any live example on this?
Thanks
| |
| Mike B 2004-11-17, 8:55 pm |
| Surely if you are going to process the data stream to measure entropy,you
may as well just do a deflate on it anyway, and then decide based on the
compressed size whether to store or use the compressed stream?
regards
Mike
"John" <johncrane@yahoo.com> wrote in message
news:9a7b51f1.0411171123.70567613@posting.google.com...
> I am trying to classify data streams into Compressible or not.
> If it is compressible, the data stream will be compressed by using Zlib.
> Otherwise, it is transfered in clear.
>
> Entropy test (such as ENT) do not fit, because Zlib using a variant of
LZ77.
> Kolmogorov complexity seems not to be a pratical solution, either.
> I know V.42bis do such compressiblity test, but I don't know how it was
done.
>
> Can anybody give me a pointer on this? Shall I use ziv-lemple complexity
here?
> Is there any live example on this?
> Thanks
| |
| James K. 2004-11-21, 8:55 am |
| On 17 Nov 2004 11:23:12 -0800, johncrane@yahoo.com (John) wrote:
>I am trying to classify data streams into Compressible or not.
>If it is compressible, the data stream will be compressed by using Zlib.
>Otherwise, it is transfered in clear.
>
>Entropy test (such as ENT) do not fit, because Zlib using a variant of LZ77.
>Kolmogorov complexity seems not to be a pratical solution, either.
>I know V.42bis do such compressiblity test, but I don't know how it was done.
>
>Can anybody give me a pointer on this? Shall I use ziv-lemple complexity here?
>Is there any live example on this?
>Thanks
If you want to know the theoretical bound of compression, you may want
to check the entropy rate as well as the entropy of each symbol. Note
that LZ algorithm is a sort of variable-to-block coding algorithms.
BR,
------
James K. (txdiversity@hotmail.com)
[Home] http://home.naver.com/txdiversity
| |
| Matt Mahoney 2004-11-21, 3:55 pm |
| "John" <johncrane@yahoo.com> wrote in message
news:9a7b51f1.0411171123.70567613@posting.google.com...
> I am trying to classify data streams into Compressible or not.
> If it is compressible, the data stream will be compressed by using Zlib.
> Otherwise, it is transfered in clear.
>
> Entropy test (such as ENT) do not fit, because Zlib using a variant of
LZ77.
> Kolmogorov complexity seems not to be a pratical solution, either.
> I know V.42bis do such compressiblity test, but I don't know how it was
done.
>
> Can anybody give me a pointer on this? Shall I use ziv-lemple complexity
here?
> Is there any live example on this?
> Thanks
There is no test that can tell you if data is compressible. Kolmogorov
complexity is not only impractical to measure, but provably impossible.
Here is an example: an encrypted stream of 0 bits appears statistically
random and will not compress by any algorithm that doesn't know the key.
The most practical solution to your problem is to try compressing with zlib
and see if the output is smaller. Actually you don't need to output data
for this test, just simulate the compression front end. Build an index of
trigrams over a 32K sliding window using a hash table, and test whether the
frequency of matches is greater than random.
-- Matt Mahoney
| |
|
| Thanks Matt.
Could you give a further pointer? Which book, paper, or source code shall I read?
> There is no test that can tell you if data is compressible. Kolmogorov
> complexity is not only impractical to measure, but provably impossible.
> Here is an example: an encrypted stream of 0 bits appears statistically
> random and will not compress by any algorithm that doesn't know the key.
>
> The most practical solution to your problem is to try compressing with zlib
> and see if the output is smaller. Actually you don't need to output data
> for this test, just simulate the compression front end. Build an index of
> trigrams over a 32K sliding window using a hash table, and test whether the
> frequency of matches is greater than random.
>
> -- Matt Mahoney
| |
| Matt Mahoney 2004-11-22, 8:55 pm |
|
"John" <johncrane@yahoo.com> wrote in message
news:9a7b51f1.0411221041.77c61bf6@posting.google.com...
> Thanks Matt.
> Could you give a further pointer? Which book, paper, or source code shall
I read?
Bell, Witten and Cleary, "Modeling for text compression", 1989, is a classic
introduction to data compression.
[color=darkred]
>
>
zlib[color=darkred]
data[color=darkred]
of[color=darkred]
the[color=darkred]
|
|
|
|
|