For Programmers: Free Programming Magazines  


Home > Archive > Compression > February 2005 > general but interesting question for me









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author general but interesting question for me
Michael

2005-02-08, 3:55 pm

Well,
I would like to ask you, whether someone of you knows the answer to
the following question:

Assuming you have a bit sequence of, let's say 65536 Bits (8 kb)
This would mean, the amount of possibly occuring data is 2^65536
bitcombinations.

The question I would like to know is: Does anybody of you know, how
much of this data (in percentage) would be compressible with a common
compression algorithm like zip or so (maybe including header) ?

There must be also some data that means no compression but also no
increasing size of data, how much (in percent) data would it be ?

It would be great if anyone of you could answer this question....
Willem

2005-02-08, 3:55 pm

Michael wrote:
) Well,
) I would like to ask you, whether someone of you knows the answer to
) the following question:
)
) Assuming you have a bit sequence of, let's say 65536 Bits (8 kb)
) This would mean, the amount of possibly occuring data is 2^65536
) bitcombinations.
)
) The question I would like to know is: Does anybody of you know, how
) much of this data (in percentage) would be compressible with a common
) compression algorithm like zip or so (maybe including header) ?

Easy.

Less than 1/256 of all bitcombinations can be compressed by one byte.
Less than 1/65536 '' '' '' two bytes.

Etcetera. See the pattern ?

) There must be also some data that means no compression but also no
) increasing size of data, how much (in percent) data would it be ?
)
) It would be great if anyone of you could answer this question....


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Phil Carmody

2005-02-08, 3:55 pm

westpdmkids@yahoo.de (Michael) writes:
> Well,
> I would like to ask you, whether someone of you knows the answer to
> the following question:
>
> Assuming you have a bit sequence of, let's say 65536 Bits (8 kb)
> This would mean, the amount of possibly occuring data is 2^65536
> bitcombinations.
>
> The question I would like to know is: Does anybody of you know, how
> much of this data (in percentage) would be compressible with a common
> compression algorithm like zip or so (maybe including header) ?


Without specifiying the compression algorithm, it's impossible to say.
However, one thing's universally true - it will be a tiny percentage.
Practically none. Even if you come up with a deliberately contrived
compression algorithm to maximise the percentage, you still can't even
get 1/256 of the files to compress by one byte.

> There must be also some data that means no compression but also no
> increasing size of data, how much (in percent) data would it be ?


Without specifying the compression algorithm, it's again impossible to
say. However, as ZIP is sub-optimal in several ways, the vast majority
of files will expand.

Phil
--
If a religion is defined to be a system of ideas that contains unprovable
statements, then Godel taught us that mathematics is not only a religion, it
is the only religion that can prove itself to be one. -- John Barrow
Matt Mahoney

2005-02-08, 8:55 pm

Michael wrote:
> Well,
> I would like to ask you, whether someone of you knows the answer to
> the following question:
>
> Assuming you have a bit sequence of, let's say 65536 Bits (8 kb)
> This would mean, the amount of possibly occuring data is 2^65536
> bitcombinations.
>
> The question I would like to know is: Does anybody of you know, how
> much of this data (in percentage) would be compressible with a common
> compression algorithm like zip or so (maybe including header) ?
>
> There must be also some data that means no compression but also no
> increasing size of data, how much (in percent) data would it be ?
>
> It would be great if anyone of you could answer this question....


In general, you can compress all but 1 of the 2^n bit strings of length
n, assuming you don't also need to compress strings of other lengths.
The following code will do it: drop any trailing 0 bits, then drop the
trailing 1 bit. To uncompress, append a 1 bit and then append 0 bits
until the length is n. A string of all 0 bits won't compress this way,
so code it as itself. It can be proven by the counting argument that
this is the best you can do, although you can choose which one of the
2^n strings won't compress.

If you need to compress all 2^(n+1)-1 strings up to n bits, then the
best you can do is to compress 2^(m+1)-2 strings to m bits or less, m <
n. The proof also uses the counting argument: there are 2^(m+1)-1
codes length 0 to m, and you need to reserve at least one code to flag
the remaining strings.

Either way, the number of strings that could be compressed
significantly is tiny. You could only compress less than 1/256 of
files by 1 byte regardless of the algorithm.

-- Matt Mahoney

Michael

2005-02-09, 3:55 pm

ehrm..I'm somehow ....

does it mean that

a) 1/256 of all possible bit combinations can't be compressed ?
b) 1/256 of all possible bit combinations CAN be compressed ?

If I understood right, both answers were given :)

If b) is the case, doesn't it mean, all lossless compression
algorithms are damn unefficient ?


To clear confusions up, I may have done...:
I meant, create files for each bitcombination possible.
1st File : 00000000......0001
2nd File : 00000000......0010
....
....
last file : 11111111....1111
How many of them would mean compression and how many of them not ?

If Zip isn't the best one to choose (I'm not that familar with
compression at all), you can use another one for this "experiment" :)
..

Thanks for your answers...
Phil Carmody

2005-02-09, 3:55 pm

westpdmkids@yahoo.de (Michael) writes:

> ehrm..I'm somehow ....


We were answering slightly different questions.

> does it mean that
>
> a) 1/256 of all possible bit combinations can't be compressed ?


Matt's answer, IIRC.

True if the size of the input file is fixed, and the size of the
output file, to bit-granularity, can be transmitted for free.
e.g.
0 -> 0 1 <65534 0s>
00 -> 0 0 1 <65533 0s>
cannot be distinguished unless you can transmit the lengths 1 and 2
for free. The way most people exchange data, this transmission is
_not_ for free. Thus this answer is unlikely to be relevant.

> b) 1/256 of all possible bit combinations CAN be compressed ?


My answer (and one other's too). Applies when the size of the
input file is not fixed.

> If I understood right, both answers were given :)
>
> If b) is the case, doesn't it mean, all lossless compression
> algorithms are damn unefficient ?


Without defining efficiency, it's not a meaningful question.
If you want to know whether they almost never suceed in making
the data smaller, then the answer's yes.

> To clear confusions up, I may have done...:
> I meant, create files for each bitcombination possible.
> 1st File : 00000000......0001
> 2nd File : 00000000......0010
> ...
> ...
> last file : 11111111....1111
> How many of them would mean compression and how many of them not ?


Generate a million random files. How many did your favourite compress?
1/2? 1/256? Fewer? None?

> If Zip isn't the best one to choose (I'm not that familar with
> compression at all), you can use another one for this "experiment" :)


Why don't you do the experiment yourself. Most of us know exactly
what the results will be, and don't need to experiment.

Phil
--
If a religion is defined to be a system of ideas that contains unprovable
statements, then Godel taught us that mathematics is not only a religion, it
is the only religion that can prove itself to be one. -- John Barrow
guenther vonKnakspott

2005-02-09, 3:55 pm

Michael wrote:
> ehrm..I'm somehow ....
>
> does it mean that
>
> a) 1/256 of all possible bit combinations can't be compressed ?
> b) 1/256 of all possible bit combinations CAN be compressed ?


b)
Consider: if there are K=2^n files of n bits then there are 2^(n-1)
that is K/2 files of length n-1 (a bit shorter) 2^(n-2)=K/4 files of
length n-2, etc. and therefore K/256=2^(n-8) files one byte shorter.
That means that there can be at most K/256 "compressed" files.

> If I understood right, both answers were given :)
>
> If b) is the case, doesn't it mean, all lossless compression
> algorithms are damn unefficient ?


No that means that any algorithm which can actually compress 1/256 of
all possible 8kb files is damn efficient, being as good as it can get.

> If Zip isn't the best one to choose (I'm not that familar with
> compression at all), you can use another one for this "experiment" :)


The four response given to you are all correct regardless of the
algorithm, and reflect the best possible case.

regards.

Teo van der Vlies

2005-02-09, 8:55 pm

That depends of the propability of '1' and '0' bits.
If the data is random and the amount of '1' and '0' bits is almost equal;
compression will be weak.
If the same pattern is repeated many times; compression can be very strong.

Teo.



"Michael" <westpdmkids@yahoo.de> wrote in message
news:4c6bc8b8.0502080331.70ad6142@posting.google.com...
> Well,
> I would like to ask you, whether someone of you knows the answer to
> the following question:
>
> Assuming you have a bit sequence of, let's say 65536 Bits (8 kb)
> This would mean, the amount of possibly occuring data is 2^65536
> bitcombinations.
>
> The question I would like to know is: Does anybody of you know, how
> much of this data (in percentage) would be compressible with a common
> compression algorithm like zip or so (maybe including header) ?
>
> There must be also some data that means no compression but also no
> increasing size of data, how much (in percent) data would it be ?
>
> It would be great if anyone of you could answer this question....



guenther vonKnakspott

2005-02-09, 8:55 pm


Teo van der Vlies wrote:
> That depends of the propability of '1' and '0' bits.
> If the data is random and the amount of '1' and '0' bits is almost

equal;
> compression will be weak.
> If the same pattern is repeated many times; compression can be very

strong.
>
> Teo.


Are you blind or just stoned? Perhaps victim of a gouda poisoning?
After one clear question and 6 correct explanatory responses, you feel
compelled to post five lines of nothing?

Jeeezes!

Michael

2005-02-10, 3:55 pm

"guenther vonKnakspott" <guenther.vonKnakspott@gmx.de> wrote in message news:<1107984774.532952.256650@z14g2000cwz.googlegroups.com>...
> Teo van der Vlies wrote:
> equal;
> strong.
>
> Are you blind or just stoned? Perhaps victim of a gouda poisoning?
> After one clear question and 6 correct explanatory responses, you feel
> compelled to post five lines of nothing?
>
> Jeeezes!



:)

Anyway, thanks for you answers. I am now enlightened ;)
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com