Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Compression Paradox
Hy all,
I want to open a discussion about this strange aspect of compression.
We know in a binary string of length N we have 2^N different possible
configurations.
Mathematically it is highly improbable to compress a string of N bits
in a string (for example) of N/2 bits becouse we can make a
correspondence only for 2^(N/2) strings of a 2^N strings of length N.
So the compression "difficulty" increase exponentially with the
increasing of the compression ratio and with the increasing of the
string length .
This mathematical law is absolutely in contraddiction with everyday
experiences.
When I try to compress a file I am about sure to be able to compress
it with a ratio about 50% and greater is the file more I believe I am
able to obtain a great compression ratio !
We obtain this incredible result using so simple compression algorithm
( Is possible to write down a program of 10 lines and obtain gigabytes
of incompressible data for these compressor ) also
If we construct the most powerfull compression ( uncomputable ) this
result is mathematically impossible !

How is this possible?

How can we explain this paradox?

I suggest 2 different possible answer .

-1) The today compressor are not very powerfull as compressor but they
are very specialized for our data.

-2) For a string of N bits there are not 2^N possible strings
available .


In the first point I say that the compressor are able to make a
correspondence from short string to long string becouse they are able
to indentify a subset into the  2^N possible strings of used string .
This mean that we are not able to use a lot of string , we are not
able to use , to have , to see 2^N strings.
In the second point I say that for a system receiving informations,
sending informations there is a limit and this limit is not
approximable using exponential function .
I don't know if these explanation are correct ( but I think the
working compressor can be a good proof ) but in both case we can place
a limit on the available strings .
For example I place a limit of M on the size of the available
strings .
For a string of size N the distribution of available strings become
for example not 2^N but
Min(2^N,M/N)
Watching the behaviour of this formula and what happen in the real
there are something similar.
The compressor are not able to compress "small" string and in the
small string we have an exponential behaviour so in this side we have
a real mathematical behaviour so here the compressor are not able to
compress becouse it is about impossible to compress bit strings.
After a limit we have always a more different behaviour , increasing
the string size the available strings decrease so the compressor can
compress that string better and better and this is what happen when I
have a big file I am about sure to compress it.
When we have a string of length N we have again 2^N different possible
configurations becouse we don't know the bits string configuration but
when we get a string S we know this is one string on Min(2^N,M/N)
available and not 2^N available.
Roughly speaking thinking that all 2^N strings are available we are
making the hypothesis we are working with very very incredible
powerfull systems, I don't think so.
There are also other interesting consequence of this point of view but
I am out of topic ...

Denis.

Report this thread to moderator Post Follow-up to this message
Old Post
a.denis1@yahoo.com
03-31-08 12:03 AM


Re: Compression Paradox
On Mar 31, 1:28 am, a.den...@yahoo.com wrote:
>
> I suggest 2 different possible answer .
>
> -1) The today compressor are not very powerfull as compressor but they
> are very specialized for our data.
>
.. powerfull or not, for me it's hard to say, but with some
specialization for
some data types it's more effectivive - I agree with that.
I'm sure there are still more things to do in data compression field
and
I hope somebody will find more effective way to compress data.

> -2) For a string of N bits there are not 2^N possible strings
> available .

It's a basic of all binary data - definately there is 2^N
possibilities
for string of length N.


Regards

Apo

Report this thread to moderator Post Follow-up to this message
Old Post
Apo
03-31-08 12:57 PM


Re: Compression Paradox
> -1) The today compressor are not very powerfull as compressor but they
> are very specialized for our data.
As you say, the trich is that they are specialized on compressing
widely used data structures (text, waveforms, image frames etc).
Generally, they are more powerful as they are more specialized for one
(or more) data structures, or as they are more smart, and this cost in
terms of memory and computing power.
Some compressors approaches a compression ratio quite close to the
expected (uncompressible) enthropy of the file, so are definitely
powerful.

> -2) For a string of N bits there are not 2^N possible strings
> available .
There are 2^N possible strings by definition, and powerful or not
powerful compressor will perform poorly on most of them since no
definite structure can be found, but perform well on non random ones,
that are the ones meaningful for us.
And the more the file grows big, the more the compression algorithm
can learn about it and the more it can refine its strategy on the
specifical data structure, so in practice you will tend to see better
results as the file grows bigger... if the file has a sort of
structure the compression algorithm can understand, otherwise the
result will be equally poor.

Report this thread to moderator Post Follow-up to this message
Old Post
giorgio.tani
04-01-08 12:02 AM


Re: Compression Paradox
Hello,

> -2) For a string of N bits there are not 2^N possible strings
> available .

Nope, there surely are 2^N possible combinations

> -1) The today compressor are not very powerfull as compressor but they
> are very specialized for our data.

Yes, this is the point ! The definition of 'powerful' for a compressor
is quite strange, but you got the reason :)

The fact is this: when we define a compressor, we are also defining
what kind of input data we are willing to compress. When in zip-based
algorithms we encode an already-seen strings with a pointer, we are
stating: we are assuming our input to have some strings that repeat
themselves more likely than others, and we will reduce the size of
them by removing this redundancy. On the other side, we are also
stating 'if our data isn't made this way, our algorithm won't
compress'. But we probably don't care: in every message or data that
is usually meaningful to us, there is some kind of redundancy like
this that can be exploited. An exception could be of course already-
compressed files, but this is because the redundancies have been
already removed.

At this point, compression works because we are assuming that our
feasible inputs are only a small subset of the 2^N possible inputs,
and these can be mapped to files much shorter than N bits long. A
compressor will be implemented to 'chose' the inputs to be compressed
that most likely reflect the data of our interest. Another example:
image compression often uses assumptions of smoothness, as images that
represent something to us usually contain some areas where the
intensity of color (or of brightness) doesn't change suddenly. Since
we use this assumptions, we are saying 'we will be able to compress
images that have smooth areas; in other cases we will fail'.
Alternatively, we could also design a compressor that compresses (of a
few bits) images where close pixels are completely different one from
another, but I thing this could hardly be useful to anybody.

Hope I've been clear :)

So long,
Stefano

Report this thread to moderator Post Follow-up to this message
Old Post
Stefano Brocchi
04-01-08 12:03 AM


Re: Compression Paradox
It is not 2^N

You can exceed 2^N via this simple proceedure, albiet writ much larger
than my method :P

Take 3 bits for example, normally 2^3 = 8

But you can also substitute in 2^2 and 2^1 to have values as well,
thus getting (2^4)-2


The practical problem is in tracking it. Writ large.... it's not so
'difficult' to do, but admittedly the odds of 'getting it to work'
increase dramatically against you the more you wish to compress in
this direction :P

Report this thread to moderator Post Follow-up to this message
Old Post
Einstein
04-01-08 08:56 AM


Re: Compression Paradox
>
> There are 2^N possible strings by definition, and powerful or not
> powerful compressor will perform poorly on most of them since no
> definite structure can be found, but perform well on non random ones,
> that are the ones meaningful for us.
> And the more the file grows big, the more the compression algorithm
> can learn about it and the more it can refine its strategy on the
> specifical data structure, so in practice you will tend to see better
> results as the file grows bigger... if the file has a sort of
> structure the compression algorithm can understand, otherwise the
> result will be equally poor.

Yes, I agree with you but what I want to point out is that this is
mathematically wrong ... or better what this mean is that the bit
strings  have not an arbitrary bit arrangement becouse in this case it
is about impossible to compress them.
So why not conclude saying the number of N bit strings length are M/N
and not 2^N .
It seem to me we make a bigger approximation to use 2^N as a number of
available string instead of M/N or other  functions but not
exponential .

Denis.

Report this thread to moderator Post Follow-up to this message
Old Post
a.denis1@yahoo.com
04-01-08 11:58 PM


Re: Compression Paradox
On 1 Apr, 03:40, Einstein <michae...@gmail.com> wrote:
> It is not 2^N
>
> You can exceed 2^N via this simple proceedure, albiet writ much larger
> than my method :P
>
> Take 3 bits for example, normally 2^3 = 8
>
> But you can also substitute in 2^2 and 2^1 to have values as well,
> thus getting (2^4)-2
>
> The practical problem is in tracking it. Writ large.... it's not so
> 'difficult' to do, but admittedly the odds of 'getting it to work'
> increase dramatically against you the more you wish to compress in
> this direction :P

?
In a string of N bits you can have 2^N possible configurations

Report this thread to moderator Post Follow-up to this message
Old Post
a.denis1@yahoo.com
04-01-08 11:58 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Compression archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 04:32 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.