Code Comments
Programming Forum and web based access to our favorite programming groups.Hy all, I want to open a discussion about this strange aspect of compression. We know in a binary string of length N we have 2^N different possible configurations. Mathematically it is highly improbable to compress a string of N bits in a string (for example) of N/2 bits becouse we can make a correspondence only for 2^(N/2) strings of a 2^N strings of length N. So the compression "difficulty" increase exponentially with the increasing of the compression ratio and with the increasing of the string length . This mathematical law is absolutely in contraddiction with everyday experiences. When I try to compress a file I am about sure to be able to compress it with a ratio about 50% and greater is the file more I believe I am able to obtain a great compression ratio ! We obtain this incredible result using so simple compression algorithm ( Is possible to write down a program of 10 lines and obtain gigabytes of incompressible data for these compressor ) also If we construct the most powerfull compression ( uncomputable ) this result is mathematically impossible ! How is this possible? How can we explain this paradox? I suggest 2 different possible answer . -1) The today compressor are not very powerfull as compressor but they are very specialized for our data. -2) For a string of N bits there are not 2^N possible strings available . In the first point I say that the compressor are able to make a correspondence from short string to long string becouse they are able to indentify a subset into the 2^N possible strings of used string . This mean that we are not able to use a lot of string , we are not able to use , to have , to see 2^N strings. In the second point I say that for a system receiving informations, sending informations there is a limit and this limit is not approximable using exponential function . I don't know if these explanation are correct ( but I think the working compressor can be a good proof ) but in both case we can place a limit on the available strings . For example I place a limit of M on the size of the available strings . For a string of size N the distribution of available strings become for example not 2^N but Min(2^N,M/N) Watching the behaviour of this formula and what happen in the real there are something similar. The compressor are not able to compress "small" string and in the small string we have an exponential behaviour so in this side we have a real mathematical behaviour so here the compressor are not able to compress becouse it is about impossible to compress bit strings. After a limit we have always a more different behaviour , increasing the string size the available strings decrease so the compressor can compress that string better and better and this is what happen when I have a big file I am about sure to compress it. When we have a string of length N we have again 2^N different possible configurations becouse we don't know the bits string configuration but when we get a string S we know this is one string on Min(2^N,M/N) available and not 2^N available. Roughly speaking thinking that all 2^N strings are available we are making the hypothesis we are working with very very incredible powerfull systems, I don't think so. There are also other interesting consequence of this point of view but I am out of topic ... Denis.
Post Follow-up to this messageOn Mar 31, 1:28 am, a.den...@yahoo.com wrote: > > I suggest 2 different possible answer . > > -1) The today compressor are not very powerfull as compressor but they > are very specialized for our data. > .. powerfull or not, for me it's hard to say, but with some specialization for some data types it's more effectivive - I agree with that. I'm sure there are still more things to do in data compression field and I hope somebody will find more effective way to compress data. > -2) For a string of N bits there are not 2^N possible strings > available . It's a basic of all binary data - definately there is 2^N possibilities for string of length N. Regards Apo
Post Follow-up to this message> -1) The today compressor are not very powerfull as compressor but they > are very specialized for our data. As you say, the trich is that they are specialized on compressing widely used data structures (text, waveforms, image frames etc). Generally, they are more powerful as they are more specialized for one (or more) data structures, or as they are more smart, and this cost in terms of memory and computing power. Some compressors approaches a compression ratio quite close to the expected (uncompressible) enthropy of the file, so are definitely powerful. > -2) For a string of N bits there are not 2^N possible strings > available . There are 2^N possible strings by definition, and powerful or not powerful compressor will perform poorly on most of them since no definite structure can be found, but perform well on non random ones, that are the ones meaningful for us. And the more the file grows big, the more the compression algorithm can learn about it and the more it can refine its strategy on the specifical data structure, so in practice you will tend to see better results as the file grows bigger... if the file has a sort of structure the compression algorithm can understand, otherwise the result will be equally poor.
Post Follow-up to this messageHello, > -2) For a string of N bits there are not 2^N possible strings > available . Nope, there surely are 2^N possible combinations > -1) The today compressor are not very powerfull as compressor but they > are very specialized for our data. Yes, this is the point ! The definition of 'powerful' for a compressor is quite strange, but you got the reason :) The fact is this: when we define a compressor, we are also defining what kind of input data we are willing to compress. When in zip-based algorithms we encode an already-seen strings with a pointer, we are stating: we are assuming our input to have some strings that repeat themselves more likely than others, and we will reduce the size of them by removing this redundancy. On the other side, we are also stating 'if our data isn't made this way, our algorithm won't compress'. But we probably don't care: in every message or data that is usually meaningful to us, there is some kind of redundancy like this that can be exploited. An exception could be of course already- compressed files, but this is because the redundancies have been already removed. At this point, compression works because we are assuming that our feasible inputs are only a small subset of the 2^N possible inputs, and these can be mapped to files much shorter than N bits long. A compressor will be implemented to 'chose' the inputs to be compressed that most likely reflect the data of our interest. Another example: image compression often uses assumptions of smoothness, as images that represent something to us usually contain some areas where the intensity of color (or of brightness) doesn't change suddenly. Since we use this assumptions, we are saying 'we will be able to compress images that have smooth areas; in other cases we will fail'. Alternatively, we could also design a compressor that compresses (of a few bits) images where close pixels are completely different one from another, but I thing this could hardly be useful to anybody. Hope I've been clear :) So long, Stefano
Post Follow-up to this messageIt is not 2^N You can exceed 2^N via this simple proceedure, albiet writ much larger than my method :P Take 3 bits for example, normally 2^3 = 8 But you can also substitute in 2^2 and 2^1 to have values as well, thus getting (2^4)-2 The practical problem is in tracking it. Writ large.... it's not so 'difficult' to do, but admittedly the odds of 'getting it to work' increase dramatically against you the more you wish to compress in this direction :P
Post Follow-up to this message> > There are 2^N possible strings by definition, and powerful or not > powerful compressor will perform poorly on most of them since no > definite structure can be found, but perform well on non random ones, > that are the ones meaningful for us. > And the more the file grows big, the more the compression algorithm > can learn about it and the more it can refine its strategy on the > specifical data structure, so in practice you will tend to see better > results as the file grows bigger... if the file has a sort of > structure the compression algorithm can understand, otherwise the > result will be equally poor. Yes, I agree with you but what I want to point out is that this is mathematically wrong ... or better what this mean is that the bit strings have not an arbitrary bit arrangement becouse in this case it is about impossible to compress them. So why not conclude saying the number of N bit strings length are M/N and not 2^N . It seem to me we make a bigger approximation to use 2^N as a number of available string instead of M/N or other functions but not exponential . Denis.
Post Follow-up to this messageOn 1 Apr, 03:40, Einstein <michae...@gmail.com> wrote: > It is not 2^N > > You can exceed 2^N via this simple proceedure, albiet writ much larger > than my method :P > > Take 3 bits for example, normally 2^3 = 8 > > But you can also substitute in 2^2 and 2^1 to have values as well, > thus getting (2^4)-2 > > The practical problem is in tracking it. Writ large.... it's not so > 'difficult' to do, but admittedly the odds of 'getting it to work' > increase dramatically against you the more you wish to compress in > this direction :P ? In a string of N bits you can have 2^N possible configurations
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.