Code Comments
Programming Forum and web based access to our favorite programming groups.It seems to me that the next wave of compression algorithms must address the concept of "Random Data". By Random Data, I take as meaning a compressed file down to its finite limit. As an example a large txt file coded with say arithmetic coding. Finite, finito, no more free space, etc. On the issue of further successfully reducing a "randomised" file it appears that the "Counting argument", or the 16 pidgeons into 15 holes will not permit further compression. However, it is possible that this counting argument is being wrongly applied to compressed data. If we take truly random data then mathematically there is an argument that would seem to support the notion that this data is not compressible. Also, if we compare say the output of the above arithmetic coder it may look very similar and have character counts equivalent to that of a randomly generated file. There is of course, one very large and significant difference that I have not seen noted in any of the literature. This is that these two random files have completely different motivations. One is unpredictable and the other is completely predictable. Now I know that there will be at least a few who will argue that there is indeed no difference between the two files mentioned above. And will waste as many hours trying to convince the rest of of their conviction as those people who have spent many fruitless hours in the pursuit of compressing "Random Data". Thus being the proof for an argumentative outcome that it is impossible. But is it? It is obviously a desirable commodity. The BBC in England would surely like to get their little hands on such an algorithm for their newly proposed free to air broadcasting rather than use the networking of shared files. Think too, of all the avenues available to technology with such an advance in data compression. Pornography phones! No more phoning those 1900 numbers to talk dirty to some grandma kniting clothes for her grandchildren at some call centre in Bombay! Get what you pay for. Also, Global TV via th internet. The list goes on. I suppose one could call this the search for the Holy Grail. If it were found how much would it be worth? And, if someone did find it would he be able to keep it? For whosoever did find the clue to this question would have a dramatic impact on technology as we know it today. Perhaps, if some sort of value were put on this quest then a solution might suddenly surface! I shall write more on the fallacy of the counting argument in a few days and why I think that these proponents have stiffled creative thought in the field of data compression.
Post Follow-up to this messageOn Sat, 14 Aug 2004 11:49:14 GMT, "The Phantom" <The Phantom@ Ghost.net> wrote a whole lot of hooey that ended with: >I shall write more on the fallacy of the counting argument in a few days an d >why I think that these proponents have stiffled creative thought in the >field of data compression. Ooooh. I'm waiting with masturbated breath for you to BREAK OPEN the conspiracy of mathematics. -- Sev
Post Follow-up to this message"The Phantom" <The Phantom@ Ghost.net> wrote in news:411dfc39@news.comindico.com.au: > > It seems to me that the next wave of compression algorithms must > address the concept of "Random Data". By Random Data, I take as > meaning a compressed file down to its finite limit. As an example a > large txt file coded with say arithmetic coding. Finite, finito, no > more free space, etc. > > On the issue of further successfully reducing a "randomised" file it > appears that the "Counting argument", or the 16 pidgeons into 15 holes > will not permit further compression. However, it is possible that this > counting argument is being wrongly applied to compressed data. If we > take truly > ... Sorry the Counting argument is not wrong. I often have wondered why so many people delude themsevles into thinking such a thing is possible. Sure its possible in the sense the tooth fairy is real or that I could be the most powerful god in the universe. I think the reason people fail to grasp the obvious is this. No file is random in the sense I could define a compressor and assign a number to it and that write the file out. The problem is this. You have to tell or the program has to know how to assign the number. Take any optimal file compressor. You howevr can take output and decide to change it. Yes certain files will get samller but in the big picture you just reordering it. So I think people my see a compressed file and then so a pattern so they qucicly think its not optimal. One always has to keep in mind it reduced or compressed for a model and the model determines the reordering. Even k.. complexity depends on a model since one could befine a machine such that one bit writes out any file that some one else says has high complexity. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"
Post Follow-up to this message"David A. Scott" <daVvid_a_scott@email.com> wrote in message news:Xns954553EEA4D3H110W296LC45WIN3030R @130.133.1.4... > "The Phantom" <The Phantom@ Ghost.net> wrote in > news:411dfc39@news.comindico.com.au: > > .... > > Sorry the Counting argument is not wrong. ... Of course you are right. But you can't win an argument with a troll. -- Matt Mahoney
Post Follow-up to this message"Matt Mahoney" <matmahoney@yahoo.com> wrote in news:89xTc.20357$nx2.8666@newsread2.news.atl.earthlink.net: > "David A. Scott" <daVvid_a_scott@email.com> wrote in message > news:Xns954553EEA4D3H110W296LC45WIN3030R @130.133.1.4... > ... > > Of course you are right. But you can't win an argument with a troll. > > -- Matt Mahoney > Matt your not a troll. But I don't think I can win an argument with you either. But occasionly I have win people over to bijective lossless compression however not often since it requites a little more work but in the end you do save a few bytes. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"
Post Follow-up to this message"David A. Scott" <daVvid_a_scott@email.com> wrote in message news:Xns9545B8C25926CH110W296LC45WIN3030 R@130.133.1.4... > "Matt Mahoney" <matmahoney@yahoo.com> wrote in > news:89xTc.20357$nx2.8666@newsread2.news.atl.earthlink.net: > holes this > > > Matt your not a troll. But I don't think I can win an argument with > you either. But occasionly I have win people over to bijective lossless > compression however not often since it requites a little more work but > in the end you do save a few bytes. Yeah. Wha't s a few bytes when you can put the whole Internet on a floppy disk? :-) -- Matt Mahoney
Post Follow-up to this messageThe wrote: ) I shall write more on the fallacy of the counting argument in a few days a nd ) why I think that these proponents have stiffled creative thought in the ) field of data compression. There is no fallacy in the counting theorem. I shall adress your point quite simply and clearly, even though you have not actually given any arguments to support it, but have only asserted it. ) Now I know that there will be at least a few who will argue that there is ) indeed no difference between the two files mentioned above. And will waste ) as many hours trying to convince the rest of of their conviction as those ) people who have spent many fruitless hours in the pursuit of compressing ) "Random Data". Thus being the proof for an argumentative outcome that it i s ) impossible. Proof that a file compressed to the limit is the same as a randomly generated file: A compressor removes redundant information from a file. A 'perfect' compressor thus removes all redundancy from that file. A file compressed to the limit therefore has no redundancy. If you know the first N bits of a file, and you can conclude from those bits that the next bit is more likely to be 1 than 0 (or vice versa) then that next bit is slightly redundant. Therefore in a file without redundancy, each bit is equally likely to be 1 or 0. In a randomly generated file, each bit is equally likely to be 1 or 0. Therefore there is no difference between a randomly generated file and a file compressed to the limit. QED SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
Post Follow-up to this messageWillem <willem@stack.nl> wrote in message news:<slrnchvp3s.12qt.willem@toad.stack.nl>...[co lor=darkred] > The wrote: > ) I shall write more on the fallacy of the counting argument in a few days and > ) why I think that these proponents have stiffled creative thought in the > ) field of data compression. > > There is no fallacy in the counting theorem. > I shall adress your point quite simply and clearly, even though you have > not actually given any arguments to support it, but have only asserted it. > > ) Now I know that there will be at least a few who will argue that there i s > ) indeed no difference between the two files mentioned above. And will was te > ) as many hours trying to convince the rest of of their conviction as thos e > ) people who have spent many fruitless hours in the pursuit of compressing > ) "Random Data". Thus being the proof for an argumentative outcome that it is > ) impossible. > > Proof that a file compressed to the limit is the same as a randomly > generated file: > > A compressor removes redundant information from a file. > A 'perfect' compressor thus removes all redundancy from that file. > A file compressed to the limit therefore has no redundancy. > If you know the first N bits of a file, and you can conclude from those > bits that the next bit is more likely to be 1 than 0 (or vice versa) then > that next bit is slightly redundant. Therefore in a file without > redundancy, each bit is equally likely to be 1 or 0. > In a randomly generated file, each bit is equally likely to be 1 or 0. > Therefore there is no difference between a randomly generated file and a > file compressed to the limit. QED > > > SaSW, Willem[/color] If each bit is equally likely to be 1 or 0.. would it mean there will be almost 50% of one and 50% or zero ? If it so.. then we can do compression again 100c50.. is smaller then 2^100; (not counting length overhead yet) lol. Regards, Ray
Post Follow-up to this messageRay wrote: ) If each bit is equally likely to be 1 or 0.. would it mean there will ) be almost 50% of one and 50% or zero ? No, this is a logical fallacy. First, it is very *likely* to be *almost* 50/50, not exactly. Furthermore, it is not *impossible* that all bits will be zero. Question: If I flip a fair coin 10 times, and the first 9 were 3 heads and 6 tails, is the 10th flip more likely to be heads or tails ? SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
Post Follow-up to this messageWillem <willem@stack.nl> wrote in news:slrnci15j6.gci.willem@toad.stack.nl: > > Question: If I flip a fair coin 10 times, and the first 9 were 3 heads > and 6 tails, is the 10th flip more likely to be heads or tails ? > > > My gues is its more likely to be tails. Why because there is a chnace that the coin is not fair you only falsely assumed it to be fair. Second even if the coin fair you may not flip it fairly. But then again you knowing I would pick tails you may force a head just to have me guess wrong. Know if you writting say an entropy compressor based on a binary spilt I think the have to go someware between 50/50 and 60/30 David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.