For Programmers: Free Programming Magazines  


Home > Archive > Compression > December 2005 > Re: Repeated compression of previously compressed data is not impossible,









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: Repeated compression of previously compressed data is not impossible,
Ben Rudiak-Gould

2005-12-11, 6:55 pm

jules.stocks@gmail.com wrote:
> Also I am considering getting together with a few friends to
> videotape a compression excercise photgraphed from several viewpoints,
> first showing the machines and reading the model numbers, doing a
> compression, then moving a floppy to a second machine, and reproducing
> a large file, etc. Obviously we'd have only one machine on at a time.
> The machines would not have ethernet cards, etc...


I'm afraid I can't see the point of this. Surely you can see that it's
trivially easy to fake such a test. Anyone gullible enough to believe you
because of the video would be gullible enough to believe you without it.

To convince me of a claim, you have to link the claim to me via a trust
network. For example, if I trust James Randi to detect deception, and James
Randi certifies that there's no deception in your test, then I will believe
that there's no deception in your test. There's no reason for me to believe
or disbelieve anything based on the claims of a stranger on Usenet. Of
course, mathematically precise statements are different, because I can
verify them independently.

> I've been a programmer for 40 years now and
> it's hard to accept the notion that a large file, say in excess of 1GB
> can be completely represented in as little as 25 bytes. (Notice that
> this does not mean that their are only 2^(25*8) possible files in the
> whole wide world because repeatable compression systems operate by
> *repeating* compression/ecompression cycles, and the 'final' process is
> final based on operator determination.) So N, the number of cycles
> used to compress/decompress becomes a factor in calculating the
> possible number of resulting target files. (See below where I comment
> on 'N'.)


.... and below you say ...

> My point is that N is not small at all.


It's easy to make a compressor satisfying this property. For example,
suppose I identify files with natural numbers by the following bijection:

"" <-> 0
"00" <-> 1
"01" <-> 2
... ...
"FF" <-> 256
"0000" <-> 257
... ...
"FFFF" <-> 65792
... ...

My compression algorithm consists of "subtracting 1" from the file if it's
not the empty file, and failing to compress if it's the empty file. By
iterating this algorithm I can reversibly compress any file down to a length
of 0, though the number of compression steps required will not be small at all.

For your claim to be interesting you have to bound the size of N in some
way. For example, if you can prove that your N is bounded by any polynomial
in the length of the file, then you've got something. If you believe that
you can prove this, please say so.

What you don't seem to understand is that when people talk about compressed
size on this group, they're talking about all of the data you need to
reconstitute the original file. That includes the number N.

> My latest RAD-cpx uses internal 'micro-cycles' and can also be used in
> conjunction with several physical passes; Usually a small block (128b)
> is associated with about five to ten micro-cycles, while a large block
> (>128kb) would need more than 100 micro-cycles. Each micro-cycle
> produces one extra nibble. But the gain from this process, after
> application, is that the client buffer is no longer random but is
> (typically, for a small buffer,) about 90% uniform, and a large buffer
> is usually more than 98% uniform. (Yes, I understand that these
> statements fly in the face of CS and mathematics, but this is exactly
> the truth!)


They don't fly in the face of anything, because they aren't precise enough.
Replace the words "usually" and "typically" with hard bounds, please.

There's nothing surprising or revolutionary about a program that can
"usually" or "typically" transform a buffer to make it more compressible.
The Burrows-Wheeler transform does this.

> The buffer size is unchanged (actually, because of the required
> nibbles, it's bigger!,) but any conventional compressor, on seeing a
> file containing mostly one's or zero's, quickly reduces it's size.
>
> When I consider the above I understand how difficult it is for a
> stranger to accept my remarks at face value -- and I don't think I'd
> believe someone saying things like this either.


No, it's very easy to believe, just not interesting. So far all of the
precise, verifiable claims you've made about your compressor have been
either trivial or provably wrong.

-- Ben
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com