For Programmers: Free Programming Magazines  


Home > Archive > Compression > May 2004 > Compressing compressed files?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Compressing compressed files?
rep_movsd

2004-05-12, 9:28 pm

I have a really crazy idea:

Supposing you took a compressed file (any format) and pass it to a
bijective decompressor,In principle the data should become less
random. After that you can recompress the data with a superior
algorithm.

My reasoning is as follows:
The set of input data for most general purpose compression algorithms
tends to overlap. Within a certain margin , the set of compressed
files overlaps too.

I tried this out on a couple of files with David Scott's BICOM but I
didn't get any positive result.

My idea was that since PPMII does 40 - 50 % better than ZIP and there
are large number of compressed formats using lz77 , one can convert
compress these files without needing to decompress them first or
knowing the exact file format.

Any holes in my theory?
Matt Mahoney

2004-05-12, 9:28 pm


"rep_movsd" <rep_movsd@yahoo.co.in> wrote in message
news:85e8883c.0405041116.13e556bd@posting.google.com...
> I have a really crazy idea:
>
> Supposing you took a compressed file (any format) and pass it to a
> bijective decompressor,In principle the data should become less
> random. After that you can recompress the data with a superior
> algorithm.


You would be better off decompressing with the same algorithm used to
compress. Otherwise it will just look like random data and you won't get
anywhere no matter what you do.

-- Matt Mahoney


David A. Scott

2004-05-12, 9:28 pm

rep_movsd@yahoo.co.in (rep_movsd) wrote in
news:85e8883c.0405041116.13e556bd@posting.google.com:

> I have a really crazy idea:
>
> Supposing you took a compressed file (any format) and pass it to a
> bijective decompressor,In principle the data should become less
> random. After that you can recompress the data with a superior
> algorithm.
>


Usually this does not lead to better compression. For several reason.
One you usually have extra data added by the compressor used at start
better to start with clean so the extra data not used. However for
encryption I often use several bijective transforms before the final
encryption pass.

> My reasoning is as follows:
> The set of input data for most general purpose compression algorithms
> tends to overlap. Within a certain margin , the set of compressed
> files overlaps too.
>
> I tried this out on a couple of files with David Scott's BICOM but I
> didn't get any positive result.
>


I like BICOM buts it Matt Timmermans bijective compressor with full
CBC AES encryption that is fully bijective. He is a Candian and they have
more freedom in writting encryption code. Americans aren't trusted by
the government to write encryption code and soon we will not be allowed
to have guns any more at which time we might as well surrender to the
Moslems and there cult like view of god.

> My idea was that since PPMII does 40 - 50 % better than ZIP and there
> are large number of compressed formats using lz77 , one can convert
> compress these files without needing to decompress them first or
> knowing the exact file format.
>


Best to uncompress first with orignal compressor than compress with
the compressor of your choice if compress is your goal.

> Any holes in my theory?
>


Here is an example. Its a bad example but simple. Say you have a file
100k bytes long with a certain entropy based on some trusted model. When
you compress with a typical compressor you shorten the file but due to
headers and such found in typical compressor you add some entropy.
When you decompress with a bijective compress there is no entropy change.
The file may be longer than 100k or shorter take you pick but is has
more entropy than the starting file.
You now use your superior compress method. Not matter how small you
compress it it will still have to have the original entropy plus that
added by the first compressor. If you started with the original file you
have less entropy to work with and it should compress smaller.

That being said you could design a poor compressor that works like crap
on file X but if you compress with Y and then decompress with a chosen
bijective decompressor to Z its possible for this crappy compressor to
work better on this combination but its designed that way. Sooner ot later
when you get to good compressors the entropy argument is going to come into
play.


David A. Scott
--
My Crypto code
http://cryptography.org/cgi-bin/cry...sc/scott19u.zip
http://cryptography.org/cgi-bin/cry...sc/scott16u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
David A. Scott

2004-05-12, 9:28 pm

"Matt Mahoney" <matmahoney@yahoo.com> wrote in
news:rpUlc.4742$a47.2475@newsread3.news.atl.earthlink.net:

>
> "rep_movsd" <rep_movsd@yahoo.co.in> wrote in message
> news:85e8883c.0405041116.13e556bd@posting.google.com...
>
> You would be better off decompressing with the same algorithm used to
> compress. Otherwise it will just look like random data and you won't get
> anywhere no matter what you do.
>
> -- Matt Mahoney
>
>
>


Matt it may look like random data to the human mind depending on
witch bijective decompressor was used. But to most random data means not
compressable. Most times decompressing a typical compressed file leads to a
large file. That is not random since it will be easily compressed by many
compressors to small size. Unfortunately this small size is usually larger
then the file before the bijective decompressor expanded it.


David A. Scott
--
My Crypto code
http://cryptography.org/cgi-bin/cry...sc/scott19u.zip
http://cryptography.org/cgi-bin/cry...sc/scott16u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com