Home > Archive > Compression > November 2005 > The effect of encryption on compression ratios
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
The effect of encryption on compression ratios
|
|
| Adam M 2005-11-14, 9:55 pm |
| Which of the following statements is true:
1) Encrypting data makes it less "compressable"
2) Encrypting data makes it more "compressable"
3) Encrypting data has no effect on its compression ratio
It seems to me that #1 *should* be true, because the encrypted data should
have few (if any) recurring patterns that a compression algorithm could
exploit. However, I'm not sure if I could prove this to be true.
In particular, I'm talking about lossless (zlib) compression.
Essentially what I'm debating is what in order to compress and encrypt my
data.
Thanks,
Adam
| |
| Matt Mahoney 2005-11-14, 9:55 pm |
| Adam M wrote:
> Which of the following statements is true:
>
> 1) Encrypting data makes it less "compressable"
> 2) Encrypting data makes it more "compressable"
> 3) Encrypting data has no effect on its compression ratio
>
>
> It seems to me that #1 *should* be true, because the encrypted data should
> have few (if any) recurring patterns that a compression algorithm could
> exploit. However, I'm not sure if I could prove this to be true.
Encrypted data is not compressible at all. In order for an encryption
algorithm to be secure, it must be computationally indistinguishable
from a random oracle. This means that the encrypted data must be
indistinguishable from random by any test you could run in a million
CPU years. Compression is one such test. Compress your data before
you encrypt.
-- Matt Mahoney
| |
| Adam M 2005-11-15, 3:55 am |
| On Mon, 14 Nov 2005 19:26:59 -0800, Matt Mahoney wrote:
> Encrypted data is not compressible at all. In order for an encryption
> algorithm to be secure, it must be computationally indistinguishable
> from a random oracle. This means that the encrypted data must be
> indistinguishable from random by any test you could run in a million
> CPU years. Compression is one such test. Compress your data before
> you encrypt.
Hi Matt,
Excellent explanation, thanks.
-Adam
| |
| megagurka 2005-11-15, 3:55 am |
| Matt Mahoney wrote:
> Encrypted data is not compressible at all. In order for an encryption
> algorithm to be secure, it must be computationally indistinguishable
> from a random oracle.
Is this really true in general? Most encryption algorithms use a key
that is shorter than the redundancy in the plain text, thus it should
be possible to devise an encryption algorithm that kept some of the
redundancy in the encrypted data and still be secure against brute
force attacks.
/JN
| |
| Ben Rudiak-Gould 2005-11-15, 7:55 am |
| megagurka wrote:
>Matt Mahoney wrote:
>
>Is this really true in general? Most encryption algorithms use a key
>that is shorter than the redundancy in the plain text, thus it should
>be possible to devise an encryption algorithm that kept some of the
>redundancy in the encrypted data and still be secure against brute
>force attacks.
Any encryption algorithm produces ciphertext that is in principle almost as
compressible as the plaintext. But it's impossible for a compression
algorithm to find and exploit that redundancy in practice, if the cipher is
secure. This is pretty much the definition of a secure cipher.
-- Ben
| |
| nightlight 2005-11-16, 7:55 am |
| > Any encryption algorithm produces ciphertext that is in principle almost
> as compressible as the plaintext.
For compression to work at all one needs to restrict what kind of input
it gets. A set Gn of general n-symbol sequences is not compressible at
all and any existent compression algorithm will be on average an
expansion algorithm on Gn. If you had in mind such set (some set must
be assumed, at least implicitly, to attach an attribute
"compressible"), then it is trivially true that encrypted data is
equally compressible (i.e. incompressible) as the data from Gn.
On the other hand, if one is talking about a much smaller set Sn(C),
which is _all_ n symbol sequences which are compressible by a
compressor C, then the encryption will spread sequences from Sn(C),
outside into Gn uniformly (if the encryption is any good). Since any
replacement, much less mixing it uniformly with Gn, of a sequence from
Sn(C) by an outside sequence makes the new sequence incompressible
(otherwise this outside sequence would have been in Sn(C) by definition
of Sn(C)), the encrytion necessarily makes data less compressible by a
compressor C (which is any compressor). A good encryption makes it as
incompressible as a sequence from Gn, which means expandible on average.
| |
| Ignorant 2005-11-17, 3:55 am |
|
that is all nice but in a sysmmetric encryption routine where the
message size does not change where does the entropy hide?.
Is it that it has to be measured in a different way?. Then I can
compress with those new measures.?
No?
| |
| Matt Mahoney 2005-11-17, 6:55 pm |
| Ignorant wrote:
> that is all nice but in a sysmmetric encryption routine where the
> message size does not change where does the entropy hide?.
> Is it that it has to be measured in a different way?. Then I can
> compress with those new measures.?
> No?
Yes, you can compress as follows: try decrypting with all possible keys
and compressing the result. Save the best result along with the key
you used to obtain it. To decompress, you re-encrypt with the saved
key. Note that the compressed size of the encrypted text is the same
as the compressed plaintext plus the key.
The entropy is "hidden" by the fact that your compressor will be very
slow. If you can find a faster way, then by the random oracle model
criteria, the encryption is broken.
-- Matt Mahoney
| |
| Ben Rudiak-Gould 2005-11-17, 6:55 pm |
| nightlight wrote:
>Ben Rudiak-Gould wrote:
>
>For compression to work at all one needs to restrict what kind of input
>it gets. [...]
What I meant is that you can in principle compress the ciphertext to
(compressed plaintext, encryption algorithm, key), which is the same size as
the compressed plaintext up to an additive constant.
-- Ben
|
|
|
|
|