Home > Archive > Compression > May 2004 > Re: Calgary Compression Challenge update
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Re: Calgary Compression Challenge update
|
|
| xleobx@qmailcomq.com 2004-05-12, 9:28 pm |
| xleobx@qmailcomq.com wrote:
> The challenge entry (PAQ-based) by Alexander Ratushnyak is 637116 bytes.
> See http://www.mailcom.com/challenge/ for details.
It is 619922 now. Alexander used a little loophole in the rules. His latest
version does use less than 256 Mb of VM, but it is so close that a 256 Mb
(Linux) machine is thrashed and would have had no luck completing decompression
in 24 hours. But rules are rules: the challenge statement does not specify
the amount of RAM on a host computer, only the VM requirements.
Leo
| |
| Matt Mahoney 2004-05-12, 9:28 pm |
| <xleobx@qmailcomq.com> wrote in message
news:li_kc.91460$vn.256577@sea-read.news.verio.net...
> xleobx@qmailcomq.com wrote:
>
> It is 619922 now. Alexander used a little loophole in the rules. His
latest
> version does use less than 256 Mb of VM, but it is so close that a 256 Mb
> (Linux) machine is thrashed and would have had no luck completing
decompression
> in 24 hours. But rules are rules: the challenge statement does not specify
> the amount of RAM on a host computer, only the VM requirements.
>
> Leo
Very interesting program. It is derived from PAQ6 as was his previous
entry. The comments and white space are removed but it is still readable.
It seems the good compression is the result of mixing a very large number of
models with a huge number of parameters tuned to the Calgary corpus. The
number of context mixers was increased from 2 to 12, each with a different
context. Each mixer has an SSE stage on the output. Those outputs are
combined by a weighted summation which is passed again through 4 more SSE
stages in parallel, each selected by a different context and combined again
with the pre-SSE probability for arithmetic coding. The coder uses David A.
Scott's modification from PAQ32 and modified by Fabio Buffoni in PAQ605fb.
Overall there are 3 different models, one for pic (identified by file size),
one for binary files, and one for text files. Text files are identified by
the absence of bytes greater than 127. The models are as follows:
Charmodel - contexts of length 0 through 10, some tuning.
Matchmodel - matches up to 4 contexts of length at least 8 (same as PAQ6).
Recordmodel - identifies 2 fixed record lengths with 8 contexts in 2D, some
tuning.
Sparsemodel - 11 2-byte contexts from the last 5 bytes, some tuning.
Sparsemodel2 - 5 4-byte contexts from the last 18 bytes (introduced by Berto
Destasio)
Analogmodel - 4 contexts using the upper bits from the last 8 bytes. (I had
written this to model .wav and .bmp files, although there are none in the
Calgary corpus).
Picmodel - 33 2D contexts for modeling pic (bitmapped image). This includes
some 3 bit contexts and is much more extensive than in PAQ6.
Wordmodel - 6 contexts consisting of 1-2 of the last 3 whole words split on
either letters or white space.
The 4 binary files are each compressed separately from the 10 text files,
resulting in 5 archives, which still have a PAQ human readable header (as
with the previous 2 submissions). The source is split into 2 files (one for
the counter state tables) and compressed with RAR. These 6 files are then
all stored in a HA archive with no further compression.
-- Matt Mahoney
|
|
|
|
|