Code Comments
Programming Forum and web based access to our favorite programming groups.Hi all, I'm looking for the best way to post-process a bwt stream to get the best order 0 or 1 result. So far i've tried bwt+rle+mtf, and distance coding (DC). And got book1 from calgary corpus down to +/- 230.000 with my bwt+dc implementation using an order 0 arithmetic coder. But in this paper: http://sun.iinf.polsl.gliwice.pl/~sdeor/pub/deo01.pdf, they claim DC can bring book1 down to 2.241 bpp. Which would be (2.241/8)*768.771=215.352 bytes. Does anybody know exactly how they code their distances (i use 1 byte up to 254 and a flag byte + 3 byte distance for the rest)? (Or what other transformations they add to boost the performance?) This paper also mentions that wfc might be a better method to use, but the description they gave was pretty cryptic and i couldn't find any other comprehensible information on it. I do understand it is basically an adapted mtf coder which uses multiple weights from a local context. And i'm sure it's not that hard when you get how it works, but right now i haven't been able to implement it. Does anybody know a link to a comprehensive article on this? Any help with this will be much appreciated. Best regards, Rick
Post Follow-up to this messageRick D. wrote: > Hi all, > > I'm looking for the best way to post-process a bwt stream to get the > best order 0 or 1 result. So far i've tried bwt+rle+mtf, and distance > coding (DC). And got book1 from calgary corpus down to +/- 230.000 > with my bwt+dc implementation using an order 0 arithmetic coder. > > But in this paper: > http://sun.iinf.polsl.gliwice.pl/~sdeor/pub/deo01.pdf, > > they claim DC can bring book1 down to 2.241 bpp. Which would be > (2.241/8)*768.771=215.352 bytes. Does anybody know exactly how they > code their distances (i use 1 byte up to 254 and a flag byte + 3 byte > distance for the rest)? (Or what other transformations they add to > boost the performance?) > > This paper also mentions that wfc might be a better method to use, but > the description they gave was pretty cryptic and i couldn't find any > other comprehensible information on it. I do understand it is > basically an adapted mtf coder which uses multiple weights from a > local context. And i'm sure it's not that hard when you get how it > works, but right now i haven't been able to implement it. Does anybody > know a link to a comprehensive article on this? > > Any help with this will be much appreciated. > > Best regards, > Rick book1 can be better compressed with end of line coding but this is more of a trick than a legitimate improvement on compression. http://www.data-compression.info/Ju...ges.p df is a good paper which touches on WFC. The best compression method for BWT currently is (to the best of my knowledge) my M03 algorithm. A really rough paper is at http://www.michael-maniscalco.com/index2.htm There's a fully functional version at http://mij4x.datacompression.jp/ This achieves around 795,000 for the 18 file calgary corpus. Around 211,000 for book1. - Michael A Maniscalco
Post Follow-up to this messageHi, Rick D. wrote: > I'm looking for the best way to post-process a bwt stream to get the > best order 0 or 1 result. So far i've tried bwt+rle+mtf, and distance > coding (DC). And got book1 from calgary corpus down to +/- 230.000 > with my bwt+dc implementation using an order 0 arithmetic coder. > > But in this paper: > http://sun.iinf.polsl.gliwice.pl/~sdeor/pub/deo01.pdf, > > they claim DC can bring book1 down to 2.241 bpp. Which would be > (2.241/8)*768.771=215.352 bytes. Does anybody know exactly how they > code their distances (i use 1 byte up to 254 and a flag byte + 3 byte > distance for the rest)? (Or what other transformations they add to > boost the performance?) In the original distance coder, I coded the distances by their log2 (number of needed bits) and then sent those bits (raw, not arithmetic coded) without the leading one (always 1). The logs were encoded with an order-1 arithmetic coder with a predictor as context. The predictor was computed from previous distances of the same symbol like t his: pred_log = pred_log * weight1 + last_log * weight2 Of course there are a lot of things you could use additionally for predictin g (use your imagination), but AFAIR you should get down to 217k for start with almost any reasonable choice for weight1, weight2 (like for example both 0.5 ). There probably are some other better ways to encode the distances, IIRC ther e are distance coders that compress book down to 213k. Nowadays I'd do it quite differently too, but I can't seem to find any spare time :-) -- Edgar
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.