Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Looking for best way to post-process BWT data
Hi all,

I'm looking for the best way to post-process a bwt stream to get the
best order 0 or 1 result. So far i've tried bwt+rle+mtf, and distance
coding (DC). And got book1 from calgary corpus down to +/- 230.000
with my bwt+dc implementation using an order 0 arithmetic coder.

But in this paper:
http://sun.iinf.polsl.gliwice.pl/~sdeor/pub/deo01.pdf,

they claim DC can bring book1 down to 2.241 bpp. Which would be
(2.241/8)*768.771=215.352 bytes. Does anybody know exactly how they
code their distances (i use 1 byte up to 254 and a flag byte + 3 byte
distance for the rest)? (Or what other transformations they add to
boost the performance?)

This paper also mentions that wfc might be a better method to use, but
the description they gave was pretty cryptic and i couldn't find any
other comprehensible information on it. I do understand it is
basically an adapted mtf coder which uses multiple weights from a
local context. And i'm sure it's not that hard when you get how it
works, but right now i haven't been able to implement it. Does anybody
know a link to a comprehensive article on this?

Any help with this will be much appreciated.

Best regards,
Rick


Report this thread to moderator Post Follow-up to this message
Old Post
Rick D.
04-24-05 08:55 PM


Re: Looking for best way to post-process BWT data
Rick D. wrote:
> Hi all,
>
> I'm looking for the best way to post-process a bwt stream to get the
> best order 0 or 1 result. So far i've tried bwt+rle+mtf, and distance
> coding (DC). And got book1 from calgary corpus down to +/- 230.000
> with my bwt+dc implementation using an order 0 arithmetic coder.
>
> But in this paper:
> http://sun.iinf.polsl.gliwice.pl/~sdeor/pub/deo01.pdf,
>
> they claim DC can bring book1 down to 2.241 bpp. Which would be
> (2.241/8)*768.771=215.352 bytes. Does anybody know exactly how they
> code their distances (i use 1 byte up to 254 and a flag byte + 3 byte
> distance for the rest)? (Or what other transformations they add to
> boost the performance?)
>
> This paper also mentions that wfc might be a better method to use,
but
> the description they gave was pretty cryptic and i couldn't find any
> other comprehensible information on it. I do understand it is
> basically an adapted mtf coder which uses multiple weights from a
> local context. And i'm sure it's not that hard when you get how it
> works, but right now i haven't been able to implement it. Does
anybody
> know a link to a comprehensive article on this?
>
> Any help with this will be much appreciated.
>
> Best regards,
> Rick

book1 can be better compressed with end of line coding but this is
more of a trick than a legitimate improvement on compression.

http://www.data-compression.info/Ju...ges.p
df
is
a good paper which touches on WFC.

The best compression method for BWT currently is (to the best of
my knowledge) my M03 algorithm.

A really rough paper is at
http://www.michael-maniscalco.com/index2.htm

There's a fully functional version at
http://mij4x.datacompression.jp/

This achieves around 795,000 for the 18 file calgary corpus.  Around
211,000 for book1.


- Michael A Maniscalco


Report this thread to moderator Post Follow-up to this message
Old Post
michael
04-24-05 08:55 PM


Re: Looking for best way to post-process BWT data
Hi,

Rick D. wrote:

> I'm looking for the best way to post-process a bwt stream to get the
> best order 0 or 1 result. So far i've tried bwt+rle+mtf, and distance
> coding (DC). And got book1 from calgary corpus down to +/- 230.000
> with my bwt+dc implementation using an order 0 arithmetic coder.
>
> But in this paper:
> http://sun.iinf.polsl.gliwice.pl/~sdeor/pub/deo01.pdf,
>
> they claim DC can bring book1 down to 2.241 bpp. Which would be
> (2.241/8)*768.771=215.352 bytes. Does anybody know exactly how they
> code their distances (i use 1 byte up to 254 and a flag byte + 3 byte
> distance for the rest)? (Or what other transformations they add to
> boost the performance?)

In the original distance coder, I coded the distances by their log2
(number of needed bits) and then sent those bits (raw, not arithmetic coded)
without the leading one (always 1). The logs were encoded with an order-1
arithmetic coder with a predictor as context.

The predictor was computed from previous distances of the same symbol like t
his:

pred_log = pred_log * weight1 + last_log * weight2

Of course there are a lot of things you could use additionally for predictin
g
(use your imagination), but AFAIR you should get down to 217k for start with
almost any reasonable choice for weight1, weight2 (like for example both 0.5
).

There probably are some other better ways to encode the distances, IIRC ther
e
are distance coders that compress book down to 213k. Nowadays I'd do it
quite differently too, but I can't seem to find any spare time :-)

--
Edgar


Report this thread to moderator Post Follow-up to this message
Old Post
Edgar Binder
04-24-05 08:55 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Compression archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 07:23 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.