Code Comments
Programming Forum and web based access to our favorite programming groups.hi , I am doing a project on compression .can any one suggest any good compression algortims other than MFT,WFC,SIF,RLE_EXP ,RLE_BIT which can be used in the place of second stage algorithms to improve compression ratio in an effecient time .if you find any new algorithms please let me know,which will be very much helpfull to me .thanking you your's friendly bharath
Post Follow-up to this message"BHARATH" <bharathkj@hotmail.com> wrote in message news:57dda274.0410102319.33baa2b1@posting.google.com... > hi , > I am doing a project on compression .can any one suggest any good > compression algortims other than MFT,WFC,SIF,RLE_EXP ,RLE_BIT which > can be used in the place of second stage algorithms to improve > compression ratio in an effecient time .if you find any new algorithms > please let me know,which will be very much helpfull to me .thanking > you > your's friendly > bharath What is the first stage? Burrows-Wheeler? -- Matt Mahoney
Post Follow-up to this message"Matt Mahoney" <matmahoney@yahoo.com> wrote in message news:<AgFad.722$SZ5.140@newsread2.ne ws.atl.earthlink.net>... > "BHARATH" <bharathkj@hotmail.com> wrote in message > news:57dda274.0410102319.33baa2b1@posting.google.com... > > What is the first stage? Burrows-Wheeler? > > -- Matt Mahoney yes i am using BWT in the first stage.Is their any good compression algorithms which can be used in the second stage .it will be more helpfull if i get the combination of algorthims to be used in the second stage.Is their any transformation algorithm other than bwt which can transform the data efficiently when compare to BWT?thanking you
Post Follow-up to this messageOn 12 Oct 2004 04:29:29 -0700, bharathkj@hotmail.com (BHARATH) wrote: > >yes i am using BWT in the first stage.Is their any good compression >algorithms which can be used in the second stage .it will be more Best I've come up with (so far) is to use bwt, then move-to-front, and then arithmetic compression. The size of the bwt block might also affect the final compression ratio ... not entirely sure. I've got my bet block size at 8k right now. Still does not do as good as pkzip though. :(
Post Follow-up to this message"Bob" <rock@almost.bestweb.net> wrote in message news:pesqm0hbbaau9idf1mftmsrndtsa3cdabl@ 4ax.com... > On 12 Oct 2004 04:29:29 -0700, bharathkj@hotmail.com (BHARATH) wrote: > > > > Best I've come up with (so far) is to use bwt, then move-to-front, > and then arithmetic compression. The size of the bwt block might > also affect the final compression ratio ... not entirely sure. > I've got my bet block size at 8k right now. > > Still does not do as good as pkzip though. :( > A bigger block will give you better compression, especially for text. 8K is pretty small. PKZIP gets better compression because it uses a bigger context window (not sure how big, but gzip uses 32K). In their original paper, Burrows and Wheeler tested on the 103 MB Hector corpus and got the best compression using one block for the entire input. http://gatekeeper.dec.com/pub/DEC/S...ts/SRC-124.ps.Z For modeling the transformed stream, I suspect a 1/t model would work well, where t is the distance back in the stream. For example, if your stream is ...ABCD, then you would assign relative probabilities of A=1/4, B=1/3, C=1/2, D=1 for the next symbol. I haven't tried it, though. -- Matt Mahoney
Post Follow-up to this messageOn Fri, 15 Oct 2004 02:30:20 GMT, "Matt Mahoney" <matmahoney@yahoo.com> wrote: > >"Bob" <rock@almost.bestweb.net> wrote in message > news:pesqm0hbbaau9idf1mftmsrndtsa3cdabl@ 4ax.com... oops, should be "bwt block size", not "bet block size". My mistake. > >A bigger block will give you better compression, especially for text. 8K i s >pretty small. OK. I'll have to try bigger block sizes. >PKZIP gets better compression because it uses a bigger >context window (not sure how big, but gzip uses 32K). I didn't think PKzip used BWT. Dunno about gzip. >In their original >paper, Burrows and Wheeler tested on the 103 MB Hector corpus and got the >best compression using one block for the entire input. >http://gatekeeper.dec.com/pub/DEC/S...ts/SRC-124.ps.Z > More support for the idea of a larger BWT blocksize. Thank you. I will check out that link. >For modeling the transformed stream, I suspect a 1/t model would work well, >where t is the distance back in the stream. For example, if your stream is >...ABCD, then you would assign relative probabilities of A=1/4, B=1/3, >C=1/2, D=1 for the next symbol. I haven't tried it, though. > Huh? You lost me with this one.
Post Follow-up to this messageBob wrote: )>A bigger block will give you better compression, especially for text. 8K is )>pretty small. ) ) OK. I'll have to try bigger block sizes. When he sais 'bigger block size' he means like a few megabytes in size. As a rule of thumb, doubling the block size gives you in the order of 1% better compression. That is, using 8MB blocksize versus 4MB on the linux kernel source gave me roughly a 1% improvement. )>PKZIP gets better compression because it uses a bigger )>context window (not sure how big, but gzip uses 32K). ) ) I didn't think PKzip used BWT. Dunno about gzip. It doesn't, it uses a 32K sliding context window. But that's still (somewhat) comparable. P.S. bzip2 uses a blocksize of between 100k and 900k depending on the commandline flag. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
Post Follow-up to this message"Bob" <rock@almost.bestweb.net> wrote in message news:oq25n090h0l7bkm6qve11dvigrcilmst5d@ 4ax.com... > On Fri, 15 Oct 2004 02:30:20 GMT, "Matt Mahoney" > <matmahoney@yahoo.com> wrote: > > I didn't think PKzip used BWT. Dunno about gzip. They both use Limpel-Ziv. But there is a compression-memory tradeoff in just about all compression algorithms. well, is > > Huh? You lost me with this one. You want to use a locally adaptive order 0 model. The move-to-front model is good for BWT because the probability of a symbol is inversely related to the time since it was last seen. For example, if the first 4 symbols in your BWT stream was ABCD, then the MTF model would assign the highest probability to D since it was seen last, followed by C, B, and A in that order. Now if the stream was AAABAABBCD, then the MTF model would assign exactly the same probabilities since they would be in the same position in the queue. What I am suggesting is a model that also takes frequency into account, i.e. raise the probabilities of A and B to account for their higher frequency, but in inverse proportion to the time since they were last seen, for example: A = 1/10 + 1/9 + 1/8 + 1/6 + 1/5 B = 1/7 + 1/4 + 1/3 C = 1/2 D = 1 I can't say that this would work because I haven't tried it, just suggesting an experiment. It probably isn't done this way because there isn't an efficient way to compute it that I can think of. -- Matt Mahoney
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.