Code Comments
Programming Forum and web based access to our favorite programming groups.currently, i am looking for a mechanism to perform maximum compression ratio (compression speed is not an issue to be concerned) and maximum decompression time. I tested LZO http://www.oberhumer.com/opensource/lzo/, with lzo1x_1_compress and lzo1x_decompress. For the decompression time, it is very much dependent on the data type. At my 1.86MHz computer, for certain data pattern, it can achieve up to 1MB -> 1.3ms. However, for certain data pattern, it only can achieve up to 1MB -> 8ms. Please note that 1MB is the original data set. I was wondering is there any way/ tuning, so that I can have a much more consistent/ lower decompression time? thank you.
Post Follow-up to this message<yancheng.cheok@gmail.com> wrote in message news:1153152437.204499.126550@m79g2000cwm.googlegroups.com... > currently, i am looking for a mechanism to perform maximum compression > ratio (compression speed is not an issue to be concerned) and maximum > decompression time. > > I tested LZO http://www.oberhumer.com/opensource/lzo/, with > lzo1x_1_compress and lzo1x_decompress. > > For the decompression time, it is very much dependent on the data type. > At my 1.86MHz computer, for certain data pattern, it can achieve up to > 1MB -> 1.3ms. However, for certain data pattern, it only can achieve up > to 1MB -> 8ms. Please note that 1MB is the original data set. > > I was wondering is there any way/ tuning, so that I can have a much > more consistent/ lower decompression time? Have you tried this: http://www.1stworks.com/ref/TR/tr04-0815b.pdf The decompress is very fast.
Post Follow-up to this messageyancheng.cheok@gmail.com wrote: > > I was wondering is there any way/ tuning, so that I can have a much > more consistent/ lower decompression time? For LZO, you should read the documentation; there are different methods to use, and there are assembly versions of the decompressor for various platforms. There are other low-resource algorithms out there you might want to look into. Properly coded into assembler, they have faster decompression methods -- but the tradeoff is less compression, so you'll have to run your own tests to see if the faster decompression is worth the size tradeoff. Search comp.compression for discussions on: - LZSS variants (itself an LZ77 variant) like LZRW, LZS (Stac), RDC - Charles Bloom's LZP - Pasi Ojala's pucrunch (LZ77+RLE with rice codes) at http://www.cs.tut.fi/~albert/Dev/pucrunch/ Some more info is here [url]http://groups.google.com/group/comp.compression/msg/71dd340e19f2cb7d?dmode=source[ /url] as well. For an 8088 project, I standardized on RDC so that I could maximize use of the REP MOVSW and STOSW opcodes when decompressing. Good luck to you.
Post Follow-up to this messageDann Corbit wrote: > <yancheng.cheok@gmail.com> wrote in message > news:1153152437.204499.126550@m79g2000cwm.googlegroups.com... > > Have you tried this: > http://www.1stworks.com/ref/TR/tr04-0815b.pdf > > The decompress is very fast. Unfortunately, quantized indexing is only applicable to static bitwise order-0 models. If you want really fast decompression, use LZ77. Take a look at zlib. -- Matt Mahoney
Post Follow-up to this messageyancheng.cheok@gmail.com wrote: > For the decompression time, it is very much dependent on the data type. > At my 1.86MHz computer, for certain data pattern, it can achieve up to > 1MB -> 1.3ms. However, for certain data pattern, it only can achieve up > to 1MB -> 8ms. Please note that 1MB is the original data set. > > I was wondering is there any way/ tuning, so that I can have a much > more consistent/ lower decompression time? No, and the other suggestions are all wrong. RDC and LZ77 will suffer the same problem. It has to do with the # of tokens in the stream. Think about it, if the text compresses really well you're going to have fewer tokens and they're going to be longer runs. That is, you approach memset() speed. Now if it didn't compress well you're going to have more tokens of short runs. If you want a truly constant time algo you're going to be stuck. Shorter symbols take less time to decode. That's just the nature of information. Now if all you want is a lower upper bound, then it's a different problem. Then in that case switching codecs may help. Though IIRC LZO is just a LZ77 modification so the suggestions to try LZ77 straightup or zlib are a bit premature. Tom
Post Follow-up to this messagetomstdenis@gmail.com wrote: > No, and the other suggestions are all wrong. RDC and LZ77 will suffer > the same problem. The difference between RDC and LZO is that RDC is *so* simple that decompressing a token can be done in less than 10 instructions depending on architecture. LZO, while still incredibly efficient, is at least 40 (based on the x86 asm decoder). So it's not all about number of tokens -- with RDC there will be more of them, but they're faster to decde. With LZO there will be less, but they take longer to decode. So you have to try both with some sample data sets to see what is appropriate for your architecture and data. (The above isn't an advertisement for RDC but rather just the idea that you need to weigh more than just number of tokens in the stream.)
Post Follow-up to this messageJim Leonard wrote: > tomstdenis@gmail.com wrote: > > The difference between RDC and LZO is that RDC is *so* simple that > decompressing a token can be done in less than 10 instructions > depending on architecture. LZO, while still incredibly efficient, is > at least 40 (based on the x86 asm decoder). So it's not all about > number of tokens -- with RDC there will be more of them, but they're > faster to decde. With LZO there will be less, but they take longer to > decode. So you have to try both with some sample data sets to see what > is appropriate for your architecture and data. > > (The above isn't an advertisement for RDC but rather just the idea that > you need to weigh more than just number of tokens in the stream.) My point though is even with something as simple as LZ77 without huffman or whatever, you're still going to take longer with some inputs than others. Overall the process is still O(n) for streams of tokens. Tom
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.