For Programmers: Free Programming Magazines  


Home > Archive > Compression > November 2005 > Low CPU standard compressor?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Low CPU standard compressor?
Toni

2005-10-13, 3:55 am

Hi,

I need to compress an Oracle database as it is being backed-up to tape.

gzip gives me 10x ~ 12x compression (80 GB to ~7 GB) but takes 2h30 to
do so.

I would need a maybe not so good compressor that can do better than 4x
but faster than gzip (1h?).

Unfortunately, using the hardware compressor in the tape device is not
an option.

Thanks for comments and ideas,

Toni

Mark Adler

2005-10-13, 6:55 pm

Toni wrote:
> I would need a maybe not so good compressor that can do better than 4x
> but faster than gzip (1h?).


Try gzip -1.

mark

Toni

2005-10-13, 6:55 pm


Mark Adler wrote:
> Toni wrote:
>
> Try gzip -1.


Thanks Mark,

I have tried. This gives me a 1.5 ~ 1.8x increase in speed but i'd need
some more.
Right now I'm also experimenting with compress and bzip2. Other ideas I
want to try are zip and rar (if rar exists for unix).

Toni

Nicolas Le Gland

2005-10-13, 6:55 pm

> I would need a maybe not so good compressor that can do better than 4x
> but faster than gzip (1h?).


Maybe this could help you :
http://www.maximumcompression.com/d...ry_mf3.php#data


Mark Adler

2005-10-13, 6:55 pm

Toni wrote:
> I have tried. This gives me a 1.5 ~ 1.8x increase in speed but i'd need
> some more.


If gzip -1 doesn't do it for you, then try lzop.

http://www.lzop.org/

mark

Fulcrum

2005-10-13, 6:55 pm


Nicolas Le Gland wrote:
>
> Maybe this could help you :
> http://www.maximumcompression.com/d...ry_mf3.php#data


Yep, LZOP is the anwser here!.

Toni

2005-10-14, 3:55 am

Thanks to all. I did'nt know about lzop, seems like the right thing for
my problem. Today I'll try to compile and test it.

Toni

Jim Leonard

2005-10-14, 6:55 pm

Toni wrote:
> Thanks to all. I did'nt know about lzop, seems like the right thing for
> my problem. Today I'll try to compile and test it.


I went through your dilemma last year and came up with three options:

LZRW1+RLE (which you've already worked with)
LZO
LZP (Charles Bloom)

Each of them has advantages and divantages, so it is best to
implement all three and see which one works best for your environment.

Jim Leonard

2005-10-14, 6:55 pm

Jim Leonard wrote:
> LZRW1+RLE (which you've already worked with)
> LZO
> LZP (Charles Bloom)
>
> Each of them has advantages and divantages, so it is best to
> implement all three and see which one works best for your environment.


It just dawned on me that you might not be aware of the
advantages/divantages. Here's my opinions, based on some informal
research of last year (my focus was primarily DEcompression speed on
embedded 8088 platforms with low speed/low memory):

LZRW1+RLE
Pros: Decompression of runs is the fastest of all three if your CPU has
a repeating store opcode like x86's REP STOS
Cons: Compression is worst of all three

LZO
Pros: Pre-written for you as the LZO C library; decade-old library is
thoroughly tested and optimized; speed is excellent for a "generic"
multi-platform library
Cons: No official documentation for exactly *what* LZO algorithms do so
if you don't use C the library you have a lot of porting work ahead of
you (however, a simple easy-to-understand Java port exists if you
understand Java)

LZP
Pros: Simple, clever concept; easy to implement; compression speed
excellent.
Cons: Requires same amount of memory for decompression as compression
(due to offsets being stored in the hash table)

BTW, all of the above assumes you are using byte-aligned or
nybble-aligned codes (ie. by "LZP" I am *not* talking about the
PPM-like order-2 variable-bit code stuff). Also, if you plan to
re-implement any of the above for a specific language or platform, you
have to try to understand how they work so you can pick the best one.
In my case, my 8088 environment only has three memory pointers
available, 64KB total, slow memory speed, and a 4-byte prefetch queue
so I had to take these factors into account when choosing what I was
going to use.

Mark Adler

2005-10-17, 3:55 am

Just for kicks, and since I was curious, here are some quick
comparisons I did between lzop, gzip, and bzip2. Each line is the
program, the compressed size and compressed ratio, and the execution
time in seconds. The input file was mostly header files and binary
libraries. This was on a 1.5 GHz PowerPC. Your mileage may vary.

Starting with file of 248,046,595 bytes:

lzop -1 87790913 (0.354) 4.1
lzop -2 87565567 (0.353) 3.9
lzop -7 68455488 (0.276) 56.2
lzop -9 67886068 (0.274) 160.7

gzip -1 70038655 (0.282) 16.8
gzip -6 60523978 (0.244) 32.0
gzip -9 60025123 (0.242) 112.6

bzip2 48528470 (0.196) 214.8

Decompression times in seconds, on the default lzop -2, gzip -6, and
bzip2 output:

lzop 1.7
gun 2.2
gzip 3.6
bzip2 37.7

"gun" is a gzip decompressor that uses the latest version of zlib and
as a result is faster than gzip. One of these days gzip will be
updated accordingly.

To first order, lzop got 3:1, gzip 4:1, and bzip2 5:1 compression.
gzip took eight times as long as lzop, and bzip2 took seven times as
long as gzip.

By the way, lzop -3 through lzop -6 produced the same compressed size
as lzop -2. I don't know why lzop -2 got better compression in less
time than lzop -1, but perhaps that's why lzop -2 is the default.

What's clear is that you should just stick with the default for lzop,
which is lzop -2. For higher compression levels, you should use gzip
instead. Similarly, you needn't bother with gzip -9. For higher
compression than the default gzip -6, you should probably use bzip2.

mark

Toni

2005-10-17, 6:55 pm

Thanks Jim and Leonard

Finally I've stayed with gzip -1. The client did not ever want to hear
on any compressioon tool wich was not universally known (...by them).

I like compression algorithms and have experimented a bit with them,
mostly of the RLE kind when I workd in real-time embedded systems in a
past life (collecting servo data and similar). Now with clients who
only use computers as a resource and dont' know nothing about them it
is much more difficult to do anything at all.

Thanks again,

Toni

Ignorant

2005-10-17, 6:55 pm


Java port for LZO compressor. Where is it that?.
I used the old Java decompressor to understand the output tokens.

Ignorant

2005-10-17, 6:55 pm


The type of clients you are mentioning know the "REAL" usage of
computers as a tool. So
what are you cribbing about. Develop for them.
Dont expect end users to develop skills to match
fancy resources used by their smart developers.
Ps: I am a simula-67 developer who has very recently upgraded his
skills to vb.net.

John Reiser

2005-10-17, 6:55 pm

> Decompression times in seconds, on the default lzop -2, gzip -6, and
> bzip2 output:
>
> lzop 1.7
> gun 2.2
> gzip 3.6
> bzip2 37.7
>
> "gun" is a gzip decompressor that uses the latest version of zlib and
> as a result is faster than gzip.


Please give the absolute version number of zlib that was tested.
"The latest version of zlib" is a relative designation whose meaning
may vary, especially from the point of view of the developer of zlib
decompression [Mark Adler] in contrast to an ordinary user.

According to http://www.zlib.net , as of August 7th, 2005 the latest
version of zlib is 1.2.3, dated July 18, 2005. Is that the one?

> One of these days gzip will be updated accordingly.


A similar proposal for re-implmenting gzip using zlib has been
around for several years, but I can find little evidence of action.

--
Mark Adler

2005-10-17, 6:55 pm

John Reiser wrote:
> According to http://www.zlib.net , as of August 7th, 2005 the latest
> version of zlib is 1.2.3, dated July 18, 2005. Is that the one?


Yes, per the web site that is the latest version of zlib. gun.c is in
the examples directory.

> A similar proposal for re-implmenting gzip using zlib has been
> around for several years, but I can find little evidence of action.


That would be because there has been no action.

mark

Earl Colby Pottinger

2005-11-03, 6:55 pm

"Ignorant" <shunya@vsnl.com> :

> The type of clients you are mentioning know the "REAL" usage of
> computers as a tool. So
> what are you cribbing about. Develop for them.
> Dont expect end users to develop skills to match
> fancy resources used by their smart developers.
> Ps: I am a simula-67 developer who has very recently upgraded his
> skills to vb.net.


This however runs both ways.

If the programmer gets too fancy with his code he could well end up
developing a system that is hard/expensive to maintain if the original
programmer is no longer available. Simpler systems are better in that sense.

However, the client is not the programmer or the availablle market of
programmers out there. What would you say if the client insists on using a
Bubble Sort because that is what they understand? And no matter how much you
warn them, they insist on that code being used?

I had the same happen to me when I had written a inventory program that used
fixed indexes for speed. The customer insisted on making the indexes as
small as possible to get max. speed (yes this was a few years ago), I kept
asking to double the size of the indexes to be safe for future expansion.
Guess which company does not exist today? No, it was not because of the
program, but it really was because they did not plan for radical changes in
the boat building industry, they got left in the dust in the last industry
dip - all the other companies around them still exist today.

Earl Colby Pottinger

--
I make public email sent to me! Hydrogen Peroxide Rockets, OpenBeos,
SerialTransfer 3.0, RAMDISK, BoatBuilding, DIY TabletPC. What happened to
the time? http://webhome.idirect.com/~earlcp
Ignorant

2005-11-07, 9:55 pm


well, All said and done I have yet to come across a system which was
useful , reasonably
sized and easy to maintain. The "easy" has to have meaning in the
context of how much I know
about the software collection anyway.
For a clinet who liked bubblesort , I would sweet talk her/him into
parting with a lot of money to
develop a super bubble sort . Develop your own quicksort and have a
constrained bubble sort at
the last step. The client will feel great.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com