Code Comments
Programming Forum and web based access to our favorite programming groups.Hello, Is there a compression library that automatically stops when a specified output-size has been reached? Normally, when compressing, e.g. with zlib, you specify the size of the source buffer, and zlib will tell you how large the output is. I want to turn this around: I specify the output size, and the compression lib tells me how much of the input data it could compress into the destination buffer. I think this is related to the issue of compressing streams versus compressing files? Bram
Post Follow-up to this messageBram Stolk <bram@geenspam.sara.nl> wrote in message news:<1100707491.177591@blaat.sara.nl>. . > Normally, when compressing, e.g. with zlib, you > specify the size of the source buffer, and zlib > will tell you how large the output is. > > I want to turn this around: I specify the output > size, and the compression lib tells me how much > of the input data it could compress into the > destination buffer. This isn't normally possible because actual compression of the source has to occur before you know what the output is. Instead of trying to answer your question on how to do it, could you tell us *why* you want to do this?
Post Follow-up to this messageHi, > Is there a compression library that automatically > stops when a specified output-size has been reached? Depends a lot on what you're compressing and how. If you're talking about "generic" lossless file compression, then this question doesn't really make a lot of sense. If you truncate the compressed output before everything has been compressed, you'd loose data and the ability to reconstruct lossless. If you'd generate files longer than necessary, you could have done better. Thus, there's no lossless compressor for which this subject makes sense. For lossy compression: Yes, clearly. This concept of defining a target quality to define a compression ratio is one of the building-stones of modern rate-allocation mechanisms as they are used in some codec designs, for example in the EZW (Shapiro Zero-Tree) or EBCOT (Embedded bit stream coding by truncation) algorithms. The goal of all these algorithms is to grant optimal quality for given output file size, i.e. "optimization under constraints". They are used in compression standards like JPEG2000 (not in traditional JPEG, though). > Normally, when compressing, e.g. with zlib, you > specify the size of the source buffer, and zlib > will tell you how large the output is. > I want to turn this around: I specify the output > size, and the compression lib tells me how much > of the input data it could compress into the > destination buffer. Unless you give me the type of data you'd like to compress, I can't give a definite answer. For "general purpose" compressors there can't be no such thing (except a greedy algorithm that just tries to compress and stops when the size gets too large) just because there is no given model for the source data, and thus there is no way of knowing how large the output would be when getting compressed. In other words, there's no way of knowing whether a given file is "easily compressible" except by trying to compress it in first place. For specific data, e.g. natural images, such models exist and allow (to a certain extend) to give estimates of the output size. > I think this is related to the issue of compressing > streams versus compressing files? No, not really. So long, Thomas
Post Follow-up to this messageJim Leonard wrote: > Bram Stolk <bram@geenspam.sara.nl> wrote in message news:<1100707491.17759 1@blaat.sara.nl>... > > > > This isn't normally possible because actual compression of the source > has to occur before you know what the output is. Instead of trying to > answer your question on how to do it, could you tell us *why* you want > to do this? Because I want the compressed data to fit the MTU size of a UDP datagram. Additionally, I want to be each datagram selfcontained, so that it can decompress its part of the data, irrespective of wether the other datagrams arrived or not. So: lossless compression (of image data), over lossy network (udp). Parts of the image that do not survive the network, I take for granted. Parts that do make it, should be decompressible in their own right (self contained). bram
Post Follow-up to this message"Bram Stolk" <bram@geenspam.sara.nl> wrote in message news:1100800359.335989@blaat.sara.nl... > Jim Leonard wrote: news:<1100707491.177591@blaat.sara.nl>... > > Because I want the compressed data to fit the MTU size of a > UDP datagram. > > Additionally, I want to be each datagram selfcontained, so that > it can decompress its part of the data, irrespective of wether > the other datagrams arrived or not. > > So: lossless compression (of image data), over lossy network (udp). > Parts of the image that do not survive the network, I take for > granted. Parts that do make it, should be decompressible in their > own right (self contained). > > bram zlib will compress until either the input buffer is empty or the output buffer is full. So just set the output buffer to your MTU size and compress as much data as you can. Start a new zlib stream for each packet. Actually it's not quite this simple because some data needs to be flushed, so you might need to reserve a little space at the end of the packet. -- Matt Mahoney
Post Follow-up to this message"Matt Mahoney" <matmahoney@yahoo.com> wrote in message news:bW7nd.3016$Qh3.1797@newsread3.news.atl.earthlink.net... > "Bram Stolk" <bram@geenspam.sara.nl> wrote in message > > zlib will compress until either the input buffer is empty or the output > buffer is full. So just set the output buffer to your MTU size and > compress > as much data as you can. Start a new zlib stream for each packet. > > Actually it's not quite this simple because some data needs to be flushed, > so you might need to reserve a little space at the end of the packet. > Presumably he also needs to reserve space for position data for the decoded image portion.
Post Follow-up to this messageBram Stolk <bram@geenspam.sara.nl> wrote in message news:<1100800359.335989@blaat.sara.nl>. . > Because I want the compressed data to fit the MTU size of a > UDP datagram. > > Additionally, I want to be each datagram selfcontained, so that > it can decompress its part of the data, irrespective of wether > the other datagrams arrived or not. Your best bet, for speed and simplicity of implementation, is to define a blocksize equal to that of the MTU+UDP header size, and compress. Some data will not be compressable, and so will fit in the largest possible packet that can be sent. Some data will compress, and you just use a smaller packet to contain the compressed data. There are standards for IP compression; check RFC 2395 and RFC 3173 for starters.
Post Follow-up to this messageAs long as your final compression is lower than your output size specified, simply do your compression, then pad your data with 00s or whatever you want at the end. You could have your compression terminate on a sequence and it would then just ignore the rest of the data. It really depends what kind of compression you're looking for, but either way you can pad your data to get it to the size, and your decompressor would know to ignore anything after the terminating sequence. You could even use known compression libraries, take the output, and then pad it so you don't need to write the compression yourself. Decompressor should just decompress properly and terminate, and the rest of the data won't be used. If already done compression isn't easy, just write it yourself. ----- Original Message ----- From: "Bram Stolk" <bram@geenspam.sara.nl> Newsgroups: comp.compression Sent: Wednesday, November 17, 2004 11:04 AM Subject: Compression with specified output-size. > Hello, > > Is there a compression library that automatically > stops when a specified output-size has been reached? > > Normally, when compressing, e.g. with zlib, you > specify the size of the source buffer, and zlib > will tell you how large the output is. > > I want to turn this around: I specify the output > size, and the compression lib tells me how much > of the input data it could compress into the > destination buffer. > > I think this is related to the issue of compressing > streams versus compressing files? > > Bram >
Post Follow-up to this messageMatt Mahoney wrote: > > zlib will compress until either the input buffer is empty or the output > buffer is full. So just set the output buffer to your MTU size and compre ss > as much data as you can. Start a new zlib stream for each packet. > > Actually it's not quite this simple because some data needs to be flushed, > so you might need to reserve a little space at the end of the packet. Well, I already tried this approach with bzlib, which I think has the same stream interface? If I do this, I get: BZ_SEQUENCE_ERROR during the second BZ2_bzCompress call to write out the remaining stuff. Once you've promized bzlib a certain nr of input bytes, you cannot withdraw that promise, and finish up by resetting avail_in to 0. Bram > -- Matt Mahoney > >
Post Follow-up to this messageAberAber wrote: > As long as your final compression is lower than your output size specified , > simply do your compression, then pad your data with 00s or whatever you wa nt > at the end. You could have your compression terminate on a sequence and i t > would then just ignore the rest of the data. No, this is not the issue I try to solve. The complete text compressed is larger than my desired output size, so I want to know how much of my text could fit when compressed. E.g.: source text is 1Mbyte, I want lossless compression, but my compressed output should be 8000 bytes. Let's say the 1Mbyte compresses to 300Kb. What I want to know: how many bytes of my 1Mbyte will fit, when compressed into the 8000 bytes? I've given it more thought, and clearly this is a very difficult problem. A non-trivial compressor needs to 'look ahead' into the entire 1Mb. Only something simple as RLE can process a stream without looking ahead, and stop at any time when the output buffer is full. When stopped, it has a valid compressed stream. Unfortunately, photographic images will do very poorly on RLE is my guess. Bram
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.