Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Compression with specified output-size.
Hello,

Is there a compression library that automatically
stops when a specified output-size has been reached?

Normally, when compressing, e.g. with zlib, you
specify the size of the source buffer, and zlib
will tell you how large the output is.

I want to turn this around: I specify the output
size, and the compression lib tells me how much
of the input data it could compress into the
destination buffer.

I think this is related to the issue of compressing
streams versus compressing files?

Bram


Report this thread to moderator Post Follow-up to this message
Old Post
Bram Stolk
11-17-04 08:55 PM


Re: Compression with specified output-size.
Bram Stolk <bram@geenspam.sara.nl> wrote in message news:<1100707491.177591@blaat.sara.nl>.
.
> Normally, when compressing, e.g. with zlib, you
> specify the size of the source buffer, and zlib
> will tell you how large the output is.
>
> I want to turn this around: I specify the output
> size, and the compression lib tells me how much
> of the input data it could compress into the
> destination buffer.

This isn't normally possible because actual compression of the source
has to occur before you know what the output is.  Instead of trying to
answer your question on how to do it, could you tell us *why* you want
to do this?

Report this thread to moderator Post Follow-up to this message
Old Post
Jim Leonard
11-18-04 08:55 PM


Re: Compression with specified output-size.
Hi,

> Is there a compression library that automatically
> stops when a specified output-size has been reached?

Depends a lot on what you're compressing and how. If you're talking
about "generic" lossless file compression, then this question doesn't
really make a lot of sense. If you truncate the compressed output
before everything has been compressed, you'd loose data and the
ability to reconstruct lossless. If you'd generate files longer
than necessary, you could have done better. Thus, there's no
lossless compressor for which this subject makes sense.

For lossy compression: Yes, clearly. This concept of defining
a target quality to define a compression ratio is one of the
building-stones of modern rate-allocation mechanisms as they
are used in some codec designs, for example in the EZW (Shapiro
Zero-Tree) or EBCOT (Embedded bit stream coding by truncation)
algorithms. The goal of all these algorithms is to grant
optimal quality for given output file size, i.e. "optimization
under constraints". They are used in compression standards like
JPEG2000 (not in traditional JPEG, though).

> Normally, when compressing, e.g. with zlib, you
> specify the size of the source buffer, and zlib
> will tell you how large the output is.

> I want to turn this around: I specify the output
> size, and the compression lib tells me how much
> of the input data it could compress into the
> destination buffer.

Unless you give me the type of data you'd like to compress, I can't
give a definite answer. For "general purpose" compressors there can't
be no such thing (except a greedy algorithm that just tries to
compress and stops when the size gets too large) just because there is
no given model for the source data, and thus there is no way of
knowing how large the output would be when getting compressed. In
other words, there's no way of knowing whether a given file is "easily
compressible" except by trying to compress it in first place. For
specific data, e.g.  natural images, such models exist and allow (to a
certain extend) to give estimates of the output size.

> I think this is related to the issue of compressing
> streams versus compressing files?

No, not really.

So long,
Thomas

Report this thread to moderator Post Follow-up to this message
Old Post
Thomas Richter
11-18-04 08:55 PM


Re: Compression with specified output-size.
Jim Leonard wrote:
> Bram Stolk <bram@geenspam.sara.nl> wrote in message news:<1100707491.17759
1@blaat.sara.nl>...
> 
>
>
> This isn't normally possible because actual compression of the source
> has to occur before you know what the output is.  Instead of trying to
> answer your question on how to do it, could you tell us *why* you want
> to do this?

Because I want the compressed data to fit the MTU size of a
UDP datagram.

Additionally, I want to be each datagram selfcontained, so that
it can decompress its part of the data, irrespective of wether
the other datagrams arrived or not.

So: lossless compression (of image data), over lossy network (udp).
Parts of the image that do not survive the network, I take for
granted. Parts that do make it, should be decompressible in their
own right (self contained).

bram

Report this thread to moderator Post Follow-up to this message
Old Post
Bram Stolk
11-18-04 08:55 PM


Re: Compression with specified output-size.
"Bram Stolk" <bram@geenspam.sara.nl> wrote in message
news:1100800359.335989@blaat.sara.nl...
> Jim Leonard wrote: 
news:<1100707491.177591@blaat.sara.nl>... 
>
> Because I want the compressed data to fit the MTU size of a
> UDP datagram.
>
> Additionally, I want to be each datagram selfcontained, so that
> it can decompress its part of the data, irrespective of wether
> the other datagrams arrived or not.
>
> So: lossless compression (of image data), over lossy network (udp).
> Parts of the image that do not survive the network, I take for
> granted. Parts that do make it, should be decompressible in their
> own right (self contained).
>
>     bram

zlib will compress until either the input buffer is empty or the output
buffer is full.  So just set the output buffer to your MTU size and compress
as much data as you can.  Start a new zlib stream for each packet.

Actually it's not quite this simple because some data needs to be flushed,
so you might need to reserve a little space at the end of the packet.

-- Matt Mahoney



Report this thread to moderator Post Follow-up to this message
Old Post
Matt Mahoney
11-19-04 08:55 AM


Re: Compression with specified output-size.
"Matt Mahoney" <matmahoney@yahoo.com> wrote in message
news:bW7nd.3016$Qh3.1797@newsread3.news.atl.earthlink.net...
> "Bram Stolk" <bram@geenspam.sara.nl> wrote in message
 
>
> zlib will compress until either the input buffer is empty or the output
> buffer is full.  So just set the output buffer to your MTU size and
> compress
> as much data as you can.  Start a new zlib stream for each packet.
>
> Actually it's not quite this simple because some data needs to be flushed,
> so you might need to reserve a little space at the end of the packet.
>
Presumably he also needs to reserve space for position data for
the decoded image portion.



Report this thread to moderator Post Follow-up to this message
Old Post
Pete Fraser
11-19-04 08:55 AM


Re: Compression with specified output-size.
Bram Stolk <bram@geenspam.sara.nl> wrote in message news:<1100800359.335989@blaat.sara.nl>.
.
> Because I want the compressed data to fit the MTU size of a
> UDP datagram.
>
> Additionally, I want to be each datagram selfcontained, so that
> it can decompress its part of the data, irrespective of wether
> the other datagrams arrived or not.

Your best bet, for speed and simplicity of implementation, is to
define a blocksize equal to that of the MTU+UDP header size, and
compress.  Some data will not be compressable, and so will fit in the
largest possible packet that can be sent.  Some data will compress,
and you just use a smaller packet to contain the compressed data.

There are standards for IP compression; check RFC 2395 and RFC 3173
for starters.

Report this thread to moderator Post Follow-up to this message
Old Post
Jim Leonard
11-19-04 08:55 AM


Re: Compression with specified output-size.
As long as your final compression is lower than your output size specified,
simply do your compression, then pad your data with 00s or whatever you want
at the end.  You could have your compression terminate on a sequence and it
would then just ignore the rest of the data.

It really depends what kind of compression you're looking for, but either
way you can pad your data to get it to the size, and your decompressor would
know to ignore anything after the terminating sequence.

You could even use known compression libraries, take the output, and then
pad it so you don't need to write the compression yourself.  Decompressor
should just decompress properly and terminate, and the rest of the data
won't be used.  If already done compression isn't easy, just write it
yourself.


----- Original Message -----
From: "Bram Stolk" <bram@geenspam.sara.nl>
Newsgroups: comp.compression
Sent: Wednesday, November 17, 2004 11:04 AM
Subject: Compression with specified output-size.


> Hello,
>
> Is there a compression library that automatically
> stops when a specified output-size has been reached?
>
> Normally, when compressing, e.g. with zlib, you
> specify the size of the source buffer, and zlib
> will tell you how large the output is.
>
> I want to turn this around: I specify the output
> size, and the compression lib tells me how much
> of the input data it could compress into the
> destination buffer.
>
> I think this is related to the issue of compressing
> streams versus compressing files?
>
>   Bram
>



Report this thread to moderator Post Follow-up to this message
Old Post
AberAber
11-19-04 08:55 AM


Re: Compression with specified output-size.
Matt Mahoney wrote:

>
> zlib will compress until either the input buffer is empty or the output
> buffer is full.  So just set the output buffer to your MTU size and compre
ss
> as much data as you can.  Start a new zlib stream for each packet.
>
> Actually it's not quite this simple because some data needs to be flushed,
> so you might need to reserve a little space at the end of the packet.

Well, I already tried this approach with bzlib,
which I think has the same stream interface?

If I do this, I get:
BZ_SEQUENCE_ERROR during the second BZ2_bzCompress call to write
out the remaining stuff.

Once you've promized bzlib a certain nr of input bytes, you cannot
withdraw that promise, and finish up by resetting avail_in to 0.

Bram

> -- Matt Mahoney
>
>

Report this thread to moderator Post Follow-up to this message
Old Post
Bram Stolk
11-19-04 01:55 PM


Re: Compression with specified output-size.
AberAber wrote:
> As long as your final compression is lower than your output size specified
,
> simply do your compression, then pad your data with 00s or whatever you wa
nt
> at the end.  You could have your compression terminate on a sequence and i
t
> would then just ignore the rest of the data.

No, this is not the issue I try to solve.
The complete text compressed is larger than my desired output size,
so I want to know how much of my text could fit when compressed.

E.g.: source text is 1Mbyte, I want lossless compression, but my
compressed output should be 8000 bytes. Let's say the 1Mbyte compresses
to 300Kb.

What I want to know: how many bytes of my 1Mbyte will fit, when
compressed into the 8000 bytes?

I've given it more thought, and clearly this is a very difficult problem.
A non-trivial compressor needs to 'look ahead' into the entire
1Mb. Only something simple as RLE can process a stream without
looking ahead, and stop at any time when the output buffer is
full. When stopped, it has a valid compressed stream.

Unfortunately, photographic images will do very poorly on RLE
is my guess.

Bram


Report this thread to moderator Post Follow-up to this message
Old Post
Bram Stolk
11-19-04 01:55 PM


Sponsored Links




Last Thread Next Thread Next
Pages (4): [1] 2 3 4 »
Search this forum -> 
Post New Thread

Compression archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 06:04 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.