Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

IMPROVED SECOND STAGE COMPRESSION ALGORITHM?
hi ,
I am doing a project on compression .can any one suggest any good
compression algortims other than MFT,WFC,SIF,RLE_EXP ,RLE_BIT  which
can be used in the place of second stage algorithms to improve
compression ratio in an effecient time .if you find any new algorithms
please let me know,which will be very much helpfull to me .thanking
you
your's friendly
bharath

Report this thread to moderator Post Follow-up to this message
Old Post
BHARATH
10-11-04 08:55 AM


Re: IMPROVED SECOND STAGE COMPRESSION ALGORITHM?
"BHARATH" <bharathkj@hotmail.com> wrote in message
news:57dda274.0410102319.33baa2b1@posting.google.com...
> hi ,
>     I am doing a project on compression .can any one suggest any good
> compression algortims other than MFT,WFC,SIF,RLE_EXP ,RLE_BIT  which
> can be used in the place of second stage algorithms to improve
> compression ratio in an effecient time .if you find any new algorithms
> please let me know,which will be very much helpfull to me .thanking
> you
>                 your's friendly
>                    bharath

What is the first stage?  Burrows-Wheeler?

-- Matt Mahoney



Report this thread to moderator Post Follow-up to this message
Old Post
Matt Mahoney
10-12-04 01:55 AM


Re: IMPROVED SECOND STAGE COMPRESSION ALGORITHM?
"Matt Mahoney" <matmahoney@yahoo.com> wrote in message news:<AgFad.722$SZ5.140@newsread2.ne
ws.atl.earthlink.net>...
> "BHARATH" <bharathkj@hotmail.com> wrote in message
> news:57dda274.0410102319.33baa2b1@posting.google.com... 
>
> What is the first stage?  Burrows-Wheeler?
>
> -- Matt Mahoney

yes i am using BWT in the first stage.Is their any good compression
algorithms which can be used in the second stage .it will be more
helpfull if i get the combination of algorthims to be used in the
second stage.Is their any transformation algorithm other than bwt
which can transform the data efficiently when compare to BWT?thanking
you

Report this thread to moderator Post Follow-up to this message
Old Post
BHARATH
10-12-04 01:55 PM


Re: IMPROVED SECOND STAGE COMPRESSION ALGORITHM?
On 12 Oct 2004 04:29:29 -0700, bharathkj@hotmail.com (BHARATH) wrote:
 
>
>yes i am using BWT in the first stage.Is their any good compression
>algorithms which can be used in the second stage .it will be more


Best I've come up with (so far) is to use bwt, then move-to-front,
and then arithmetic compression.  The size of the bwt block might
also affect the final compression ratio ... not entirely sure.
I've got my bet block size at 8k right now.

Still does not do as good as pkzip though.  :(


Report this thread to moderator Post Follow-up to this message
Old Post
Bob
10-15-04 08:55 AM


Re: IMPROVED SECOND STAGE COMPRESSION ALGORITHM?
"Bob" <rock@almost.bestweb.net> wrote in message
 news:pesqm0hbbaau9idf1mftmsrndtsa3cdabl@
4ax.com...
> On 12 Oct 2004 04:29:29 -0700, bharathkj@hotmail.com (BHARATH) wrote:
> 
>
>
> Best I've come up with (so far) is to use bwt, then move-to-front,
> and then arithmetic compression.  The size of the bwt block might
> also affect the final compression ratio ... not entirely sure.
> I've got my bet block size at 8k right now.
>
> Still does not do as good as pkzip though.  :(
>

A bigger block will give you better compression, especially for text.  8K is
pretty small.  PKZIP gets better compression because it uses a bigger
context window (not sure how big, but gzip uses 32K).  In their original
paper, Burrows and Wheeler tested on the 103 MB Hector corpus and got the
best compression using one block for the entire input.
http://gatekeeper.dec.com/pub/DEC/S...ts/SRC-124.ps.Z

For modeling the transformed stream, I suspect a 1/t model would work well,
where t is the distance back in the stream.  For example, if your stream is
...ABCD, then you would assign relative probabilities of A=1/4, B=1/3,
C=1/2, D=1 for the next symbol.  I haven't tried it, though.

-- Matt Mahoney




Report this thread to moderator Post Follow-up to this message
Old Post
Matt Mahoney
10-15-04 08:55 AM


Re: IMPROVED SECOND STAGE COMPRESSION ALGORITHM?
On Fri, 15 Oct 2004 02:30:20 GMT, "Matt Mahoney"
<matmahoney@yahoo.com> wrote:

>
>"Bob" <rock@almost.bestweb.net> wrote in message
> news:pesqm0hbbaau9idf1mftmsrndtsa3cdabl@
4ax.com...
 

oops,  should be "bwt block size", not "bet block size".  My mistake.
 
>
>A bigger block will give you better compression, especially for text.  8K i
s
>pretty small.

OK.  I'll have to try bigger block sizes.

>PKZIP gets better compression because it uses a bigger
>context window (not sure how big, but gzip uses 32K).

I didn't think PKzip used BWT.  Dunno about gzip.

>In their original
>paper, Burrows and Wheeler tested on the 103 MB Hector corpus and got the
>best compression using one block for the entire input.
>http://gatekeeper.dec.com/pub/DEC/S...ts/SRC-124.ps.Z
>

More support for the idea of a larger BWT blocksize.  Thank
you.  I will check out that link.

>For modeling the transformed stream, I suspect a 1/t model would work well,
>where t is the distance back in the stream.  For example, if your stream is
>...ABCD, then you would assign relative probabilities of A=1/4, B=1/3,
>C=1/2, D=1 for the next symbol.  I haven't tried it, though.
>

Huh?  You lost me with this one.




Report this thread to moderator Post Follow-up to this message
Old Post
Bob
10-18-04 01:55 AM


Re: IMPROVED SECOND STAGE COMPRESSION ALGORITHM?
Bob wrote:
)>A bigger block will give you better compression, especially for text.  8K 
is
)>pretty small.
)
) OK.  I'll have to try bigger block sizes.

When he sais 'bigger block size' he means like a few megabytes in size.
As a rule of thumb, doubling the block size gives you in the order of 1%
better compression.  That is, using 8MB blocksize versus 4MB on the linux
kernel source gave me roughly a 1% improvement.

)>PKZIP gets better compression because it uses a bigger
)>context window (not sure how big, but gzip uses 32K).
)
) I didn't think PKzip used BWT.  Dunno about gzip.

It doesn't, it uses a 32K sliding context window.
But that's still (somewhat) comparable.

P.S. bzip2 uses a blocksize of between 100k and 900k depending on the
commandline flag.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Report this thread to moderator Post Follow-up to this message
Old Post
Willem
10-18-04 01:55 AM


Re: IMPROVED SECOND STAGE COMPRESSION ALGORITHM?
"Bob" <rock@almost.bestweb.net> wrote in message
 news:oq25n090h0l7bkm6qve11dvigrcilmst5d@
4ax.com...
> On Fri, 15 Oct 2004 02:30:20 GMT, "Matt Mahoney"
> <matmahoney@yahoo.com> wrote: 
>
> I didn't think PKzip used BWT.  Dunno about gzip.

They both use Limpel-Ziv.  But there is a compression-memory tradeoff in
just about all compression algorithms. 
well, 
is 
>
> Huh?  You lost me with this one.

You want to use a locally adaptive order 0 model.  The move-to-front model
is good for BWT because the probability of a symbol is inversely related to
the time since it was last seen.  For example, if the first 4 symbols in
your BWT stream was ABCD, then the MTF model would assign the highest
probability to D since it was seen last, followed by C, B, and A in that
order.  Now if the stream was AAABAABBCD, then the MTF model would assign
exactly the same probabilities since they would be in the same position in
the queue.  What I am suggesting is a model that also takes frequency into
account, i.e. raise the probabilities of A and B to account for their higher
frequency, but in inverse proportion to the time since they were last seen,
for example:

A = 1/10 + 1/9 + 1/8 + 1/6 + 1/5
B = 1/7 + 1/4 + 1/3
C = 1/2
D = 1

I can't say that this would work because I haven't tried it, just suggesting
an experiment.  It probably isn't done this way because there isn't an
efficient way to compute it that I can think of.

-- Matt Mahoney



Report this thread to moderator Post Follow-up to this message
Old Post
Matt Mahoney
10-18-04 01:55 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Compression archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 05:46 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.