Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Arithmetic compression
Hello!!

I have an arithmetic compressor that, at initialization, allow to me
to set the number of symbols to compress. This is a port to c# that i
have writted of c code of Mark Nelson. I've maked a lot of tests an
work fine.

My question is: When i take a lot of bytes (say, 100000) and push it
into the compressor, i get a compress ratio (say 50%), but if i push
the same bytes bit by bit i get more compression (say 60%).

Is this ok?. If yes, are there another way to determine the best
symbol size that try and error?

Thanks a lot.

Federico

Report this thread to moderator Post Follow-up to this message
Old Post
Federico
10-22-04 08:55 PM


Re: Arithmetic compression
On 22 Oct 2004 09:12:34 -0700, Federico wrote:

> My question is: When i take a lot of bytes (say, 100000) and push it
> into the compressor, i get a compress ratio (say 50%), but if i push
> the same bytes bit by bit i get more compression (say 60%).
What do you mean by "pushing into the compressor"?

> Is this ok?. If yes, are there another way to determine the best
> symbol size that try and error?
No, this is odd. If you use the same model on the same files, you should be
getting the same ratio IMHO.

Eric

--
Eric Bodden, ICQ: 12656220, http://www.bodden.de, PGP: BB465582
Arithmetic Coding - educational example code and more
http://ac.bodden.de/

Report this thread to moderator Post Follow-up to this message
Old Post
Eric Bodden
10-22-04 08:55 PM


Re: Arithmetic compression
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Eric Bodden <newsserver_mails@bodden.de> wrote:
> On 22 Oct 2004 09:12:34 -0700, Federico wrote: 
> What do you mean by "pushing into the compressor"? 
> No, this is odd. If you use the same model on the same files, you should b
e
> getting the same ratio IMHO.
> Eric

Why odd? When we use the same model on the same "random source", we
should get the same ratio. But now it is a specific file, just
a sampling from a random source. Thus we can't acquire any knowledge
about the original ramdon source.

For an example. A file contains 0x01 0x02 0x04 0x08 0x10
can be a sample from a source:

0x01	probability = 0.2
0x02	probability = 0.2
0x04	probability = 0.2
0x08	probability = 0.2
0x10	probability = 0.2

we can't compress this source because it's an uniform distribution.

And also it could be a sample from another source:

0	probability = 35/40
1	probability = 5/40

which can be compressed well.

- --
PaulLiu(¼B¿oÂ@)
E-mail address:PaulLiu.bbs@bbs.cis.nctu.edu.tw
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBej/ roQj7xTSiaUYRAqV9AJoDv7dKQ3bQ5MusdAXppWO
zd1FWlgCbBl2r
S23KZZ5/YG0zpTz1dEKYgNQ=
=yl6E
-----END PGP SIGNATURE-----

Report this thread to moderator Post Follow-up to this message
Old Post
Ying-Chun Liu
10-23-04 01:55 PM


Re: Arithmetic compression
Ying-Chun Liu <PaulLiu.bbs@bbs.cis.nctu.edu.tw> wrote in message news:<cldfa
r$hc3$1@jupiter.ttn.net>...

> For an example. A file contains 0x01 0x02 0x04 0x08 0x10
> can be a sample from a source:
>
> 	0x01	probability = 0.2
> 	0x02	probability = 0.2
> 	0x04	probability = 0.2
> 	0x08	probability = 0.2
> 	0x10	probability = 0.2
>
> we can't compress this source because it's an uniform distribution.
>
> And also it could be a sample from another source:
>
> 	0	probability = 35/40
> 	1	probability = 5/40
>
> which can be compressed well.

Good example, but it's not only applicable to arithmetic encoders.  I
actually took advantage of something similar to that when I wrote a
VLSI sim dump database: when bitvectors are blasted apart in the value
"string" table, values would compress much better as you could
compress bit trains that were shifted off of byte boundaries and
wouldn't have been all that compressible otherwise:

let x be some long stream of bits, then we're presented with
x in one instance
and prefix bits + x + suffix bits in another, etc.

...this worked especially well with LZ/BWT compressor postprocessors
as the data itself isn't really character-based data and the frequent
(sub)string matches were able to overcome the 8x expansion from packed
characters.  As always, YMMV.

-t

Report this thread to moderator Post Follow-up to this message
Old Post
Anthony J Bybell
10-24-04 01:55 AM


Re: Arithmetic compression
Ying-Chun Liu <PaulLiu.bbs@bbs.cis.nctu.edu.tw> writes:

> Eric Bodden <newsserver_mails@bodden.de> wrote: 
>
> Why odd? When we use the same model on the same "random source", we
> should get the same ratio. But now it is a specific file, just
> a sampling from a random source. Thus we can't acquire any knowledge
> about the original ramdon source.
>
> For an example. A file contains 0x01 0x02 0x04 0x08 0x10
> can be a sample from a source:
>
> 	0x01	probability = 0.2
> 	0x02	probability = 0.2
> 	0x04	probability = 0.2
> 	0x08	probability = 0.2
> 	0x10	probability = 0.2
>
> we can't compress this source because it's an uniform distribution.

Nonsense.

P(0x00) = 0.0
P(0x03) = 0.0
P(0x05) = 0.0
P(0x06) = 0.0
...

Phil

--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
... liver.   -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'

Report this thread to moderator Post Follow-up to this message
Old Post
Phil Carmody
10-24-04 01:55 PM


Re: Arithmetic compression
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Phil Carmody <thefatphil_demunged@yahoo.co.uk> wrote:
> Ying-Chun Liu <PaulLiu.bbs@bbs.cis.nctu.edu.tw> writes: 
> Nonsense.
> P(0x00) = 0.0
> P(0x03) = 0.0
> P(0x05) = 0.0
> P(0x06) = 0.0

Symbol has no meaning.
What I want to say is when you choose 0x01 0x02 0x04 0x08 0x10 as symbols
or 0 1 as symbols to represent the same data.
And calculate the entropy function H(S) results different value.

H(S) = £U -p(s)log p(s)
s

- --
PaulLiu(¼B¿oÂ@)
E-mail address:PaulLiu.bbs@bbs.cis.nctu.edu.tw
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

 iD8DBQFBe7NooQj7xTSiaUYRAgOkAJwPzeWSduCq
QLbjpiZanIYnwPedtwCfVcNO
nAjDyp1QNZddtooEyb+9Vm0=
=Q6m8
-----END PGP SIGNATURE-----

Report this thread to moderator Post Follow-up to this message
Old Post
Ying-Chun Liu
10-24-04 08:55 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Compression archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 04:38 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.