Home > Archive > Compression > August 2007 > compression question
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
compression question
|
|
| arthur 2007-08-20, 6:56 pm |
| Hi All,
I have a set of blocks with three bits, the figures like (only those are
allowed):
000
100
010
001
101
011
Is any ideas to make average 2 bits per a block, the input about 16K long.
Thanks in advance,
Arthur
| |
| Willem 2007-08-20, 6:56 pm |
| arthur wrote:
) Hi All,
)
) I have a set of blocks with three bits, the figures like (only those are
) allowed):
)
) 000
) 100
) 010
) 001
) 101
) 011
)
) Is any ideas to make average 2 bits per a block, the input about 16K long.
If the 6 different symbols are randomly distributed, you will need log2(6)
bits per symbol, which is somewhat less than 2.6 bits.
To get down to 2 bits/block the data neds to have some kind of higher order
redundancy that you can exploit. If not, you're out of luck.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| cr88192 2007-08-21, 9:56 pm |
|
"arthur" <aamshukov@cogeco.ca> wrote in message
news:kikyi.132$tx1.66@read1.cgocable.net...
> Hi All,
>
> I have a set of blocks with three bits, the figures like (only those are
> allowed):
>
> 000
> 100
> 010
> 001
> 101
> 011
>
> Is any ideas to make average 2 bits per a block, the input about 16K long.
>
for a random distribution, you can get, as previously noted, a little closer
to 2.6 bits.
000 = 0
001 = 1
010 = 2
011 = 3
100 = 4
101 = 5
now, directly, 3 such values would take 9 bits.
however, if we use some kind of an inverted base-6 scheme:
v'=v0+v1*6+v2*6
where v' is the output, and v0, v1, and v2 are the inputs.
we can fit 3 values in 8 bits.
we have 0.245 bits/byte wasted.
so, a more elaborate scheme could fit in an extra digit every 12 bytes or
so...
so 37 values in 12 bytes.
vs 32 values if encoded as 3 bit values as noted originally.
or, 36 if encoded with the scheme mentioned above.
not a lot to work with though.
if one wants to be able to code uneven endings (say, the stream ends with
only 1 or 2 values).
216..251 encode the 2-value case.
252..255 have a range of 4 (not enough for a digit).
however, for a stream with a non-trivial length, one could mix bytes with 2
and 3 digits near the end to get an exact-length ending.
2 2: 4 digits
3 2: 5 digits
3 3: 6 digits
or such...
> Thanks in advance,
>
> Arthur
>
| |
| arthur 2007-08-22, 6:56 pm |
| Thanks!
"arthur" <aamshukov@cogeco.ca> wrote in message
news:kikyi.132$tx1.66@read1.cgocable.net...
> Hi All,
>
> I have a set of blocks with three bits, the figures like (only those are
> allowed):
>
> 000
> 100
> 010
> 001
> 101
> 011
>
> Is any ideas to make average 2 bits per a block, the input about 16K long.
>
> Thanks in advance,
>
> Arthur
>
| |
| danilobrambilla@tiscali.it 2007-08-23, 7:56 am |
|
> now, directly, 3 such values would take 9 bits.
> however, if we use some kind of an inverted base-6 scheme:
> v'=v0+v1*6+v2*6
> where v' is the output, and v0, v1, and v2 are the inputs.
>
> we can fit 3 values in 8 bits.
>
> we have 0.245 bits/byte wasted.
>
> so, a more elaborate scheme could fit in an extra digit every 12 bytes or
> so...
>
> so 37 values in 12 bytes.
> vs 32 values if encoded as 3 bit values as noted originally.
> or, 36 if encoded with the scheme mentioned above.
Hi! This seems a really interesting solution, but sorry I wasn't able
to understand the magic. Could you please provide a little sample for
encoding and decoding steps? I have a similar problem to solve...Is it
possible to generalize this solution to n-bit word sequence (2 to 8
bit words).
Thank you!
| |
| cr88192 2007-08-23, 7:56 am |
|
<danilobrambilla@tiscali.it> wrote in message
news:1187860626.869329.70530@e9g2000prf.googlegroups.com...
>
>
> Hi! This seems a really interesting solution, but sorry I wasn't able
> to understand the magic. Could you please provide a little sample for
> encoding and decoding steps? I have a similar problem to solve...Is it
> possible to generalize this solution to n-bit word sequence (2 to 8
> bit words).
>
for the above, say we want to encode:
1 2 3 4 5 0 1 2 3 4 5
as bits, this would take:
001 010 011 100 101 000 001 010 011 100 101
0010 1001 1100 1010 0000 1010 0111 0010 1 (5 bytes total, uneven bit
alignment)
0x29 0xCA 0x0A 0x72 0x80
with the scheme, it takes (triples packed with first as v0):
0x79 0x22 0x79 0xFA
0110 0001 0010 0010 0110 0001 1111 1010
so, it takes 4 bytes, and aligns exactly with a byte boundary.
now, it assumes that the base is < a powe of 2.
for example, 4 bits, 0..15 would buy nothing, but if this were decimal, it
may actually be worth something.
ly, we can still only fit 2 digits per byte.
but, with BCD, we could only fit 8 digits in 32 bits (and no sign), but with
a similar scheme, we can fit 9 (0..999999999). likewise, we have about
2-bits space left over, so we could include both a sign, and an intermediate
carry/borrow status.
an example here (for sign) is mixing 2 and 10s complement.
0..999999999(0x00000000..0x3B9AC9FF): positive decimal
-999999999..-1 (0xC4653601..0xFFFFFFFF): negative decimal
this could be faster if we want to use them like normal integers, though
more intuitive may be to use 10s complement, or a fixed sign bit
(complicates arithmetic slightly).
another possible use of the encoding: self-terminating large numerical
strings (in storage the high-2 bits being reserved for all but the
terminal).
or such...
> Thank you!
>
>
| |
| danilobrambilla@tiscali.it 2007-08-23, 6:56 pm |
|
>
> for the above, say we want to encode:
> 1 2 3 4 5 0 1 2 3 4 5
>
> as bits, this would take:
> 001 010 011 100 101 000 001 010 011 100 101
> 0010 1001 1100 1010 0000 1010 0111 0010 1 (5 bytes total, uneven bit
> alignment)
> 0x29 0xCA 0x0A 0x72 0x80
>
> with the scheme, it takes (triples packed with first as v0):
> 0x79 0x22 0x79 0xFA
> 0110 0001 0010 0010 0110 0001 1111 1010
>
> so, it takes 4 bytes, and aligns exactly with a byte boundary.
OK here is where I don't understand. As you said, the first triple
would be v0 = 001, v1 = 101 and v2 = 011. Then you apply
v'=v0+v1*6+v2*6 so 001 + 101 * 6 + 011 * 6 = ??. How it would be
reversible?
thank you!
| |
| Willem 2007-08-23, 6:56 pm |
| danilobrambilla@tiscali.it wrote:
)
)>
)> for the above, say we want to encode:
)> 1 2 3 4 5 0 1 2 3 4 5
)>
)> as bits, this would take:
)> 001 010 011 100 101 000 001 010 011 100 101
)> 0010 1001 1100 1010 0000 1010 0111 0010 1 (5 bytes total, uneven bit
)> alignment)
)> 0x29 0xCA 0x0A 0x72 0x80
)>
)> with the scheme, it takes (triples packed with first as v0):
)> 0x79 0x22 0x79 0xFA
)> 0110 0001 0010 0010 0110 0001 1111 1010
)>
)> so, it takes 4 bytes, and aligns exactly with a byte boundary.
)
) OK here is where I don't understand. As you said, the first triple
) would be v0 = 001, v1 = 101 and v2 = 011. Then you apply
) v'=v0+v1*6+v2*6 so 001 + 101 * 6 + 011 * 6 = ??. How it would be
) reversible?
You forgot some parentheses:
v' = v0 + 6 * (v1 + 6* (v2) )
Or, in other words: v' = v0 + v1*6 + v2*36
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| cr88192 2007-08-23, 6:56 pm |
|
"Willem" <willem@stack.nl> wrote in message
news:slrnfcr52t.2lt2.willem@turtle.stack.nl...
> danilobrambilla@tiscali.it wrote:
> )
> )>
> )> for the above, say we want to encode:
> )> 1 2 3 4 5 0 1 2 3 4 5
> )>
> )> as bits, this would take:
> )> 001 010 011 100 101 000 001 010 011 100 101
> )> 0010 1001 1100 1010 0000 1010 0111 0010 1 (5 bytes total, uneven
> bit
> )> alignment)
> )> 0x29 0xCA 0x0A 0x72 0x80
> )>
> )> with the scheme, it takes (triples packed with first as v0):
> )> 0x79 0x22 0x79 0xFA
> )> 0110 0001 0010 0010 0110 0001 1111 1010
> )>
> )> so, it takes 4 bytes, and aligns exactly with a byte boundary.
> )
> ) OK here is where I don't understand. As you said, the first triple
> ) would be v0 = 001, v1 = 101 and v2 = 011. Then you apply
> ) v'=v0+v1*6+v2*6 so 001 + 101 * 6 + 011 * 6 = ??. How it would be
> ) reversible?
>
> You forgot some parentheses:
>
> v' = v0 + 6 * (v1 + 6* (v2) )
>
> Or, in other words: v' = v0 + v1*6 + v2*36
>
seems I originally mistyped it, and failed to notice.
yes, that is correct...
you know, in another group, people were flaming me becuase I was braindead
and scaled an integer using '(i*33)/100' instead of 'i/3'. I say, it was a
trivial error, not one of some kind of intentional malice or ignorance, a
simple braindead type error...
>
> SaSW, Willem
> --
> Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT
| |
| Hans-Peter Diettrich 2007-08-23, 6:56 pm |
| danilobrambilla@tiscali.it wrote:
> OK here is where I don't understand. As you said, the first triple
> would be v0 = 001, v1 = 101 and v2 = 011. Then you apply
> v'=v0+v1*6+v2*6
This formula should read:
v'=v0+6*(v1+6*(v2...))
Substitute 10 for 6, and you'll understand how digits can be inserted
and extracted (using div and mod) from numbers.
DoDi
| |
| Mark Adler 2007-08-24, 9:56 pm |
| On Aug 21, 7:26 pm, "cr88192" <cr88...@hotmail.com> wrote:
> we can fit 3 values in 8 bits.
> we have 0.245 bits/byte wasted.
Or you can use 64-bit arithmetic, and stuff 17 values into every 44
bits, which will only waste about 0.01 bits per byte. The bits would
be packed into bytes, so you can look at it as 34 values packed into
11 bytes.
With multiple-precision arithmetic, you can do better still with 29
values in 75 bits, 41 values in 106 bits, etc.
Mark
| |
| arthur 2007-08-25, 6:56 pm |
| Sorry, could you be more specific about
>use 64-bit arithmetic
and
>multiple-precision arithmetic
Thanks in advance,
Arthur
"Mark Adler" <madler@alumni.caltech.edu> wrote in message
news:1187997222.310842.254670@q3g2000prf.googlegroups.com...
> On Aug 21, 7:26 pm, "cr88192" <cr88...@hotmail.com> wrote:
>
> Or you can use 64-bit arithmetic, and stuff 17 values into every 44
> bits, which will only waste about 0.01 bits per byte. The bits would
> be packed into bytes, so you can look at it as 34 values packed into
> 11 bytes.
>
> With multiple-precision arithmetic, you can do better still with 29
> values in 75 bits, 41 values in 106 bits, etc.
>
> Mark
>
| |
| Mark Adler 2007-08-25, 6:56 pm |
| On Aug 25, 10:18 am, "arthur" <aamshu...@cogeco.ca> wrote:[color=darkred]
> Sorry, could you be more specific about
>
> and
Other posts in this thread already described how to encode and decode
base-n numbers (in this case n = 6). Then you simply use the standard
arithmetic operations provided by the machine, e.g. using the
arithmetic operators in C or Java (add, subtract, multiply, and divide
integers).
For multiple precision, you can use the Gnu GMP library in C or the
BigInteger class in Java. You can search on these in google.
Mark
|
|
|
|
|