For Programmers: Free Programming Magazines  


Home > Archive > Compression > October 2004 > reducing all files by one bit question.









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author reducing all files by one bit question.
Ernst Berg

2004-10-04, 3:55 pm

When we say files do we mean the binary number or the whole file?

Ernst
Eric Bodden

2004-10-04, 8:55 pm

When we say "strange question", do we mean the whole word or just the
letters?

Eric
Jogi Kuenstner

2004-10-06, 8:55 pm

Eric Bodden wrote:

> When we say "strange question", do we mean the whole word or just the
> letters?
>
> Eric


:-))))
Great.
Jogi
--
The particular mistake will not be repeated. There are plenty of
mistakes left that have not yet been used. A. Tanenbaum
JogiKue@kuenstner.de

Ernst Berg

2004-10-09, 3:55 am

Jogi Kuenstner <JogiKue@kuenstner.de> wrote in message news:<ck1lhv$73v$3@online.de>...
> Eric Bodden wrote:
>
>
> :-))))
> Great.
> Jogi




I don't have "Thunder and Lightening" to contribute to this group;
after all I am a hobbyist but, I was chasing my logical tail the other
day and started to fiddle around for fun and I found an amusing
encoding where any binary value represented as a string of length N
where N is the MSB of that value can be encoded one bit less.


Well I have an algorithm that encodes binary.
It's perhaps not practical but it does work.

Constraints :

The binary number is considered N digits long and the Nth digit is
the Most Significant digit (MSB).
The resulting encoding is a string of symbols where both zero and one
( set and reset bits ) have meaning and are instructions that cause
some work.
Thus for the sumbol 0 it means to add a value once and the symbol 1
means to add a value twice.

Given a binary number say of 1100101 I can show that encoded it is
100101 and as such is a savings of one bit. Like wise a number such as
100101 encodes to 00101 a savings of one bit.

1100101 = 100101


Starting with the MSB position minus one :

1100101
-x-----
1100101
- 100000
--------
1000101 MSB is not reset so repeat
- 100000
---------
100101 MSB has reset so emit the symbol 1
symbolizing a count of 2.

xxxxxx Code
1-----

Next 100101
--x---- Next power of two.

100101
- 10000
-------
10101 MSB has reset so emit a 0 symbol to
symbolize a count of 1.

xxxxxx Code
10----
Next 10101
---x--- Next power of 2 down.

10101
- 1000
------
1101 MSB has reset so emits a 0 symbol

xxxxxx Code
100---

Next 1101
----x-- Next power of 2 down.

1101
- 100
------
1001 MSB has not reset so repeat.

1001
- 100
-----
101 MSB has reset so emit a 1 symbol.

xxxxxx Code
1001--

Next 101
-----x- Next power of 2 down.

101
- 10
--------
11 MSB has reset so emit a 0 symbol.

xxxxxx Code
10010-

Next 11
------x Next power of 2 down.

11
- 1
----
10 MSB has not reset so repeat.

10
- 1
---
1 ***

MSB has reset and the last symbol is 1


xxxxxx Code
100101

------------------------------------------------------------------------

To decode start with the value of one and read LSB to MSB adding one
when the symbol is zero and adding 2 when it is 1.

Thus, in humor, 1100101 = 100101


Ernst
Sebastian Gesemann

2004-10-09, 9:12 pm

On 9 Oct 2004, Ernst Berg wrote:

> Jogi Kuenstner <JogiKue@kuenstner.de> wrote in message news:<ck1lhv$73v$3@online.de>...
>
>
> I don't have "Thunder and Lightening" to contribute to this group;
> after all I am a hobbyist but, I was chasing my logical tail the other
> day and started to fiddle around for fun and I found an amusing
> encoding where any binary value represented as a string of length N
> where N is the MSB of that value can be encoded one bit less.
>
> Well I have an algorithm that encodes binary.
> It's perhaps not practical but it does work.


[..]

> Thus, in humor, 1100101 = 100101



Heh, how fun. It took a little time to spot the
"flaw" of this encoding for me.

Here's a hint for you (in case you really think that this
encoding is of any value in the field of data compression):

This is "a binary value represented as a string of length N":
0110001
where N=7
Try encoding this one to 6 bit using your sheme.


HTH,
Sebastian

--
PGP-Key-ID (long): 572B1778A4CA0707

Matt Mahoney

2004-10-09, 9:12 pm


"Ernst Berg" <Ernst_Berg@sbcglobal.net> wrote in message
news:be9ae35b.0410082333.7360463@posting.google.com...
> Jogi Kuenstner <JogiKue@kuenstner.de> wrote in message

news:<ck1lhv$73v$3@online.de>...
>
>
>
> I don't have "Thunder and Lightening" to contribute to this group;
> after all I am a hobbyist but, I was chasing my logical tail the other
> day and started to fiddle around for fun and I found an amusing
> encoding where any binary value represented as a string of length N
> where N is the MSB of that value can be encoded one bit less.
>
>
> Well I have an algorithm that encodes binary.
> It's perhaps not practical but it does work.
>
> Constraints :
>
> The binary number is considered N digits long and the Nth digit is
> the Most Significant digit (MSB).
> The resulting encoding is a string of symbols where both zero and one
> ( set and reset bits ) have meaning and are instructions that cause
> some work.
> Thus for the sumbol 0 it means to add a value once and the symbol 1
> means to add a value twice.
>
> Given a binary number say of 1100101 I can show that encoded it is
> 100101 and as such is a savings of one bit. Like wise a number such as
> 100101 encodes to 00101 a savings of one bit.
>
> 1100101 = 100101
>
>
> Starting with the MSB position minus one :
>
> 1100101
> -x-----
> 1100101
> - 100000
> --------
> 1000101 MSB is not reset so repeat
> - 100000
> ---------
> 100101 MSB has reset so emit the symbol 1
> symbolizing a count of 2.
>
> xxxxxx Code
> 1-----
>
> Next 100101
> --x---- Next power of two.
>
> 100101
> - 10000
> -------
> 10101 MSB has reset so emit a 0 symbol to
> symbolize a count of 1.
>
> xxxxxx Code
> 10----
> Next 10101
> ---x--- Next power of 2 down.
>
> 10101
> - 1000
> ------
> 1101 MSB has reset so emits a 0 symbol
>
> xxxxxx Code
> 100---
>
> Next 1101
> ----x-- Next power of 2 down.
>
> 1101
> - 100
> ------
> 1001 MSB has not reset so repeat.
>
> 1001
> - 100
> -----
> 101 MSB has reset so emit a 1 symbol.
>
> xxxxxx Code
> 1001--
>
> Next 101
> -----x- Next power of 2 down.
>
> 101
> - 10
> --------
> 11 MSB has reset so emit a 0 symbol.
>
> xxxxxx Code
> 10010-
>
> Next 11
> ------x Next power of 2 down.
>
> 11
> - 1
> ----
> 10 MSB has not reset so repeat.
>
> 10
> - 1
> ---
> 1 ***
>
> MSB has reset and the last symbol is 1
>
>
> xxxxxx Code
> 100101
>
> ------------------------------------------------------------------------
>
> To decode start with the value of one and read LSB to MSB adding one
> when the symbol is zero and adding 2 when it is 1.
>
> Thus, in humor, 1100101 = 100101
>
>
> Ernst


Very interesting, but there is one problem. Try it with a string where the
first bit and at least one other bit is 0.

-- Matt Mahoney


Ernst Berg

2004-10-10, 3:55 am

Sebastian Gesemann <isnt@valid.net> wrote in message news:<Pine.LNX.4.44.0410092134440.6692-100000@raven.cs.upb.de>...
> On 9 Oct 2004, Ernst Berg wrote:
>
>
>
> [..]
>
>
>
> Heh, how fun. It took a little time to spot the
> "flaw" of this encoding for me.
>
> Here's a hint for you (in case you really think that this
> encoding is of any value in the field of data compression):
>
> This is "a binary value represented as a string of length N":
> 0110001
> where N=7
> Try encoding this one to 6 bit using your sheme.
>
>
> HTH,
> Sebastian



Heh.. I think it's fun too and no that isn't a 7 bit Binary number
it's a 6 bit binary :) MSB that is.

I couldn't see a way to represnt leading zeros without overhead...


Ernst
Ernst Berg

2004-10-10, 3:55 am

"Matt Mahoney" <matmahoney@yahoo.com> wrote in message news:<MFX9d.13457$gs1.28@newsread2.news.atl.earthlink.net>...
> "Ernst Berg" <Ernst_Berg@sbcglobal.net> wrote in message
> news:be9ae35b.0410082333.7360463@posting.google.com...
> news:<ck1lhv$73v$3@online.de>...
>
>
> Very interesting, but there is one problem. Try it with a string where the
> first bit and at least one other bit is 0.
>
> -- Matt Mahoney



Yes I agree... I spent some time trying naturally.

Ernst
Sebastian Gesemann

2004-10-10, 8:55 pm

On 9 Oct 2004, Ernst Berg wrote:

> Sebastian Gesemann <isnt@valid.net> wrote in message news:<Pine.LNX.4.44.0410092134440.6692-100000@raven.cs.upb.de>...
>
> Heh.. I think it's fun too and no that isn't a 7 bit Binary number
> it's a 6 bit binary :) MSB that is.


It's a definition issue.

Let me cite you again:

> I was chasing my logical tail the other
> day and started to fiddle around for fun
> and I found an amusing encoding where
> any binary value represented as a string of
> length N where N is the MSB of that value can
> be encoded one bit less.


With "where N is the MSB" you probably meant
"N is the smallest possible value so the number can
be expressed as a string of N bits".

This was not clear to me since MSB
is just a short for "most significant bit" (and
does not imply a set bit).

So, this whole complicated algorithm boils down to a
"1xxxx to yyyy"-transformer - You could have just removed
the leading bit instead since we already know it was '1'
:-)

The problem of this algorithm is: it accepts "numbers" and produces
bit strings WHICH CAN start with a zero bit. If you treat the outcome
as an ordinary number and remove the leading zero bits, then the
process will be irreversible because you need to know how many leading
zero bits you've produced to restore the original "number".

To avoid this you could add a leading '1' to the outcome of your
algorithem as a 'marker', treat this as a new number. Then the whole
thing weill be a reversible integer to integer transform - but with
no compression at all. :-)


Ghis!
Sebastian

--
PGP-Key-ID (long): 572B1778A4CA0707


Ernst Berg

2004-10-11, 8:55 am

Sebastian Gesemann <isnt@valid.net> wrote in message news:<Pine.LNX.4.44.0410102142280.25294-100000@raven.cs.upb.de>...
> On 9 Oct 2004, Ernst Berg wrote:
>
>
> It's a definition issue.
>
> Let me cite you again:
>
>
> With "where N is the MSB" you probably meant
> "N is the smallest possible value so the number can
> be expressed as a string of N bits".
>
> This was not clear to me since MSB
> is just a short for "most significant bit" (and
> does not imply a set bit).
>


Yes a Set bit is MSB to me in the context of a binary number.
I see your point thanks.


> So, this whole complicated algorithm boils down to a
> "1xxxx to yyyy"-transformer - You could have just removed
> the leading bit instead since we already know it was '1'
> :-)


Ah that wouldn't be fun :).

>
> The problem of this algorithm is: it accepts "numbers" and produces
> bit strings WHICH CAN start with a zero bit. If you treat the outcome
> as an ordinary number and remove the leading zero bits, then the
> process will be irreversible because you need to know how many leading
> zero bits you've produced to restore the original "number".
>
> To avoid this you could add a leading '1' to the outcome of your
> algorithem as a 'marker', treat this as a new number. Then the whole
> thing weill be a reversible integer to integer transform - but with
> no compression at all. :-)
>



Right you are: Application is the key.

I believe this is a functional Codec for a specific instance of data
since the symbols 0 and 1 are defined as add 1 or add 2.

I have been learning about encoding and it just gets more interesting
all the time..

I admit as well that without application this Codec is novelity.
At the moment it demonstrates integer to code - code to interger in a
simple way.

I worked out the other algorithm that does this same thing with
output of slightly different code but, I'll pass on posting more
unless there is some utility to do so.


Thanks Everyone for reading about this... I think it is neat.


Ernst
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com