For Programmers: Free Programming Magazines  


Home > Archive > Compression > March 2007 > Compressing n bit word sequence









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Compressing n bit word sequence
LiloLilo

2007-03-11, 6:56 pm

Hi all,

I have 7 buffers containing data as follows:

Buffer 1: A sequence of 2 bit words
Buffer 2: A sequence of 3 bit words
Buffer 3: A sequence of 4 bit words
Buffer 4: A sequence of 5 bit words
Buffer 5: A sequence of 6 bit words
Buffer 6: A sequence of 7 bit words
Buffer 7: A sequence of 9 bit words

I need to compress the content of each buffer using patent free algorithm
(possibly a simple one). Any idea?
I supposed to use LZW on a 8 bit word base step but I'm not so certain that
it would be a good idea...

Thank you all for help!

Best Regards, Danilo

RossClement@gmail.com

2007-03-12, 7:56 am

On Mar 11, 12:31 pm, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:
> "LiloLilo" <danilo.brambi...@4ward.it> writes:
>
>
>
>
> There's nothing intrinsically 8-bit about LZW, or any compression
> algorithms, so you might as well use the same generic algorithm
> for all sequences. 9 bit is probably the only one that will require
> special coding, as everything else is the 8-bit code restricted to
> a smaller range.
>
> Phil


People should beware of what I write, as I am far from a compression
expert.

But I notice this part of the wikipedia page on arithmetic coding.

http://en.wikipedia.org/wiki/ Arith...br /> ic_coding

which mentions that various compression techniques tend to use Huffman
coding rather than arithmetic coding due to patent concerns.

But surely the original poster hasn't given enough information for the
question to be answered. Surely without some sort of description of
the nature of the redundacy in the original signal, and the
distribution of original values, it's impossible to recommend a
compression technique?

Cheers,

Ross-c

LiloLilo

2007-03-12, 9:55 pm

Hi,

you're right I missed some datails on buffers content because of I am not
sure about it. In fact I am filling them using a particular algorithm I am
working on, so all is quite experimental at the moment.
This wend I tryied LZW but I got bad results. In fact all buffers
increased in dimension, so I suppose that there aren't many ripetitive
patterns. In fact this isn't strange because for example in buffer 2 I would
have only 4 word (2 bit words) but using an 8 bit step LZW compression means
that almost 8 words must be found in sequence to achive some compression.

I'll probably try to use Huffman because I expect that the data distribution
would be a Gauss one. I'll post asap the distribution of a sample on each
buffer for support my question.

Thank you all and Best regards, Danilo

PS I am also far to be an expert on compression, I am just a junior
programmer working on image compression for fun...


<RossClement@gmail.com> wrote in message
news:1173696802.056919.278210@j27g2000cwj.googlegroups.com...
> On Mar 11, 12:31 pm, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
> wrote:
>
> People should beware of what I write, as I am far from a compression
> expert.
>
> But I notice this part of the wikipedia page on arithmetic coding.
>
> http://en.wikipedia.org/wiki/ Arith...br /> ic_coding
>
> which mentions that various compression techniques tend to use Huffman
> coding rather than arithmetic coding due to patent concerns.
>
> But surely the original poster hasn't given enough information for the
> question to be answered. Surely without some sort of description of
> the nature of the redundacy in the original signal, and the
> distribution of original values, it's impossible to recommend a
> compression technique?
>
> Cheers,
>
> Ross-c
>


RossClement@gmail.com

2007-03-12, 9:55 pm

On Mar 12, 8:36 pm, "LiloLilo" <danilo.brambi...@4ward.it> wrote:
> Hi,
>
> you're right I missed some datails on buffers content because of I am not
> sure about it. In fact I am filling them using a particular algorithm I am
> working on, so all is quite experimental at the moment.
> This wend I tryied LZW but I got bad results. In fact all buffers
> increased in dimension, so I suppose that there aren't many ripetitive
> patterns. In fact this isn't strange because for example in buffer 2 I would
> have only 4 word (2 bit words) but using an 8 bit step LZW compression means
> that almost 8 words must be found in sequence to achive some compression.
>
> I'll probably try to use Huffman because I expect that the data distribution
> would be a Gauss one. I'll post asap the distribution of a sample on each
> buffer for support my question.
>
> Thank you all and Best regards, Danilo
>
> PS I am also far to be an expert on compression, I am just a junior
> programmer working on image compression for fun...


OK. I'll repeat my warning about my not being an expert on
compression.

You might want to calculate autocorrelations for your values. If there
is no autocorrelation then Huffman coding *might* (see disclaimer) be
a good approach by itself. If there is significant autocorrelation
between the values of samples and previous values one (or two, or
three) steps back, then you might need to use a more complicated
approach to get the best compression.

Cheers,

Ross-c

LiloLilo

2007-03-14, 9:56 pm

Hi,

well I run some counters on buffers data, and find out that the distribution
is not so far from be uniform. This is more true looking at the buffers with
longest words.

For buffer 2 i got this counters (the first value is a word ID, the second
is the counter):

126 - 3666
127 - 5347
128 - 6351
129 - 6185

buffer 3:

124 - 4844
125 - 7436
126 - 9560
127 - 12227
128 - 13277
129 - 12871
130 - 11877
131 - 8988
132 - 0

buffer 4:

120 - 2081
121 - 2864
122 - 3845
123 - 5103
124 - 5319
125 - 6466
126 - 7553
127 - 8281
128 - 8727
129 - 8607
130 - 8183
131 - 6939
132 - 7021
133 - 5604
134 - 4241
135 - 3391

buffer 5:

112 - 370
113 - 425
114 - 478
115 - 598
116 - 708
117 - 925
118 - 1023
119 - 1290
120 - 1097
121 - 1301
122 - 1524
123 - 1729
124 - 1703
125 - 1911
126 - 1956
127 - 2106
128 - 2164
129 - 2170
130 - 2280
131 - 2149
132 - 2209
133 - 1955
134 - 1713
135 - 1537
136 - 1870
137 - 1588
138 - 1250
139 - 1054
140 - 863
141 - 739
142 - 659
143 - 497


ecc..

As you can see I think that also Huffman won't be really userfull
here...apart buffer 3

Bye and Thank you for help. Danilo

<RossClement@gmail.com> wrote in message
news:1173732690.848315.268550@30g2000cwc.googlegroups.com...
> On Mar 12, 8:36 pm, "LiloLilo" <danilo.brambi...@4ward.it> wrote:
>
> OK. I'll repeat my warning about my not being an expert on
> compression.
>
> You might want to calculate autocorrelations for your values. If there
> is no autocorrelation then Huffman coding *might* (see disclaimer) be
> a good approach by itself. If there is significant autocorrelation
> between the values of samples and previous values one (or two, or
> three) steps back, then you might need to use a more complicated
> approach to get the best compression.
>
> Cheers,
>
> Ross-c
>


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com