For Programmers: Free Programming Magazines  


Home > Archive > Compression > October 2004 > adaptive huffman, ordinary data









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author adaptive huffman, ordinary data
cr88192

2004-10-15, 3:55 pm

oh well, I wrote something that is afaik similar to adaptive huffman (I just
came up with my own algos here, using what I did for the static case as the
base).

it typically gives slightly worse compression than the static version, and
takes a bit more time. this was not entirely unexpected however.

maybe someone can look at the fragment at the bottom and tell me if this is
even really "huffman", I feel unsure (similar goes for the static algo, as
far as I can tell, the only strong similarities are that it is based on
variable codes, uses a kind of tree, and I call it "huffman" lacking any
better term, ...).

I am unsure what is "typical" in terms of speed or compression (compared
with other algos of this sort), so I just posted about what I got from
testing...

yes, I know all this is crappy, eg, vs. deflate (which seems somewhat faster
and also compresses better).


I sat around and figured "hell, why don't I try compressing 'normal' data?"
(unlike what I had been focusing on before, normal data has somewhat
different properties).
the results of the basic algos are unimpressive:
exe's compress down to about 90%, text to about 75%.
for the arith coder, it was about 60% for the exe and 68% for the text.

I think, hell, text should compress better than that, given a lot of chars
are unused (many in the 0..31 range, 127..255), I would think I would be
looking at at least 50%, oh well...
it is possible though that the rle compressor adds some overhead, but it
should be insignifigant here.
possibly this could be related to me varying the tree construction a
little...

for the text (approx 1.3MB of globbed source), the static coder takes about
331ms, the adaptive coder 722ms, the adaptive coder+lz filter 11s, and the
arithmatic coder about 14s.
for the exe (also about 1.3MB), it is 373ms, 2.4s, 16.6s, and 21.8s here.

a form of lz77 does a lot here (with graphical deltas lz made things worse,
but with 'typical' data lz helps).
text compresses down to about 25%, exe's to about 45%.

a kind of character lru filter gives about 41% and 39% for the exe (static
and dynamic coding), 30% with lz77, and 27% with the arithmatic coder.

with the lru filter for the text I get 60% (both codings), 36% with lz77
added, and 50% with the arithmatic coder.


oh well, the core of the adaptive coder is fairly simple (annotated chunks
to point out the workings):
#define PDHUFF_MAXCODES 4096

typedef struct {
byte *lbits;
int nl, tot;
int *stat, *nums;
int *idx;
}PDHuff2_Table;

typedef struct {
PDBSIO *bs;
PDHuff2_Table *tbl;
}PDHuff2_CTX;



PDHuff2_Table *PDHuff2_NewInitialTable()
{
PDHuff2_Table *tmp;
int i;

tmp=kalloc(sizeof(PDHuff2_Table));
tmp->lbits=kalloc(256);
tmp-> stat=kalloc(PDHUFF_MAXCODES*sizeof(int))
;
tmp-> nums=kalloc(PDHUFF_MAXCODES*sizeof(int))
;
tmp-> idx=kalloc(PDHUFF_MAXCODES*sizeof(int));


for(i=0; i<PDHUFF_MAXCODES; i++)
{
tmp->stat[i]=1;
tmp->nums[i]=i;
tmp->idx[i]=i;
}
tmp->tot=PDHUFF_MAXCODES;

PDHuff2_RecalcTable(tmp);

return(tmp);
}

void PDHuff2_RecalcTable(PDHuff2_Table *tbl)
{
int i, j, k, b, nl, nt;
double f;

i=0;
nl=0;
nt=tbl->tot;
while(tbl->stat[i] && (i<PDHUFF_MAXCODES))
{
f=log2(((double)nt)/((double)tbl->stat[i]));
b=ceil(f);
if(!b)b=1;

tbl->lbits[nl]=b;

k=(1<<b)-1;
for(j=0; (j<k) && ((i+j)<PDHUFF_MAXCODES); j++)
nt-=tbl->stat[i+j];

i+=k;
nl++;
}

tbl->nl=nl;
}

void PDHuff2_AddCharTable(PDHuff2_Table *tbl, int v)
{
int i, j, c;

i=tbl->idx[v];
tbl->stat[i]++;
tbl->tot++;

c=0;
while((i>0) && (tbl->stat[i]>tbl->stat[i-1]))
{
j=tbl->stat[i];
tbl->stat[i]=tbl->stat[i-1];
tbl->stat[i-1]=j;

j=tbl->nums[i];
tbl->nums[i]=tbl->nums[i-1];
tbl->nums[i-1]=j;

tbl->idx[tbl->nums[i]]=i;
tbl->idx[tbl->nums[i-1]]=i-1;

c++;
i--;
}
if(c)PDHuff2_RecalcTable(tbl);
}

int PDHuff2_EncodeValue(PDBSIO *bs, PDHuff2_Table *tbl, int v)
{
int i, j, k, l;

j=tbl->idx[v];
for(i=0; i<tbl->nl; i++)
{
k=(1<<tbl->lbits[i])-1;

if(j<k)
{
PDBSIO_WriteBits(bs, j+1, tbl->lbits[i]);
return(0);
}
PDBSIO_WriteBits(bs, 0, tbl->lbits[i]);
j-=k;
}
return(-1);
}

int PDHuff2_DecodeValue(PDBSIO *bs, PDHuff2_Table *tbl)
{
int i, j, k, l;

j=0;
for(i=0; i<tbl->nl; i++)
{
k=PDBSIO_ReadBits(bs, tbl->lbits[i]);
if(k)return(tbl->nums[j+(k-1)]);
j+=(1<<tbl->lbits[i])-1;
}
return(-1);
}



Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com