Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

adaptive huffman, ordinary data
oh well, I wrote something that is afaik similar to adaptive huffman (I just
came up with my own algos here, using what I did for the static case as the
base).

it typically gives slightly worse compression than the static version, and
takes a bit more time. this was not entirely unexpected however.

maybe someone can look at the fragment at the bottom and tell me if this is
even really "huffman", I feel unsure (similar goes for the static algo, as
far as I can tell, the only strong similarities are that it is based on
variable codes, uses a kind of tree, and I call it "huffman" lacking any
better term, ...).

I am unsure what is "typical" in terms of speed or compression (compared
with other algos of this sort), so I just posted about what I got from
testing...

yes, I know all this is crappy, eg, vs. deflate (which seems somewhat faster
and also compresses better).


I sat around and figured "hell, why don't I try compressing 'normal' data?"
(unlike what I had been focusing on before, normal data has somewhat
different properties).
the results of the basic algos are unimpressive:
exe's compress down to about 90%, text to about 75%.
for the arith coder, it was about 60% for the exe and 68% for the text.

I think, hell, text should compress better than that, given a lot of chars
are unused (many in the 0..31 range, 127..255), I would think I would be
looking at at least 50%, oh well...
it is possible though that the rle compressor adds some overhead, but it
should be insignifigant here.
possibly this could be related to me varying the tree construction a
little...

for the text (approx 1.3MB of globbed source), the static coder takes about
331ms, the adaptive coder 722ms, the adaptive coder+lz filter 11s, and the
arithmatic coder about 14s.
for the exe (also about 1.3MB), it is 373ms, 2.4s, 16.6s, and 21.8s here.

a form of lz77 does a lot here (with graphical deltas lz made things worse,
but with 'typical' data lz helps).
text compresses down to about 25%, exe's to about 45%.

a kind of character lru filter gives about 41% and 39% for the exe (static
and dynamic coding), 30% with lz77, and 27% with the arithmatic coder.

with the lru filter for the text I get 60% (both codings), 36% with lz77
added, and 50% with the arithmatic coder.


oh well, the core of the adaptive coder is fairly simple (annotated chunks
to point out the workings):
#define PDHUFF_MAXCODES  4096

typedef struct {
byte *lbits;
int nl, tot;
int *stat, *nums;
int *idx;
}PDHuff2_Table;

typedef struct {
PDBSIO *bs;
PDHuff2_Table *tbl;
}PDHuff2_CTX;



PDHuff2_Table *PDHuff2_NewInitialTable()
{
PDHuff2_Table *tmp;
int i;

tmp=kalloc(sizeof(PDHuff2_Table));
tmp->lbits=kalloc(256);
tmp-> stat=kalloc(PDHUFF_MAXCODES*sizeof(int))
;
tmp-> nums=kalloc(PDHUFF_MAXCODES*sizeof(int))
;
tmp-> idx=kalloc(PDHUFF_MAXCODES*sizeof(int));


for(i=0; i<PDHUFF_MAXCODES; i++)
{
tmp->stat[i]=1;
tmp->nums[i]=i;
tmp->idx[i]=i;
}
tmp->tot=PDHUFF_MAXCODES;

PDHuff2_RecalcTable(tmp);

return(tmp);
}

void PDHuff2_RecalcTable(PDHuff2_Table *tbl)
{
int i, j, k, b, nl, nt;
double f;

i=0;
nl=0;
nt=tbl->tot;
while(tbl->stat[i] && (i<PDHUFF_MAXCODES))
{
f=log2(((double)nt)/((double)tbl->stat[i]));
b=ceil(f);
if(!b)b=1;

tbl->lbits[nl]=b;

k=(1<<b)-1;
for(j=0; (j<k) && ((i+j)<PDHUFF_MAXCODES); j++)
nt-=tbl->stat[i+j];

i+=k;
nl++;
}

tbl->nl=nl;
}

void PDHuff2_AddCharTable(PDHuff2_Table *tbl, int v)
{
int i, j, c;

i=tbl->idx[v];
tbl->stat[i]++;
tbl->tot++;

c=0;
while((i>0) && (tbl->stat[i]>tbl->stat[i-1]))
{
j=tbl->stat[i];
tbl->stat[i]=tbl->stat[i-1];
tbl->stat[i-1]=j;

j=tbl->nums[i];
tbl->nums[i]=tbl->nums[i-1];
tbl->nums[i-1]=j;

tbl->idx[tbl->nums[i]]=i;
tbl->idx[tbl->nums[i-1]]=i-1;

c++;
i--;
}
if(c)PDHuff2_RecalcTable(tbl);
}

int PDHuff2_EncodeValue(PDBSIO *bs, PDHuff2_Table *tbl, int v)
{
int i, j, k, l;

j=tbl->idx[v];
for(i=0; i<tbl->nl; i++)
{
k=(1<<tbl->lbits[i])-1;

if(j<k)
{
PDBSIO_WriteBits(bs, j+1, tbl->lbits[i]);
return(0);
}
PDBSIO_WriteBits(bs, 0, tbl->lbits[i]);
j-=k;
}
return(-1);
}

int PDHuff2_DecodeValue(PDBSIO *bs, PDHuff2_Table *tbl)
{
int i, j, k, l;

j=0;
for(i=0; i<tbl->nl; i++)
{
k=PDBSIO_ReadBits(bs, tbl->lbits[i]);
if(k)return(tbl->nums[j+(k-1)]);
j+=(1<<tbl->lbits[i])-1;
}
return(-1);
}




Report this thread to moderator Post Follow-up to this message
Old Post
cr88192
10-15-04 08:55 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Compression archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 05:59 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.