For Programmers: Free Programming Magazines  


Home > Archive > Compression > November 2005 > huffman tree save to file









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author huffman tree save to file
xxxxyz@abv.bg

2005-11-07, 9:55 pm

Hi,

Could you suggest different ways for saving huffman tree in the
compressed file header?

Thanks.

Willem

2005-11-07, 9:55 pm

xxxxyz@abv.bg wrote:
) Could you suggest different ways for saving huffman tree in the
) compressed file header?

Saving the whole tree in the header is a waste of space
when only the code lengths suffice. Google for canonical huffman.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Nir Halowani

2005-11-07, 9:55 pm


xxxxyz@abv.bg wrote:
> Hi,
>
> Could you suggest different ways for saving huffman tree in the
> compressed file header?
>
> Thanks.


Hi,

As suggested you might considered using canonical Huffman since while
saving of canonical Huffman we don't need to store the codes, of course
we still need to pass to the decoder the lengths and symbols. It can be
done in the following way: First there's 16 bytes, which mean the
symbols that a given code length has. After that there's the symbols
starting with those for the code length 1, then 2... in alphabetical
order. Of course once you have the lengths and symbols you do again the
codes and you can start to decode.
Anyway, You might also consider using an adaptive Huffman tree since in
this method no Huffman header is needed.

- Nir

cr88192

2005-11-07, 9:55 pm


<xxxxyz@abv.bg> wrote in message
news:1131369984.016163.268610@g43g2000cwa.googlegroups.com...
> Hi,
>
> Could you suggest different ways for saving huffman tree in the
> compressed file header?
>

well, as noted by the last few responses, there are multiple ways.

storing the code lengths. this is what deflate does (more so, it huffman
compresses the code lengths as well).
pros: smaller tree size.
cons: harder to implement in my experience, and if your file is of
non-trivial size, imo, the huffman tree size doesn't matter that much.

storing the huffman tree in a more raw form.
this is simpler to implement, essentially, as one just loads and saves the
tree...
it is my oppinion that, typically, the data is large enough that the table
size doesn't matter much.

for example, assuming 9 bits per code (typically needed, eg, if you want to
use part of the codespace for algo related uses). with 1 bit as a node/leaf
indicator, for 256 symbols you need 2815 bits (or about 352 bytes), 512
symbols would need about 5631 bits (or 704 bytes).

if your entire compressed file is only a few kB, then, yes, this is a
problem. if your compressed file is, eg, like 100kB or more, imo, it doesn't
matter so much.


then again, one can also use arithmetic coding.
a paq-style coder is fairly easy to get working and can be decently fast
(ok, but significantly slower than huffman if some optimizations are used).
advantages are that it gets better compression, and is in my experience a
lot simpler to work with. messing with the huffman algo to write different
compressors is imo a pita (endlessly needing to jerk off the coding at the
bit-level, ...).

as a cost, it trudges along at a few MB/s, wheras with huffman one can get
decode speeds close to the 100MB/s mark on my computer (deflate actually
manages to go faster, but I suspect this is because lz77 offers slight "make
it faster" properties).


recently as a misc effort I made another coder that was able to pull off
decode speeds exceeding those of huffman (around 150-170MB/s on my
computer), but, as a cost, the compression ratio was rather poor.

> Thanks.
>



Nils

2005-11-07, 9:55 pm

Since Arithmetic coding seems to be at the basis of a lot of compression
methods, wouldn't it be nice if Intel would build in processor instructions
especially for this.. Just like e.g. MMX or SSE instructions.

The actual arithmetic coding code is so small this would not be very
difficult imo. And a huge amount of software would then benefit from this
built-in functionality, going at the speed of light with (more efficient)
arithmetic coding vs. Huffman, which is indeed a PITA, I agree with you.

Nils


Thomas Richter

2005-11-08, 7:55 am

Hi,

> Since Arithmetic coding seems to be at the basis of a lot of compression
> methods, wouldn't it be nice if Intel would build in processor instructions
> especially for this.. Just like e.g. MMX or SSE instructions.


> The actual arithmetic coding code is so small this would not be very
> difficult imo. And a huge amount of software would then benefit from this
> built-in functionality, going at the speed of light with (more efficient)
> arithmetic coding vs. Huffman, which is indeed a PITA, I agree with you.


There are two problems:

1) Arithmetic coding != arithmetic coding. There are several strategies how
to do that best. CPU-friendly implementations differ from hardware-friendly
implementations, and a lot of them come to my mind (MQ coder, QM coder, Qx
Coder...)

2) This area is a patent minefield. If intel would want to implement at least
one of the algorithms above, they would have to pay a big pile of money to
several companies claiming patents.

Given that a binary arithmetic coder can be made almost as fast as writing
out the bits directly, it seems unrealistic for me that anything like
that could realisticly happen.

So long
Thomas


cr88192

2005-11-08, 7:55 am


"Nils" <bla@bla.com> wrote in message
news:cc01d$43701f7b$54778e77$14828@news.chello.nl...
> Since Arithmetic coding seems to be at the basis of a lot of compression
> methods, wouldn't it be nice if Intel would build in processor
> instructions especially for this.. Just like e.g. MMX or SSE instructions.
>

would be nice, however, it could be argued that compression is sufficiently
rare, and sufficiently fast, that such instructions are not really justified
at present.

> The actual arithmetic coding code is so small this would not be very
> difficult imo. And a huge amount of software would then benefit from this
> built-in functionality, going at the speed of light with (more efficient)
> arithmetic coding vs. Huffman, which is indeed a PITA, I agree with you.
>

yes.

secondary problem:
beyond trivial crap, building this in likely wouldn't go along so well with
how the processor usually does things.

it would almost make more sense to push that computers come with a built in
fpga or something, along with something like a vhdl compiler/interpreter lib
or something. the app would then specify how the logic should work and how
the communication too/from the app should work. the fpga is then set up
however, and the results are ran back into the app (the fpga would be
emulated if not present, but this would need to be detected and avoided to
prevent apps from being overly slow).

> Nils
>



Willem

2005-11-08, 7:55 am

cr88192 wrote:
) it would almost make more sense to push that computers come with a built in
) fpga or something, along with something like a vhdl compiler/interpreter lib
) or something. the app would then specify how the logic should work and how
) the communication too/from the app should work. the fpga is then set up
) however, and the results are ran back into the app (the fpga would be
) emulated if not present, but this would need to be detected and avoided to
) prevent apps from being overly slow).

This only makes sense if a single thread runs on the CPU.

Embedded processors would benefit from this a lot, and I think
that this already exists out there. Maybe even with optimizing
compilers that figure out what logic to program for most speed.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
David A. Scott

2005-11-08, 6:55 pm

"xxxxyz@abv.bg" <xxxxyz@abv.bg> wrote in
news:1131369984.016163.268610@g43g2000cwa.googlegroups.com:

> Hi,
>
> Could you suggest different ways for saving huffman tree in the
> compressed file header?
>
> Thanks.
>


The best way is to save it automatically which occurs when doing
adaptive huffman. Adaptive is faster and one pass instead of 2.
If you hate adaptive and must use static then
if your always doing just a few types of files maybe a one byte
index to tell which tree your using based on previously derived
tables. None of the commonly used ways are very space efficent.
To do it space efficent is hard and then you might as well do
it adaptive arithmetic for most cases.


David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

cr88192

2005-11-08, 6:55 pm


"Willem" <willem@stack.nl> wrote in message
news:slrndn1b9f.jb4.willem@toad.stack.nl...
> cr88192 wrote:
> ) it would almost make more sense to push that computers come with a built
> in
> ) fpga or something, along with something like a vhdl compiler/interpreter
> lib
> ) or something. the app would then specify how the logic should work and
> how
> ) the communication too/from the app should work. the fpga is then set up
> ) however, and the results are ran back into the app (the fpga would be
> ) emulated if not present, but this would need to be detected and avoided
> to
> ) prevent apps from being overly slow).
>
> This only makes sense if a single thread runs on the CPU.
>

why so? couldn't it be set up that somehow multiple apps can share the fpga
(using a lib and possibly a driver as a proxy) as is the case with the
gpu?...

of course, me remembering back when I was using a ge-force 2, multiple apps
using gl at the same time would kill performance, but this is less so with
newer cards...

> Embedded processors would benefit from this a lot, and I think
> that this already exists out there. Maybe even with optimizing
> compilers that figure out what logic to program for most speed.
>

ok.

>
> SaSW, Willem
> --
> Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT



Willem

2005-11-08, 6:55 pm

)> This only makes sense if a single thread runs on the CPU.

cr88192 wrote:

) why so? couldn't it be set up that somehow multiple apps can share the fpga
) (using a lib and possibly a driver as a proxy) as is the case with the
) gpu?...

It takes time to reprogram the fpga, and that would have to be done on each
task switch. Alternatively, all the apps would have to agree on a single
programming, which defeats the purpose of it being programmable.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
cr88192

2005-11-09, 3:55 am


"Willem" <willem@stack.nl> wrote in message
news:slrndn236o.1rcg.willem@toad.stack.nl...
> )> This only makes sense if a single thread runs on the CPU.
>
> cr88192 wrote:
>
> ) why so? couldn't it be set up that somehow multiple apps can share the
> fpga
> ) (using a lib and possibly a driver as a proxy) as is the case with the
> ) gpu?...
>
> It takes time to reprogram the fpga, and that would have to be done on
> each
> task switch. Alternatively, all the apps would have to agree on a single
> programming, which defeats the purpose of it being programmable.
>

I mean, couldn't the library/driver, say, dedicate part of the fpga to one
"device", another part to another "device", ... effectively rebuilding the
logic in question on an as-needed basis. an app starts up, and is like "I
want a device with there properties". this goes off to the driver, which is
like "ok", as it rewrites the logic for the in-fpga whatever, and proxies
data too/from the app, or whatever.

sorry, I don't know much about fpgas...

alternatively, only one app could use it as a time, which may also be
reasonable...

>
> SaSW, Willem
> --
> Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT



Willem

2005-11-09, 7:55 am

cr88192 wrote:
) "Willem" <willem@stack.nl> wrote in message
) news:slrndn236o.1rcg.willem@toad.stack.nl...
)> It takes time to reprogram the fpga, and that would have to be done on
)> each
)> task switch. Alternatively, all the apps would have to agree on a single
)> programming, which defeats the purpose of it being programmable.
)>
) I mean, couldn't the library/driver, say, dedicate part of the fpga to one
) "device", another part to another "device", ... effectively rebuilding the
) logic in question on an as-needed basis. an app starts up, and is like "I
) want a device with there properties". this goes off to the driver, which is
) like "ok", as it rewrites the logic for the in-fpga whatever, and proxies
) data too/from the app, or whatever.
)
) sorry, I don't know much about fpgas...

I don't know that much either, but I do know reprogramming takes
quite a bit of time. The problem is that if you write an app to expect
an fpga, then you need a certain amount of space for that app. To be
able to support multiple apps, you need X times that amount of space.

Also, I'm not sure how many times you can reprogram an fpga in its
lifetime. I would imagine it breaks down after a million programming
cycles or so, maybe even a hundred thousand or ten thousand.

In an embedded device, you're basically just running one task, so there
it's a feasible option, you program it a few hundred times during
development, and after that, it stays unchanged.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
cr88192

2005-11-09, 7:55 am


"Willem" <willem@stack.nl> wrote in message
news:slrndn3iic.eg1.willem@toad.stack.nl...
> cr88192 wrote:
> ) "Willem" <willem@stack.nl> wrote in message
> ) news:slrndn236o.1rcg.willem@toad.stack.nl...
> )> It takes time to reprogram the fpga, and that would have to be done on
> )> each
> )> task switch. Alternatively, all the apps would have to agree on a
> single
> )> programming, which defeats the purpose of it being programmable.
> )>
> ) I mean, couldn't the library/driver, say, dedicate part of the fpga to
> one
> ) "device", another part to another "device", ... effectively rebuilding
> the
> ) logic in question on an as-needed basis. an app starts up, and is like
> "I
> ) want a device with there properties". this goes off to the driver, which
> is
> ) like "ok", as it rewrites the logic for the in-fpga whatever, and
> proxies
> ) data too/from the app, or whatever.
> )
> ) sorry, I don't know much about fpgas...
>
> I don't know that much either, but I do know reprogramming takes
> quite a bit of time. The problem is that if you write an app to expect
> an fpga, then you need a certain amount of space for that app. To be
> able to support multiple apps, you need X times that amount of space.
>
> Also, I'm not sure how many times you can reprogram an fpga in its
> lifetime. I would imagine it breaks down after a million programming
> cycles or so, maybe even a hundred thousand or ten thousand.
>
> In an embedded device, you're basically just running one task, so there
> it's a feasible option, you program it a few hundred times during
> development, and after that, it stays unchanged.
>

ok, makes sense...

(right now, messing with wireless networking stuff, my room is presently a
horrid mess as a result...).

>
> SaSW, Willem
> --
> Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT



Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com