Home > Archive > Compression > August 2005 > Encoding/decoding ranges of unsigned integers...
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Encoding/decoding ranges of unsigned integers...
|
|
|
| Here is the description of a problem:
Assume you have n integer ranges: [i, j], [k, l], [m, n]...
There is no particular order on the sequence of ranges.
There is no gap in any of the ranges.
Is there a smart way to encode these ranges into a representation that
would be both compact and efficient to encode/decode.
Before I go about and re-invent the wheel, I would like to know if there
is a well known algorithm to accomplish this.
| |
| Willem 2005-08-18, 8:59 am |
| d|dq wrote:
) Here is the description of a problem:
)
) Assume you have n integer ranges: [i, j], [k, l], [m, n]...
)
) There is no particular order on the sequence of ranges.
) There is no gap in any of the ranges.
In other words: It's okay to, say, sort the ranges ?
Is there a minimum and maximum value to the integers ?
Can the ranges overlap ?
) Is there a smart way to encode these ranges into a representation that
) would be both compact and efficient to encode/decode.
Sort the ranges in increasing order, calculate the differences/deltas,
encode those with an arith coder or whatever.
) Before I go about and re-invent the wheel, I would like to know if there
) is a well known algorithm to accomplish this.
It rather depends on the distribution of those ranges and other factors.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| John Reiser 2005-08-18, 5:55 pm |
| > Assume you have n integer ranges: [i, j], [k, l], [m, n]...
>
> There is no particular order on the sequence of ranges.
> There is no gap in any of the ranges.
If "no gap" implies that the given intervals are a partition
(disjoint cover) of the interval [min(i,k,m,...), max(j,l,n,...)]:
Sort the intervals, then encode one of the endpoints of
the large interval, and the ordered sequence of lengths
of the sorted subintervals.
--
| |
|
| Hi Willem,
Thanks for contributing your thoughts and time.
I answer your questions below...
Willem wrote:
> In other words: It's okay to, say, sort the ranges ?
> Is there a minimum and maximum value to the integers ?
> Can the ranges overlap ?
There could be millions of ranges. Sorting them would take too long.
(too long = longer than acceptable in a soft real-time system)
The minimum value would be 0.
The maximum value would be the greatest value you can store in an
unsigned 64-bit integer. (as stated by C99)
> ) Is there a smart way to encode these ranges into a representation that
> ) would be both compact and efficient to encode/decode.
>
> Sort the ranges in increasing order, calculate the differences/deltas,
> encode those with an arith coder or whatever.
I don't think sorting is an option. See above.
> It rather depends on the distribution of those ranges and other factors.
I agree. I am sorry I have not been clear enough.
The distribution of the ranges is undetermined. I could only say this:
At time t1,
if you were to sort the ranges from left to right ([2,9] < [10,11] < etc.),
you would find the density of ranges is higher on the left side of the
sorted sequence.
>
>
> SaSW, Willem
| |
|
| John Reiser wrote:
>
>
> If "no gap" implies that the given intervals are a partition
> (disjoint cover) of the interval [min(i,k,m,...), max(j,l,n,...)]:
> Sort the intervals, then encode one of the endpoints of
> the large interval, and the ordered sequence of lengths
> of the sorted subintervals.
>
Hi John,
Thanks for offering your thoughts.
1)
"no gap" does not mean this in this case.
I guess I have used this expression too casually.
There are 2 sample ranges:
[2-5] = [2,3,4,5]
[7-7] = [7]
"no gap" in any single range is what I meant. So, this is not a valid range:
[2,4,5] (3 is missing)
2)
No, it's not a disjoint cover.
If you were to sort a sample set of ranges, you could have something
like this:
[2-3], [5-12], [14-17], [20-45], [46-112], [114-456], etc...
Hence, it's not a cover and the density of ranges is higher at the
beginning of the sorted sequence.
| |
| John Reiser 2005-08-18, 5:55 pm |
| d|dq wrote:
> There could be millions of ranges. Sorting them would take too long.
> (too long = longer than acceptable in a soft real-time system)
>
> The minimum value would be 0.
> The maximum value would be the greatest value you can store in an
> unsigned 64-bit integer. (as stated by C99)
> The distribution of the ranges is undetermined [but] At time t1,
> if you were to sort the ranges from left to right ([2,9] < [10,11] < etc.),
> you would find the density of ranges is higher on the left side of the sorted sequence.
Choose a batch size of (say) 100 ranges, or the ranges that arrive
in the next fixed-in-advance interval of time (say, 0.1 second.)
Sort the batch. Encode the minimum left endpoint, the differences
(in order) between consecutive left endpoints, and the lengths (in
order) of the ranges. Take advantage of anything else that might
be discovered about the distribution of the two sequences of deltas
(distance between consecutive left endpoints; consecutive lengths.)
The statement of the problem promises very little, so it may be
unlikely to improve upon an ordinary entropy encoder.
--
| |
|
| John Reiser wrote:
> d|dq wrote:
>
>
>
>
>
> Choose a batch size of (say) 100 ranges, or the ranges that arrive
> in the next fixed-in-advance interval of time (say, 0.1 second.)
> Sort the batch. Encode the minimum left endpoint, the differences
> (in order) between consecutive left endpoints, and the lengths (in
> order) of the ranges. Take advantage of anything else that might
> be discovered about the distribution of the two sequences of deltas
> (distance between consecutive left endpoints; consecutive lengths.)
> The statement of the problem promises very little, so it may be
> unlikely to improve upon an ordinary entropy encoder.
>
In other words, I might as well use a general purpose encoder, right?
Are you familiar with LZO? (http://www.oberhumer.com/opensource/lzo/)
Will I save time if I simply use a library of this kind?
| |
| Thomas 2005-08-20, 9:55 pm |
| d|dq wrote:
> Here is the description of a problem:
>
> Assume you have n integer ranges: [i, j], [k, l], [m, n]...
>
> There is no particular order on the sequence of ranges.
> There is no gap in any of the ranges.
>
> Is there a smart way to encode these ranges into a representation that
> would be both compact and efficient to encode/decode.
>
> Before I go about and re-invent the wheel, I would like to know if there
> is a well known algorithm to accomplish this.
i don't know if this is applicable in your case.
But tag tree coding is quite a common method to encode large arrays of
integers. It is for example used in JPEG2000 to encode the packet headers.
|
|
|
|
|