For Programmers: Free Programming Magazines  


Home > Archive > Compression > April 2007 > Compressing a graph's datapoints









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Compressing a graph's datapoints
Casper

2007-04-15, 6:56 pm

This is slightly related to my previous post, "Smartest way of
compressing numbers".

If I have a large amount of X,Y data points, each data point can be
assumed to have "some" relation to its surrounding data point (I.e.
temperature samples 10 minutes apart are probably not extremely
different from another, however samples 10 hours could be very different).

Inspecting such a graph visually, it would appear that considerable
"representational space" could be saved, by dividing the large graph up
in successively smaller "windows" and simply associate an X,Y offset
with each one. My mind can do this relatively easy by inspection, but
how to implement this as an algorithm has me wondering:

A) Does this bear resemblance to any existing compression method?

B)I get the feeling this is a complex optimization problem (how large
should the window be in width vs. height and should I start out by as
many windows as possible or fewest) but perhaps it falls into a class
that is not NP bound. Which optimization algorithms does it sound like I
should make use of?

Thanks in advance,
/Casper


Phil Carmody

2007-04-15, 6:56 pm

Casper <casper@jbr.dk> writes:
> This is slightly related to my previous post, "Smartest way of
> compressing numbers".
>
> If I have a large amount of X,Y data points, each data point can be
> assumed to have "some" relation to its surrounding data point
> (I.e. temperature samples 10 minutes apart are probably not extremely
> different from another, however samples 10 hours could be very
> different).
>
> Inspecting such a graph visually, it would appear that considerable
> "representational space" could be saved, by dividing the large graph
> up in successively smaller "windows" and simply associate an X,Y
> offset with each one. My mind can do this relatively easy by
> inspection, but how to implement this as an algorithm has me wondering:
>
> A) Does this bear resemblance to any existing compression method?


Yes. Just compress the differences between subsequent points.
Delta coding has been used in many applications in the past.
(Such as audio, video, and telemetrics such as yours.)

> B)I get the feeling this is a complex optimization problem (how large
> should the window be in width vs. height and should I start out by as
> many windows as possible or fewest) but perhaps it falls into a class
> that is not NP bound. Which optimization algorithms does it sound like
> I should make use of?


If you look at one point at a time, your window is only 2 points wide -
previous, and current.

Phil
--
"Home taping is killing big business profits. We left this side blank
so you can help." -- Dead Kennedys, written upon the B-side of tapes of
/In God We Trust, Inc./.
Casper

2007-04-15, 9:55 pm

> If you look at one point at a time, your window is only 2 points wide -
> previous, and current.


Your suggestion is perhaps too far in the other end, by not sharing
anything within one "window". I.e. there must be some global agreement
regarding the minimum size of each number, which can not be smaller than
the largest delta - potentially wasting space.

What I am looking for, is some middle ground between one large window
(xMax-xMin, yMax-yMin) and storing fixed length delta values.

/Casper

Josiah Carlson

2007-04-16, 3:55 am

Casper wrote:
>
> Your suggestion is perhaps too far in the other end, by not sharing
> anything within one "window". I.e. there must be some global agreement
> regarding the minimum size of each number, which can not be smaller than
> the largest delta - potentially wasting space.
>
> What I am looking for, is some middle ground between one large window
> (xMax-xMin, yMax-yMin) and storing fixed length delta values.


Do you have all of your data before you compress it? Can you calculate
the largest and smallest deltas, maybe quantize, round, etc.? Do your
times between readings vary significantly? Would using a fourier-like
transform work reasonably well for approximating your data? Do you need
the measurements to be recovered *exactly*, or is a few percent
variation ok?

- Josiah
Casper

2007-04-16, 6:55 pm

> Do you have all of your data before you compress it?

Yes I do. It is the result of a database query.

> Can you calculate
> the largest and smallest deltas, maybe quantize, round, etc.?


Yes, its determining the optimal "sliding windows" that I find
difficult, locating the window boundaries. I can simply split the graph
into fixed smaller graphs but this does not necessarily result in a
particular optimized solution.

I.e. to represent the following graph, visual inspection and intuition
would split it into two and attach offset (0,20) to the 1'st window and
(14,40) to the 2'nd:

Y
^
50 | .... ..
40 | ... . Y-min 2'nd window
30 | . ..... ..
20 |. .. ... Y-min 1'st window
10 |
0 -------------------------- > X
0 10 20
^^^^^^^^^^^^^^ ^^^^^^^^^^^
X-min X-min
1'st window 2'nd window

> Do your
> times between readings vary significantly?


Potentially, so what I would do is to handle value and timestamps as two
different "streams" and join these upon decompression (n values must
have n timestamps so they are always in sync).

> Would using a fourier-like
> transform work reasonably well for approximating your data? Do you need
> the measurements to be recovered *exactly*, or is a few percent
> variation ok?


I would need the data with no loss in accuracy.

/Casper
George Johnson

2007-04-16, 6:55 pm

"Casper" <casper@jbr.dk> wrote in message
news:F5NUh.92176$4a6.350584@weber.videotron.net...
| > Do you have all of your data before you compress it?
|
| Yes I do. It is the result of a database query.
|
| > Can you calculate
| > the largest and smallest deltas, maybe quantize, round, etc.?
|
| Yes, its determining the optimal "sliding windows" that I find
| difficult, locating the window boundaries. I can simply split the graph
| into fixed smaller graphs but this does not necessarily result in a
| particular optimized solution.
|
| I.e. to represent the following graph, visual inspection and intuition
| would split it into two and attach offset (0,20) to the 1'st window and
| (14,40) to the 2'nd:
|
| Y
| ^
| 50 | .... ..
| 40 | ... . Y-min 2'nd window
| 30 | . ..... ..
| 20 |. .. ... Y-min 1'st window
| 10 |
| 0 -------------------------- > X
| 0 10 20
| ^^^^^^^^^^^^^^ ^^^^^^^^^^^
| X-min X-min
| 1'st window 2'nd window
|
| > Do your
| > times between readings vary significantly?
|
| Potentially, so what I would do is to handle value and timestamps as two
| different "streams" and join these upon decompression (n values must
| have n timestamps so they are always in sync).
|
| > Would using a fourier-like
| > transform work reasonably well for approximating your data? Do you need
| > the measurements to be recovered *exactly*, or is a few percent
| > variation ok?
|
| I would need the data with no loss in accuracy.
|
| /Casper

Look up "Optimal Spline" coding.
At the very worst you can reduce it down to a higher order polynomial
equation.

http://en.wikipedia.org/wiki/B%C3%A9zier_curve

http://en.wikipedia.org/wiki/Spline_(mathematics)

http://www.nar-associates.com/nurbs/c_code.html

An Interactive Introduction to Splines
http://www.ibiblio.org/e-notes/Splines/Intro.htm

Spline Curves
The applets here allow you to interactively experiment with different forms
of of spline curves.
http://www.cse.unsw.edu.au/~lambert/splines/



Josiah Carlson

2007-04-16, 6:55 pm

Casper wrote:
[snip]
> I would need the data with no loss in accuracy.


1. Try a simple delta coding technique, as suggested by others. You can
use an adaptive arithmetic coder to encode the differences, and/or if
you expect the differences to follow a curve like... 1/i or 1/2^i, use a
coding whose length is inversely proportional to its magnitude.

2. Try a low-frequency fourier/cosine transform on the x values, also
transmit deltas between the transform reconstruction and your original data.

Depending on the variation of your data, either of the above two should
give you fairly decent savings. Try it with the timestamps too.

- Josiah
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com