For Programmers: Free Programming Magazines  


Home > Archive > Compression > January 2007 > Re: issues, DCT vs. MDCT (Re: misc, possible promise: MDCT and image









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: issues, DCT vs. MDCT (Re: misc, possible promise: MDCT and image
erpy

2007-01-12, 7:55 am

cr88192 ha scritto:

....snip
>
> gpu:
> the GPU is fast, but not very accurate;
> the decode quality would likely be somewhat worse on a GPU.
>
> mostly I am talking about issues where you have to look at it fairly
> hard, or zoom in, to notice (some other artifacts occure, but they are
> common with JPEG as well, even at very high quality settings).
>
>
> I think what it is mostly is that the combination of integer math, a
> windowing function, and byte output, leads to a general degradation of
> quality.
>
> for example, if I were using floats and/or higher quality fixed
> point, and the lapped transform were done using fixed point as well,
> quite possibly the quality would be higher, but so would the memory
> use and run time.
>
> it is a tradeoff I guess, the MDCT, itself, seemingly has an implied
> level of noise and loss in quality when implemented as I have, which
> is a little higher than DCT when implemented the same way (DCT lacking
> both lapping and a windowing function...).
>
>
> this makes sense, for example:
> between the horizontal and vertical passes, much of the accuracy is
> lost, why:
> because I have to do the windowing function, and as such have less
> total bits for storing accuracy (limited here by sizeof int, and the
> fact that going to long-long will make things several-X slower in this
> case).
>
> and, the lapped transform is done purely with bytes (already with a
> reduced dynamic range, which actually amplifies the noise when the
> dynamic range is restored), which thinking about it, is itself likely
> to constitute much of the degredation I have seen.
>
> ...
>
>
> so, yes, it probably will be possible to get good quality out of this,
> but likely as a cost to speed.
>
>
>
> for example, this is about what my IMDCT transform (or at least, part
> of it) looks like:
>
> void PDJPG_TransIMDCT_Horiz(int *iblk, byte *oblk)
> {
> int a, b, c, d, e, f, g, h;
> int i, j, k, l, m, n, o, p;
>
> a=iblk[0]; b=iblk[1]; c=iblk[2]; d=iblk[3];
> e=iblk[4]; f=iblk[5]; g=iblk[6]; h=iblk[7];
>
> i= a*162 -b*225 -c* 74 +d*254 -e* 25 -f*244 +g*120 +h*197;
> j= a*120 -b*254 +c*162 +d* 74 -e*244 +f*197 +g* 25 -h*225;
> k= a* 74 -b*197 +c*254 -d*225 +e*120 +f* 25 -g*162 +h*244;
> l= a* 25 -b* 74 +c*120 -d*162 +e*197 -f*225 +g*244 -h*254;
> m=-a*197 +b*120 +c*244 -d* 25 -e*254 -f* 74 +g*225 +h*162;
> n=-a*225 -b* 25 +c*197 +d*244 +e* 74 -f*162 -g*254 -h*120;
> o=-a*244 -b*162 -c* 25 +d*120 +e*225 +f*254 +g*197 +h* 74;
> p=-a*254 -b*244 -c*225 -d*197 -e*162 -f*120 -g* 74 -h* 25;
>
> oblk[ 0]+= (i* 6)>>16; oblk[ 1]+= (j*19)>>16;
> oblk[ 2]+= (k*30)>>16; oblk[ 3]+= (l*41)>>16;
> oblk[ 4]+=(-l*49)>>16; oblk[ 5]+=(-k*56)>>16;
> oblk[ 6]+=(-j*61)>>16; oblk[ 7]+=(-i*64)>>16;
> oblk[ 8]+= (m*64)>>16; oblk[ 9]+= (n*61)>>16;
> oblk[10]+= (o*56)>>16; oblk[11]+= (p*49)>>16;
> oblk[12]+= (p*41)>>16; oblk[13]+= (o*30)>>16;
> oblk[14]+= (n*19)>>16; oblk[15]+= (m* 6)>>16;
> }
>
> misc:
>
> though dubious, I largely split the transform and window function into
> 2 steps in order to allow factoring of the former.
>
> I am not sure if I went off track and there is a more efficient way to
> do this though.
>
>
> alternatively, all of this can be done in a single pass by not
> splitting them (and with an in total reduced bits requirement and
> likely slightly improving accuracy). the cost though would be more or
> less doubling the number of additions and multiplies (adding up to
> about a 4x slowdown).
>
>
> or such...
>


Where do you get the assumption that GPUs are less accurate ? On a
fragment (pixel) level most GPUs work at 128bit precision.
You should be working in floats while within the GPU, and only when
retrieving the results with the CPU have values converted to bytes.
Tests we conducted on image processing and color space transformations
show not only that the GPU is more accurate but that it's way faster
than price-comparable CPU.

As an example:

Input frame: HDR 32bpp RGBE format - 720p resolution.

Doing RGBE to RGB floating point conversion, recovering chroma and luma
(always floating point) and a bunch more operations per-pixel, gave us
the following timing:

CPU: ~200 milliseconds/frame (0.2 secs)
GPU: ~130 microseconds/frame (0.00013 secs)

Yes...initially we where getting 0 milliseconds from the GPU...so we
decided to switch to the performance timer and get microseconds...then
it showed up... :) I suppose you can get the enormous difference in
speed, and the fact that a CPU cannot beat that...as a GPU nowadays
provides between 16 and 24 parallel pixel pipelines in floating point.
If you build your data accordingly, doing matrices or converting data to
floating point textures to lookup, you can pretty much do (almost )
anything in image processing - and other fields.

Latest models are even more generic...and can be programmed directly in
C - or a very similar language anyway.


Best,
E.


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com