| cr88192 2007-02-18, 3:55 am |
| well, recently I went and beat together some code to allow my apps (a 3D
engine, mapper, modeler, ...) to record video in real time (note, pure
software encoding).
it is annoyingly difficult to get good performance with this kind of thing
(preventing recording from hurting framerate too bad).
for example, one wants to use the app, and at the same time record an output
video to a file. I have it set up so that it encodes a frame every 1/15th
second, which has the annoying property of interfering with the smoothness
of the apps' framerate.
the codec used is motion-jpeg (I more or less reused my usual image-based
JPEG encoding for encoding the frames). I am using AVI as the container
format (after beating this together it was annoyingly difficult to get
functional output it seems).
counterintuitive as it is, one of the most time-consuming portions of the
whole process is downsampling. it takes longer to downsample an image (say,
from 800x600 to 320x240), than it does to actually encode the downsampled
image.
one ends up having to do a whole bunch of optimizations that seem to
somewhat hurt quality (such as aliased downsampling, ...).
but it does seem a little mysterious how existing codecs are as fast as they
are. then again, my code is compiled with debug and profile options, so oh
well...
oh well, here is an intermediate version of the downsampler:
for(i=0; i<120; i++)
for(j=0; j<160; j++)
for(k=0; k<3; k++)
{
tb=buf+(i*5*800+j*5)*4+k;
l=tb[0*800*4+0]+tb[0*800*4+4]+tb[1*800*4
+0]+tb[1*800*4+4];
buf[((i*2+0)*320+(j*2+0))*4+k]=l>>2;
l=tb[0*800*4+12]+tb[0*800*4+16]+tb[1*800
*4+12]+tb[1*800*4+16];
buf[((i*2+0)*320+(j*2+1))*4+k]=l>>2;
l=tb[3*800*4+0]+tb[3*800*4+4]+tb[4*800*4
+0]+tb[4*800*4+4];
buf[((i*2+1)*320+(j*2+0))*4+k]=l>>2;
l=tb[3*800*4+12]+tb[3*800*4+16]+tb[4*800
*4+12]+tb[4*800*4+16];
buf[((i*2+1)*320+(j*2+1))*4+k]=l>>2;
}
and here is a slower version:
for(i=0; i<120; i++)
for(j=0; j<160; j++)
for(k=0; k<3; k++)
{
tb=buf+(i*5*800+j*5)*4+k;
l= (tb[0*800*4+0]+tb[0*800*4+4]+tb[0*800*4+
8]+
tb[1*800*4+0]+tb[1*800*4+4]+tb[1*800*4+8
]+
tb[2*800*4+0]+tb[2*800*4+4]+tb[2*800*4+8
])/9;
buf[((i*2+0)*320+(j*2+0))*4+k]=(l<0)?0:(l>255)?255:l;
l= (tb[0*800*4+8]+tb[0*800*4+12]+tb[0*800*4
+16]+
tb[1*800*4+8]+tb[1*800*4+12]+tb[1*800*4+
16]+
tb[2*800*4+8]+tb[2*800*4+12]+tb[2*800*4+
16])/9;
buf[((i*2+0)*320+(j*2+1))*4+k]=(l<0)?0:(l>255)?255:l;
l= (tb[2*800*4+0]+tb[2*800*4+4]+tb[2*800*4+
8]+
tb[3*800*4+0]+tb[3*800*4+4]+tb[3*800*4+8
]+
tb[4*800*4+0]+tb[4*800*4+4]+tb[4*800*4+8
])/9;
buf[((i*2+1)*320+(j*2+0))*4+k]=(l<0)?0:(l>255)?255:l;
l= (tb[2*800*4+8]+tb[2*800*4+12]+tb[2*800*4
+16]+
tb[3*800*4+8]+tb[3*800*4+12]+tb[3*800*4+
16]+
tb[4*800*4+8]+tb[4*800*4+12]+tb[4*800*4+
16])/9;
buf[((i*2+1)*320+(j*2+1))*4+k]=(l<0)?0:(l>255)?255:l;
}
and here is the faster version (very crappy looking):
for(i=0; i<120; i++)
for(j=0; j<160; j++)
for(k=0; k<3; k++)
{
tb=buf+(i*5*800+j*5)*4+k;
l=(i*2*320+j*2)*4+k;
buf[l]=tb[1*800*4+4];
buf[l+4]=tb[1*800*4+12];
buf[l+320*4]=tb[3*800*4+4];
buf[l+320*4+4]=tb[3*800*4+12];
}
likewise, here are 2 versions of the jpeg encoders' UV downsampling:
#if 0
for(i=0; i<ys3; i++)
for(j=0; j<xs3; j++)
{
k=ub[(i*2)*xs2+j*2]+
ub[(i*2)*xs2+j*2+1]+
ub[(i*2+1)*xs2+j*2]+
ub[(i*2+1)*xs2+j*2+1];
ub[i*xs3+j]=k/4;
k=vb[(i*2)*xs2+j*2]+
vb[(i*2)*xs2+j*2+1]+
vb[(i*2+1)*xs2+j*2]+
vb[(i*2+1)*xs2+j*2+1];
vb[i*xs3+j]=k/4;
}
#endif
for(i=0; i<ys3; i++)
for(j=0; j<xs3; j++)
{
ub[i*xs3+j]=ub[(i*2)*xs2+j*2];
vb[i*xs3+j]=vb[(i*2)*xs2+j*2];
}
note, the jpeg encoding has been optimized less for the primary reason that
it takes much less of the runtime.
or something...
|