Home > Archive > Compression > March 2007 > ADPCM question
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| RossClement@gmail.com 2007-03-07, 6:56 pm |
| I'm a bit stuck with ADPCM. From what I've read, ADPCM is similar to a
blocked floating point approach, except that the data encoded is DPCM
rather than the original samples.
Now, since this is a lossy technique, but based on DPCM, then an over-
simple approach would be to simply encode the difference in a lossy
way, and then ignore the loss on the decoding side. But this would
result in errors accumulating in the sample. So, the ADPCM would have
to be designed to account for the errors. DPCM is based on the
difference between the current sample and the previous sample. If we
have a lossy technique, then at each time there are potentially two
previous samples, the previous "real" sample r_i-1, and the previous
sample that would be output by the decoder, d_i-1. If the delta value
r_i - d_i-1 is stored, then each delta value would effectively correct
for previous errors.
That seems logical to me, but there is a problem if I have a block and
am trying to choose the step size. If I'm storing the delta values in,
say, 6 bits with one exponent per 64 bytes, then the actual delta
values depend on previous errors. The exponent can be chosen so that
all delta values in a block fit into 6 bits. But the choice of
exponent will affect the error, and hence affect the delta values
themselves. This could then create a nasty circularity with no proper
solution. I could find the smallest possible exponent where all delta
values are within the six bit range, but have a strong suspicion that
I have misunderstood the method. Have I?
Secondly, it's claimed that in flac when a polynomial is fitted to the
samples, the "parameters for the fixed models can be transmitted in 3
bits". (http://flac.sourceforge.net/
documentation_format_overview.html) Where can I find documentation
that explains this as three bits seems an awfully small amount of
information to store the parameters of a fitted polynomial.
| |
| RossClement@gmail.com 2007-03-07, 6:56 pm |
| I may have sort of answered my own question. I think. Using the "self-
correcting" model as described, I get much better quality than not
self-correcting (64 sample blocks, first sample absolute 16 bit
value). As far as the circularity goes, I take a plausible starting
value (smallest divisor that will make all delta values fit into 4
bits)-1, and work upwards until all self-correcting delta values fit
into 4 bits.
On two of the three (44.1khz, mono, 30 secs) sample songs I'm using
the quality sounds very good. One of them has some audible artifacts
on the intro, but otherwise sounds good. If I use 5 bit ADPCM, then
the artifacts on the intro become less noticable, but are still there.
I'm guessing that these are due to the intro of this song being a bad
case for the compression technique, rather than this being a bug in my
code. How likely is this?
| |
| cr88192 2007-03-07, 6:56 pm |
|
<RossClement@gmail.com> wrote in message
news:1173286606.774886.262810@64g2000cwx.googlegroups.com...
>I may have sort of answered my own question. I think. Using the "self-
> correcting" model as described, I get much better quality than not
> self-correcting (64 sample blocks, first sample absolute 16 bit
> value). As far as the circularity goes, I take a plausible starting
> value (smallest divisor that will make all delta values fit into 4
> bits)-1, and work upwards until all self-correcting delta values fit
> into 4 bits.
>
yes, I have done similar before as well.
yes, a self-correcting mechanism is more or less necessary, otherwise things
quickly turn into crap. usually as I had done was to simulate the decoder in
the encoder, so that predictions/... were always based off what the decoder
would have readily available, rather than the input samples.
> On two of the three (44.1khz, mono, 30 secs) sample songs I'm using
> the quality sounds very good. One of them has some audible artifacts
> on the intro, but otherwise sounds good. If I use 5 bit ADPCM, then
> the artifacts on the intro become less noticable, but are still there.
> I'm guessing that these are due to the intro of this song being a bad
> case for the compression technique, rather than this being a bug in my
> code. How likely is this?
>
well, I was disappointed in my tests, in that I found it effectively
impossible (with my codec) to get perceptually-lossless results (good, yes,
lossless, no).
this challenge had been raised to me (ie: which point does it become
perceptually lossless?...). I couldn't pull this off, the ear is
surprisingly sensitive to even minor distortions (such as the calculations
used in using joint-stereo as opposed to true stereo, ...).
I had a lot better success at the low-end, ie, bitrate/quality limbo, than
at the high-end. eventually, I figure, my codec was useless for music, but
is probably fine for "most things", ie, sound effects, voice, ... where one
doesn't really care as much about quality (except in some cases).
like, most sound-effects and voice sound just fine at 11025 8bit mono, but
this greatly reduces the quality of music. in my case, downsampling had
usually been used in the predecessor because it gave great payoffs wrt
bitrate.
in my case though, I had not used a fixed-bit scheme, but had instead used
scale-conversion, quantization, and huffman coding. actually, I had produced
a mix of predicted samples and some values occasionally used for tuning the
predictor (the encoder would scan forwards and do a kind of search to find
the best predictor and settings for a given block of samples), providing a
periodic reset point (this was per-block, to keep error from accumulating
too much), ..., which were than ran through deflate, but close enough. the
tuning helped because different audio handles differently (noise and
explosions very much different than voice, which is very much different than
tones, ...).
results were good enough, and for what I was encoding results were better
than mp3 for the same bitrates (8-16kbps mp3 sounds like crap).
but, lossless was impossible. this codec would be a brutal crime against
music...
| |
| RossClement@gmail.com 2007-03-08, 7:56 am |
| On Mar 8, 12:27 am, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
> this challenge had been raised to me (ie: which point does it become
> perceptually lossless?...). I couldn't pull this off, the ear is
> surprisingly sensitive to even minor distortions (such as the calculations
> used in using joint-stereo as opposed to true stereo, ...).
I was hoping to write a program that would do blind listening tests
for various compression techniques against themselves and against
uncompressed versions. I'd have a whole lot of 30 second music
samples, compressed in various ways, and the program would present
pairs of them to the listener, who would decide which sounded better.
A chi-squared test (or similar) would then be used to conclude whether
their ability to distinguish between them was statistically
significant or not. And multiple songs would be used to test whether
they reliably identified the same technique as "better".
| |
| cr88192 2007-03-09, 3:56 am |
|
<RossClement@gmail.com> wrote in message
news:1173352099.287269.163190@p10g2000cwp.googlegroups.com...
> On Mar 8, 12:27 am, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
>
> I was hoping to write a program that would do blind listening tests
> for various compression techniques against themselves and against
> uncompressed versions. I'd have a whole lot of 30 second music
> samples, compressed in various ways, and the program would present
> pairs of them to the listener, who would decide which sounded better.
> A chi-squared test (or similar) would then be used to conclude whether
> their ability to distinguish between them was statistically
> significant or not. And multiple songs would be used to test whether
> they reliably identified the same technique as "better".
>
I was comparing between my codec and the original (I guess a flac version of
a raw CD rip). I was looking for the point to where I could not distinguish
them, and was unable to find it...
then again, IMO, not even JPEG is perceptually lossless. make a JPEG, zoom
in, compare with original. even at high quality settings (90% or more), the
difference is clear (file gets larger, but quality does not).
in my JPEG coder (likely similar with gimp), much past about 90% quality,
one may as well just use PNG, as a 100% JPEG is IME both larger and looks
worse than the equivalent PNG...
as such, I think most people define "perceptually lossless" as "looks pretty
close to original" or "sounds pretty close to original", and not whether or
not one can distinguish them directly.
then again, I suspect most people can't normally see individual pixels at
1280x1024 either (or the meshwork dark grid between the phosphors, or the
pulsating of the redraw).
actually, CRTs are just odd in this way, for being at the same time both
light and dark, a subtle throbbing of momentary yet solid-seeming light over
a hidden darkness, but the illusion is imperfect.
well, worse with old macs it seems, in the right conditions you can just see
the screen redrawing (ie: after being outside in natural sunlight). screen
looks more black than light...
LCDs at least look solid, but one gets an LCD, and even new in a few spots
it may have dead pixels, and then adjacent pixels may go dark as well, so
then one has these annoying speckles where pixels have gone dead.
oh yes, and my natural vision often looks like some crappy camcorder, filled
with light and dark noise and speckles (and apparent continuous color
normalization, halo elimination, ... issues). it is a wonder I can see
anything at all...
combine this with apparent partial colorblindness, and yeah...
my world looks a little different than that seen on TV or on a computer. I
can see these versions, but it looks, different, like some alternative
universe posing as ones' own...
and at times one can wonder if they are going deaf due to the background
noise generated by their computer.
....
or something...
|
|
|
|
|