For Programmers: Free Programming Magazines  


Home > Archive > Compression > November 2005 > possibly cool: audio filter, decent bitrates









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author possibly cool: audio filter, decent bitrates
cr88192

2005-11-04, 6:55 pm

oh well, I was beating around with audio filtering, likely getting little
'useful' done this wend, but oh well.

I was beating at it, feeling disappointed that I couldn't really get some
"fancy" ideas that popped up working (predicting samples via recent
frequencies, ...).

I was getting better results with a slightly fudged variation of dpcm, eg,
using the last 2 samples to predict the next one, based on the assumption
that the curve will at least sort-of follow a line.

ok, it is the midpoint between the previous sample and what the sample would
be if it followed a line. strict linear prediction was actually doing worse,
probably because sounds will more often follow curves than straight lines,
and a midpoint better approximates a curve?...

(I had some other algos that were working better, but were slower and more
complicated, so I left them out for now).

this is followed by a tweaked variation of a square root (special handling
for negatives, essentially, it deals with the 'complex' case). similarly,
there is a specialized square function that treats negatives as complexes to
reverse the specialized square root.

the results of this are then quantized and arithmetic coded, me being lazy,
using a further modified version of a paq-like algo (mostly changing how it
manages the context to better suit "numerical" data).
yeah, one good point of paq-like coders is that they are easy to reuse and
tweak in my experience, even if they could be faster...


oh well, all was not so good seeming, until I noticed my bitrates. the
bitrates are falling well within mp3's domain (I can get speech down to
about 2kbps and still sort of make it out, though it does sound terrible).

imo, this seemed surprising. one would expect it almost gueranteed that a
dpcm variant would suck vs. mdct or something?...

at bitrates > about 50kbps or so, output sounds pretty decent. it probably
depends a lot on the sound though (more complex sounds are likely to not
hold up as well, or generally generate higher bitrates, or something). more
testing is likely needed for this.

it does not sound or behave much like mp3. I don't have direct bitrate
control, rather, sample rate and the quantizer step are used for controlling
the bitrate.

at low bitrates, mp3 starts sounding like it is going through a metal can or
glass bottle or something in my experience. mine just sounds "crunchy". sine
waves turn into square waves, the waveform has a harsh square-ish look, it
starts containing pops and similar, ... at some lower limit, the sound turns
into garbage though.

I don't know, all this is possibly .


another algo that had worked, so may be worth considering:
splitting the sound into "tiles", then searching past samples and
subtracting out whatever was found that most closely resembled the current
tile. in my experience, this has reduced the total output entopy some. with
tones, nearly the entire wave was silenced, and many other sounds were
filtered pretty well (however, really noisy sounds typically did a lot
worse).

as a cost, this algo was also rather slow, and I am not totally sure if it
is "worth it" in the general case. likewise, I am unsure how well it will
hold up when quantization is used.

alternatively, it could be done as a lossless step post-quantization, at
least simplifying the process somewhat. luckily, from what I saw, the
results from tile searches were pretty repetitive, so they should at least
compress well (reducing the risk of adding too much overhead).

I could try at least.


any comments?...


Ignorant

2005-11-07, 9:55 pm

some untested/untried comment:
Speech/sound/noise modelling cannot be universally applicable.
The predictor will
predict some value(frequency, decibells,....) and the delta from the
predicted is what has to
be coded to get fidelity. The predictor should be as good as in LZP
for normal data to provide a decent distribution of the deltas.
Then the codec can bother only about the delta codings.

cr88192 wrote:
> oh well, I was beating around with audio filtering, likely getting little
> 'useful' done this wend, but oh well.
>
> I was beating at it, feeling disappointed that I couldn't really get some
> "fancy" ideas that popped up working (predicting samples via recent
> frequencies, ...).
>
> I was getting better results with a slightly fudged variation of dpcm, eg,
> using the last 2 samples to predict the next one, based on the assumption
> that the curve will at least sort-of follow a line.
>
> ok, it is the midpoint between the previous sample and what the sample would
> be if it followed a line. strict linear prediction was actually doing worse,
> probably because sounds will more often follow curves than straight lines,
> and a midpoint better approximates a curve?...
>
> (I had some other algos that were working better, but were slower and more
> complicated, so I left them out for now).
>
> this is followed by a tweaked variation of a square root (special handling
> for negatives, essentially, it deals with the 'complex' case). similarly,
> there is a specialized square function that treats negatives as complexes to
> reverse the specialized square root.
>
> the results of this are then quantized and arithmetic coded, me being lazy,
> using a further modified version of a paq-like algo (mostly changing how it
> manages the context to better suit "numerical" data).
> yeah, one good point of paq-like coders is that they are easy to reuse and
> tweak in my experience, even if they could be faster...
>
>
> oh well, all was not so good seeming, until I noticed my bitrates. the
> bitrates are falling well within mp3's domain (I can get speech down to
> about 2kbps and still sort of make it out, though it does sound terrible).
>
> imo, this seemed surprising. one would expect it almost gueranteed that a
> dpcm variant would suck vs. mdct or something?...
>
> at bitrates > about 50kbps or so, output sounds pretty decent. it probably
> depends a lot on the sound though (more complex sounds are likely to not
> hold up as well, or generally generate higher bitrates, or something). more
> testing is likely needed for this.
>
> it does not sound or behave much like mp3. I don't have direct bitrate
> control, rather, sample rate and the quantizer step are used for controlling
> the bitrate.
>
> at low bitrates, mp3 starts sounding like it is going through a metal can or
> glass bottle or something in my experience. mine just sounds "crunchy". sine
> waves turn into square waves, the waveform has a harsh square-ish look, it
> starts containing pops and similar, ... at some lower limit, the sound turns
> into garbage though.
>
> I don't know, all this is possibly .
>
>
> another algo that had worked, so may be worth considering:
> splitting the sound into "tiles", then searching past samples and
> subtracting out whatever was found that most closely resembled the current
> tile. in my experience, this has reduced the total output entopy some. with
> tones, nearly the entire wave was silenced, and many other sounds were
> filtered pretty well (however, really noisy sounds typically did a lot
> worse).
>
> as a cost, this algo was also rather slow, and I am not totally sure if it
> is "worth it" in the general case. likewise, I am unsure how well it will
> hold up when quantization is used.
>
> alternatively, it could be done as a lossless step post-quantization, at
> least simplifying the process somewhat. luckily, from what I saw, the
> results from tile searches were pretty repetitive, so they should at least
> compress well (reducing the risk of adding too much overhead).
>
> I could try at least.
>
>
> any comments?...


cr88192

2005-11-07, 9:55 pm


"Ignorant" <shunya@vsnl.com> wrote in message
news:1131188955.329102.267530@g43g2000cwa.googlegroups.com...
> some untested/untried comment:
> Speech/sound/noise modelling cannot be universally applicable.
> The predictor will
> predict some value(frequency, decibells,....) and the delta from the
> predicted is what has to
> be coded to get fidelity. The predictor should be as good as in LZP
> for normal data to provide a decent distribution of the deltas.
> Then the codec can bother only about the delta codings.
>


my algo is not tuned much for any particular kind of sound, and the algo
itself does not care the nature of the sound it is modelling (it does not
seperate 'tone' from 'noise', partly since in the past I have never been
able to effectively seperate these anyways).

however, not everything compresses evenly as well. typically noisier sounds
compress less well it seems.

quality between my filter and mp3:
for similar bitrates, they sound rather different, making exact comparrision
difficult. mp3 opts to put the sound in a metal can, mine opts to make it
sound "harsh". however, I will lean on mp3's side in saying that, in
general, it sounds more normal...

the tile filter was of varying success, sometimes working quite well,
sometimes not so much. it is also slow to a barely (or maybe not) tolerable
level.

nothing is modeled per-se, choices for the tile filter are based mostly on
entropy (find the tile that can be subtracted for the greatest decrease in
entropy).

but, yeah, the general algo does need to be tuned for audio at least, as
otherwise it is not lilely to do much of anything useful (typical data
compressors do rather poor on audio data).


so what anyways? I don't know.
it is doubtful there was much of any real point to this.


too tired right now to write that much else...


Jim Leonard

2005-11-09, 6:55 pm

cr88192 wrote:
> another algo that had worked, so may be worth considering:
> splitting the sound into "tiles", then searching past samples and
> subtracting out whatever was found that most closely resembled the current
> tile. in my experience, this has reduced the total output entopy some. with
> tones, nearly the entire wave was silenced, and many other sounds were
> filtered pretty well (however, really noisy sounds typically did a lot
> worse).


This reminds me of mp3's psycho-acoustic modeling, where certain
frequencies are outright discarded if a neighboring/similar frequency
is significantly louder such that the original frequency would be
inaudible to the human ear because of it.

All of this sounds like you're applying traditional image techniques to
audio -- interesting, but I think you'll hit a limit. Generic image
techniques work on 2-dimensional data, while generic audio techniques
work on 1-dimensional data. You can apply 1d techniques to 2d by
treating each dimension separately -- but you can't really force a
linear 1d data source (sound) into a 2d structure (image) because it
generally doesn't corellate that way.

Still, your "tile" idea with audio is interesting, but you'd have to do
some FFT to determine what the best tile size is (for example, simple
beat detection so that each tile began with the beat/measure), and if
you're doing FFT on audio you might as well code up the rest of your
mp3 encoder :-) or better yet just use lame ;-)

It's always the same tradeoff -- decompression speed vs. complexity.

Jim Leonard

2005-11-09, 6:55 pm

cr88192 wrote:
> for similar bitrates, they sound rather different, making exact comparrision
> difficult. mp3 opts to put the sound in a metal can, mine opts to make it
> sound "harsh". however, I will lean on mp3's side in saying that, in
> general, it sounds more normal...


The "harsh" is prediction error and/or over-quanitzation; this is
typical for ADPCM schemes. For example, most people are familiar with
4-bit ADPCM when used with 16-bit samples; some distortion, but it's
acceptable, especially if the sample rate is high. But try much lower
sample rates (8Khz or lower), or smaller output tokens (3-bit or 2-bit
ADPCM) and you get the same "harsh" sound. 3-bit is usable for voice
only; 2-bit is mostly crap.

cr88192

2005-11-09, 6:55 pm


"Jim Leonard" <MobyGamer@gmail.com> wrote in message
news:1131566494.345640.72100@g44g2000cwa.googlegroups.com...
> cr88192 wrote:
>
> The "harsh" is prediction error and/or over-quanitzation; this is
> typical for ADPCM schemes. For example, most people are familiar with
> 4-bit ADPCM when used with 16-bit samples; some distortion, but it's
> acceptable, especially if the sample rate is high. But try much lower
> sample rates (8Khz or lower), or smaller output tokens (3-bit or 2-bit
> ADPCM) and you get the same "harsh" sound. 3-bit is usable for voice
> only; 2-bit is mostly crap.
>


yeah. I wasn't quantizing to a fixed bit-count though, rather I was using an
integer divisor and arithmetic coding the results.

usually I was getting harsh results with quantizer values like 64 and above
(note: the difference is already converted to the single byte range).
a quantizer value of 64 leaves, typically, about 2 bits/sample (at most). 96
sometimes works. values like 128 (1 bit left), however, almost exclusively
turns the sound into either garbage or silence.

32 (about 3 bits) generally sounds ok, if a little poor. 16 and lower have
not all that noticable results wrt quality (similar to those of adpcm).

1 implies no real loss apart from the original conversion from 16 bit to 8
bit (square root) samples, and from clamping to the byte range (for loud
samples with differences much above 16384, it is possible for quality loss
due to clamping).

annoyingly, these effects are mildly noticable in terms of sound quality
(for pure tones), but for most other sounds it doesn't really seem that
noticable (some sounds, however, came off sounding a little "flat" vs the
originals).


cr88192

2005-11-09, 6:55 pm


"Jim Leonard" <MobyGamer@gmail.com> wrote in message
news:1131566264.396974.70580@g43g2000cwa.googlegroups.com...
> cr88192 wrote:
>
> This reminds me of mp3's psycho-acoustic modeling, where certain
> frequencies are outright discarded if a neighboring/similar frequency
> is significantly louder such that the original frequency would be
> inaudible to the human ear because of it.
>

ok.

nothing like this was done in my filter.

> All of this sounds like you're applying traditional image techniques to
> audio -- interesting, but I think you'll hit a limit. Generic image
> techniques work on 2-dimensional data, while generic audio techniques
> work on 1-dimensional data. You can apply 1d techniques to 2d by
> treating each dimension separately -- but you can't really force a
> linear 1d data source (sound) into a 2d structure (image) because it
> generally doesn't corellate that way.
>

I know, they are 1D filters.

I work by searching backwards for some maximum amount of time (this is
tunable, because longer searches make encoding take a lot longer).
as of yet, I have not figured out any good way to optimize these searches.

> Still, your "tile" idea with audio is interesting, but you'd have to do
> some FFT to determine what the best tile size is (for example, simple
> beat detection so that each tile began with the beat/measure), and if
> you're doing FFT on audio you might as well code up the rest of your
> mp3 encoder :-) or better yet just use lame ;-)
>

hmm, actually, using mpglib and lame is what I have been doing already.
just, I am not entirely sure about mp3, eg, because what if the patent issue
actually becomes a problem?...


yeah. at one point I had tried a variation of the algo that would also look
for the best length as well, but this was very slow. the current algo is a
little faster, but uses a fixed size (settable on the command line). I used
trial and error to determine a decent "general purpose" tile size (presently
about 64 samples).

the main problem is, however, that the search is unbearably slow if much
area is covered.

another thought is that it may be possible to develop a hueristic method to
deal with this issue (generating a set of common tiles and using an mru
scheme or something for dealing with them).

the problem then would be that both the encoder and decoder would need to
maintain this cache, which could slow decoding.

using this algo plain, however, could lead to a faster encoder (with
possibly better prediction).

> It's always the same tradeoff -- decompression speed vs. complexity.
>

yes.

my current algo decodes fairly quickly at least...


onehappymadman@yahoo.com

2005-11-09, 6:55 pm


cr88192 wrote:
> "Jim Leonard" <MobyGamer@gmail.com> wrote in message
> news:1131566264.396974.70580@g43g2000cwa.googlegroups.com...
> ok.
>
> nothing like this was done in my filter.
>
> I know, they are 1D filters.
>
> I work by searching backwards for some maximum amount of time (this is
> tunable, because longer searches make encoding take a lot longer).
> as of yet, I have not figured out any good way to optimize these searches.
>
> hmm, actually, using mpglib and lame is what I have been doing already.
> just, I am not entirely sure about mp3, eg, because what if the patent issue
> actually becomes a problem?...


have you looked into ogg vorbis?

"Ogg Vorbis is a completely open, patent-free, professional audio
encoding and streaming technology with all the benefits of Open
Source."

http://www.vorbis.com/

cr88192

2005-11-10, 9:55 pm


<onehappymadman@yahoo.com> wrote in message
news:1131580266.319888.312240@z14g2000cwz.googlegroups.com...
>
> cr88192 wrote:
<snip>[color=darkred]
>
> have you looked into ogg vorbis?
>
> "Ogg Vorbis is a completely open, patent-free, professional audio
> encoding and streaming technology with all the benefits of Open
> Source."
>
> http://www.vorbis.com/
>

yes.

actually, ogg vorbis is not ruled out anyways, more like, ogg is my current
choice for if I am forced away from mp3 for some reason.

now, other thoughts were along the line of "but what if I don't want to use
ogg for some reason?...".

but, at this point, I am using mp3...

my fiddling had shown that, if I don't want to, I may not have to use
either. I can do some interesting things without resorting to dct (the core
of both mp3 and vorbis).

this was interesting, at least:
I can compress images without dct, getting "decent enough" results with just
filtering and entropy coding (but then using png for most things, as png
does what I want);
with a little more work, I can compress audio without dct as well.

the issue is then, just as cases pop up where I can't/don't want to use png,
but still want compression (typically with specialized
computationally-generated data), similar cases may pop up with audio ("sound
maps?...", ok, I would need something that couldn't be done well with normal
mixing and such...). issue is, not much needs to be done with sound, and
sound is fairly cheap to work with computationally (simple effects like
basic echos and doppler shifting), so yeah...

pointless, but oh well...


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com