For Programmers: Free Programming Magazines  


Home > Archive > Compression > October 2004 > Audio Compression...









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Audio Compression...
Eric D. Brown

2004-10-01, 3:55 am

Alright, I'm a bit on this...

In a 44,100hz 16-bit mono channel, there's approximately
1,740,509,884,615,490,592,578 different audio possibilities per 10,000th
of a second (88,200 / 10,000 = 8.82 | 256^8.82). I know the human ear
couldn't differentiate even a fraction of this many possibilities...
even in an ENTIRE SECOND. So, with this stated... am I wrong to assume
that there's still ALOT to be gained in lossy audio compression?

Or, am I'm referencing to something that would be more of trying to
index all those audio possibilities? Even if we only had a specific
library of 300,000 possible sounds per second, per channel... we'd have
to store 26,460,000,000 bytes for the dictionary (88,200 * 300,000).
This wouldn't be that beneficial; however, would it be impossible to
find a mathematical process to calculate (or some kind of mathematical
hash on a sample of audio) to differentiate similar sounds over a period
of time?

I hope I'm not confusing you in my post... and I hope that it's not too
far fetched!
Earl Colby Pottinger

2004-10-01, 3:55 pm

"Eric D. Brown" <nospam@nowhere.com> :

> Alright, I'm a bit on this...
>
> In a 44,100hz 16-bit mono channel, there's approximately
> 1,740,509,884,615,490,592,578 different audio possibilities per 10,000th
> of a second (88,200 / 10,000 = 8.82 | 256^8.82). I know the human ear
> couldn't differentiate even a fraction of this many possibilities...
> even in an ENTIRE SECOND. So, with this stated... am I wrong to assume
> that there's still ALOT to be gained in lossy audio compression?
>
> Or, am I'm referencing to something that would be more of trying to
> index all those audio possibilities? Even if we only had a specific
> library of 300,000 possible sounds per second, per channel... we'd have
> to store 26,460,000,000 bytes for the dictionary (88,200 * 300,000).
> This wouldn't be that beneficial; however, would it be impossible to
> find a mathematical process to calculate (or some kind of mathematical
> hash on a sample of audio) to differentiate similar sounds over a period
> of time?
>
> I hope I'm not confusing you in my post... and I hope that it's not too
> far fetched!


No, you are just so far off that is not even funny. Most sounds in reallife
don't have the waveforms jumping up and down at random. The range of
possible sounds of interest to a human being is a fraction the possible
waveforms. Compression software is designed for those types of sounds. In
the case where there is true random/chaostic values in a waveform we just
used lossless recording techquinces. Always remember also that the sampling
equipment's limits will do some filtering of the recorded sound also.

However in reallife even the limited range of sounds that humans are
interested will cover far more than your very limited 300,000 possible
sounds. Remember you need to cover every possible sound a human may hear
from traffic noise, to speech, to singing, to musical instruments and lots
more and every combination of them all. Never watched a movie with
background music, two people talking and the traffic noise fading out?
Earl Colby Pottinger

--
I make public email sent to me! Hydrogen Peroxide Rockets, OpenBeos,
SerialTransfer 3.0, RAMDISK, BoatBuilding, DIY TabletPC. What happened to
the time? http://webhome.idirect.com/~earlcp
Phil Frisbie, Jr.

2004-10-01, 3:55 pm

Eric D. Brown wrote:

> Alright, I'm a bit on this...
>
> In a 44,100hz 16-bit mono channel, there's approximately
> 1,740,509,884,615,490,592,578 different audio possibilities per 10,000th
> of a second (88,200 / 10,000 = 8.82 | 256^8.82). I know the human ear
> couldn't differentiate even a fraction of this many possibilities...
> even in an ENTIRE SECOND. So, with this stated... am I wrong to assume
> that there's still ALOT to be gained in lossy audio compression?
>
> Or, am I'm referencing to something that would be more of trying to
> index all those audio possibilities? Even if we only had a specific
> library of 300,000 possible sounds per second, per channel... we'd have
> to store 26,460,000,000 bytes for the dictionary (88,200 * 300,000).
> This wouldn't be that beneficial; however, would it be impossible to
> find a mathematical process to calculate (or some kind of mathematical
> hash on a sample of audio) to differentiate similar sounds over a period
> of time?


Have you read about MP3? That is just one example of a lossy compression
algorithm that first throws out the parts of the sounds that most people cannot
differentiate.

Or are you thinking you might have an idea for a new algorithm?

> I hope I'm not confusing you in my post... and I hope that it's not too
> far fetched!


--
Phil Frisbie, Jr.
Hawk Software
http://www.hawksoft.com

Jim Leonard

2004-10-01, 8:55 pm

"Eric D. Brown" <nospam@nowhere.com> wrote in message news:<10lpo5gprjnfg31@corp.supernews.com>...
> In a 44,100hz 16-bit mono channel, there's approximately
> 1,740,509,884,615,490,592,578 different audio possibilities per 10,000th
> of a second (88,200 / 10,000 = 8.82 | 256^8.82).


I can't even begin to explain how screwed up your math is. Where did
the "10,000th of a second" value come from and why is it factored into
your calculations?

In any digitized sound sampled at 44100 samples per second, there are
up to 22050 frequencies represented in that second. This is standard
Nyquist. With that in mind, try asking your question again.
Eric D. Brown

2004-10-01, 8:55 pm

Jim Leonard wrote:
> "Eric D. Brown" <nospam@nowhere.com> wrote in message news:<10lpo5gprjnfg31@corp.supernews.com>...
>
>
>
> I can't even begin to explain how screwed up your math is. Where did
> the "10,000th of a second" value come from and why is it factored into
> your calculations?


I beg your pardon...

Is not 44,100 bytes x 16 bit = 88,200 bytes?

Is not 88,200 bytes (per second) / 10,000 (10,000th of a second) = 8.82
bytes per 10,000th of a second?

Is not 256^8.82 = approx. 1,740,509,884,615,490,592,578 different
possibilities?

My math is screwed up, eh?

>
> In any digitized sound sampled at 44100 samples per second, there are
> up to 22050 frequencies represented in that second. This is standard
> Nyquist. With that in mind, try asking your question again.


Yes, I understand that you need double the sampling rate of the desired
frequency (Nyquist), to reconstruct the frequency. If there's 44,100
bytes per second recorded on a mono, 8-bit channel... then changing just
the LSB of 1 byte of the sample will change the sound (though, probably
not audible by the human ear). That being said, refer back to my
original question.

It's great how somebody can ask a simple question in this place... and
quickly get thrown at with stones by those of you who think your shit
don't stink. News flash: Grab your next turd and stick it in your
nose... see how fast reality knocks your ass over!
Eric D. Brown

2004-10-01, 8:55 pm

Earl Colby Pottinger wrote:
> "Eric D. Brown" <nospam@nowhere.com> :
>
>
>
>
> No, you are just so far off that is not even funny. Most sounds in reallife
> don't have the waveforms jumping up and down at random. The range of
> possible sounds of interest to a human being is a fraction the possible
> waveforms. Compression software is designed for those types of sounds. In
> the case where there is true random/chaostic values in a waveform we just
> used lossless recording techquinces. Always remember also that the sampling
> equipment's limits will do some filtering of the recorded sound also.
>
> However in reallife even the limited range of sounds that humans are
> interested will cover far more than your very limited 300,000 possible
> sounds. Remember you need to cover every possible sound a human may hear
> from traffic noise, to speech, to singing, to musical instruments and lots
> more and every combination of them all. Never watched a movie with
> background music, two people talking and the traffic noise fading out?
> Earl Colby Pottinger
>


Thanks Earl... BTW, the 300,000 index was just an example. I know the
human ear can differentiate over 300,000 different audio samples within
a one-second duration.
Phil Carmody

2004-10-02, 8:55 am

trixter@despammed.com (Jim Leonard) writes:

> "Eric D. Brown" <nospam@nowhere.com> wrote in message news:<10lpo5gprjnfg31@corp.supernews.com>...
>
> I can't even begin to explain how screwed up your math is. Where did
> the "10,000th of a second" value come from and why is it factored into
> your calculations?


He came up with it. He's permitted to do that, it's his post.
What period of time should he have used as an example? If 1/44100-th of
a second then his figure would be 65536. If he used 1 second then the
figure would be 5.82*10^212406, and if he used 1/10000-th of a second,
which he did, then the figure is 1740509884615490592578. He even
explains, using somewhat unusual punctuation and logic, where he got
the figure from - it's the parenthetical remark. (2^16)^(44100/10000)
would have been a more sensible remark, but both evaluate to the same
thing.

> In any digitized sound sampled at 44100 samples per second, there are
> up to 22050 frequencies represented in that second. This is standard
> Nyquist. With that in mind, try asking your question again.


You have a plank in your eye, sir. Try reading his post again.

Phil
--
They no longer do my traditional winks tournament lunch - liver and bacon.
It's just what you need during a winks tournament lunchtime to replace lost
.... liver. -- Anthony Horton, 2004/08/27 at the Cambridge 'Long Vac.'
Earl Colby Pottinger

2004-10-03, 3:55 am

"Eric D. Brown" <nospam@nowhere.com> :

> Alright, I'm a bit on this...
>
> In a 44,100hz 16-bit mono channel, there's approximately
> 1,740,509,884,615,490,592,578 different audio possibilities per 10,000th
> of a second (88,200 / 10,000 = 8.82 | 256^8.82). I know the human ear
> couldn't differentiate even a fraction of this many possibilities...
> even in an ENTIRE SECOND. So, with this stated... am I wrong to assume
> that there's still ALOT to be gained in lossy audio compression?
>
> Or, am I'm referencing to something that would be more of trying to
> index all those audio possibilities? Even if we only had a specific
> library of 300,000 possible sounds per second, per channel... we'd have
> to store 26,460,000,000 bytes for the dictionary (88,200 * 300,000).
> This wouldn't be that beneficial; however, would it be impossible to
> find a mathematical process to calculate (or some kind of mathematical
> hash on a sample of audio) to differentiate similar sounds over a period
> of time?
>
> I hope I'm not confusing you in my post... and I hope that it's not too
> far fetched!


No, you are just so far off that is not even funny. Most sounds in reallife
don't have the waveforms jumping up and down at random. The range of
possible sounds of interest to a human being is a fraction the possible
waveforms. Compression software is designed for those types of sounds. In
the case where there is true random/chaostic values in a waveform we just
used lossless recording techquinces. Always remember also that the sampling
equipment's limits will do some filtering of the recorded sound also.

However in reallife even the limited range of sounds that humans are
interested will cover far more than your very limited 300,000 possible
sounds. Remember you need to cover every possible sound a human may hear
from traffic noise, to speech, to singing, to musical instruments and lots
more and every combination of them all. Never watched a movie with
background music, two people talking and the traffic noise fading out?
Earl Colby Pottinger

--
I make public email sent to me! Hydrogen Peroxide Rockets, OpenBeos,
SerialTransfer 3.0, RAMDISK, BoatBuilding, DIY TabletPC. What happened to
the time? http://webhome.idirect.com/~earlcp
Bumped

2004-10-03, 8:38 pm

Earl Colby Pottinger <earlcp@idirect.com> wrote in message news:<C9udnYDHQ8P-GMLcRVn-rg@look.ca>...
> "Eric D. Brown" <nospam@nowhere.com> :
>
>
> No, you are just so far off that is not even funny. Most sounds in reallife
> don't have the waveforms jumping up and down at random. The range of
> possible sounds of interest to a human being is a fraction the possible
> waveforms. Compression software is designed for those types of sounds. In


I think that what he means is that the human ear can only hear that
many distinct sounds at any particular instant. That in the normal
range of hearing, based on frequency (and 'other' stuff), the human
ear can only differentiate between "X" number of sounds.

Kind of like human sight. It can only see around 16 million different
shandes of color. By sheer chance, that works out pretty well with
computer's using 8 bits for each RGB, for a total of 24 bits, or a
total of 16,777,216 different colors. Doing colors as 64 bits would
be a waste because no human could tell the difference.

A picture is made up of many of those seperate distictly visible
pixels. Picture compression takes advantage of redundancy and
patterns. Motion picture image compression takes advantage of
multiple frames and the similarities between them. Including possibly
non-identical similarities.

I think he's saying that the human ear is a little like that. With
what we call "sound" being made up of many instances of seperate
things.

And he's wondering if there was some way to take advantage of those
discrete sound events, and use some method to predict what the next
one would be, and so on. And use things like that as a form of audio
compression. Rather than basing it on more traditional methods, like
mp3 etc.

Kind of like compressing a regular text dictionary by using spelling
rules.

Or compressing a full book by using spelling and punctuation rules to
remove redundancy. And then throw in a pkzip style dictionary based
compression method where previous snippets are stored for later use.
With a possible "lossy" nature to allow very similar snippets to be
used instead of identical snippets.


I'm not so sure of his numbers, or even that the idea is a good idea,
though.
Earl Colby Pottinger

2004-10-04, 3:55 am

"Eric D. Brown" <nospam@nowhere.com> :

> Thanks Earl... BTW, the 300,000 index was just an example. I know the
> human ear can differentiate over 300,000 different audio samples within
> a one-second duration.


Ok, and Bumped has also pointed out to me that if appoached right with enough
diffirent samples that there may be something to your idea, so don't give up
- Don't make the mistake that you are the new genuis of compression, but also
don't give up if you have the time, you may figure out something new too.

Earl Colby Pottinger


--
I make public email sent to me! Hydrogen Peroxide Rockets, OpenBeos,
SerialTransfer 3.0, RAMDISK, BoatBuilding, DIY TabletPC. What happened to
the time? http://webhome.idirect.com/~earlcp
Earl Colby Pottinger

2004-10-04, 3:55 am

jtmh.0210@bumpymail.com (Bumped) :

> I think that what he means is that the human ear can only hear that
> many distinct sounds at any particular instant. That in the normal
> range of hearing, based on frequency (and 'other' stuff), the human
> ear can only differentiate between "X" number of sounds.


I don't think the human ear is so limited. It is possible to listen to a
conversation even with background noises.

> Kind of like human sight. It can only see around 16 million different
> shandes of color. By sheer chance, that works out pretty well with
> computer's using 8 bits for each RGB, for a total of 24 bits, or a
> total of 16,777,216 different colors. Doing colors as 64 bits would
> be a waste because no human could tell the difference.


Some people can see more than 256 levels (usually graphic artists and
photography people), infact I have seen some of the single step diffirence on
VERY High-end Mac monitors myself.

On Macs there were video cards with more than eight bits per colour gun (10
bits?), and there are CMYB video cards that are using eight bits per colour
for a total of 32 bits of colour information. But I don't know how many bits
are driving each colour gun.

[SNIP suggestions]

I think you are right. I jumped the gun, maybe there is something to his
idea, it should be tested more first.

I was wrong, sorry.

Earl Colby Pottinger

--
I make public email sent to me! Hydrogen Peroxide Rockets, OpenBeos,
SerialTransfer 3.0, RAMDISK, BoatBuilding, DIY TabletPC. What happened to
the time? http://webhome.idirect.com/~earlcp
Earl Colby Pottinger

2004-10-04, 3:55 am

"Eric D. Brown" <nospam@nowhere.com> :

> Yes, I understand that you need double the sampling rate of the desired
> frequency (Nyquist), to reconstruct the frequency. If there's 44,100
> bytes per second recorded on a mono, 8-bit channel... then changing just
> the LSB of 1 byte of the sample will change the sound (though, probably
> not audible by the human ear). That being said, refer back to my
> original question.
>
> It's great how somebody can ask a simple question in this place... and
> quickly get thrown at with stones by those of you who think your shit
> don't stink. News flash: Grab your next turd and stick it in your
> nose... see how fast reality knocks your ass over!


Why so hostile to him and not my post? The point he raised is that you did
not give reasons for why you did the choices you did, they seen to be values
picked out of thin air. Please remember we get a lot of posts here from
people who think they are smarter than people in the usenet group who have
been studying and posting about compression for the last ten years or more.
One thing that usually shows up with them is a lack of detail about thier
compression ideas, The more details and carefully done math, the better the
responses.

Earl Colby Pottinger


--
I make public email sent to me! Hydrogen Peroxide Rockets, OpenBeos,
SerialTransfer 3.0, RAMDISK, BoatBuilding, DIY TabletPC. What happened to
the time? http://webhome.idirect.com/~earlcp
Jim Leonard

2004-10-04, 8:55 am

"Eric D. Brown" <nospam@nowhere.com> wrote in message news:<10lrfm3cj9kv171@corp.supernews.com>...
> Is not 88,200 bytes (per second) / 10,000 (10,000th of a second) = 8.82
> bytes per 10,000th of a second?


I suppose, reading your initial question over again since my initial
reply, that what you're really asking is "is it possible to represent
each finite sound in a dictionary for space savings", in which case
the our friend the counting theorum resoundingly replies no, as the
overhead in managing said dictionary would eat the space savings. But
the reason I responded so harshly is because I felt you asked your
question immediately after you thought about it, instead of doing a
little legwork/research first.

A sound, simple or complex, sampled at a period of 44100Hz, can have
no more than 22050 distinct frequencies per second, regardless of
their amplitude and/or dynamic range. By converting words to bytes,
choosing a goofy divisor, etc. you decoupled the relationship between
the data and what it represented, which makes it difficult if not
impossible to talk about how to encode the representation.

Look, sorry if I responded harshly, but come on, you could have put a
little bit more thought into what you were asking.

> Grab your next turd and stick it in your
> nose... see how fast reality knocks your ass over!


How eloquent. Thanks, I'll have to try that sometime.
Bumped

2004-10-04, 3:55 pm

Earl Colby Pottinger <earlcp@idirect.com> wrote in message news:<8bWdnVIwitFVQf3cRVn-jg@look.ca>...
> jtmh.0210@bumpymail.com (Bumped) :
>
>
> I don't think the human ear is so limited. It is possible to listen to a
> conversation even with background noises.


As I said at the end, I wasn't so sure about his numbers!

Background noise, conversation, etc. are sequence of points. It's the
progression of time that adds the complexity.

It's the brain's processing and recognition ability that lets it sort
out a conversation in a noisy room.

I remember old computers (back in the 80's) that could only do 6 bit
D/A conversion. A mere 64 different values. But it was enough to
create rather complex sounds. Voices, music, etc.

Still, I too wonder just what the resolution of the human ear is.

EXCLUDING volume, just how sensitive is the ear. Can it hear the
difference between 16000hz and 16000.1hz? What about phase. Is the
freq response linear or some other shape? And whatever else.

When all those factors are accounted for, the human ear might have
less detection ability than we normally expect. We might be getting
fooled by volume and the progression of time making the sequence of
points seem more complex.

That's just a guess / question, since I really don't know.

>
> Some people can see more than 256 levels (usually graphic artists and
> photography people), infact I have seen some of the single step diffirence on
> VERY High-end Mac monitors myself.


I was basing that on some tests that were done some years ago. I
really can't remember when it was. (I'm wanting to say it was the
people who do the 'official' color reference cards, but I'm not sure.)
They printed out a lot of cards of one color and a letter or shape of
a very slightly different color, and based on their tests, the
'average' eye could tell around 16 millions shades of color.

Some colors are easier to see than others, of course. Nearly
everybody has worse vision abilities with blue than red. You can see
more shades of red than of blue. (With some old video card modes,
some 16 bit modes give 6 bits to red and green and just 4 to blue, for
that very reason.)

It's also possible that emissive color sources (monitors) are
different than the reflective color sources they tested. Right off
the top of my head, I don't know any research for emissive colors.
(It's not something I've ever had the urge to check into.)

Still, I doubt it'd be more than an extra bit here and there of actual
color change (while keeping the brightness the same)

Reflective is probably around the 16 million they determined years
ago. Maybe an extra bit or so. And it's probably not distributed
evenly.

For emissive, it's possible it might be a little different, but I
wouldn't think it'd be more than double the number of colors. And the
distribution might be slightly different.

> I think you are right. I jumped the gun, maybe there is something to his
> idea, it should be tested more first.
>
> I was wrong, sorry.


Okay. Fair enough.
Earl Colby Pottinger

2004-10-05, 3:55 am

32 bits cards mean we can give each colour 10 bits and I am sure that breaks
the limits of anyone to tell the difference of single step values evevn when
compared side by side.

I also know some colour models assume 16-32 bits per colour gun for graphic
editing purposes. IE divide all colour values by 5 then mulitply them all by
4 will give very diffirent results than multiply all colour values by .8 if
you use only eight bits per gun.

Earl Colby Pottinger


--
I make public email sent to me! Hydrogen Peroxide Rockets, OpenBeos,
SerialTransfer 3.0, RAMDISK, BoatBuilding, DIY TabletPC. What happened to
the time? http://webhome.idirect.com/~earlcp
Bumped

2004-10-05, 3:55 pm

Earl Colby Pottinger <earlcp@idirect.com> wrote in message news:<gYCdnYoF6f1r1f_cRVn-oQ@look.ca>...
> 32 bits cards mean we can give each colour 10 bits and I am sure that breaks
> the limits of anyone to tell the difference of single step values evevn when
> compared side by side.


I don't think most current video cards give each color 10 bits when in
32 bit mode. (Some monitors / video cards may be different, of
course.)

I think they give each color 8 bits for a total of 24 and then an
extra 8 bits for the 'alpha' channel. Or the other 8 bits may even
get ignored. 32 bits is more about data alignment than any major
difference from 24 bit mode. And, of course, 24 bits of color is 16
million different colors.

But yeah, It's reasonably safe to say that 10 bits per color is
probably enough for visual purposes.


> I also know some colour models assume 16-32 bits per colour gun for graphic


There can indeed be good reasons to have more colors during video
editing or creation. Games or video creation etc. But they don't
make it to the video output.


But this has gotten a bit off topic.
Jim Leonard

2004-10-05, 8:55 pm

Earl Colby Pottinger <earlcp@idirect.com> wrote in message news:<gYCdnYoF6f1r1f_cRVn-oQ@look.ca>...
> 32 bits cards mean we can give each colour 10 bits and I am sure that breaks
> the limits of anyone to tell the difference of single step values evevn when
> compared side by side.


Actually, "32-bit" color in the traditional sense is 24-bit color (8
bits per channel) plus an additional alpha channel for transparency.
An additional definition for 32-bit color, in the x86 Windows world
anyway, is 24-bit color that is doubleword-aligned (aligned on 32-bit
boundaries) for speed. "32-bit" display modes waste RAM but the speed
boost is generally worth it.

Typically, the usual upgrade to graphics from 24-bit color (8 bits per
channel) is to 36-bit color (12 bits per channel). Past that, there
is rarely a need to go higher for output/display purposes.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com