For Programmers: Free Programming Magazines  


Home > Archive > Compression > April 2008 > massive modeling effort yields excellent audio compression









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author massive modeling effort yields excellent audio compression
Mark Nelson

2008-04-02, 7:03 pm

This is pretty :

http://www.rochester.edu/news/show.php?id=3136

Researchers at the University of Rochester have digitally reproduced
music in a file nearly 1,000 times smaller than a regular MP3 file.

The music, a 20-second clarinet solo, is encoded in less than a single
kilobyte, and is made possible by two innovations: recreating in a
computer both the real-world physics of a clarinet and the physics of
a clarinet player.

The achievement, announced today at the International Conference on
Acoustics Speech and Signal Processing held in Las Vegas, is not yet a
flawless reproduction of an original performance, but the researchers
say it's getting close.

"This is essentially a human-scale system of reproducing music," says
Mark Bocko, professor of electrical and computer engineering and co-
creator of the technology. "Humans can manipulate their tongue,
breath, and fingers only so fast, so in theory we shouldn't really
have to measure the music many thousands of times a second like we do
on a CD. As a result, I think we may have found the absolute least
amount of data needed to reproduce a piece of music."

In replaying the music, a computer literally reproduces the original
performance based on everything it knows about clarinets and clarinet
playing. Two of Bocko's doctoral students, Xiaoxiao Dong and Mark
Sterling, worked with Bocko to measure every aspect of a clarinet that
affects its sound--from the backpressure in the mouthpiece for every
different fingering, to the way sound radiates from the instrument.
They then built a computer model of the clarinet, and the result is a
virtual instrument built entirely from the real-world acoustical
measurements.

The team then set about creating a virtual player for the virtual
clarinet. They modeled how a clarinet player interacts with the
instrument including the fingerings, the force of breath, and the
pressure of the player's lips to determine how they would affect the
response of the virtual clarinet. Then, says Bocko, it's a matter of
letting the computer "listen" to a real clarinet performance to infer
and record the various actions required to create a specific sound.
The original sound is then reproduced by feeding the record of the
player's actions back into the computer model.

At present the results are a very close, though not yet a perfect,
representation of the original sound.

"We are still working on including 'tonguing,' or how the player
strikes the reed with the tongue to start notes in staccato passages,"
says Bocko. "But in music with more sustained and connected notes the
method works quite well and it's difficult to tell the synthesized
sound from the original."

As the method is refined the researchers imagine that it may give
computer musicians more intuitive ways to create expressive music by
including the actions of a virtual musician in computer synthesizers.
And although the human vocal tract is highly complex, Bocko says the
method may in principle be extended to vocals as well.

The current method handles only a single instrument at a time, however
in other work in the University's Music Research Lab with post-
doctoral researcher Gordana Velikic and Dave Headlam, professor of
music theory at the University of Rochester's Eastman School of Music,
the team has produced a method of separating multiple instruments in a
mix so the two methods can be combined to produce a very compact
recording.

Bocko believes that the quality will continue to improve as the
acoustic measurements and the resulting synthesis algorithms become
more accurate, and he says this process may represent the maximum
possible data compression of music.

"Maybe the future of music recording lies in reproducing performers
and not recording them," says Bocko.

This research is funded by the National Science Foundation.

|
| Mark Nelson - http://marknelson.us
|

Christian

2008-04-02, 7:03 pm

> This is pretty :
>
> http://www.rochester.edu/news/show.php?id=3136
>
> Researchers at the University of Rochester have digitally reproduced
> music in a file nearly 1,000 times smaller than a regular MP3 file.

Well, that's surely a good work and goes a couple of steps into the
right direction. But - to be honest - this is nothing really
revolutionary! The MIDI standard has been proposed a *quarter century*
ago, and MIDI also describes sounds (e.g. in form of musical terms)
rather than using samples. Only a couple of years later, samples of
instruments (like this clarinet) have been combined with such a
"musical description", and file formats like MOD emerged. Such formats
became very popular and achieved similar astounding "compression
ratios". A proper method for a good 'wav2mod' converter is long
overdue ... ;-)

Christian
Thomas Pornin

2008-04-02, 7:03 pm

According to Christian <iBBiS@gmx.de>:
> Well, that's surely a good work and goes a couple of steps into the
> right direction. But - to be honest - this is nothing really
> revolutionary! The MIDI standard has been proposed a *quarter century*
> ago, and MIDI also describes sounds (e.g. in form of musical terms)
> rather than using samples.


The "revolutionary" part is probably in the inverse transform, i.e.
transforming an actual recording into a MIDI-like format. I can envision
that this process is hard to do automatically, and it seems that it is
what the scientists achieved here. Which is a great advance.

If they can do it for non-musical human voice, then phone companies
will be very interested. Huge amounts of money are just waiting to
happen on the smart guy who will reduce bandwidth consumption for
phone calls.


--Thomas Pornin
Mark Nelson

2008-04-02, 7:03 pm

On Apr 2, 10:41=A0am, Christian <iB...@gmx.de> wrote:
>
>
>
> Well, that's surely a good work and goes a couple of steps into the
> right direction. But - to be honest - this is nothing really
> revolutionary! The MIDI standard has been proposed a *quarter century*
> ago, and MIDI also describes sounds (e.g. in form of musical terms)
> rather than using samples. Only a couple of years later, samples of


Yep, there's not doubt that the idea of modeling musical instruments
has been around a long time. And of course, much speech compression
depends on modeling of either the human vocal tract or other features
of speech.

The hard part here, of course, is what you call wav2mod - mapping the
sampled music onto the model.

An interesting question would be whether their analysis of the played
music, when properly modeled, simply reproduces the physical actions
of the person playing the tune. If that was the case, you could
reverse engineer a recording of Benny Goodman playing Rhapsody In Blue
and see exactly how he played it - fingering, breath, etc. Pretty
crazy if it could be done.

|
| Mark Nelson - http://marknelson.us
|


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com