Page 9


MPEG-4 Low Delay AAC

An important topic for many real world codec applications is delay. When announcers use codecs for a broadcast remote application, they often need to have natural two-way interaction with other program participants located back at the studio or callers via telephone lines.

Because it is a hot topic for engineers working in the field of Internet telephony, a number of studies have been conducted to determine user?s reactions to delays in telephone conversations. The data apply directly to the application of hi-fi codecs to remotes, so it is interesting to take a peak over the shoulder of the telecom boys to see what they have learned. 

Usually in broadcast, we try to arrange our system set-up so that there is no path for the remote announcer?s voice to return to his headphones. But sometimes echo is unavoidable. For example, this can occur when a telephone hybrid has leakage or when a studio announcer has open-air headphones turned-up loud. (Could happen, no?)

When there is no echo, it has been discovered that anything less than 100 ms one-way delay permits normal interactivity. Between 100 and 250 ms is considered ?acceptable.? ITU-T standard G.114 recommends 150 ms as the maximum for ?good? interactivity.

Echo introduces a different case. As you might expect, echo is more or less annoying depending upon both the length of time it is delayed and its level. Telephone researchers have measured and quantified reactions, and ITU-T G.131 reports the findings and makes recommendations.

There are codecs using other than perceptual technologies that have lower delay, but they are not as powerful. That is, for a given bitrate, they do not achieve fidelity as good as the MPEG ones we have been examining. The common G.722 is an example. It uses ADPCM (Adaptive Delta Pulse Code Modulation), which can have delay as low as 10 ms, but with much poorer quality. So the question arises: Is it possible to have high quality and low delay in the same codec? Until recently, the answer was no. But new developments in codecs have changed the picture.

One of the main objectives in audio coding is to provide the best tradeoff between quality and bitrate. In general, this goal can only be achieved at the cost of a certain coding delay. Codecs for voice telephone applications have use ADPCM and CELP because they have much lower delay than perceptual codecs, as is required for interactive conversation. These are optimized for voice and can have reasonably good performance for speech signals ? but are poor for music or mixed signals that include voice and ambient sounds.

(more) more