Analysis and coding of high quality audio signals
Ning, Daryl (2003) Analysis and coding of high quality audio signals. PhD thesis, Queensland University of Technology.
Digital audio is increasingly becoming more and more a part of our daily lives. Unfortunately, the excessive bitrate associated with the raw digital signal makes it an extremely expensive representation. Applications such as digital audio broadcasting, high definition television, and internet audio, require high quality audio at low bitrates. The field of audio coding addresses this important issue of reducing the bitrate of digital audio, while maintaining a high perceptual quality. Developing an efficient audio coder requires a detailed analysis of the audio signals themselves. It is important to find a representation that can concisely model any general audio signal. In this thesis, we propose two new high quality audio coders based on two different audio representations - the sinusoidal-wavelet representation, and the warped linear predictive coding (WLPC)-wavelet representation. In addition to high quality coding, it is also important for audio coders to be flexible in their application. With the increasing popularity of internet audio, it is advantageous for audio coders to address issues related to real-time audio delivery. The issue of bitstream scalability has been targeted in this thesis, and therefore, a third audio coder capable of bitstream scalability is also proposed. The performance of each of the proposed coders was evaluated by comparisons with the MPEG layer III coder. The first coder proposed is based on a hybrid sinusoidal-wavelet representation. This assumes that each frame of audio can be modelled as a sum of sinusoids plus a noisy residual. The discrete wavelet transform (DWT) is used to decompose the residual into subbands that approximate the critical bands of human hearing. A perceptually derived bit allocation algorithm is then used to minimise the audible distortions introduced from quantising the DWT coefficients. Listening tests showed that the coder delivers near-transparent quality for a range of critical audio signals at G4 kbps. It also outperforms the MPEG layer III coder operating at this same bitrate. This coder, however, is only useful for high quality coding, and is difficult to scale to operate at lower rates. The second coder proposed is based on a hybrid WLPC-wavelet representation. In this approach, the spectrum of the audio signal is estimated by an all pole filter using warped linear prediction (WLP). WLP operates on a warped frequency domain, where the resolution can be adjusted to approximate that of the human auditory system. This makes the inherent noise shaping of the synthesis filter even more suited to audio coding. The excitation to this filter is transformed using the DWT and perceptually encoded. Listening tests showed that near-transparent coding is achieved at G4 kbps. The coder was also found to be slightly superior to the MPEG layer III coder operating at this same bitrate. The third proposed coder is similar to the previous WLPC-wavelet coder, but modified to achieve bitstream scalability. A noise model for high frequency components is included to keep the overall bitrate low, and a two stage quantisation scheme for the DWT coefficients is implemented. The first stage uses fixed rate scalar and vector quantisation to provide a coarse approximation of the coefficients. This allows for low bitrate, low quality versions of the input signal to be embedded in the overall bitstream. The second stage of quantisation adds detail to the coefficients, and hence, enhances the quality of the output signal. Listening tests showed that signal quality gracefully improves as the bitrate increases from 16 kbps to SO kbps. This coder has a performance that is comparable to the MPEG layer III coder operating at a similar (but fixed) bitrate.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||QUT Thesis (PhD)|
|Supervisor:||Deriche, Mohamed & Chandran, Vinod|
|Keywords:||audio coding, audio compression, psychoacoustics, wavelet transform, sinusoidal analysis, warped linear prediction, scalability|
|Department:||Faculty of Built Environment and Engineering|
|Institution:||Queensland University of Technology|
|Copyright Owner:||Copyright Daryl Ning|
|Deposited On:||03 Dec 2008 03:49|
|Last Modified:||28 Oct 2011 19:39|
Repository Staff Only: item control page