Minimum mean square error spectral peak envelope estimation for automatic vowel classification

Minimum mean square error spectral peak envelope estimation for automatic vowel classification

Jaishree Venugopal, Graduate Student, D. of Electrical and Computer Eng., Old Dominion U., USA.
Stephen A. Zahorian, Prof., D. of Electrical and Computer Eng., Old Dominion U., USA.
Montri Karnjanadecha, D. of Computer Eng., F. of Eng., PSU.
Corresponding e-mail : montri@coe.psu.ac.th

Presented : The 6th International Conference on Spoken Language Processing, 16-20 Oct. 2000, Beijing, China
Key words : cepstral analysis, speech processing, feature extraction

Spectral feature computations continue to be a very difficult problem for accurate machine recognition of vowels especially in the presence of noise or for otherwise degraded acoustic signals. In this work, a new peak envelope method for vowel classification is developed, based on a missing frequency components model of speech recognition. According to this model, vowel recognition depends only on the location of spectral peaks. Also, smoothing and interpolation of the sampled spectra, performed in the cepstral analysis method commonly used in automatic speech recognition results in a loss of valuable information. The new method for feature extraction presented in this paper is based on minimum mean square error curve fitting of cosine-like basis vectors to all peaks in the speech spectrum. A mathematical model for smoothly tracking spectral envelopes using only spectral peak information and ignoring other parts of the spectrum is presented. A software algorithm for the model was developed and tested for various speaker types using a neural network classifier. Vowel classification experiments were conducted based on the features derived from the spectral peaks. The classification rates of the peak method under various signal to noise ratios was also evaluated. The basic conclusion is that the new features perform the same as cepstral features for clean speech, but have advantages when the signal is degraded by noise.

BACK