An investigation of variable block length methods for calculation of spectral/temporal features for automatic speech recognition

Montri Karnjanadecha, D. of Computer Eng., F. of Eng., PSU.
Stephen A. Zahorian, Prof., D. of Electrical and Computer Eng., Old Dominion U., USA
Corresponding e-mail : montri@coe.psu.ac.th

Presented : The 6th International Conference on Spoken Language Processing, 16-20 Oct. 2000, Beijing, China
Key words : speech recognition, speech processing, feature extraction

This paper presents an investigation of non-uniform time sampling methods for spectral/temporal feature extraction for use in automatic speech recognition. In most current methods for signal modeling of speech information, "dynamic" features are determined from frame-based pa-rameters using a fixed time sampling, i.e., fixed block length and fixed block spacing. This work explores new methods in which block length and/or block spacing are variable. Three methods are suggested and each was tested with the TIMIT database using a standard HMM recognizer. Phone recognition experiments were conducted using the standard 39 phone set. The methods were also evaluated with various HMM model complexities. Experimental results indicated that none of the proposed non-uniform feature time sampling methods perform significantly better than fixed time sampling methods. However, the best results obtained with the front end are comparable to those obtained with current state-of-the-art systems. Also the performance of our monophone system surpasses that of most reported context-dependent monophone systems.
BACK