Skoči na glavni sadržaj

Izvorni znanstveni članak

Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing

Antonio Vasilijević orcid id orcid.org/0000-0002-0862-3726 ; Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia
Davor Petrinović orcid id orcid.org/0000-0003-3950-7864 ; Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia


Puni tekst: engleski pdf 1.180 Kb

str. 132-146

preuzimanja: 2.035

citiraj


Sažetak

Currently, one of the most widely used distance measures in speech and speaker recognition is the Euclidean distance between mel frequency cepstral coefficients (MFCC). MFCCs are based on filter bank algorithm whose filters are equally spaced on a perceptually motivated mel frequency scale. The value of mel cepstral vector, as well as the properties of the corresponding cepstral distance, are determined by several parameters used in mel cepstral analysis. The aim of this work is to examine compatibility of MFCC measure with human perception for different values of parameters in the analysis. By analysing mel filter bank parameters it is found that filter bank with 24 bands, 220 mels bandwidth and band overlap coefficient equal and higher than one gives optimal spectral distortion (SD) distance measures. For this kind of mel filter bank, the difference between vowels can be recognised for full-length mel cepstral SD RMS measure higher than 0.4 - 0.5 dB. Further on, we will show that usage of truncated mel cepstral vector (12 coefficients) is justified for speech recognition, but may be arguable for speaker recognition. We also analysed the impact of aliasing in cepstral domain on cepstral distortion measures. The results showed high correlation of SD distances calculated from aperiodic and periodic mel cepstrum, leading to the conclusion that the impact of aliasing is generally minor. There are rare exceptions where aliasing is present, and these were also analysed.

Ključne riječi

Aliasing; Digital speech processing; MFCC; Mel cepstrum; SD Measure; Speech recognition

Hrčak ID:

71297

URI

https://hrcak.srce.hr/71297

Datum izdavanja:

22.7.2011.

Podaci na drugim jezicima: hrvatski

Posjeta: 3.032 *