hrcak mascot   Srce   HID

Izvorni znanstveni članak

Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing

Antonio Vasilijević   ORCID icon orcid.org/0000-0002-0862-3726 ; Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia
Davor Petrinović ; Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia

Puni tekst: engleski, pdf (1 MB) str. 132-146 preuzimanja: 1.848* citiraj
APA 6th Edition
Vasilijević, A. i Petrinović, D. (2011). Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing. Automatika, 52 (2), 132-146. Preuzeto s https://hrcak.srce.hr/71297
MLA 8th Edition
Vasilijević, Antonio i Davor Petrinović. "Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing." Automatika, vol. 52, br. 2, 2011, str. 132-146. https://hrcak.srce.hr/71297. Citirano 16.10.2021.
Chicago 17th Edition
Vasilijević, Antonio i Davor Petrinović. "Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing." Automatika 52, br. 2 (2011): 132-146. https://hrcak.srce.hr/71297
Harvard
Vasilijević, A., i Petrinović, D. (2011). 'Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing', Automatika, 52(2), str. 132-146. Preuzeto s: https://hrcak.srce.hr/71297 (Datum pristupa: 16.10.2021.)
Vancouver
Vasilijević A, Petrinović D. Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing. Automatika [Internet]. 2011 [pristupljeno 16.10.2021.];52(2):132-146. Dostupno na: https://hrcak.srce.hr/71297
IEEE
A. Vasilijević i D. Petrinović, "Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing", Automatika, vol.52, br. 2, str. 132-146, 2011. [Online]. Dostupno na: https://hrcak.srce.hr/71297. [Citirano: 16.10.2021.]

Sažetak
Currently, one of the most widely used distance measures in speech and speaker recognition is the Euclidean distance between mel frequency cepstral coefficients (MFCC). MFCCs are based on filter bank algorithm whose filters are equally spaced on a perceptually motivated mel frequency scale. The value of mel cepstral vector, as well as the properties of the corresponding cepstral distance, are determined by several parameters used in mel cepstral analysis. The aim of this work is to examine compatibility of MFCC measure with human perception for different values of parameters in the analysis. By analysing mel filter bank parameters it is found that filter bank with 24 bands, 220 mels bandwidth and band overlap coefficient equal and higher than one gives optimal spectral distortion (SD) distance measures. For this kind of mel filter bank, the difference between vowels can be recognised for full-length mel cepstral SD RMS measure higher than 0.4 - 0.5 dB. Further on, we will show that usage of truncated mel cepstral vector (12 coefficients) is justified for speech recognition, but may be arguable for speaker recognition. We also analysed the impact of aliasing in cepstral domain on cepstral distortion measures. The results showed high correlation of SD distances calculated from aperiodic and periodic mel cepstrum, leading to the conclusion that the impact of aliasing is generally minor. There are rare exceptions where aliasing is present, and these were also analysed.

Ključne riječi
Aliasing; Digital speech processing; MFCC; Mel cepstrum; SD Measure; Speech recognition

Hrčak ID: 71297

URI
https://hrcak.srce.hr/71297

[hrvatski]

Posjeta: 2.315 *