Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.32985/ijeces.14.3.8

Enhancement in Speaker Identification through Feature Fusion using Advanced Dilated Convolution Neural Network

Hema Kumar Pentapati ; Department of Electrical Electronics and Communication Engineering, GITAM School of Technology Visakhapatnam-530045, India
Sridevi K ; Department of Electrical Electronics and Communication Engineering GITAM School of Technology Visakhapatnam-530045, India


Puni tekst: engleski pdf 1.326 Kb

str. 301-310

preuzimanja: 169

citiraj


Sažetak

There are various challenges in identifying the speakers accurately. The Extraction of discriminative features is a vital task for accurate identification in the speaker identification task. Nowadays, speaker identification is widely investigated using deep learning. The complex and noisy speech data affects the performance of Mel Frequency Cepstral Coefficients (MFCC); hence, MFCC fails to represent the speaker characteristics accurately. In this proposed work, a novel text-independent speaker identification system is developed to enhance the performance by fusion of Log-MelSpectrum and excitation features. The excitation information is obtained due to the vibration of vocal folds, and it is represented using Linear Prediction (LP) residual. The various types of features extracted from the excitation are residual phase, sharpness, Energy of Excitation (EoE), and Strength of Excitation (SoE). The extracted features were processed with the dilated convolution neural network (dilated CNN) to fulfill the identification task. The extensive evaluation showed that the fusion of excitation features gives better results than the existing methods. The accuracy reaches 94.12% for 11 complex classes and 91.34% for 80 speakers, and Equal Error Rate (EER) is reduced to 1.16% for the proposed model. The proposed model is tested with the Librispeech corpus using Matlab 2021b tool, outperforming the existing baseline models. The proposed model achieves an accuracy improvement of 1.34% compared to the baseline system.

Ključne riječi

Log-MelSpectrum; MFCC; Speaker Identification; excitation features; Convolution Neural Network(CNN); LP Residual; deep learning; Deep Neural Network;

Hrčak ID:

296699

URI

https://hrcak.srce.hr/296699

Datum izdavanja:

28.3.2023.

Posjeta: 359 *