Tehnički vjesnik, Vol. 32 No. 3, 2025.
Izvorni znanstveni članak
https://doi.org/10.17559/TV-20240914001990
Comparative Analysis of CNN Architectures for Eight-Class Facial Expression Recognition: A Performance and Error Pattern Study
Kyoungjong Park
; Department of Business Administration, Gwangju University, 277 Hyodeok-ro, Nam-gu, Gwangju 61743, Korea
*
* Dopisni autor.
Sažetak
This paper presents a systematic evaluation of deep learning architectures for facial expression recognition, focusing on improving recognition accuracy through advanced CNN models. This paper investigates three different architectures: Conv2D with Max Pooling (M1), Conv2D with Max Pooling & Dropout (M2), and EfficientNet-B0 (M3), and examines their effectiveness in recognizing eight different facial expressions (Anger, Content, Disgust, Fear, Happiness, Neutral, Sadness, and Surprise). The experimental framework uses the Tsinghua facial expression database, which has a baseline recognition rate of 79.08% by human evaluators. The study yields several significant findings through rigorous comparative analysis using standardized metrics, such as accuracy measurements and confusion matrices. The EfficientNet-B0 model achieves superior performance with an average accuracy of 86.47%, while Conv2D with Max Pooling demonstrates robust performance at 81.68%, both exceeding the accuracy of human evaluators. Notably, the Conv2D with Max Pooling & Dropout model shows reduced effectiveness at 73.25%. Heat map analysis reveals specific recognition patterns: happiness achieves the highest recognition rate (96%), while sadness shows the lowest (63%). The study provides three main contributions: (1) empirical evidence for the superiority of EfficientNet-B0 for facial expression recognition, (2) comprehensive error pattern analysis through heat map visualization, and (3) practical insights into the limitations of dropout layers in expression recognition tasks. These findings advance the technical understanding of CNN architectures in emotion recognition systems and provide practical guidelines for implementing efficient facial expression recognition systems in real-world applications.
Ključne riječi
Deep Learning; CNN; Conv2D with Max Pooling; Conv2D with Max Pooling&Dropout; EfficientNet-B0; Facial Expressions Recognition
Hrčak ID:
330576
URI
Datum izdavanja:
1.5.2025.
Posjeta: 544 *