Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.17559/TV-20240914001990

Comparative Analysis of CNN Architectures for Eight-Class Facial Expression Recognition: A Performance and Error Pattern Study

Kyoungjong Park ; Department of Business Administration, Gwangju University, 277 Hyodeok-ro, Nam-gu, Gwangju 61743, Korea *

* Dopisni autor.


Puni tekst: engleski pdf 647 Kb

str. 1095-1106

preuzimanja: 308

citiraj


Sažetak

This paper presents a systematic evaluation of deep learning architectures for facial expression recognition, focusing on improving recognition accuracy through advanced CNN models. This paper investigates three different architectures: Conv2D with Max Pooling (M1), Conv2D with Max Pooling & Dropout (M2), and EfficientNet-B0 (M3), and examines their effectiveness in recognizing eight different facial expressions (Anger, Content, Disgust, Fear, Happiness, Neutral, Sadness, and Surprise). The experimental framework uses the Tsinghua facial expression database, which has a baseline recognition rate of 79.08% by human evaluators. The study yields several significant findings through rigorous comparative analysis using standardized metrics, such as accuracy measurements and confusion matrices. The EfficientNet-B0 model achieves superior performance with an average accuracy of 86.47%, while Conv2D with Max Pooling demonstrates robust performance at 81.68%, both exceeding the accuracy of human evaluators. Notably, the Conv2D with Max Pooling & Dropout model shows reduced effectiveness at 73.25%. Heat map analysis reveals specific recognition patterns: happiness achieves the highest recognition rate (96%), while sadness shows the lowest (63%). The study provides three main contributions: (1) empirical evidence for the superiority of EfficientNet-B0 for facial expression recognition, (2) comprehensive error pattern analysis through heat map visualization, and (3) practical insights into the limitations of dropout layers in expression recognition tasks. These findings advance the technical understanding of CNN architectures in emotion recognition systems and provide practical guidelines for implementing efficient facial expression recognition systems in real-world applications.

Ključne riječi

Deep Learning; CNN; Conv2D with Max Pooling; Conv2D with Max Pooling&Dropout; EfficientNet-B0; Facial Expressions Recognition

Hrčak ID:

330576

URI

https://hrcak.srce.hr/330576

Datum izdavanja:

1.5.2025.

Posjeta: 544 *