EEG Based Emotion Prediction with Neural Network Models

: The term "emotion" refers to an individual's response to an event, person, or condition. In recent years, there has been an increase in the number of papers that have studied emotion estimation. In this study, a dataset based on three different emotions, utilized to classify feelings using EEG brainwaves, has been analysed. In the dataset, six film clips have been used to elicit positive and negative emotions from a male and a female. However, there has not been a trigger to elicit a neutral mood. Various classification approaches have been used to classify the dataset, including MLP, SVM, PNN, KNN, and decision tree methods. The Bagged Tree technique which is utilized for the first time has been achieved a 98.60 percent success rate in this study, according to the researchers. In addition, the dataset has been classified using the PNN approach, and achieved a success rate of 94.32 percent.


INTRODUCTION
Psychological response to an event, individual, or thing is called emotion. Emotions that play a significant part in a human's daily life, appearing in both psychological and physiological situations, have been studied in various fields, which include virtual reality, e-therapy, health, and brainmachine interfaces. There has, however, been a significant rise in studies predicting emotions recently. Various methods, susceptible to manipulation and interpretation, such as facial expressions, mimicry, and voice signals, have also been used to predict emotions in the literature [1,2]. Hence, this has led to a decrease in the prediction accuracy rate since the data has been erroneously collected. Nonetheless, several physiological signals that are difficult to control, such as an electrocardiogram (ECG), an electroencephalogram (EEG), an electromyogram (EMG), blood pressure (BP), and heart rate (HR), have acquired relevance for the accuracy of emotion classification and reliability. Across these technologies, EEG data, which is low-cost, wireless, and mobile device-based, has become increasingly more common to predict emotions [3].
The emotion model has been classified into discrete and dimensional models. While the discrete model represents six essential emotions (nervousness, fear, sorrow, anger, joy, and surprise), positive and negative, the dimensional model describes their locations on the arousal-valence plane [4]. Fig. 1 depicts an emotional pattern separated into four distinct zones on the arousal-valence plane.
The main purpose of this study is to determine the best classification method to predict positive, negative, and neutral emotional responses based on EEG signals. The success rate has also been studied in k-nearest neural network (KNN), support vector machine (SVM), probabilistic neural networks (PNN), and in both PCA and non-PCA scenarios using the multi-layer perceptron (MLP). Moreover, various parameters' effects on classification accuracy rates have been examined. Consequently, it has been shown that the bagged tree ensemble method utilized has enhanced accuracy by 98.6 percent.

RELATED WORKS
Numerous performance analyses have been published in the literature to predict emotional states based on EEG signals. In [1], the best conclusion has been reached with Random Forest classifiers at 87.16 percent, using OneR to extract the features of the raw dataset. The features of EEG signals have been extracted using OneR, BayesNet, and InfoGain techniques and categorized using classification techniques such as MLP, random forest (RF), and SMO are given in [6]. However, with InfoGain and Random Forest, the most accurate rate has been achieved at 97.89 percent. In [7], SVM, KNN, Lightgbm, Xgboost, RF, Logistic Regression, and two neural network classification models have been used to predict emotion. The classification results have been obtained by removing the most important 150 features by using the PCA method after normalization varied between 90.61 percent and 97.0 percent. Additionally, in [8] which is aimed at classifying positive and negative emotions using visual stimuli, support vector machines have been used to classify the features obtained by particle swarm optimization and genetic screening, and success rates of between 48.85 percent and 57.42 percent have been obtained. In [9], support vector machines have been used to classify EEG collected data with audio-visual stimuli as positive or negative emotions, and the achieved success rate has been 74.55 percent on average. Emotion classification utilizing common spatial patterns has been shown to be exceptionally successful in [10], with 93.5 percent being the overall best outcome. In [11], which is proposed a convolutional neural network (CNN) based model that extracts the EEG signal properties with the discrete wavelet transform using Welch's power spectral density estimation, the best estimation result has been 90.16 percent.
The rest of this paper is organized as follows. Details regarding the dataset, the classification algorithms, and the outcomes of the used methods are thoroughly contained in Section 3 and Section 4, respectively. Finally, the analysis of the achieved results throughout the research is given in Section 5.

Dataset
This study analyzed the emotion dataset in [1] and [6] based on EEG signals. The MUSE EEG headband has been used to collect data from four electrodes, TP9, AF7, AF8, and TP10 points, which are shown in Fig 2. Six movie clips have been utilized as stimuli to evoke positive and negative emotions in this dataset, which contains 2132 samples, each with 2549 features, collected from a female and a male, whereas no stimulus has been used to elicit a neutral emotion. However, EEG signals have been collected for 36 minutes in total, three minutes for each emotion. The mean, maximum-minimum, and Fast Fourier transforms have been used in EEG signal processing at a sampling frequency of 150 Hz. Emotion-related changes in brain waveforms are seen in Fig.  3. The classification of the three main groups identified is as follows: in Class 1, clips from the films Marley and Me, Up, and My Girl elicited negative emotions. In Class 2, positive emotions have been evoked by film clips from La La Land, Slow Life, and Funny Dogs. Besides, no neutral emotions have been elicited in Class 3.

Classification Methods
Classification methods, MLP, SVM, PNN, KNN, and decision trees, all of which employed five-fold cross-validation and an 80 percent training set, are given in this section.

Multi-Layer Neural Networks
A multi-layered neural network employs the Levenberg-Marquardt algorithm, a well-known training approach for identifying brain signals. The classification performance is determined by an input layer, two hidden layers, and an output layer. However, PCA transformation is carried out, which reduces the dataset's size and removes irrelevant information from the dataset to identify the significant data attributes [12,13]. The significant advantages of PCA for large-scale datasets include reduced noise sensitivity, decreased memory, and capacity restrictions, and more efficient performance in small-dimensional spaces [14]. Depending on whether PCA is utilized, the system is trained by obtaining the top 100, 150, and 200 most relevant attributes, which uses a five-cross-validation technique. Support Vector Machine EEG signal classification techniques frequently use a two-group classifier called a support vector machine whose nonlinear basis functions are used to relocate linearly inseparable planes to a new space [15,16]. Furthermore, SVM takes advantage of high-dimensional features in the kernel functions that are utilized to increase model performance [17]. In this paper, which employs the quadratic kernel function, 80 percent of the data is utilized for training and 20 percent for testing.

K-Nearest Neighbour
The k-nearest neighbour method is a classification based on the distances and similarities between the data points [12]. When data with an unknown decision class is received, the value of k closest data is used to identify which decision class the data belongs to. The data is then labeled with the class that is similar to the most relevant k value [17]. This study examines the performance of the KNN classifier using Euclidean distance over a variety of k values, with the dataset divided into 80 percent for training and testing.
Probabilistic Neural Network A pattern recognition system known as a probabilistic neural network considers all possible outcomes before identifying a winner. The network structure includes the use of Parzen Estimators, which by Bayes' theorem increases probability density functions [18]. The link weights forward the features' input vectors to the hidden layer, which has nodes that are determined by the Euclidean distances between the weights and the vectors [16]. This paper is evaluated the classification performance using a five-fold cross-validation approach with a spread parameter of 0.0993, which has been identified as the most critical parameter in reducing errors in PNN.

Decision Trees
Decision trees resembled tree topologies, are a basic algorithm made of nodes and branches; each branch represents a probability state and, in theory, divides the data into subgroups by branching. Each rule corresponds to a branch produced during the separation process [19].
Nevertheless, decision trees are ensemble algorithms that are prone to overlearn. Bagging decision trees and AdaBoost algorithms, on the other hand, increase generalization, hence mitigating the impacts of over-learning [20]. Therefore, the study has examined the bagged trees and AdaBoost algorithms.
a Bagged Trees In bagged trees, each new training set is formed by picking from the existing training set. In this case, certain data from the training set is excluded entirely from the bag, whereas others are included more than once. Furthermore, the algorithm randomly picks a subset of estimators to utilize in each decision segment. Thus, the training sets are used to create regression trees, which are not pruned [20]. Hence, by reducing variation and combining the outcomes of several decision trees that enhance generalization, bagging decision trees reduces the consequences of over-learning. In this paper is evaluated the classification performance of 80% for training and 20% for testing.
b AdaBoost The AdaBoost technique, which constructs a powerful classifier from numerous weak classifiers, operates on the principle of re-concerning the classifier at each step and increasing the weight of the inaccurate predictions made in the previous step. This method is intended to identify and correct inaccurate predictions, thereby increasing the classification accuracy of the trained models [21]. Although AdaBoost is more resistant to overlearning than other classification algorithms, is commonly influenced by noisy data and outliers [20].
Optimizing the algorithm's performance by modifying the learning rate, which is highly proportionate to the system accuracy, is possible. A learning rate greater than one causes oscillation and makes it impossible to find the global optimum. In contrast, choosing an extremely slow learning rate causes an increase in training time due to the highly increased number of steps. In the study, which used 80 percent of the dataset for training, the effect of changing the learning rate from zero to one has been examined.

RESULTS
The section discussed the various methods used to classify emotion prediction datasets based on EEG signals and the implications of modifying certain classifier parameters on the network. Depending on whether or not the PCA technique is being used, the training is performed on different features of the data. Besides, the accuracy, sensitivity, and specificity results of the classifiers are also examined. Fig. 4 illustrates the accuracy, sensitivity, and specificity values achieved by combining PCA that extracts the most significant 100, 150, and 200 features with the MLP classification algorithm. The 150 most essential characteristics extracted with PCA have an accuracy rate of 90.21 percent, whereas the rates for 100 and 200 features are 88.86 percent and 88.38 percent, respectively. Data loss occurs when fewer features are extracted by PCA, whereas over-learning can generally occur when more features are extracted. Therefore, classification performance has been evaluated utilizing the top 100, 150, and 200 features.
For each classification, the accuracy, sensitivity, and specificity percentages are shown in Tab. 1. The sensitivity percentage represents the accurate prediction rate across all classes, while the specificity percentage represents the appropriateness prediction rate for negative predictions [22]. It is shown in Fig. 5 that the MLP technique, which trains the first 100 and 150 and 200 features of the dataset without the use of PCA, achieves accuracy, sensitivity, and specificity. MLP classification-based training and testing are more effective when the PCA approach is not included; the system achieves a 96.53% success rate for the first 200 features. In other words, a higher feature amount enhances the likelihood of successful accuracy.
The accuracy curves of the classifiers that have been used in the study are shown in Fig. 6. The best training achievement is 98.6 percent in the bagged trees classification. The PNN, in particular, when compared to the bagged trees algorithm, has the lowest success rate of 94.32 percent. Training performance based on changes in learning rate is shown in Tab. 2. The AdaBoost classifier's accuracy rate increased once the learning rate parameter has been raised. Besides, the sensitivity and specificity are also affected according to the change in the learning rate. In addition, as seen in Fig. 7, learning rate has a significant impact on accuracy results. The algorithm's success rate has risen in direct proportion to the learning rate.   Selecting the optimal k value impacts system performance. Otherwise, selecting a value for k that is more or less than this specified value increases the mistake rate. As seen in Tab. 3, the dataset classified using KNN has shown that the optimal k value is 5 and that when k is increased to 9, the error rate rises. Bag trees' test data is depicted in Fig. 9 as a confusion matrix. Labels 1, 2, and 3 express negative emotions, positive emotions, and neutral emotions, respectively. The greatest misclassification for negative emotions in the bag trees model is 4, 10 for positive emotions, and 1 for neutral emotions. Consequently, compared to the other models studied, the proposed model bagged trees has considerably decreased the number of erroneous classifications.

Figure 9
Test set -bagged trees confusion matrix

CONCLUSION
This study analyzes the prediction of positive, negative, and neutral emotions based on EEG signals, utilizing MLP, SVM, PNN, KNN, and decision tree classification methods. Depending on whether the PCA approach is utilized or not, the training has been evaluated on various properties of the dataset. The accuracy in the MLP classification, 150 most important extracted features using PCA is achieved to be 90.21 percent. Thereby, when the PCA technique is not used, MLP-based training and testing become more successful. When the learning rate parameter is increased, the accuracy of the AdaBoost-classified emotion prediction has also increased. The proposed model, bagged trees, significantly reduces the occurrence of incorrect classifications.
To the best of the authors' knowledge, the bagged tree ensemble approach has achieved a success rate of 98.60 percent. In addition, the classification of the dataset using PNN yielded a success rate of 94.32 percent. However, the system model trained to use the 100 most essential factors extracted from PCA and classified using the MLP classifier has the lowest success rate of 89.86 percent. The Bagged Trees technique used for classification, improves the performance by 8 percent on the MLP classifier using the PCA approach, while the PNN classifier is used, a 4 percent performance increase is achieved.

Notice
The paper was presented at the International Congress of Electrical and Computer Engineering (ICECENG '22), which took place in Bandırma (Turkey), on February 9-12, 2022. The paper will not be published anywhere else.