Original scientific paper
https://doi.org/10.1080/00051144.2019.1602293
Two new feature selection metrics for text classification
Durmuş Özkan Şahin
; Department of Computer Engineering, Ondokuz Mayıs University, Samsun, Turkey
Erdal Kılıç
; Department of Computer Engineering, Ondokuz Mayıs University, Samsun, Turkey
Abstract
Obtaining meaningful information from data has become the main problem. Hence data mining techniques have gained importance. Text classification is one of the most commonly studied areas of data mining. The main problem about text classification is the increase in the required time and a decrease in the success of classification because of data size. To determine the right feature selection methods for text classification is the main purpose of this study. Metrics that are used frequently for feature selection like Chi-square and Information Gain were applied over different data sets and performance was measured. In this study two feature selection metrics, which are based on filtration, are recommended as alternatives to the current ones. The first recommended metric is Relevance Frequency Feature Selection metric that was obtained by adding new parameters to Relevance Frequency method that is used for term weighting in text classification. The second one is the alternative Accuracy2 metric, which was obtained by changing the parameters of Accuracy2 metric. It was observed that the suggested Relevance Frequency Feature Selection and Alternative Accuracy2 metrics offer successful results as the current metrics used frequently.
Keywords
Text classification; text mining; feature selection; term selection
Hrčak ID:
239777
URI
Publication date:
20.5.2019.
Visits: 2.031 *