Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.1080/00051144.2019.1602293

Two new feature selection metrics for text classification

Durmuş Özkan Şahin ; Department of Computer Engineering, Ondokuz Mayıs University, Samsun, Turkey
Erdal Kılıç ; Department of Computer Engineering, Ondokuz Mayıs University, Samsun, Turkey


Puni tekst: engleski pdf 2.079 Kb

str. 162-171

preuzimanja: 487

citiraj


Sažetak

Obtaining meaningful information from data has become the main problem. Hence data mining techniques have gained importance. Text classification is one of the most commonly studied areas of data mining. The main problem about text classification is the increase in the required time and a decrease in the success of classification because of data size. To determine the right feature selection methods for text classification is the main purpose of this study. Metrics that are used frequently for feature selection like Chi-square and Information Gain were applied over different data sets and performance was measured. In this study two feature selection metrics, which are based on filtration, are recommended as alternatives to the current ones. The first recommended metric is Relevance Frequency Feature Selection metric that was obtained by adding new parameters to Relevance Frequency method that is used for term weighting in text classification. The second one is the alternative Accuracy2 metric, which was obtained by changing the parameters of Accuracy2 metric. It was observed that the suggested Relevance Frequency Feature Selection and Alternative Accuracy2 metrics offer successful results as the current metrics used frequently.

Ključne riječi

Text classification; text mining; feature selection; term selection

Hrčak ID:

239777

URI

https://hrcak.srce.hr/239777

Datum izdavanja:

20.5.2019.

Posjeta: 1.535 *