Skoči na glavni sadržaj

Izvorni znanstveni članak

SOM-US: A Novel Under-Sampling Technique for Handling Class Imbalance Problem

Ajay Kumar ; KIET Group of Institutions, India

Puni tekst: engleski pdf 1.312 Kb

str. 69-75

preuzimanja: 52



A significant research challenge in data mining and machine learning is class imbalance classification since the majority of real-world datasets are imbalanced. When the dataset is highly unbalanced, the majority of available classification techniques frequently underperform on minority-class cases. This is due to the fact that they disregard the relative distribution of each class in favor of maximizing the overall accuracy. Various techniques based on sampling methods, cost-sensitive learning, and ensemble methods have recently been employed to handle the class imbalance problem. This paper proposes a new clustering-based under-sampling (US) technique, called SOM-US, for handling the class imbalance problem using the self-organized map (SOM). To validate the proposed approach, an experimental study was conducted to improve the capability of a classifier-logistic regression for software defect prediction by applying SOM-US over a NASA software defect dataset. The proposed approach was compared with six existing under-sampling methods on two performance measures. The results demonstrate that the SOM-US significantly improves the prediction capability of logistic regression over other under-sampling techniques for software defect prediction.

Ključne riječi

Class Imbalance; Under-Sampling; Software Defect Prediction

Hrčak ID:



Datum izdavanja:


Posjeta: 156 *