Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection

Ata, Oğuz; Hazim, Layth

doi:10.17559/TV-20180427091048

Tehnički vjesnik, Vol. 27 No. 2, 2020.

Izvorni znanstveni članak

https://doi.org/10.17559/TV-20180427091048

Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection

Oğuz Ata orcid.org/0000-0003-4511-7694 ; Altinbas University, Institute of Science, Dept. of Information Technologies, Istanbul, Turkey
Layth Hazim orcid.org/0000-0001-8066-2175 ; Tikrit University, Cisco Networking Academy, Dept. of Computer of Science, Salah Al-Din, Iraq

Puni tekst: engleski pdf 598 Kb

str. 618-626

preuzimanja: 1.847

citiraj

APA 6th Edition

Ata, O. i Hazim, L. (2020). Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection. Tehnički vjesnik, 27 (2), 618-626. https://doi.org/10.17559/TV-20180427091048

MLA 8th Edition

Ata, Oğuz i Layth Hazim. "Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection." Tehnički vjesnik, vol. 27, br. 2, 2020, str. 618-626. https://doi.org/10.17559/TV-20180427091048. Citirano 24.11.2024.

Chicago 17th Edition

Ata, Oğuz i Layth Hazim. "Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection." Tehnički vjesnik 27, br. 2 (2020): 618-626. https://doi.org/10.17559/TV-20180427091048

Harvard

Ata, O., i Hazim, L. (2020). 'Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection', Tehnički vjesnik, 27(2), str. 618-626. https://doi.org/10.17559/TV-20180427091048

Vancouver

Ata O, Hazim L. Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection. Tehnički vjesnik [Internet]. 2020 [pristupljeno 24.11.2024.];27(2):618-626. https://doi.org/10.17559/TV-20180427091048

IEEE

O. Ata i L. Hazim, "Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection", Tehnički vjesnik, vol.27, br. 2, str. 618-626, 2020. [Online]. https://doi.org/10.17559/TV-20180427091048

Sažetak

Banks suffer multimillion-dollars losses each year for several reasons, the most important of which is due to credit card fraud. The issue is how to cope with the challenges we face with this kind of fraud. Skewed "class imbalance" is a very important challenge that faces this kind of fraud. Therefore, in this study, we explore four data mining techniques, namely naïve Bayesian (NB),Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Random Forest (RF), on actual credit card transactions from European cardholders. This paper offers four major contributions. First, we used under-sampling to balance the dataset because of the high imbalance class, implying skewed distribution. Second, we applied NB, SVM, KNN, and RF to under-sampled class to classify the transactions into fraudulent and genuine followed by testing the performance measures using a confusion matrix and comparing them. Third, we adopted cross-validation (CV) with 10 folds to test the accuracy of the four models with a standard deviation followed by comparing the results for all our models. Next, we examined these models against the entire dataset (skewed) using the confusion matrix and AUC (Area Under the ROC Curve) ranking measure to conclude the final results to determine which would be the best model for us to use with a particular type of fraud. The results showing the best accuracy for the NB, SVM, KNN and RF classifiers are 97,80%; 97,46%; 98,16% and 98,23%, respectively. The comparative results have been done by using four-division datasets (75:25), (90:10), (66:34) and (80:20) displayed that the RF performs better than NB, SVM, and KNN, and the results when utilizing our proposed models on the entire dataset (skewed), achieved preferable outcomes to the under-sampled dataset.

Ključne riječi

credit card fraud detection; data mining; K-Nearest Neighbour; Naïve Bayesian; Random Forest; Support Vector Machine

Hrčak ID:

236820

URI

https://hrcak.srce.hr/236820

Datum izdavanja:

15.4.2020.

Posjeta: 3.831 *

Prijava i registracija

Tehnički vjesnik, Vol. 27 No. 2, 2020.

Sažetak

Ključne riječi

Hrčak ID:

URI

Datum izdavanja: