Skip to the main content

Preliminary communication

https://doi.org/10.18045/zbefri.2021.1.163

An implementation of ensemble methods, logistic regression, and neural network for default prediction in Peer-to-Peer lending

Aneta Dzik-Walczak orcid id orcid.org/0000-0002-0192-0226 ; University of Warsaw – Faculty of Economic Sciences, Warsaw, Poland.
Mateusz Heba orcid id orcid.org/0000-0001-6929-907X ; University of Warsaw – Faculty of Economic Sciences, Warsaw, Poland.


Full text: english pdf 2.507 Kb

page 163-197

downloads: 198

cite


Abstract

Credit scoring has become an important issue because competition among financial institutions is intense and even a small improvement in predictive accuracy can result in significant savings. Financial institutions are looking for optimal strategies using credit scoring models. Therefore, credit scoring tools are extensively studied. As a result, various parametric statistical methods, non-parametric statistical tools and soft computing approaches have been developed to improve the accuracy of credit scoring models. In this paper, different approaches are used to classify customers into those who repay the loan and those who default on a loan. The purpose of this study is to investigate the performance of two credit scoring techniques, the logistic regression model estimated on categorized variables modified with the use of WOE (Weight of Evidence) transformation, and neural networks. We also combine multiple classifiers and test whether ensemble learning has better performance. To evaluate the feasibility and effectiveness of these methods, the analysis is performed on Lending Club data. In addition, we investigate Peer-to-peer lending, also called social lending. From the results, it can be concluded that the logistic regression model can provide better performance than neural networks. The proposed ensemble model (a combination of logistic regression and neural network by averaging the probabilities obtained from both models) has higher AUC, Gini coefficient and Kolmogorov-Smirnov statistics compared to other models. Therefore, we can conclude that the ensemble model allows to successfully reduce the potential risks of losses due to misclassification costs.

Keywords

credit scoring, ensemble methods, logistic regression, neural nets, peer-to-peer lending

Hrčak ID:

259598

URI

https://hrcak.srce.hr/259598

Article data in other languages: croatian

Visits: 437 *