An implementation of ensemble methods, logistic regression, and neural network for default prediction in Peer-to-Peer lending
; University of Warsaw – Faculty of Economic Sciences, Warsaw, Poland.
Mateusz Heba orcid.org/0000-0001-6929-907X ; University of Warsaw – Faculty of Economic Sciences, Warsaw, Poland.
Credit scoring has become an important issue because competition among financial institutions is intense and even a small improvement in predictive accuracy can result in significant savings. Financial institutions are looking for optimal strategies using credit scoring models. Therefore, credit scoring tools are extensively studied. As a result, various parametric statistical methods, non-parametric statistical tools and soft computing approaches have been developed to improve the accuracy of credit scoring models. In this paper, different approaches are used to classify customers into those who repay the loan and those who default on a loan. The purpose of this study is to investigate the performance of two credit scoring techniques, the logistic regression model estimated on categorized variables modified with the use of WOE (Weight of Evidence) transformation, and neural networks. We also combine multiple classifiers and test whether ensemble learning has better performance. To evaluate the feasibility and effectiveness of these methods, the analysis is performed on Lending Club data. In addition, we investigate Peer-to-peer lending, also called social lending. From the results, it can be concluded that the logistic regression model can provide better performance than neural networks. The proposed ensemble model (a combination of logistic regression and neural network by averaging the probabilities obtained from both models) has higher AUC, Gini coefficient and Kolmogorov-Smirnov statistics compared to other models. Therefore, we can conclude that the ensemble model allows to successfully reduce the potential risks of losses due to misclassification costs.
Posjeta: 1.048 *