Tenfold Bootstrap as Resampling Method in Classification Problems
Keywords:the bootstrap, cross validation, repeated train/test splitting
In this research, we propose the bootstrap procedure as a method for train/test splitting in machine learning algorithms for classification. We show that this resampling method can be a reliable alternative to cross validation and repeated random test/train splitting algorithms. The bootstrap procedure optimizes the classifier’s performance by improving its accuracy and classification scores and by reducing computational time significantly. We also show that ten iterations of the bootstrap procedure are enough to achieve better performance of the classification algorithm. With these findings, we propose a solution to the problem of how to reduce computing time in large datasets, while introducing a new practical application of the bootstrap procedure.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Breiman, L. (1992), “The little bootstrap and other methods for dimensionality selection in regression: x-fixed prediction error”, Journal of American Statistical Association, Vol. 87, No. 419, pp. 738-754.
Breiman, L. (1995), “Better subset regression using the nonnegative garrote”, Technometrics, Vol. 37, No. 4, pp. 373-384.
Breiman, L. (1996), "Bagging predictors", Machine Learning, Vol. 24, No. 2, pp. 123-140.
Efron, B. (1979), “Bootstrap methods: another look at the jackknife”, The Annals of Statistics, Vol. 7, No. 1, pp. 1-26.
Efron, B., Tibshirani, R., (1997), “Improvements on cross-validation: the .632+ bootstrap method”, Journal of the American Statistical Association, Vol. 92, No. 438, pp. 548-560.
Hoerl, A. E., Kennard, R. W. (1970), “Ridge regression: applications to nonorthogonal Problems”, Technometrics, Vol. 12, No. 1, pp. 69-82.
James, G., Witten, D., Hastie, T., Tibshirani, R. (2013), An introduction to statistical learning, Springer, New York.
Krstajic, D., Buturovic, L. J., Leahy, D. E., Thomas, S. (2014), “Cross-validation pitfalls when selecting and assessing regression and classification models”, Cheminformatics, vol. 6, No. 1, 10.
MacKinnon, J. G. (2002), “Bootstrap inference in econometrics”, The Canadian Journal of Economics, Vol. 35, No. 4, pp. 615-645.
Pampel, F. (2000), “Logistic regression: a primer”, Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-132, Sage, Thousand Oaks.
Vrigazova, B. (2018), “Nonnegative garrote as a variable selection method in panel data”, International Journal of Computer Science and Information Security, vol. 16, No. 1, pp. 95-106.
Vrigazova, B., Ivanov, I. (2019), “Optimization of the ANOVA procedure for support vector machines”, International Journal of Recent Technology and Engineering, Vol. 8, No. 4, pp. 5160-5165.
Vrigazova, B., Ivanov, I. (2020a), “The bootstrap procedure in classification problems”, International Journal of Data Mining, Modelling and Management, Vol. 12, in press.
Vrigazova, B., Ivanov, I. (2020b), “Tenfold bootstrap procedure for support vector machines”, Computer Science, Vol. 21, No. 2, pp. 241-257.
Wong, T. T. (2015), “Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation”, Pattern Recognition, Vol. 48, No. 9, pp. 2839-2846.