The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems

Vrigazova, Borislava

doi:10.2478/bsrj-2021-0015

Business Systems Research : International journal of the Society for Advancing Innovation and Research in Economy, Vol. 12 No. 1, 2021.

Original scientific paper

https://doi.org/10.2478/bsrj-2021-0015

The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems

Borislava Vrigazova orcid.org/0000-0001-9335-6927 ; Sofia University, Faculty of Economics and Business Administration

Full text: english pdf 326 Kb

page 228-242

downloads: 2.909

cite

APA 6th Edition

Vrigazova, B. (2021). The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems. Business Systems Research, 12 (1), 228-242. https://doi.org/10.2478/bsrj-2021-0015

MLA 8th Edition

Vrigazova, Borislava. "The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems." Business Systems Research, vol. 12, no. 1, 2021, pp. 228-242. https://doi.org/10.2478/bsrj-2021-0015. Accessed 28 Jun. 2026.

Chicago 17th Edition

Vrigazova, Borislava. "The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems." Business Systems Research 12, no. 1 (2021): 228-242. https://doi.org/10.2478/bsrj-2021-0015

Harvard

Vrigazova, B. (2021). 'The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems', Business Systems Research, 12(1), pp. 228-242. https://doi.org/10.2478/bsrj-2021-0015

Vancouver

Vrigazova B. The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems. Business Systems Research [Internet]. 2021 [cited 2026 June 28];12(1):228-242. https://doi.org/10.2478/bsrj-2021-0015

IEEE

B. Vrigazova, "The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems", Business Systems Research, vol.12, no. 1, pp. 228-242, 2021. [Online]. https://doi.org/10.2478/bsrj-2021-0015

Abstract

Background: The bootstrap can be alternative to cross-validation as a training/test set splitting method since it minimizes the computing time in classification problems in comparison to the tenfold cross-validation. Objectives: Тhis research investigates what proportion should be used to split the dataset into the training and the testing set so that the bootstrap might be competitive in terms of accuracy to other resampling methods. Methods/Approach: Different train/test split proportions are used with the following resampling methods: the bootstrap, the leave-one-out cross-validation, the tenfold cross-validation, and the random repeated train/test split to test their performance on several classification methods. The classification methods used include the logistic regression, the decision tree, and the k-nearest neighbours. Results: The findings suggest that using a different structure of the test set (e.g. 30/70, 20/80) can further optimize the performance of the bootstrap when applied to the logistic regression and the decision tree. For the k-nearest neighbour, the tenfold cross-validation with a 70/30 train/test splitting ratio is recommended. Conclusions: Depending on the characteristics and the preliminary transformations of the variables, the bootstrap can improve the accuracy of the classification problem.

Keywords

the bootstrap; classification; cross-validation; repeated train/test splitting

Hrčak ID:

258032

URI

https://hrcak.srce.hr/258032

Publication date:

28.5.2021.

Visits: 3.758 *

Login and registration

Business Systems Research : International journal of the Society for Advancing Innovation and Research in Economy, Vol. 12 No. 1, 2021.

Abstract

Keywords

Hrčak ID:

URI

Publication date: