Novel Approach to Choosing Principal Components Number in Logistic Regression

Authors

  • Borislava Vrigazova Sofia University

DOI:

https://doi.org/10.54820/PUCR5250

Keywords:

ANOVA, PCA, Bootstrap, logistic regression

Abstract

The confirmed approach to choosing the number of principal components for prediction models includes exploring the contribution of each principal component to the total variance of the target variable. A combination of possible important principal components can be chosen to explain a big part of the variance in the target. Sometimes several combinations of principal components should be explored to achieve the highest accuracy in classification. This research proposes a novel automatic way of deciding how many principal components should be retained to improve classification accuracy. We do that by combining principal components with the ANOVA selection. To improve the accuracy resulting from our automatic approach, we use the bootstrap procedure for model selection. We call this procedure the Bootstrapped-ANOVA PCA selection. Our results suggest that this procedure can automate the principal components selection and improve the accuracy of classification models, in our example, the logistic regression.

References

Breiman, L. (1995), “Better Subset Regression Using the Nonnegative Garrote”, Technometrics, Vol. 37 No. 4, pp. 373-384.

Efron, B. (1979), “Bootstrap methods: another look at the jackknife”, The Annals of Statistics, Vol. 7 No. 1, pp. 1-26.

Gajjar, S., Kulahci, M., Palazoglu, A. (2017), “Selection of non-zero loadings in sparse principal component analysis”, Chemometrics and Intelligent Laboratory Systems, Vol. 162, pp. 160-171.

James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An Introduction to Statistical Learning, Springer, New York.

Kim, S., Rattakorn, P. (2011), “Unsupervised feature selection using weighted principal components”, Expert Systems with Applications, Vol. 38 No. 5, pp. 5704-5710.

Pacheco, J., Casado, S., Porras, S. (2011), “Exact methods for variable selection in principal component analysis: Guide functions and pre-selection”, Computational Statistics & Data Analysis, Vol. 57 No. 1, pp. 95-111.

Prieto-Moreno, A., Llanes-Santiago, O., García-Moreno, E. (2015), “Principal components selection for dimensionality reduction using discriminant information applied to fault diagnosis”, Journal of Process Control, Vol. 33, pp. 14-24.

Rahoma, A., Imtiaz, S., Ahmed, S. (2021), “Sparse principal component analysis using bootstrap method”, Chemical Engineering Science, Vol. 246, paper no. 116890.

Salata, S., Grillenzoni, C. (2021), “A spatial evaluation of multifunctional Ecosystem Service networks using Principal Component Analysis: A case of study in Turin, Italy”, Ecological Indicators, Vol. 127, pp. 1-13.

Sharifzadeh, S., Ghodsi, A., Clemmensen, L., Ersboll, B. (2017), “Sparse supervised principal component analysis (SSPCA) for dimension reduction and variable selection”, Engineering Applications of Artificial Intelligence, Vol. 65, pp. 168-177.

Tibshirani, R. (1996), “Regression Shrinkage and Selection via the Lasso”, Journal of the Royal Statistical Society, Series B (Methodological), Vol. 58 No. 1, pp. 267-288.

Vrigazova, B. (2020), “Tenfold Bootstrap as Resampling Method in Classification Problems”, in Proceedings of the ENTRENOVA-ENTerprise REsearch InNOVAtion Conference, virtual conference, pp. 74-83.

Zou, H. (2006), “The adaptive lasso and its oracle properties”, Journal of the American statistical association, Vol. 101 No. 476, pp. 1418-1429.

Downloads

Published

2022-03-29

How to Cite

Vrigazova, B. (2022). Novel Approach to Choosing Principal Components Number in Logistic Regression. ENTRENOVA - ENTerprise REsearch InNOVAtion, 7(1), 1–12. https://doi.org/10.54820/PUCR5250

Issue

Section

Mathematical and Quantitative Methods