Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges

Lučić, Bono; Batista, Jadranko; Bojović, Viktor; Lovrić, Mario; Sović Kržić, Ana; Bešlo, Drago; Nadramija, Damir; Vikić-Topić, Dražen

doi:10.5562/cca3551

Croatica Chemica Acta, Vol. 92 No. 3, 2019.

Izvorni znanstveni članak

Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges

Bono Lučić orcid.org/0000-0001-7232-2007 ; NMR Centre, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia
Jadranko Batista ; Faculty of Science and Education, University of Mostar, Matice hrvatske b.b., BA-88000 Mostar, Bosnia and Herzegovina
Viktor Bojović ; NMR Centre, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia
Mario Lovrić ; Srebrnjak Children’s Hospital, Srebrnjak 100, HR-10000 Zagreb, Croatia
Ana Sović Kržić ; Department of Electronic Systems and Informational Processing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000 Zagreb, Croatia
Drago Bešlo ; Faculty of Agrobiotechnical Sciences Osijek, Josip Juraj Strossmayer University of Osijek, Vladimira Preloga 1, HR-31000 Osijek, Croatia
Damir Nadramija ; PharmaS, Radnička cesta 47, Zagreb, Croatia
Dražen Vikić-Topić ; NMR Centre, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia

Puni tekst: engleski pdf 2.684 Kb

str. 379-391

preuzimanja: 565

citiraj

APA 6th Edition

Lučić, B., Batista, J., Bojović, V., Lovrić, M., Sović Kržić, A., Bešlo, D., ... Vikić-Topić, D. (2019). Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges. Croatica Chemica Acta, 92 (3), 379-391. https://doi.org/10.5562/cca3551

MLA 8th Edition

Lučić, Bono, et al. "Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges." Croatica Chemica Acta, vol. 92, br. 3, 2019, str. 379-391. https://doi.org/10.5562/cca3551. Citirano 19.04.2024.

Chicago 17th Edition

Lučić, Bono, Jadranko Batista, Viktor Bojović, Mario Lovrić, Ana Sović Kržić, Drago Bešlo, Damir Nadramija i Dražen Vikić-Topić. "Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges." Croatica Chemica Acta 92, br. 3 (2019): 379-391. https://doi.org/10.5562/cca3551

Harvard

Lučić, B., et al. (2019). 'Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges', Croatica Chemica Acta, 92(3), str. 379-391. https://doi.org/10.5562/cca3551

Vancouver

Lučić B, Batista J, Bojović V, Lovrić M, Sović Kržić A, Bešlo D i sur. Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges. Croatica Chemica Acta [Internet]. 2019 [pristupljeno 19.04.2024.];92(3):379-391. https://doi.org/10.5562/cca3551

IEEE

B. Lučić, et al., "Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges", Croatica Chemica Acta, vol.92, br. 3, str. 379-391, 2019. [Online]. https://doi.org/10.5562/cca3551

Sažetak

Shortcomings of the correlation coefficient (Pearson's) as a measure for estimating and calculating the accuracy of predictive model properties are analysed. Here we discuss two such cases that can often occur in the application of the model in predicting properties of a new external set of compounds. The first problem in using the correlation coefficient is its insensitivity to the systemic error that must be expected in predicting properties of a novel external set of compounds, which is not a random sample selected from the training set. The second problem is that an external set can be arbitrarily large or small and have an arbitrary and uneven distribution of the measured value of the target variable, whose values are not known in advance. In these conditions, the correlation coefficient can be an overoptimistic measure of agreement of predicted values with the corresponding experimental values and can lead to a highly optimistic conclusion about the predictive ability of the model. Due to these shortcomings of the correlation coefficient, the use of standard error (root-mean-square-error) of prediction is suggested as a better quality measure of predictive capabilities of a model. In the case of classification models, the use of the difference between the real accuracy and the most probable random accuracy of the model shows very good characteristics in ranking different models according to predictive quality, having at the same time an obvious interpretation .

This work is licensed under a Creative Commons Attribution 4.0 International License.

Ključne riječi

model validation; QSPR; QSAR; two-class variable; classification model; contingency table; estimation; prediction; test set; correlation coefficient; predictive error; classification accuracy; model ranking; random accuracy

Hrčak ID:

238284

URI

https://hrcak.srce.hr/238284

Datum izdavanja:

29.7.2019.

Posjeta: 1.568 *

Prijava i registracija

Croatica Chemica Acta, Vol. 92 No. 3, 2019.

Sažetak

Ključne riječi

Hrčak ID:

URI

Datum izdavanja: