The Difference Between the Accuracy of Real and the Corresponding Random Model is a Useful Parameter for Validation of Two-State Classification Model Quality

Batista, Jadranko; Vikić-Topić, Dražen; Lučić, Bono

doi:10.5562/cca3117

Croatica Chemica Acta, Vol. 89 No. 4, 2016.

Izvorni znanstveni članak

The Difference Between the Accuracy of Real and the Corresponding Random Model is a Useful Parameter for Validation of Two-State Classification Model Quality

Jadranko Batista ; University of Mostar, Faculty of Science and Education, Mostar, Bosnia and Herzegovina
Dražen Vikić-Topić ; NMR Center, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia
Bono Lučić orcid.org/0000-0001-7232-2007 ; NMR Center, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia

Puni tekst: engleski pdf 642 Kb

str. 527-534

preuzimanja: 2.073

citiraj

APA 6th Edition

Batista, J., Vikić-Topić, D. i Lučić, B. (2016). The Difference Between the Accuracy of Real and the Corresponding Random Model is a Useful Parameter for Validation of Two-State Classification Model Quality. Croatica Chemica Acta, 89 (4), 527-534. https://doi.org/10.5562/cca3117

MLA 8th Edition

Batista, Jadranko, et al. "The Difference Between the Accuracy of Real and the Corresponding Random Model is a Useful Parameter for Validation of Two-State Classification Model Quality." Croatica Chemica Acta, vol. 89, br. 4, 2016, str. 527-534. https://doi.org/10.5562/cca3117. Citirano 07.04.2025.

Chicago 17th Edition

Batista, Jadranko, Dražen Vikić-Topić i Bono Lučić. "The Difference Between the Accuracy of Real and the Corresponding Random Model is a Useful Parameter for Validation of Two-State Classification Model Quality." Croatica Chemica Acta 89, br. 4 (2016): 527-534. https://doi.org/10.5562/cca3117

Harvard

Batista, J., Vikić-Topić, D., i Lučić, B. (2016). 'The Difference Between the Accuracy of Real and the Corresponding Random Model is a Useful Parameter for Validation of Two-State Classification Model Quality', Croatica Chemica Acta, 89(4), str. 527-534. https://doi.org/10.5562/cca3117

Vancouver

Batista J, Vikić-Topić D, Lučić B. The Difference Between the Accuracy of Real and the Corresponding Random Model is a Useful Parameter for Validation of Two-State Classification Model Quality. Croatica Chemica Acta [Internet]. 2016 [pristupljeno 07.04.2025.];89(4):527-534. https://doi.org/10.5562/cca3117

IEEE

J. Batista, D. Vikić-Topić i B. Lučić, "The Difference Between the Accuracy of Real and the Corresponding Random Model is a Useful Parameter for Validation of Two-State Classification Model Quality", Croatica Chemica Acta, vol.89, br. 4, str. 527-534, 2016. [Online]. https://doi.org/10.5562/cca3117

Sažetak

The simplest and the most commonly used measure for assess the classification model quality is parameter Q2 = 100 (p + n) / N (%) named the classification accuracy, p, n and N are the total numbers of correctly predicted compounds in the first and in the second class, and the total number of elements of classes (compounds) in data set, respectively. Moreover, the most probable accuracy that can be obtained by a random model is calculated for two-state model by the formulae Q2,rnd = 100 [(p + u) (p + o) + (n + u) (n + o)] / N2 (%), where u and o are the total number of under-predictions (when class 1 is predicted by the model as class 2) and over-predictions (when class 2 is predicted by the model as class 1) in data set, respectively. Finally, the difference between these two parameter ΔQ2 = Q2 – Q2,rnd is introduced, and it is suggested to compute and give ΔQ2 for each two-state classification model to assess its contribution over the accuracy of the corresponding random model. When data set is ideally balanced having the same numbers of elements in both classes, the two-state classification problem is the most difficult with maximal Q2 = 100 % and Q2,rnd = 50 %, giving the maximal ΔQ2 = 50 %. The usefulness of ΔQ2 parameter is illustrated in comparative analysis on two-class classification models from literature for prediction of secondary structure of membrane proteins and on several quanti¬tative structure-property models. Real contributions of these models over the random level of accuracy is calculated, and their ΔQ2 values are compared mutually and with the value of ΔQ2 (= 50 %) for the most difficult two-state classification model.

Ključne riječi

classification model; Q2 accuracy; overall classification accuracy; random classification accuracy; classification accuracy difference; correct class estimation; under-prediction; over-prediction; class imbalance; membrane structure modeling; QSAR classification modeling

Hrčak ID:

181475

URI

https://hrcak.srce.hr/181475

Datum izdavanja:

19.12.2016.

Posjeta: 3.485 *

Prijava i registracija

Croatica Chemica Acta, Vol. 89 No. 4, 2016.

Sažetak

Ključne riječi

Hrčak ID:

URI

Datum izdavanja:

closePristupačnostrefresh

Pristupačnost