hrcak mascot   Srce   HID

Izvorni znanstveni članak
https://doi.org/10.17559/TV-20190109015453

CUBOS: An Internal Cluster Validity Index for Categorical Data

Xiaonan Gao   ORCID icon orcid.org/0000-0002-0154-4742 ; Donlinks School of Economics and Management, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083, China
Sen Wu ; Donlinks School of Economics and Management, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083, China

Puni tekst: engleski, pdf (584 KB) str. 486-494 preuzimanja: 463* citiraj
APA 6th Edition
Gao, X. i Wu, S. (2019). CUBOS: An Internal Cluster Validity Index for Categorical Data. Tehnički vjesnik, 26 (2), 486-494. https://doi.org/10.17559/TV-20190109015453
MLA 8th Edition
Gao, Xiaonan i Sen Wu. "CUBOS: An Internal Cluster Validity Index for Categorical Data." Tehnički vjesnik, vol. 26, br. 2, 2019, str. 486-494. https://doi.org/10.17559/TV-20190109015453. Citirano 13.06.2021.
Chicago 17th Edition
Gao, Xiaonan i Sen Wu. "CUBOS: An Internal Cluster Validity Index for Categorical Data." Tehnički vjesnik 26, br. 2 (2019): 486-494. https://doi.org/10.17559/TV-20190109015453
Harvard
Gao, X., i Wu, S. (2019). 'CUBOS: An Internal Cluster Validity Index for Categorical Data', Tehnički vjesnik, 26(2), str. 486-494. https://doi.org/10.17559/TV-20190109015453
Vancouver
Gao X, Wu S. CUBOS: An Internal Cluster Validity Index for Categorical Data. Tehnički vjesnik [Internet]. 2019 [pristupljeno 13.06.2021.];26(2):486-494. https://doi.org/10.17559/TV-20190109015453
IEEE
X. Gao i S. Wu, "CUBOS: An Internal Cluster Validity Index for Categorical Data", Tehnički vjesnik, vol.26, br. 2, str. 486-494, 2019. [Online]. https://doi.org/10.17559/TV-20190109015453

Sažetak
Internal cluster validity index is a powerful tool for evaluating clustering performance. The study on internal cluster validity indices for categorical data has been a challenging task due to the difficulty in measuring distance between categorical attribute values. While some efforts have been made, they ignore the relationship between different categorical attribute values and the detailed distribution information between data objects. To solve these problems, we propose a novel index called Categorical data cluster Utility Based On Silhouette (CUBOS). Specifically, we first make clear the superiority of the paradigm of Silhouette index in exploring the details of clustering results. Then, we raise the Improved Distance metric for Categorical data (IDC) inspired by Category Distance to measure distance between categorical data exactly. Finally, the paradigm of Silhouette index and IDC are combined to construct the CUBOS, which can overcome the aforementioned shortcomings and produce more accurate evaluation results than other baselines, as shown by the experimental results on several UCI datasets.

Ključne riječi
categorical data; clustering; distance metric; evaluation; internal cluster validity index

Hrčak ID: 219541

URI
https://hrcak.srce.hr/219541

Posjeta: 687 *