Investigation of the optimal number of clusters by the adaptive EM algorithm
Abstract
This paper considers the investigation of the optimal number of clusters for datasets that are modeled as the Gaussian mixture. For that purpose, the adaptive method that is based on the modified Expectation Maximization (EM) algorithm is developed. The modification is conducted within the hidden variable of the standard EM algorithm. Assuming that data are multivariate normally distributed, where each component of the Gaussian mixture corresponds to one cluster, the modification is provided by utilizing the fact that the Mahalanobis distance of samples follows a Chi-square distribution. Besides, the quantity measure is constructed in order to determine number of clusters. The proposed method is presented in several numerical examples.
Downloads
Published
Issue
Section
License
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).