Skip to the main content

Original scientific paper

https://doi.org/10.17559/TV-20200918143701

Clustering Algorithm Based on Sparse Feature Vector without Specifying Parameter

Huixia He ; School of Economics and Management, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, Beijing, China
Guiying Wei ; School of Economics and Management, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, Beijing, China
Sen Wu* ; School of Economics and Management, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, Beijing, China
Xiaonan Gao ; School of Economics and Management, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, Beijing, China


Full text: english pdf 585 Kb

page 1974-1981

downloads: 502

cite


Abstract

Parameter setting is an essential factor affecting algorithm performance in data mining techniques. CABOSFV is an efficient clustering algorithm which can cluster binary data with sparse features, but it is challenging to specify the threshold parameter. To solve the difficulty of parameter decision, a clustering algorithm based on sparse feature vector without specifying parameter (CASP) is proposed in this paper. The calculation method of an upper limit of threshold is firstly defined to determine the range of threshold. Furthermore, we use the sparseness index to sort the data and conduct the clustering process based on the adjusted sparse feature vector after data sorting. An interval search strategy is adopted to find a suitable threshold within the defined threshold range, and the clustering result with the selected suitable parameter is the outcome. Experiments on 7 UCI datasets demonstrate that the clustering results of the CASP algorithm are superior to other baselines in terms of both effectiveness and efficiency. CASP not only simplifies the parameter decision process, but also obtains desirable clustering results quickly and stably, which shows the practicability of the algorithm.

Keywords

CABOSFV; clustering; sparse feature; threshold parameter

Hrčak ID:

248249

URI

https://hrcak.srce.hr/248249

Publication date:

19.12.2020.

Visits: 910 *