Technical gazette, Vol. 27 No. 6, 2020.
Original scientific paper
https://doi.org/10.17559/TV-20200918143701
Clustering Algorithm Based on Sparse Feature Vector without Specifying Parameter
Huixia He
; School of Economics and Management, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, Beijing, China
Guiying Wei
; School of Economics and Management, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, Beijing, China
Sen Wu*
; School of Economics and Management, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, Beijing, China
Xiaonan Gao
; School of Economics and Management, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, Beijing, China
Abstract
Parameter setting is an essential factor affecting algorithm performance in data mining techniques. CABOSFV is an efficient clustering algorithm which can cluster binary data with sparse features, but it is challenging to specify the threshold parameter. To solve the difficulty of parameter decision, a clustering algorithm based on sparse feature vector without specifying parameter (CASP) is proposed in this paper. The calculation method of an upper limit of threshold is firstly defined to determine the range of threshold. Furthermore, we use the sparseness index to sort the data and conduct the clustering process based on the adjusted sparse feature vector after data sorting. An interval search strategy is adopted to find a suitable threshold within the defined threshold range, and the clustering result with the selected suitable parameter is the outcome. Experiments on 7 UCI datasets demonstrate that the clustering results of the CASP algorithm are superior to other baselines in terms of both effectiveness and efficiency. CASP not only simplifies the parameter decision process, but also obtains desirable clustering results quickly and stably, which shows the practicability of the algorithm.
Keywords
CABOSFV; clustering; sparse feature; threshold parameter
Hrčak ID:
248249
URI
Publication date:
19.12.2020.
Visits: 1.270 *