A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation

Jin, Ran; Kou, Chunhai; Liu, Ruijuan; Guo, Tao

doi:10.17559/TV-20150126121041

Technical gazette, Vol. 23 No. 1, 2016.

Original scientific paper

https://doi.org/10.17559/TV-20150126121041

A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation

Ran Jin ; (1) School of Computer Science and Information Technology, Zhejiang Wanli University, No. 8 South QianHu Road, Ningbo, Zhejiang, 315100, China / (2) College of Computer Science and Technology, Zhejiang University, No.38 Zheda Road, Hangzhou, Zhejiang, 310
Chunhai Kou ; School of Science, Donghua University No. 2999 North Renmin Road, Songjiang district, Shanghai, 201620, China
Ruijuan Liu ; School of Information Science and Technology, Donghua University, No. 2999 North Renmin Road, Songjiang district, Shanghai, 201620, China
Tao Guo ; School of Information Science and Technology, Donghua University, No. 2999 North Renmin Road, Songjiang district, Shanghai, 201620, China

Full text: croatian pdf 1.713 Kb

page 25-33

downloads: 460

cite

APA 6th Edition

Jin, R., Kou, C., Liu, R. & Guo, T. (2016). A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation. Tehnički vjesnik, 23 (1), 25-33. https://doi.org/10.17559/TV-20150126121041

MLA 8th Edition

Jin, Ran, et al. "A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation." Tehnički vjesnik, vol. 23, no. 1, 2016, pp. 25-33. https://doi.org/10.17559/TV-20150126121041. Accessed 26 Dec. 2024.

Chicago 17th Edition

Jin, Ran, Chunhai Kou, Ruijuan Liu and Tao Guo. "A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation." Tehnički vjesnik 23, no. 1 (2016): 25-33. https://doi.org/10.17559/TV-20150126121041

Harvard

Jin, R., et al. (2016). 'A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation', Tehnički vjesnik, 23(1), pp. 25-33. https://doi.org/10.17559/TV-20150126121041

Vancouver

Jin R, Kou C, Liu R, Guo T. A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation. Tehnički vjesnik [Internet]. 2016 [cited 2024 December 26];23(1):25-33. https://doi.org/10.17559/TV-20150126121041

IEEE

R. Jin, C. Kou, R. Liu and T. Guo, "A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation", Tehnički vjesnik, vol.23, no. 1, pp. 25-33, 2016. [Online]. https://doi.org/10.17559/TV-20150126121041

Full text: english pdf 1.713 Kb

page 25-33

downloads: 737

cite

APA 6th Edition

MLA 8th Edition

Chicago 17th Edition

Harvard

Vancouver

IEEE

Abstract

Clustering is one of the significant tasks in data mining, and partition-based clustering algorithms such as k-means are one of the popular solutions. However, with the increasing development of cloud computing and big data, large scale dataset has been a big challenge for clustering. For example, the execution of clustering algorithm is too time-consuming, the optimization of parameters is difficult, and the quality of clusters is not good. To this end, in this paper, we proposed a common framework of partition-based clustering algorithms such as k-means, and designed its MapReduce implementation. Specifically, in order to deal with the representation of large scale dataset, we propose to employ sampling technique. Then, inspired by k-means algorithm, we propose a common procedure of clustering, and provide a k-means based implementation. Furthermore, we implement proposed framework using MapReduce programming model. Experiments show that our method is efficient for large scale dataset.

Keywords

large scale dataset; MapReduce; partition-based clustering; sampling

Hrčak ID:

153152

URI

https://hrcak.srce.hr/153152

Publication date:

19.2.2016.

Article data in other languages: croatian

Visits: 2.679 *

Login and registration

Technical gazette, Vol. 23 No. 1, 2016.

Abstract

Keywords

Hrčak ID:

URI

Publication date: