A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation

Jin, Ran; Kou, Chunhai; Liu, Ruijuan; Guo, Tao

doi:10.17559/TV-20150126121041

Tehnički vjesnik, Vol. 23 No. 1, 2016.

Izvorni znanstveni članak

https://doi.org/10.17559/TV-20150126121041

A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation

Ran Jin ; (1) School of Computer Science and Information Technology, Zhejiang Wanli University, No. 8 South QianHu Road, Ningbo, Zhejiang, 315100, China / (2) College of Computer Science and Technology, Zhejiang University, No.38 Zheda Road, Hangzhou, Zhejiang, 310
Chunhai Kou ; School of Science, Donghua University No. 2999 North Renmin Road, Songjiang district, Shanghai, 201620, China
Ruijuan Liu ; School of Information Science and Technology, Donghua University, No. 2999 North Renmin Road, Songjiang district, Shanghai, 201620, China
Tao Guo ; School of Information Science and Technology, Donghua University, No. 2999 North Renmin Road, Songjiang district, Shanghai, 201620, China

Puni tekst: hrvatski pdf 1.713 Kb

str. 25-33

preuzimanja: 402

citiraj

APA 6th Edition

Jin, R., Kou, C., Liu, R. i Guo, T. (2016). A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation. Tehnički vjesnik, 23 (1), 25-33. https://doi.org/10.17559/TV-20150126121041

MLA 8th Edition

Jin, Ran, et al. "A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation." Tehnički vjesnik, vol. 23, br. 1, 2016, str. 25-33. https://doi.org/10.17559/TV-20150126121041. Citirano 19.04.2024.

Chicago 17th Edition

Jin, Ran, Chunhai Kou, Ruijuan Liu i Tao Guo. "A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation." Tehnički vjesnik 23, br. 1 (2016): 25-33. https://doi.org/10.17559/TV-20150126121041

Harvard

Jin, R., et al. (2016). 'A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation', Tehnički vjesnik, 23(1), str. 25-33. https://doi.org/10.17559/TV-20150126121041

Vancouver

Jin R, Kou C, Liu R, Guo T. A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation. Tehnički vjesnik [Internet]. 2016 [pristupljeno 19.04.2024.];23(1):25-33. https://doi.org/10.17559/TV-20150126121041

IEEE

R. Jin, C. Kou, R. Liu i T. Guo, "A common framework of partition-based clustering for large scale dataset using sampling and its MapReduce implementation", Tehnički vjesnik, vol.23, br. 1, str. 25-33, 2016. [Online]. https://doi.org/10.17559/TV-20150126121041

Puni tekst: engleski pdf 1.713 Kb

str. 25-33

preuzimanja: 659

citiraj

APA 6th Edition

MLA 8th Edition

Chicago 17th Edition

Harvard

Vancouver

IEEE

Sažetak

Clustering is one of the significant tasks in data mining, and partition-based clustering algorithms such as k-means are one of the popular solutions. However, with the increasing development of cloud computing and big data, large scale dataset has been a big challenge for clustering. For example, the execution of clustering algorithm is too time-consuming, the optimization of parameters is difficult, and the quality of clusters is not good. To this end, in this paper, we proposed a common framework of partition-based clustering algorithms such as k-means, and designed its MapReduce implementation. Specifically, in order to deal with the representation of large scale dataset, we propose to employ sampling technique. Then, inspired by k-means algorithm, we propose a common procedure of clustering, and provide a k-means based implementation. Furthermore, we implement proposed framework using MapReduce programming model. Experiments show that our method is efficient for large scale dataset.

Ključne riječi

large scale dataset; MapReduce; partition-based clustering; sampling

Hrčak ID:

153152

URI

https://hrcak.srce.hr/153152

Datum izdavanja:

19.2.2016.

Podaci na drugim jezicima: hrvatski

Posjeta: 2.064 *

Prijava i registracija

Tehnički vjesnik, Vol. 23 No. 1, 2016.

Sažetak

Ključne riječi

Hrčak ID:

URI

Datum izdavanja: