Comparative Analysis of Classic Clustering Algorithms and Girvan-Newman Algorithm for Finding Communities in Social Networks
Klíčová slova:
data mining, datasets, clusters, communities, graphs, social networks, ICT, Girvan-Newman algorithm, clustering algorithmsAbstrakt
Nowadays finding patterns in large social network datasets is a growing challenge and an important subject of interest. One of current problems in this field is identifying clusters within social networks with large number of nodes. Social network clusters are not necessarily disjoint sets; rather they may overlap and have common nodes, in which case it is more appropriate to designate them as communities. Although many clustering algorithms handle small datasets well, they are usually extremely inefficient on large datasets. This paper shows comparative analysis of frequently used classic graph clustering algorithms and well-known Girvan-Newman algorithm that is used for identification of communities in graphs, which is especially optimized for large datasets. The goal of the paper is to show which of the algorithms give best performances on given dataset. The paper presents real problem of data clustering, algorithms that can be used for its solution, methodology of analysis, results that were achieved and conclusions that were derived.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Reference
Barabasi, A. L. (2016), Network Science, Cambridge University Press, Cambridge, UK.
Behnke, L. (2015), “Implementation of an agglomerative hierarchical clustering algorithm in Java”, available at: https://github.com/lbehnke/hierarchical-clustering-java(April 15, 2016).
Gephi Consortium (2016), “GEPHI”, available at:https://gephi.org/(March 10, 2016).
Girvan, M., Newman, M.E.J. (2002), “Community structure in social and biological networks”, in Proceedings of the National Academy of Sciences of the United States of America, pp. 7821-7826.
Kuchar, J. (2014), “The Girvan Newman Clustering plugin for Gephi”, available at: https://github.com/jaroslav-kuchar/GirmanNewmanClustering(March 10, 2016).
Leskovec, J., Kleinberg, J., Faloutsos, C. (2007), “Graph evolution: Densication and shrinking diameters”, ACM Transactions on Knowledge Discovery from Data, Vol. 1 No.1.
Leskovec, J., Rajaraman, A., Ullman, J. D. (2014), Mining of Massive Datasets, Palo Alto, CA, USA.
Ljucović, J., Tomović, S. (2016), “Analyzing clusters in the University of Montenegro collaboration network”, in Proceedings of MECO 2016, Budva, Montenegro.
Maimon, O., Rokach, L. (2010), Data Mining and Knowledge Discovery Handbook, Springer, USA.
O’Madadhain, J. (2016), “JUNG - Java Universal Network/Graph Framework”, available at: http://jung.sourceforge.net/ (April 05, 2016).
Zafarani, R., Abbasi, M.A., Liu, H. (2014), Social Media Mining: An Introduction, Cambridge University Press.