Tehnički vjesnik, Vol. 27 No. 1, 2020.
Izvorni znanstveni članak
https://doi.org/10.17559/TV-20191002034614
Clustering Single-cell RNA-sequencing Data based on Matching Clusters Structures
Yizhang Wang
orcid.org/0000-0002-0687-7802
; College of Computer Science and Technology, Jilin University, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, 2699 Qianjin Street, Changchun, 130012, China
You Zhou
orcid.org/0000-0003-0013-1281
; College of Computer Science and Technology, Jilin University, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, 2699 Qianjin Street, Changchun, 130012, China
Wie Pang
orcid.org/0000-0002-1761-6659
; The School of Natural and Computing Sciences, University of Aberdeen, Aberdeen, UK
Yanchun Liang*
; College of Computer Science and Technology, Jilin University, College of Computer Science, Zhuhai College of Jilin University, 2699 Qianjin Street, Changchun, 130012, China
Shu Wang*
; College of Computer Science Zhuhai College of Jilin University Zhuhai, 519041, China
Sažetak
Single-cell sequencing technology can generate RNA-sequencing data at the single cell level, and one important single-cell RNA-sequencing data analysis method is to identify their cell types without supervised information. Clustering is an unsupervised approach that can help find new insights into biology especially for exploring the biological functions of specific cell type. However, it is challenging for traditional clustering methods to obtain high-quality cell type recognition results. In this research, we propose a novel Clustering method based on Matching Clusters Structures (MCSC) for identifying cell types among single-cell RNA-sequencing data. Firstly, MCSC obtains two different groups of clustering results from the same K-means algorithm because its initial centroids are randomly selected. Then, for one group, MCSC uses shared nearest neighbour information to calculate a label transition matrix, which denotes label transition probability between any two initial clusters. Each initial cluster may be reassigned if merging results after label transition satisfy a consensus function that maximizes structural matching degree of two different groups of clustering results. In essence, the MCSC may be interpreted as a label training process. We evaluate the proposed MCSC with five commonly used datasets and compare MCSC with several classical and state-of-the-art algorithms. The experimental results show that MCSC outperform other algorithms.
Ključne riječi
clustering; consensus function; single-cell sequencing
Hrčak ID:
234164
URI
Datum izdavanja:
15.2.2020.
Posjeta: 1.879 *