Skip to the main content

Original scientific paper

https://doi.org/10.17559/TV-20191002034614

Clustering Single-cell RNA-sequencing Data based on Matching Clusters Structures

Yizhang Wang orcid id orcid.org/0000-0002-0687-7802 ; College of Computer Science and Technology, Jilin University, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, 2699 Qianjin Street, Changchun, 130012, China
You Zhou orcid id orcid.org/0000-0003-0013-1281 ; College of Computer Science and Technology, Jilin University, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, 2699 Qianjin Street, Changchun, 130012, China
Wie Pang orcid id orcid.org/0000-0002-1761-6659 ; The School of Natural and Computing Sciences, University of Aberdeen, Aberdeen, UK
Yanchun Liang* ; College of Computer Science and Technology, Jilin University, College of Computer Science, Zhuhai College of Jilin University, 2699 Qianjin Street, Changchun, 130012, China
Shu Wang* ; College of Computer Science Zhuhai College of Jilin University Zhuhai, 519041, China


Full text: english pdf 1.301 Kb

page 89-95

downloads: 876

cite


Abstract

Single-cell sequencing technology can generate RNA-sequencing data at the single cell level, and one important single-cell RNA-sequencing data analysis method is to identify their cell types without supervised information. Clustering is an unsupervised approach that can help find new insights into biology especially for exploring the biological functions of specific cell type. However, it is challenging for traditional clustering methods to obtain high-quality cell type recognition results. In this research, we propose a novel Clustering method based on Matching Clusters Structures (MCSC) for identifying cell types among single-cell RNA-sequencing data. Firstly, MCSC obtains two different groups of clustering results from the same K-means algorithm because its initial centroids are randomly selected. Then, for one group, MCSC uses shared nearest neighbour information to calculate a label transition matrix, which denotes label transition probability between any two initial clusters. Each initial cluster may be reassigned if merging results after label transition satisfy a consensus function that maximizes structural matching degree of two different groups of clustering results. In essence, the MCSC may be interpreted as a label training process. We evaluate the proposed MCSC with five commonly used datasets and compare MCSC with several classical and state-of-the-art algorithms. The experimental results show that MCSC outperform other algorithms.

Keywords

clustering; consensus function; single-cell sequencing

Hrčak ID:

234164

URI

https://hrcak.srce.hr/234164

Publication date:

15.2.2020.

Visits: 1.911 *