期刊论文详细信息
Genome Biology
Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
Kenong Su1  Wenjing Ma1  Hao Wu2 
[1] Department of Computer Science, Emory University, 400 Dowman Drive, 30322, Atlanta, GA, USA;Department of Computer Science, Emory University, 400 Dowman Drive, 30322, Atlanta, GA, USA;Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, 30322, Atlanta, GA, USA;
关键词: Supervised cell typing;    Reference dataset construction;    scRNA-seq;   
DOI  :  10.1186/s13059-021-02480-2
来源: Springer
PDF
【 摘 要 】

BackgroundCell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustness, and computational performance. Despite all the advantages, the performance of the supervised methods relies heavily on several key factors: feature selection, prediction method, and, most importantly, choice of the reference dataset.ResultsIn this work, we perform extensive real data analyses to systematically evaluate these strategies in supervised cell identification. We first benchmark nine classifiers along with six feature selection strategies and investigate the impact of reference data size and number of cell types in cell type prediction. Next, we focus on how discrepancies between reference and target datasets and how data preprocessing such as imputation and batch effect correction affect prediction performance. We also investigate the strategies of pooling and purifying reference data.ConclusionsBased on our analysis results, we provide guidelines for using supervised cell typing methods. We suggest combining all individuals from available datasets to construct the reference dataset and use multi-layer perceptron (MLP) as the classifier, along with F-test as the feature selection method. All the code used for our analysis is available on GitHub (https://github.com/marvinquiet/RefConstruction_supervisedCelltyping).

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202110141579313ZK.pdf 1706KB PDF download
  文献评价指标  
  下载次数:1次 浏览次数:3次