期刊论文详细信息
Genome Biology
Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics
Research
Ayshwarya Subramanian1  Yiming Yang2  Bo Li3  Mikhail Alperovich4 
[1] Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA;Brigham and Womens’s Hospital, Harvard Medical School, Boston, USA;Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA;Center for Immunology and Inflammatory Diseases, Department of Medicine, Massachusetts General Hospital, 02114, Boston, MA, USA;Present Address: Department of Cellular and Tissue Genomics, Genentech Inc., South San Francisco, CA, USA;Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA;Center for Immunology and Inflammatory Diseases, Department of Medicine, Massachusetts General Hospital, 02114, Boston, MA, USA;Present Address: Department of Cellular and Tissue Genomics, Genentech Inc., South San Francisco, CA, USA;Department of Medicine, Harvard Medical School, 02115, Boston, MA, USA;Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA;MIT PRIMES, Massachusetts Institute of Technology, Cambridge, MA, USA;Lexington High School, Lexington, MA, USA;Present Address: Wake Technical Community College, Raleigh, USA;
关键词: scRNA-seq;    Quality control (QC);    Data-driven;    Single cell;    Adaptive QC;    Exploratory data analysis (EDA);    Biological variation;   
DOI  :  10.1186/s13059-022-02820-w
 received in 2021-08-18, accepted in 2022-11-23,  发布年份 2022
来源: Springer
PDF
【 摘 要 】

BackgroundQuality control (QC) of cells, a critical first step in single-cell RNA sequencing data analysis, has largely relied on arbitrarily fixed data-agnostic thresholds applied to QC metrics such as gene complexity and fraction of reads mapping to mitochondrial genes. The few existing data-driven approaches perform QC at the level of samples or studies without accounting for biological variation.ResultsWe first demonstrate that QC metrics vary with both tissue and cell types across technologies, study conditions, and species. We then propose data-driven QC (ddqc), an unsupervised adaptive QC framework to perform flexible and data-driven QC at the level of cell types while retaining critical biological insights and improved power for downstream analysis. ddqc applies an adaptive threshold based on the median absolute deviation on four QC metrics (gene and UMI complexity, fraction of reads mapping to mitochondrial and ribosomal genes). ddqc retains over a third more cells when compared to conventional data-agnostic QC filters. Finally, we show that ddqc recovers biologically meaningful trends in gradation of gene complexity among cell types that can help answer questions of biological interest such as which cell types express the least and most number of transcripts overall, and ribosomal transcripts specifically.Conclusionsddqc retains cell types such as metabolically active parenchymal cells and specialized cells such as neutrophils which are often lost by conventional QC. Taken together, our work proposes a revised paradigm to quality filtering best practices—iterative QC, providing a data-driven QC framework compatible with observed biological diversity.

【 授权许可】

CC BY   
© The Author(s) 2022

【 预 览 】
附件列表
Files Size Format View
RO202305064371595ZK.pdf 5151KB PDF download
Fig. 4 141KB Image download
Fig. 1 236KB Image download
Fig. 1 866KB Image download
12982_2022_119_Article_IEq16.gif 1KB Image download
12982_2022_119_Article_IEq48.gif 1KB Image download
12888_2022_4482_Article_IEq2.gif 1KB Image download
MediaObjects/40360_2022_637_MOESM2_ESM.docx 36KB Other download
12888_2022_4365_Article_IEq18.gif 1KB Image download
【 图 表 】

12888_2022_4365_Article_IEq18.gif

12888_2022_4482_Article_IEq2.gif

12982_2022_119_Article_IEq48.gif

12982_2022_119_Article_IEq16.gif

Fig. 1

Fig. 1

Fig. 4

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  • [55]
  • [56]
  • [57]
  • [58]
  • [59]
  • [60]
  • [61]
  • [62]
  • [63]
  • [64]
  • [65]
  • [66]
  • [67]
  • [68]
  • [69]
  • [70]
  • [71]
  • [72]
  • [73]
  • [74]
  • [75]
  • [76]
  • [77]
  • [78]
  • [79]
  • [80]
  • [81]
  • [82]
  • [83]
  • [84]
  • [85]
  • [86]
  • [87]
  • [88]
  • [89]
  • [90]
  • [91]
  文献评价指标  
  下载次数:11次 浏览次数:3次