期刊论文详细信息
Frontiers in Oncology 卷:10
Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data
Lauren L. Hsu2  Aedin C. Culhane2 
[1] Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States;
[2] Division of Biostatistics and Computational Biology, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, United States;
关键词: data integration;    matrix factorization;    single cell;    scRNA-seq;    normalization;    standardization;   
DOI  :  10.3389/fonc.2020.00973
来源: DOAJ
【 摘 要 】

Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far from being standardized, most data integration, cell clustering, cell trajectory, and analysis pipelines employ a dimension reduction step, frequently principal component analysis (PCA), a matrix factorization method that is relatively fast, and can easily scale to large datasets when used with sparse-matrix representations. In this review, we provide a guide to PCA and related methods. We describe the relationship between PCA and singular value decomposition, the difference between PCA of a correlation and covariance matrix, the impact of scaling, log-transforming, and standardization, and how to recognize a horseshoe or arch effect in a PCA. We describe canonical correlation analysis (CCA), a popular matrix factorization approach for the integration of single-cell data from different platforms or studies. We discuss alternatives to CCA and why additional preprocessing or weighting datasets within the joint decomposition should be considered.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次