期刊论文

【摘要】

Using latent variables in gene expression data can help correct unobserved confounders and increase statistical power for expression quantitative trait Loci (eQTL) detection. The probabilistic estimation of expression residuals (PEER) and principal component analysis (PCA) are widely used methods that can remove unwanted variation and improve eQTL discovery power in bulk RNA-seq analysis. However, their performance has not been evaluated extensively in single-cell eQTL analysis, especially for different cell types. Potential challenges arise due to the structure of single-cell RNA-seq data, including sparsity, skewness, and mean-variance relationship. Here, we show by a series of analyses that PEER and PCA require additional quality control and data transformation steps on the pseudo-bulk matrix to obtain valid latent variables; otherwise, it can result in highly correlated factors (Pearson's correlation r = 0.63 ~ 0.99). Incorporating valid PFs/PCs in the eQTL association model would identify 1.7 ~ 13.3% more eGenes. Sensitivity analysis showed that the pattern of change between the number of eGenes detected and fitted PFs/PCs varied significantly in different cell types. In addition, using highly variable genes to generate latent variables could achieve similar eGenes discovery power as using all genes but save considerable computational resources (~ 6.2-fold faster).

【授权许可】

CC BY
© The Author(s) 2023

【预览】

附件列表
Files	Size	Format	View
RO202305159916140ZK.pdf	2508KB	PDF	download
MediaObjects/13011_2023_522_MOESM2_ESM.pdf	141KB	PDF	download
Fig. 3	723KB	Image	download
Fig. 4	3008KB	Image	download
Fig.6	131KB	Image	download
Fig. 1	389KB	Image	download
Fig. 7	1477KB	Image	download
MediaObjects/40345_2023_287_MOESM1_ESM.docx	151KB	Other	download
Fig. 2	154KB	Image	download
Fig. 2	697KB	Image	download
Fig. 1	501KB	Image	download
Fig. 3	1456KB	Image	download
12936_2023_4483_Article_IEq25.gif	1KB	Image	download
Fig. 2	177KB	Image	download
Fig. 5	796KB	Image	download
Fig. 7	446KB	Image	download

【图表】

Fig. 7

Fig. 5

Fig. 2

12936_2023_4483_Article_IEq25.gif

Fig. 3

Fig. 1

Fig. 2

Fig. 2

Fig. 7

Fig. 1

Fig.6

Fig. 4

Fig. 3

【参考文献】

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]

Genome Biology
Pitfalls and opportunities for applying latent variables in single-cell eQTL analyses
Research
Drew Neavin¹ Seyhan Yazar¹ Angli Xue² Joseph E. Powell³
[1] Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, 2010, Sydney, NSW, Australia;Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, 2010, Sydney, NSW, Australia;School of Biomedical Sciences, University of New South Wales, 2052, Sydney, NSW, Australia;Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, 2010, Sydney, NSW, Australia;UNSW Cellular Genomics Futures Institute, University of New South Wales, 2052, Sydney, NSW, Australia;
关键词: Single-cell RNA-seq; Pseudo-bulk; Latent variable; PEER factors; Principal component analysis; Normalization; eQTL mapping;
DOI : 10.1186/s13059-023-02873-5
received in 2022-06-22, accepted in 2023-02-13, 发布年份 2023
来源: Springer
PDF


	文献评价指标
	下载次数：8次	浏览次数：0次

【 摘 要 】

【 授权许可】

【 预 览 】

【 图 表 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【图表】

【参考文献】