期刊论文详细信息
BMC Bioinformatics
An introduction to new robust linear and monotonic correlation coefficients
Zoran Bursac1  Habib Tabatabai2  Karan P. Singh3  Mohammad Tabatabai4  Stephanie Bailey4  Derek Wilus4 
[1] Department of Biostatistics, Florida International University, 33199, Miami, FL, USA;Department of Civil and Environmental Engineering, University of Wisconsin Milwaukee, 53211, Milwaukee, WI, USA;Department of Epidemiology and Biostatistics, University of Texas Health Sciences Center at Tyler, 75708, Tyler, TX, USA;Meharry Medical College, 37208, Nashville, TN, USA;
关键词: Pearson correlation;    Spearman correlation;    Quadrant correlation;    Median correlation;    Minimum covariance determinant correlation;    Dissimilarity measures;    Gene expression;    Williams syndrome;   
DOI  :  10.1186/s12859-021-04098-4
来源: Springer
PDF
【 摘 要 】

BackgroundThe most common measure of association between two continuous variables is the Pearson correlation (Maronna et al. in Safari an OMC. Robust statistics, 2019. https://login.proxy.bib.uottawa.ca/login?url=https://learning.oreilly.com/library/view/-/9781119214687/?ar&orpq&email=^u). When outliers are present, Pearson does not accurately measure association and robust measures are needed. This article introduces three new robust measures of correlation: Taba (T), TabWil (TW), and TabWil rank (TWR). The correlation estimators T and TW measure a linear association between two continuous or ordinal variables; whereas TWR measures a monotonic association. The robustness of these proposed measures in comparison with Pearson (P), Spearman (S), Quadrant (Q), Median (M), and Minimum Covariance Determinant (MCD) are examined through simulation. Taba distance is used to analyze genes, and statistical tests were used to identify those genes most significantly associated with Williams Syndrome (WS).ResultsBased on the root mean square error (RMSE) and bias, the three proposed correlation measures are highly competitive when compared to classical measures such as P and S as well as robust measures such as Q, M, and MCD. Our findings indicate TBL2 was the most significant gene among patients diagnosed with WS and had the most significant reduction in gene expression level when compared with control (P value = 6.37E-05).ConclusionsOverall, when the distribution is bivariate Log-Normal or bivariate Weibull, TWR performs best in terms of bias and T performs best with respect to RMSE. Under the Normal distribution, MCD performs well with respect to bias and RMSE; but TW, TWR, T, S, and P correlations were in close proximity. The identification of TBL2 may serve as a diagnostic tool for WS patients. A Taba R package has been developed and is available for use to perform all necessary computations for the proposed methods.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202107022720167ZK.pdf 5807KB PDF download
  文献评价指标  
  下载次数:26次 浏览次数:15次