期刊论文详细信息
BMC Genetics
Multi-view singular value decomposition for disease subtyping and genetic associations
Henry R Kranzler2  Jinbo Bi1  Jiangwen Sun1 
[1] Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Way, Storrs, CT 06269, USA;Treatment Research Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 24105, USA
关键词: Matrix decomposition;    Biclustering;    Subtyping;    Multi-view data analysis;    Genotype-phenotype association;   
Others  :  866429
DOI  :  10.1186/1471-2156-15-73
 received in 2014-01-29, accepted in 2014-06-06,  发布年份 2014
PDF
【 摘 要 】

Background

Accurate classification of patients with a complex disease into subtypes has important implications for medicine and healthcare. Using more homogeneous disease subtypes in genetic association analysis will facilitate the detection of new genetic variants that are not detectible using the non-differentiated disease phenotype. Subtype differentiation can also improve diagnostic classification, which can in turn inform clinical decision making and treatment matching. Currently, the most sophisticated methods for disease subtyping perform cluster analysis using patients’ clinical features. Without guidance from genetic information, the resultant subtypes are likely to be suboptimal and efforts at genetic association may fail.

Results

We propose a multi-view matrix decomposition approach that integrates clinical features with genetic markers to detect confirmatory evidence for a disease subtype. This approach groups patients into clusters that are consistent between the clinical and genetic dimensions of data; it simultaneously identifies the clinical features that define the subtype and the genotypes associated with the subtype. A simulation study validated the proposed approach, showing that it identified hypothesized subtypes and associated features. In comparison to the latest biclustering and multi-view data analytics using real-life disease data, the proposed approach identified clinical subtypes of a disease that differed from each other more significantly in the genetic markers, thus demonstrating the superior performance of the proposed approach.

Conclusions

The proposed algorithm is an effective and superior alternative to the disease subtyping methods employed to date. Integration of phenotypic features with genetic markers in the subtyping analysis is a promising approach to identify concurrently disease subtypes and their genetic associations.

【 授权许可】

   
2014 Sun et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140727072958341.pdf 400KB PDF download
72KB Image download
54KB Image download
22KB Image download
44KB Image download
【 图 表 】

【 参考文献 】
  • [1]Kranzler HR, Wilcox M, Weiss RD, Brady K, Hesselbrock V, Rounsaville B, Farrer L, Gelernter J: The validity of cocaine dependence subtypes. Addict Behav 2008, 33(1):41-53.
  • [2]Babor TF, Caetano R: Subtypes of substance dependence and abuse: implications for diagnostic classification and empirical research. Addiction (Abingdon, England) 2006, 101:104-110.
  • [3]McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JPA, Hirschhorn JN: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008, 9(5):356-369.
  • [4]Hodgkinson CA, Yuan Q, Xu K, Shen PH, Heinz E, Lobos EA, Binder EB, Cubells J, Ehlers CL, Gelernter J, Mann J, Riley B, Roy A, Tabakoff B, Todd RD, Zhou Z, Goldman D: Addictions biology: haplotype-based analysis for 130 candidate genes on a single array. Alcohol Alcohol 2008, 43(5):505-515.
  • [5]Gelernter J, Panhuysen C, Wilcox M, Hesselbrock V, Rounsaville B, Poling J, Weiss R, Sonne S, Zhao H, Farrer L, Kranzler HR: Genomewide linkage scan for opioid dependence and related traits. Am J Hum Genet 2006, 78(5):759-769.
  • [6]Schwartz B, Wetzler S, Swanson A, Sung SC: Subtyping of substance use disorders in a high-risk welfare-to-work sample: a latent class analysis. J Subst Abuse Treat 2010, 38(4):366-374.
  • [7]Chen P, Hung YS, Fan Y, Wong STC: An integrative bioinformatics approach for identifying subtypes and subtype-specific drivers in cancer. In IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). New York: IEEE; 2012:169-176.
  • [8]Tay ST, Leong SH, Yu K, Aggarwal A, Tan SY, Lee CH, Wong K, Visvanathan J, Lim D, Wong WK, Soo KC, Kon OL, Tan P: A combined comparative genomic hybridization and expression microarray analysis of gastric cancer reveals novel molecular subtypes. Cancer Res 2003, 63(12):3309-3316.
  • [9]Kumar A, Rai P, Daume HIII: Co-regularized multi-view spectral clustering. In Advances in Neural Information Processing Systems 24. Edited by Weinberger KQ, Pereira FCN, Bartlett P, Zemel RS, Shawe-Taylor J. Cambridge, MA: MIT Press; 2011:1413-1421.
  • [10]Chaudhuri K, Kakade SM, Livescu K, Sridharan K: Multi-view clustering via canonical correlation analysis. In Proceedings of the 26th International Conference on Machine Learning. New York: ACM; 2009:129-136.
  • [11]Van Mechelen I, Bock H-H, De Boeck P: Two-mode clustering methods: a structured overview. Stat Methods Med Res 2004, 13(5):363-394.
  • [12]Lee M, Shen H, Huang JZ, Marron JS: Biclustering via sparse singular value decomposition. Biometrics 2010, 66(4):1087-1095.
  • [13]Kumar A, Daume HIII: A co-training approach for multi-view spectral clustering. In Proceedings of the 28th International Conference on Machine Learning. Edited by Getoor L. Scheffer New York: ACM; 2011:393-400.
  • [14]Guan Y, Dy J, Jordan MI: A unified probabilistic model for global and local unsupervised feature selection. In Proceedings of the 28th International Conference on Machine Learning. New York: ACM; 2011:1073-1080.
  • [15]Fawcett T: An introduction to ROC analysis. Pattern Recogn Lett 2006, 27(8):861-874.
  • [16]Yuan G-X, Ho C-H, Lin C-J: An improved glmnet for l1-regularized logistic regression. J Mach Learn Res 2012, 13:1999-2030.
  • [17]The 1000 Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature 2012, 491(7422):56-65.
  • [18]American Psychiatric Association: Diagnostic and Statistical Manual of Mental Disorders: Fourth Edition (DSM-IV). Washington, DC: American Psychiatric Press Inc; 1994.
  • [19]Pierucci-Lagha A, Gelernter J, Chan G, Arias A, Cubells JF, Farrer L, Kranzler HR: Reliability of dsm-iv diagnostic criteria using the semi-structured assessment for drug dependence and alcoholism (ssadda). Drug Alcohol Depend 2007, 91(1):85-90.
  • [20]Bi J, Gelernter J, Sun J, Kranzler HR: Comparing the utility of homogeneous subtypes of cocaine use and related behaviors with DSM-IV cocaine dependence as traits for genetic association analysis. Am J Med Genet B 2013, 2:148-156.
  • [21]Sun J, Bi J, Kranzler HR: Multi-view co-modeling to improve subtyping and genetic association of complex diseases. IEEE J Biomed Health Inf 2013, 18(2):548-554.
  • [22]Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics 2000, 155(2):945-959.
  • [23]Chan G, Gelernter J, Oslin D, Farrer L, Kranzler HR: Empirically derived subtypes of opioid use and related behaviors. Addiction 2011, 106(6):1146-1154.
  • [24]Sun J, Bi J, Chan G, Anton RF, Oslin D, Farrer L, Gelernter J, Kranzler HR: Improved methods to identify stable, highly heritable subtypes of opioid use and related behaviors. Addict Behav 2012, 37(10):1138-1144.
  • [25]Sun J, Bi J, Kranzler HR: A multi-objective program for quantitative subtyping of clinically-relevant phenotypes. In Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM2012). New York: ACM; 2012:256-261.
  文献评价指标  
  下载次数:22次 浏览次数:21次