期刊论文详细信息
BMC Bioinformatics
Gene expression prediction using low-rank matrix completion
Research Article
Kshitij Marwah1  Arnav Kapur1  Gil Alterovitz2 
[1]Biomedical Cybernetics Laboratory, Harvard Medical School, 02115, Boston, MA, USA
[2]Biomedical Cybernetics Laboratory, Harvard Medical School, 02115, Boston, MA, USA
[3]Department of Health Science and Technology, Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 02139, Cambridge, MA, USA
关键词: Prediction;    Machine learning;    Gene expression;   
DOI  :  10.1186/s12859-016-1106-6
 received in 2015-11-20, accepted in 2016-05-28,  发布年份 2016
来源: Springer
PDF
【 摘 要 】
BackgroundAn exponential growth of high-throughput biological information and data has occurred in the past decade, supported by technologies, such as microarrays and RNA-Seq. Most data generated using such methods are used to encode large amounts of rich information, and determine diagnostic and prognostic biomarkers. Although data storage costs have reduced, process of capturing data using aforementioned technologies is still expensive. Moreover, the time required for the assay, from sample preparation to raw value measurement is excessive (in the order of days). There is an opportunity to reduce both the cost and time for generating such expression datasets.ResultsWe propose a framework in which complete gene expression values can be reliably predicted in-silico from partial measurements. This is achieved by modelling expression data as a low-rank matrix and then applying recently discovered techniques of matrix completion by using nonlinear convex optimisation. We evaluated prediction of gene expression data based on 133 studies, sourced from a combined total of 10,921 samples. It is shown that such datasets can be constructed with a low relative error even at high missing value rates (>50 %), and that such predicted datasets can be reliably used as surrogates for further analysis.ConclusionThis method has potentially far-reaching applications including how bio-medical data is sourced and generated, and transcriptomic prediction by optimisation. We show that gene expression data can be computationally constructed, thereby potentially reducing the costs of gene expression profiling. In conclusion, this method shows great promise of opening new avenues in research on low-rank matrix completion in biological sciences.
【 授权许可】

CC BY   
© Kapur et al. 2016

【 预 览 】
附件列表
Files Size Format View
RO202311098708835ZK.pdf 1131KB PDF download
12864_2016_3440_Article_IEq14.gif 1KB Image download
12864_2017_4269_Article_IEq8.gif 1KB Image download
【 图 表 】

12864_2017_4269_Article_IEq8.gif

12864_2016_3440_Article_IEq14.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  • [55]
  • [56]
  • [57]
  • [58]
  • [59]
  • [60]
  • [61]
  • [62]
  • [63]
  文献评价指标  
  下载次数:0次 浏览次数:1次