学位论文详细信息
Entropy-based machine learning algorithms applied to genomics and pattern recognition
Machine Learning, Decision Trees, Convolutional Filters, Genomics, Cancer, Entropy
Moon, Wooyoung
关键词: Machine Learning, Decision Trees, Convolutional Filters, Genomics, Cancer, Entropy;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/104838/MOON-DISSERTATION-2019.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Transcription factors (TF) are proteins that interact with DNA to regulate the transcription of DNA to RNA and play key roles in both healthy and cancerous cells. Thus, gaining a deeper understanding of the biological factors underlying transcription factor (TF) binding specificity is important for understanding the mechanism of oncogenesis. As large, biological datasets become more readily available, machine learning (ML) algorithms have proven to make up an important and useful set of tools for cancer researchers. However, there remain many areas for potential improvements for these ML models, including a higher degree of model interpretability and overall accuracy. In this thesis, we present decision tree (DT) methods applied to DNA sequence analysis that result in highly interpretable and accurate predictions.We propose a boosted decision tree (BDT) model using the binary counts of important DNA motifs to predict the binding specificity of TFs belonging to the same protein family of binding similar DNA sequences. We then proceed to introduce a novel application of Convolutional Decision Trees (CDT) and demonstrate that this approach has distinct advantages over the BDT modeil while still accurately predicting the binding specificty of TFs. The CDT models are trained using the Cross Entropy (CE) optimization method, a Monte Carlo optimization method based on concepts from information theory related to statistical mechanics. We then further study the CDT model as a general pattern recognition and transfer learning technique and demonstrate that this approach can learn translationally invariant patterns that lead to high classification accuracy while remaining more interpretable and learning higher quality convolutional filters compared to convolutional neural networks (CNN).

【 预 览 】
附件列表
Files Size Format View
Entropy-based machine learning algorithms applied to genomics and pattern recognition 4159KB PDF download
  文献评价指标  
  下载次数:15次 浏览次数:15次