学位论文详细信息
Unsupervised and semi-supervised trainingmethods for eukaryotic gene prediction
Hidden markov models;Self-training;Gene annotation;Genome annotation;Viterbi algorithm;Unsupervised training;Gene prediction;Gene finding
Ter-Hovhannisyan, Vardges ; Biology
University:Georgia Institute of Technology
Department:Biology
关键词: Hidden markov models;    Self-training;    Gene annotation;    Genome annotation;    Viterbi algorithm;    Unsupervised training;    Gene prediction;    Gene finding;   
Others  :  https://smartech.gatech.edu/bitstream/1853/26645/1/TerHovhannisyan_Vardges_200812_phd.pdf
美国|英语
来源: SMARTech Repository
PDF
【 摘 要 】

This thesis describes new gene finding methods for eukaryotic gene prediction.The current methods for deriving model parameters for gene prediction algorithms are based on curated or experimentally validated set of genes or gene elements.These training sets often require time and additional expert efforts especially for the species that are in the initial stages of genome sequencing.Unsupervised training allows determination of model parameters from anonymous genomic sequence with.The importance and the practical applicability of the unsupervised training is critical for ever growing rate of eukaryotic genome sequencing.Three distinct training procedures are developed for diverse group of eukaryotic species. GeneMark-ES is developed for species with strong donor and acceptor site signals such as Arabidopsis thaliana, Caenorhabditis elegans and Drosophila melanogaster.The second version of the algorithm, GeneMark-ES-2, introduces enhanced intron model to better describe the gene structure of fungal species with posses with relatively weak donor and acceptor splice sites and well conserved branch point signal.GeneMark-LE, semi-supervised training approach is designed for eukaryotic species with small number of introns.The results indicate that the developed unsupervised training methods perform well as compared to other training methods and as estimated from the set of genes supported by EST-to-genome alignments.Analysis of novel genomes reveals interesting biological findings and show that several candidates of under-annotated and over-annotated fungal species are present in the current set of annotated of fungal genomes.

【 预 览 】
附件列表
Files Size Format View
Unsupervised and semi-supervised trainingmethods for eukaryotic gene prediction 6561KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:21次