Entropy | |
Analysis of Data Complexity in Human DNA for Gene-Containing Zone Prediction | |
Ricardo E. Monge1  Juan L. Crespo2  | |
[1] Escuela de Ciencias de la Computación y de la Informática, Universidad de Costa Rica, San Pedro de Montes de Oca, San José, Código Postal 2060-San José, Costa Rica;Escuela de Ingeniería Eléctrica, Universidad de Costa Rica, San Pedro de Montes de Oca, San José, Código Postal 2060-San José, Costa Rica; | |
关键词: information complexity; DNA; genomic variability; gene prediction; nucleic acid sequence; | |
DOI : 10.3390/e17041673 | |
来源: DOAJ |
【 摘 要 】
This study delves further into the analysis of genomic data by computing a variety of complexity measures. We analyze the effect of window size and evaluate the precision and recall of the prediction of gene zones, aided with a much larger dataset (full chromosomes). A technique based on the separation of two cases (gene-containing and non-gene-containing) has been developed as a basic gene predictor for automated DNA analysis. This predictor was tested on various sequences of human DNA obtained from public databases, in a set of three experiments. The first one covers window size and other parameters; the second one corresponds to an analysis of a full human chromosome (198 million nucleic acids); and the last one tests subject variability (with five different individual subjects). All three experiments have high-quality results, in terms of recall and precision, thus indicating the effectiveness of the predictor.
【 授权许可】
Unknown