学位论文详细信息
Analysis of the impact of sequencing errors on BLAST using fault injection
blast;sequencing error;fault injection;sequence alignment;smith-waterman algorithm;ssearch
Lee, So Youn ; Iyer ; Ravishankar K.
关键词: blast;    sequencing error;    fault injection;    sequence alignment;    smith-waterman algorithm;    ssearch;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/45490/So%20Youn_Lee.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

This thesis investigates the impact of sequencing errors in post-sequence computational analyses, including local alignment search and multiple sequence alignment. While the error rates of sequencing technology are commonly reported, the significance of these numbers cannot be fully grasped without putting them in the perspective of their impact on the downstream analyses that are used for biological research, forensics, diagnosis of diseases, etc. I approached the quantification of the impact using fault injection. Faults were injected in the input sequence data, and the analyses were run. Change in the output of the analyses was interpreted as the impact of faults, or errors. Three commonly used algorithms were used: BLAST, SSEARCH, and ProbCons. The main contributions of this work are the application of fault injection to the reliability analysis in bioinformatics and the quantitative demonstration that a small error rate in the sequence data can alter the output of the analysis in a significant way.BLAST and SSEARCH are both local alignment search tools, but BLAST is a heuristic implementation, while SSEARCH is based on the optimal Smith-Waterman algorithm. The error rates were larger than the corresponding fault rates by one to two orders of magnitude, indicating a small error rate in the sequence can drastically change the analysis output. False negative (FN) error rates were much larger than false positive (FP) rates. FN has negative impact because FP can be controlled by more selective subsequent filtering. SSEARCH overall had a smaller standard deviation in the error rates. A small standard deviation is important in predicting the confidence of the output based on the input quality. As the cost of running optimal algorithms like SSEARCH has decreased with the advances in computing technology, it should be more and more encouraged to use them in order to get accurate results. ProbCons is a multiple sequence alignment algorithm. Errors were measured with the sum-of-pairs (SP) and true column (TC) scores and were defined with respect to BAliBASE, a benchmark for multiple sequence alignment algorithms. The results showed no significant correlation between the fault and error rates. Errors measured with SP scores remained in the same order as the fault rate; errors measured with TC scores tended to be larger, but varied without correlation to the fault rate. Such randomness makes the systematic improvement in multiple sequence alignment difficult, and use of a single objective function to optimize the alignment, while the benchmark is aligned largely with human intervention, may be a counterproductive approach to multiple sequence alignment.

【 预 览 】
附件列表
Files Size Format View
Analysis of the impact of sequencing errors on BLAST using fault injection 1334KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:37次