BMC Bioinformatics | |
Evaluating genome architecture of a complex region via generalized bipartite matching | |
Proceedings | |
Shay Zakov1  Vineet Bafna1  Christine Lo1  Sangwoo Kim1  | |
[1] Department of Computer Science and Engineering, University of California, San Diego, CA, USA; | |
关键词: Killer Cell Immunoglobulin Like Receptor; Human Leucocyte Antigen; Donor Genome; Bipartite Match; Minimum Cost Flow; | |
DOI : 10.1186/1471-2105-14-S5-S13 | |
来源: Springer | |
【 摘 要 】
With the remarkable development in inexpensive sequencing technologies and supporting computational tools, we have the promise of medicine being personalized by knowledge of the individual genome. Current technologies provide high throughput, but short reads. Reconstruction of the donor genome is based either on de novo assembly of the (short) reads, or on mapping donor reads to a standard reference. While such techniques demonstrate high success rates for inferring 'simple' genomic segments, they are confounded by segments with complex duplication patterns, including regions of direct medical relevance, like the HLA and the KIR regions.In this work, we address this problem with a method for assessing the quality of a predicted genome sequence for complex regions of the genome. This method combines two natural types of evidence: sequence similarity of the mapped reads to the predicted donor genome, and distribution of reads across the predicted genome. We define a new scoring function for read-to-genome matchings, which penalizes for sequence dissimilarities and deviations from expected read location distribution, and present an efficient algorithm for finding matchings that minimize the penalty. The algorithm is based on a formal problem, first defined in this paper, called CoverageSensitive many-to-many min-cost bipartiteMatching (CSM). This new problem variant generalizes the standard (one-to-one) weighted bipartite matching problem, and can be solved using network flows. The resulting Java-based tool, called SAGE (Scoring function forAssembledGEnomes), is freely available upon request. We demonstrate over simulated data that SAGE can be used to infer correct haplotypes of the highly repetitive KIR region on the Human chromosome 19.
【 授权许可】
CC BY
© Lo et al.; licensee BioMed Central Ltd. 2013
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311093672955ZK.pdf | 2721KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]