Gene fusion is a phenomenon known to have an important role in tumour cells. Tumour heterogeneity is a term describing that tumour cells have multiple morphologies and phenotypes including gene fusion. As tumour heterogeneity can be explained by using alternative splicing model, one may model fusion gene transcript in the same way interpret tumour heterogeneity. However, it is hard for many alternative splicing tools to compute fusion gene models as they have to enumerate paths from the splicing graph. A rigid filter is necessary in this case. For gene fusion problem, the number of exons to model is doubled, making computation much more complex, and filtering can be deemed too heavy. In this thesis, the research was conducted by using a recent alternative splicing tool that directly models splicing graph and solves the optimization problem over that graph is used. By doing it, nothing is filtered out before solving optimization problem. The splicing graph and coverage of each exon (node) and junction (arc) are computed based on paired-end RNA sequence data. Then the graph is transformed to canonical convex min-cost flow problem. Then the flow is decomposed into paths which model transcripts after solving time-consuming optimization problem using a simple heuristic. The results show that this approach in fact works as a sensitive classifier for fusion candidates with only a few paired-end fragments that support the fusion. The method outperformed TopHat and deFuse when applied as a filtering scheme to Chimerascan, whose fusion candidates have the most false positives, in terms of $F_3$ score, with slight modification.
【 预 览 】
附件列表
Files
Size
Format
View
Flow network model for detection and quantification of gene fusion