BMC Research Notes | |
cFinder: definition and quantification of multiple haplotypes in a mixed sample | |
Christian Gabriel1  Thomas Lion3  Peter Valent2  Sandra Preuner-Stix3  Stephan Dreiseitl4  Johannes Pröll1  Karin Wiesinger1  Agnes Barna1  Julia Hafenscher1  Norbert Niklas1  | |
[1] Red Cross Transfusion Service for Upper Austria, Krankenhausstraße 7, Linz, 4017, Austria;Division of Hematology and Hemostaseology, Department of Medicine I, Ludwig Boltzmann Cluster Oncology, Medical University of Vienna, Vienna, Austria;Children’s Cancer Research Institute, Vienna, Austria;University of Applied Sciences Upper Austria, Softwarepark 11, Hagenberg, 4232, Austria | |
关键词: Software; Next-generation sequencing; Clone quantification; Haplotype identification; Mixed sample; | |
Others : 1230308 DOI : 10.1186/s13104-015-1382-7 |
|
received in 2014-07-11, accepted in 2015-08-24, 发布年份 2015 | |
【 摘 要 】
Background
Next-generation sequencing allows for determining the genetic composition of a mixed sample. For instance, when performing resistance testing for BCR-ABL1 it is necessary to identify clones and define compound mutations; together with an exact quantification this may complement diagnosis and therapy decisions with additional information. Moreover, that applies not only to oncological issues but also determination of viral, bacterial or fungal infection. The efforts to retrieve multiple haplotypes (more than two) and proportion information from data with conventional software are difficult, cumbersome and demand multiple manual steps.
Results
Therefore, we developed a tool called cFinder that is capable of automatic detection of haplotypes and their accurate quantification within one sample. BCR-ABL1 samples containing multiple clones were used for testing and our cFinder could identify all previously found clones together with their abundance and even refine some results. Additionally, reads were simulated using GemSIM with multiple haplotypes, the detection was very close to linear (R 2 = 0.96). Our aim is not to deduce haploblocks over statistics, but to characterize one sample’s composition precisely. As a result the cFinder reports the connections of variants (haplotypes) with their readcount and relative occurrence (percentage). Download is available at http://sourceforge.net/projects/cfinder/.
Conclusions
Our cFinder is implemented in an efficient algorithm that can be run on a low-performance desktop computer. Furthermore, it considers paired-end information (if available) and is generally open for any current next-generation sequencing technology and alignment strategy. To our knowledge, this is the first software that enables researchers without extensive bioinformatic support to designate multiple haplotypes and how they constitute to a sample.
【 授权许可】
2015 Niklas et al.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20151106021936222.pdf | 1065KB | download | |
Fig.4. | 21KB | Image | download |
Fig.3. | 25KB | Image | download |
Fig.2. | 8KB | Image | download |
Fig.1. | 24KB | Image | download |
【 图 表 】
Fig.1.
Fig.2.
Fig.3.
Fig.4.
【 参考文献 】
- [1]Ladetto M, Bruggemann M, Monitillo L, Ferrero S, Pepin F, Drandi D, et al.: Next-generation sequencing and real-time quantitative PCR for minimal residual disease detection in B-cell disorders. Leukemia. 2013, 28(6):1299-307.
- [2]Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005, 437:376-380.
- [3]Khorashad JS, Kelley TW, Szankasi P, Mason CC, Soverini S, Adrian LT, et al.: BCR-ABL1 compound mutations in tyrosine kinase inhibitor-resistant CML: frequency and clonal relationships. Blood 2013, 121:489-498.
- [4]Beerenwinkel N, Zagordi O: Ultra-deep sequencing for the analysis of viral populations. Curr Opin Virol 2011, 1:413-418.
- [5]Kuleshov V, Xie D, Chen R, Pushkarev D, Ma Z, Blauwkamp T, et al.: Whole-genome haplotyping using long reads and statistical methods. Nat Biotechnol. 2014, 32(3):261-6.
- [6]Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP: Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 2014, 15:121-132.
- [7]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.: The sequence alignment/map format and SAMtools. Bioinformatics 2009, 25:2078-2079.
- [8]Kastner R, Zopf A, Preuner S, Proll J, Niklas N, Foskett P, et al.: Rapid identification of compound mutations in patients with Philadelphia-positive leukaemias by long-range next generation sequencing. Eur J Cancer 2014, 50:793-800.
- [9]Randy C, Panos MP: An exact algorithm for the maximum clique problem. Oper Res Lett 1990, 9:375-382.
- [10]McElroy KE, Luciani F, Thomas T: GemSIM: general, error-model based simulator of next-generation sequencing data. BMC Genom 2012, 13:74. BioMed Central Full Text