Genome Biology | |
NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks | |
Mian Umair Ahsan1  Qian Liu1  Li Fang1  Kai Wang2  | |
[1] Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, 19104, Philadelphia, PA, USA;Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, 19104, Philadelphia, PA, USA;Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, 19104, Philadelphia, PA, USA; | |
关键词: Variant calling; Long-range haplotype; Deep learning; Difficult-to-map regions; | |
DOI : 10.1186/s13059-021-02472-2 | |
来源: Springer | |
【 摘 要 】
Long-read sequencing enables variant detection in genomic regions that are considered difficult-to-map by short-read sequencing. To fully exploit the benefits of longer reads, here we present a deep learning method NanoCaller, which detects SNPs using long-range haplotype information, then phases long reads with called SNPs and calls indels with local realignment. Evaluation on 8 human genomes demonstrates that NanoCaller generally achieves better performance than competing approaches. We experimentally validate 41 novel variants in a widely used benchmarking genome, which could not be reliably detected previously. In summary, NanoCaller facilitates the discovery of novel variants in complex genomic regions from long-read sequencing.
【 授权许可】
CC BY
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202110147782111ZK.pdf | 3259KB | download |