| BMC Research Notes | |
| PSR: polymorphic SSR retrieval | |
| Nunzio D’Agostino1  Concita Cantarella1  | |
| [1] Consiglio per la ricerca in agricoltura e l’analisi dell’economia agraria - Centro di ricerca per l’orticoltura, Via Cavalleggeri 25, Pontecagnano Faiano, 84098, Italy | |
| 关键词: SAM/BAM format; NGS; Polymorphic microsatellites; Length polymorphism; Simple sequence repeats; | |
| Others : 1229883 DOI : 10.1186/s13104-015-1474-4 |
|
| received in 2015-03-06, accepted in 2015-09-21, 发布年份 2015 | |
PDF
|
|
【 摘 要 】
Background
With the advent of high-throughput sequencing technologies large-scale identification of microsatellites became affordable and was especially directed to non-model species. By contrast, few efforts have been published toward the automatic identification of polymorphic microsatellites by exploiting sequence redundancy. Few tools for genotyping microsatellite repeats have been implemented so far that are able to manage huge amount of sequence data and handle the SAM/BAM file format. Most of them have been developed for and tested on human or model organisms with high quality reference genomes.
Results
In this note we describe polymorphic SSR retrieval (PSR), a read counter and simple sequence repeat (SSR) length polymorphism detection tool. It is written in Perl and was developed to identify length polymorphisms in perfect microsatellites exploiting next generation sequencing (NGS) data. PSR has been developed bearing in mind plant non-model species for which de novo transcriptome assembly is generally the first sequence resource available to be used for SSR-mining. PSR is divided into two modules: the read-counting module (PSR_read_retrieval) identifies all the reads that cover the full-length of perfect microsatellites; the comparative module (PSR_poly_finder) detects both heterozygous and homozygous alleles at each microsatellite locus across all genotypes under investigation. Two threshold values to call a length polymorphism and reduce the number of false positives can be defined by the user: the minimum number of reads overlapping the repetitive stretch and the minimum read depth. The first parameter determines if the microsatellite-containing sequence must be processed or not, while the second one is decisive for the identification of minor alleles. PSR was tested on two different case studies. The first study aims at the identification of polymorphic SSRs in a set of de novo assembled transcripts defined by RNA-sequencing of two different plant genotypes. The second research activity aims to investigate sequence variations within a collection of newly sequenced chloroplast genomes. In both the cases PSR results are in agreement with those obtained by capillary gel separation.
Conclusion
PSR has been specifically developed from the need to automate the gene-based and genome-wide identification of polymorphic microsatellites from NGS data. It overcomes the limits related to the existing and time-consuming efforts based on tools developed in the pre-NGS era.
【 授权许可】
2015 Cantarella and D'Agostino.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20151103024628105.pdf | 1407KB | ||
| Fig.2. | 122KB | Image | |
| Fig.1. | 59KB | Image |
【 图 表 】
Fig.1.
Fig.2.
【 参考文献 】
- [1]Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 1999, 27(2):573-580.
- [2]Kolpakov R, Bana G, Kucherov G: mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res 2003, 31(13):3672-3678.
- [3]da Maia LC, Palmieri DA, de Souza VQ, Kopp MM, de Carvalho FI, Costa de Oliveira A: SSR locator: tool for simple sequence repeat discovery integrated with primer design and PCR simulation. Int J Plant Genom. 2008, 2008:412696.
- [4]Tang J, Baldwin SJ, Jacobs JM, Linden CG, Voorrips RE, Leunissen JA, et al.: Large-scale identification of polymorphic microsatellites using an in silico approach. BMC Bioinform 2008, 9:374. BioMed Central Full Text
- [5]Temnykh S, DeClerck G, Lukashova A, Lipovich L, Cartinhour S, McCouch S: Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res 2001, 11(8):1441-1452.
- [6]Churbanov A, Ryan R, Hasan N, Bailey D, Chen H, Milligan B, et al.: HighSSR: high-throughput SSR characterization and locus development from next-gen sequencing data. Bioinformatics 2012, 28(21):2797-2803.
- [7]Chandra A, Grisham MP, Pan YB: Allelic divergence and cultivar-specific SSR alleles revealed by capillary electrophoresis using fluorescence-labeled SSR markers in sugarcane. Genome/Natl Res Counc Can = Genome/Conseil national de recherches Can. 2014, 57(6):363-372.
- [8]Zalapa JE, Cuevas H, Zhu H, Steffan S, Senalik D, Zeldin E, et al.: Using next-generation sequencing approaches to isolate simple sequence repeat (SSR) loci in the plant sciences. Am J Bot 2012, 99(2):193-208.
- [9]Iorizzo M, Senalik DA, Grzebelus D, Bowman M, Cavagnaro PF, Matvienko M, et al.: De novo assembly and characterization of the carrot transcriptome reveals novel genes, new markers, and genetic diversity. BMC Genom 2011, 12:389. BioMed Central Full Text
- [10]Zhang H, Wei L, Miao H, Zhang T, Wang C: Development and validation of genic-SSR markers in sesame by RNA-seq. BMC Genom 2012, 13:316. BioMed Central Full Text
- [11]Shirasawa K, Koilkonda P, Aoki K, Hirakawa H, Tabata S, Watanabe M, et al.: In silico polymorphism analysis for the development of simple sequence repeat and transposon markers and construction of linkage map in cultivated peanut. BMC Plant Biol 2012, 12:80. BioMed Central Full Text
- [12]D’Agostino N, Golas T, van de Geest H, Bombarely A, Dawood T, Zethof J, et al.: Genomic analysis of the native European Solanum species, S. dulcamara. BMC Genom. 2013, 14:356. BioMed Central Full Text
- [13]Liu Z, Chen T, Ma L, Zhao Z, Zhao PX, Nan Z, et al.: Global transcriptome sequencing using the Illumina platform and the development of EST-SSR markers in autotetraploid alfalfa. PLoS One 2013, 8(12):e83549.
- [14]Xiao Y, Zhou L, Xia W, Mason AS, Yang Y, Ma Z, et al.: Exploiting transcriptome data for the development and characterization of gene-based SSR markers related to cold tolerance in oil palm (Elaeis guineensis). BMC Plant Biol 2014, 14:384. BioMed Central Full Text
- [15]Sio C-P Lu Y-L, Chen C-M, Pai T-W, Chang H-T, editors. Mining polymorphic SSRs from individual genome sequences. In: The seventh international conference on complex, intelligent, and software intensive systems (CISIS), Taichung; 2013.
- [16]Hoffman JI, Nichols HJ: A novel approach for mining polymorphic microsatellite markers in silico. PLoS One 2011, 6(8):e23283.
- [17]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.: The sequence alignment/map format and SAMtools. Bioinformatics 2009, 25(16):2078-2079.
- [18]Gymrek M, Golan D, Rosset S, Erlich Y: lobSTR: a short tandem repeat profiler for personal genomes. Genome Res 2012, 22(6):1154-1162.
- [19]Highnam G, Franck C, Martin A, Stephens C, Puthige A, Mittelman D: Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles. Nucleic Acids Res 2013, 41(1):e32.
- [20]Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, et al.: Accurate typing of short tandem repeats from genome-wide sequencing data and its applications. Genome Res 2015, 25(5):736-749.
- [21]Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9(4):357-359.
PDF