期刊论文详细信息
BMC Bioinformatics
The taxonomic name resolution service: an online tool for automated standardization of plant names
Brad Boyle3  Nicole Hopkins9  Zhenyuan Lu8  Juan Antonio Raygoza Garay9  Dmitry Mozzherin7  Tony Rees6  Naim Matasci9  Martha L Narro9  William H Piel5  Sheldon J Mckay8  Sonya Lowry9  Chris Freeland2  Robert K Peet4  Brian J Enquist1 
[1] The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM, 87501, USA
[2] Missouri Botanical Garden, 4344 Shaw Blvd.
[3] , St. Louis, MO, 63110, USA
[4] The iPlant Collaborative, Thomas W. Keating Bioresearch Building, 1657 East Helen Street, Tucson, AZ, 85721, USA
[5] Department of Biology, CB 3280, University of North Carolina, Chapel Hill, NC, 27599-3280, USA
[6] Yale-NUS College, 6 College Avenue East, Singapore, 138614, Singapore
[7] Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania, 7001, Australia
[8] 7 MBL street, Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA, 02543, USA
[9] Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724-2202, USA
[10] BIO5 Institute, 1657 East Helen Street, PO Box 210240, Tucson, AZ, 85721-0240, USA
关键词: Plants;    Taxonomy;    Database integration;    Biodiversity informatics;   
Others  :  1088018
DOI  :  10.1186/1471-2105-14-16
 received in 2012-09-25, accepted in 2013-01-02,  发布年份 2013
PDF
【 摘 要 】

Background

The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this ‘names problem’ has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science.

Results

The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including the Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets.

Conclusions

We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at http://tnrs.iplantcollaborative.org/ webcite and as a RESTful web service and application programming interface. Source code is available at https://github.com/iPlantCollaborativeOpenSource/TNRS/ webcite.

【 授权许可】

   
2013 Boyle et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117064904523.pdf 2284KB PDF download
Figure 2. 77KB Image download
Figure 1. 128KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Global biodiversity information facility. http://www.gbif.org/ webcite
  • [2]Tropicos. http://www.tropicos.org webcite
  • [3]REMIB - Red mundial de informacion sobre biodiversidad. http://www.conabio.gob.mx/remib/doctos/remib_esp.html webcite
  • [4]OBIS. http://www.iobis.org/ webcite
  • [5]VertNet. http://vertnet.org/index.php webcite
  • [6]MaNIS. http://manisnet.org/ webcite
  • [7]The paleobiology database. http://paleodb.org/cgi-bin/bridge.pl webcite
  • [8]Peet RK, Lee MT, Jennings MD, Faber-Langendoen D: VegBank: a permanent, open-access archive for vegetation plot data. Biodiversity Ecol 2012, 4:233-241.
  • [9]Enquist B, Boyle B, et al.: The SALVIAS vegetation inventory database. In Vegetation databases for the 21st century Edited by Dengler J, Oldeland J, Jansen F. 2012, 288. [Biodiversity & Ecology]
  • [10]Gray AN, Brandeis TJ, Shaw JD, McWilliams WH, Miles PD: Forest inventory and analysis database of the United States of America. Biodiversity Ecol 2012, 4:225-231.
  • [11]Lopez-Gonzalez G, Lewis SL, Burkitt M, Phillips OL: ForestPlots.net: a web application and research tool to manage and analyse tropical forest plot data. J Veg Sci 2011, 22:610-613.
  • [12]Center for tropical forest science. http://www.ctfs.si.edu/ webcite
  • [13]Dengler J, Jansen F, Glöckler F, et al.: The Global Index of Vegetation-Plot Databases (GIVD): a new resource for vegetation science. J Veg Sci 2011, 22:582-597.
  • [14]TraitNet. http://traitnet.ecoinformatics.org/ webcite
  • [15]Kattge J, Díaz S, Lavorel S, et al.: TRY–a global database of plant traits. Glob Chang Biol 2011, 17:2905-2935.
  • [16]Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Res 2009, 37:D26-31.
  • [17]TreeBASE. http://www.treebase.org/treebase-web/home.html webcite
  • [18]Thomas C: Biodiversity databases spread, prompting unification call. Science 2009, 324:1632.
  • [19]Funk VA, Zermoglio MF, Nasir N: Testing the use of specimen collection data and GIS in biodiversity exploration and conservation decision making in Guyana. Biodivers Conserv 1999, 8:727-751.
  • [20]Frese L: Towards improved in situ management of Europe’s crop wild relatives. Crop Wild Relative 2008, 3627:24-25.
  • [21]Harris ESJ, Erickson SD, Tolopko AN, et al.: Traditional Medicine Collection Tracking System (TM-CTS): a database for ethnobotanically driven drug-discovery programs. J Ethnopharmacol 2011, 135:590-3.
  • [22]Paton A: Biodiversity informatics and the plant conservation baseline. Trends Plant Sci 2009, 14:629-37.
  • [23]Edwards JL: Interoperability of biodiversity databases: biodiversity information on every desktop. Science 2000, 289:2312-2314.
  • [24]The plant list. http://www.theplantlist.org webcite
  • [25]Guralnick RP, Hill AW, Lane M: Towards a collaborative, global infrastructure for biodiversity assessment. Ecol Lett 2007, 10:663-72.
  • [26]Dayrat B: Towards integrative taxonomy. Biol J Linn Soc 2005, 85:407-415.
  • [27]Bortolus A: Error cascades in the biological sciences: the unwanted consequences of using bad taxonomy in ecology. AMBIO: J Hum Environ 2008, 37:114-118.
  • [28]Global names. http://www.globalnames.org webcite
  • [29]UBio. http://www.ubio.org webcite
  • [30]International plant names index. http://www.ipni.org/ webcite
  • [31]Encyclopedia of life. http://www.eol.org webcite
  • [32]Integrated Taxonomic Information System (ITIS). http://www.itis.gov/customdownload.html webcite
  • [33]Catalogue of life. http://www.catalogueoflife.org webcite
  • [34]Gerner M, Nenadic G, Bergman CM: LINNAEUS: a species name identification system for biomedical literature. BMC Bioinforma 2010, 11:85. BioMed Central Full Text
  • [35]Gwinn NE, Rinaldo C: The biodiversity heritage library: sharing biodiversity literature with the world. IFLA J 2009, 35:25-34.
  • [36]ZooBank. http://zoobank.org:80/Default.aspx webcite
  • [37]Chave J, Muller-Landau HC, Baker TR, et al.: Regional and phylogenetic variation of wood density across 2456 Neotropical tree species. Ecol Appl: Publ Ecol Soc Am 2006, 16:2356-67.
  • [38]Weiser MD, Enquist BJ, Boyle B, et al.: Latitudinal patterns of range size and species richness of New World woody plants. Glob Ecol Biogeogr 2007, 16:679-688.
  • [39]Franz NM, Peet RK: Perspectives: towards a language for mapping relationships among taxonomic concepts. Syst Biodivers 2009, 7:5-20.
  • [40]Goff SA, Vaughn M, McKay S, et al.: The iPlant collaborative: cyberinfrastructure for plant biology. Front Plant Sci 2011, 2:1-16.
  • [41]The iPlant tree of life project. http://www.iplantcollaborative.org/grand-challenges/about-grand-challenges/current-challenges/iptol webcite
  • [42]The botanical information and ecology network. http://bien.nceas.ucsb.edu/bien/ webcite
  • [43]History of the OSI. http://www.opensource.org/history webcite
  • [44]Global compositae checklist. http://compositae.landcareresearch.co.nz/ webcite
  • [45]USDA Plants. http://plants.usda.gov/java/ webcite
  • [46]NCBI Taxonomy. http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/ webcite
  • [47]Dark taxa: GenBank in a post-taxonomic world. http://iphylo.blogspot.com/2011/04/dark-taxa-genbank-in-post-taxonomic.html webcite
  • [48]Karthick B, Williams D: The international code for nomenclature for algae, fungi and plants – a significant rewrite of the international code of botanical nomenclature. Curr Sci (Bangalore) 2012, 102:551-552.
  • [49]Celko J: Joe Celko’s SQL for smarties: trees and hierarchies. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA; 2004.
  • [50]Haston E, Richardson JE, Stevens PF, et al.: The linear angiosperm phylogeny group ( LAPG ) III: a linear sequence of the families in APG III. Bot J Linn Soc 2009, 161:128-131.
  • [51]Simple darwin core. http://rs.tdwg.org/dwc/terms/simple/index.htm webcite
  • [52]Stearn WT: Botanical latin. Timber Press, Portland, Oregon; 2004:560.
  • [53]GNI name parser. https://github.com/GlobalNamesArchitecture/biodiversity webcite
  • [54]Ford B: Parsing expression grammars. ACM SIGPLAN Not 2004, 39:111-122.
  • [55]Taxamatch Web service. http://www.silverbiology.com/products/taxamatch/ webcite
  • [56]TAXAMATCH - fuzzy matching algorithm for genus and species scientific names. http://www.cmar.csiro.au/datacentre/taxamatch.htm webcite
  • [57]Odell M, Russell R: The soundex coding system. 1918.
  • [58]Gadd TN: PHONIX: The algorithm. Program: Electron Libr Inform Syst 1990, 24:363-366.
  • [59]Fuller M, Zobel J: Conflation-based comparison of stemming algorithms. In Proceedings of the third Australian document computing symposium, Sydney, Australia, August 21, 1998. Edited by Kay J, Milosavlje M. University of Sydney, Sydney; 1998:8-13.
  • [60]Damerau FJ: A technique for computer detection and correction of spelling errors. Commun ACM 1964, 7:171-176.
  • [61]Levenshtein VI: Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 1966, 10:707-710.
  • [62]Owolabi O, McGregor DR: Fast approximate string matching. Software Pract Ex 1988, 18:387-393.
  • [63]Brummitt RK, Powell CE: Authors of plant names. Kew: Royal Botanical Gardens, London, U.K; 1992:732.
  • [64]Interim register of marine and nonmarine genera. http://www.cmar.csiro.au/datacentre/irmng/ webcite
  • [65]Farrell J, Nezlek GS: Rich internet applications the next stage of application development. In 29th International Conference on Information Technology Interfaces, ITI 2007, June 25-28. Institute of Electrical and Electronics Engineers (IEEE), Cavtat, Croatia; 2007:413-418.
  • [66]Tropicos Web services. http://services.tropicos.org/ webcite
  • [67]Tropicos name matching. http://www.tropicos.org/NameMatching.aspx webcite
  • [68]GRIN taxonomic nomenclature checker. http://pgrdoc.bioversity.cgiar.org/taxcheck/grin/index.html webcite
  • [69]Carvalho GH, Cianciaruso MV, Batalha MA: Plantminer: A web tool for checking and gathering plant species taxonomic information. Environ Model Software 2010, 25:815-816.
  • [70]Global names resolver. http://resolver.globalnames.org/ webcite
  • [71]The plant list: sources. http://www.theplantlist.org/about/#collaborators webcite
  • [72]NCBI taxonomy ftp site. ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip webcite
  • [73]Franz N, Peet R, Weakley A: On the use of taxonomic concepts in support of biodiversity research and taxonomy. In The New taxonomy, systematics association special volume series 74. Edited by Wheeler oca Raton QD. Taylor & Francis, FL; 2008:61-84.
  • [74]Open source initiative. http://www.opensource.org/licenses/index.html webcite
  文献评价指标  
  下载次数:28次 浏览次数:12次