期刊论文详细信息
BMC Bioinformatics
Integration of open access literature into the RCSB Protein Data Bank using BioLit
Andreas Prlić1  Marco A Martinez2  Dimitris Dimitropoulos1  Bojan Beran1  Benjamin T Yukich1  Peter W Rose1  Philip E Bourne2  J Lynn Fink2 
[1] San Diego Supercomputer Center, University of California San Diego, 9500 Gilman Drive, Mailcode 0505 La Jolla, CA 92093-0505 USA
[2] Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, Mailcode 0743, La Jolla, CA, 92093-0743 USA
Others  :  1166032
DOI  :  10.1186/1471-2105-11-220
 received in 2009-08-13, accepted in 2010-04-29,  发布年份 2010
PDF
【 摘 要 】

Background

Biological data have traditionally been stored and made publicly available through a variety of on-line databases, whereas biological knowledge has traditionally been found in the printed literature. With journals now on-line and providing an increasing amount of open access content, often free of copyright restriction, this distinction between database and literature is blurring. To exploit this opportunity we present the integration of open access literature with the RCSB Protein Data Bank (PDB).

Results

BioLit provides an enhanced view of articles with markup of semantic data and links to biological databases, based on the content of the article. For example, words matching to existing biological ontologies are highlighted and database identifiers are linked to their database of origin. Among other functions, it identifies PDB IDs that are mentioned in the open access literature, by parsing the full text for all research articles in PubMed Central (PMC) and exposing the results as simple XML Web Services. Here, we integrate BioLit results with the RCSB PDB website by using these services to find PDB IDs that are mentioned in research articles and subsequently retrieving abstract, figures, and text excerpts for those articles. A new RCSB PDB literature view permits browsing through the figures and abstracts of the articles that mention a given structure. The BioLit Web Services that are providing the underlying data are publicly accessible. A client library is provided that supports querying these services (Java).

Conclusions

The integration between literature and websites, as demonstrated here with the RCSB PDB, provides a broader view for how a given structure has been analyzed and used. This approach detects the mention of a PDB structure even if it is not formally cited in the paper. Other structures related through the same literature references can also be identified, possibly providing new scientific insight. To our knowledge this is the first time that database and literature have been integrated in this way and it speaks to the opportunities afforded by open and free access to both database and literature content.

【 授权许可】

   
2010 Prlić et al; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150416035938130.pdf 1317KB PDF download
Figure 1. 173KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Bourne PE: Will a biological database be different from a biological journal? PLoS Computational Biology 2005, 1(3):179-181.
  • [2]Bourne PE, Fink JL, Gerstein M: Open access: taking full advantage of the content. PLoS Computational Biology 2008, 4(3):e1000037.
  • [3]Fink L, Bourne P: Reinventing Scholarly Communication for the Electronic Age. CTWatch Quarterly 2007., 3(3)
  • [4]Bourne PE, McEntyre J: Biocurators: contributors to the world of science. PLoS Computational Biology 2006, 2(10):e142.
  • [5]Fink J, Kushch S, Williams P, Bourne P: BioLit: Integrating Biological Literature with Databases. Nucleic Acids Research 2008, 36(11):W385-9.
  • [6]Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Research 2000, 28:235-242.
  • [7]Doms A, Schroeder M: GoPubMed: exploring PubMed with the Gene Ontology. Nucleic acids research 2005, (33 Web Server):W783-786.
  • [8]Vanteru BC, Shaik JS, Yeasin M: Semantically linking and browsing PubMed abstracts with gene ontology. BMC Genomics 2008, 9(Suppl 1):S10. BioMed Central Full Text
  • [9]Muller HM, Kenny EE, Sternberg PW: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biology 2004, 2(11):e309.
  • [10]Chen D, Muller HM, Sternberg PW: Automatic document classification of biological literature. BMC Bioinformatics 2006, 7:370. BioMed Central Full Text
  • [11]Laskowski R: Enhancing the functional annotation of PDB structures in PDBsum using key figures extracted from the literature. Bioinformatics 2007., 15;23(14)
  • [12]Ceol A, Chatr-Aryamontri A, Licata L, Cesareni G: The FEBS Letters experiment. FEBS Letters 582(8):1171-1177.
  • [13]Hoffmann R, Valencia A: A gene network for navigating the literature. Nat Genet 2004, 36(7):664.
  • [14]Mitchell JA, Aronson AR, Mork JG, Folk LC, Humphrey SM, Ward JM: Gene indexing: characterization and analysis of NLM's GeneRIFs. AMIA Annu Symp Proc 2003, 460-4.
  • [15]Monier A, Claverie JM, Ogata H: Taxonomic distribution of large DNA viruses in the sea. Genome Biol 2008, 9(7):R106. BioMed Central Full Text
  • [16]Das D, Kozbial P, et al.: Crystal structure of uncharacterized protein (JCVI_PEP_1096686650277) from an environmental metagenome (unidentified marine microbe, Sorcerer II Global Ocean Sampling experiment) at 2.60 A resolution. Proteins 2009, 75:296-307.
  文献评价指标  
  下载次数:1次 浏览次数:0次