期刊论文详细信息
BMC Bioinformatics
S3QL: A distributed domain specific language for controlled semantic integration of life sciences data
Helena F Deus8  Miriã C Correa1  Romesh Stanislaus2  Maria Miragaia4  Wolfgang Maass3  Hermínia de Lencastre6  Ronan Fox7  Jonas S Almeida5 
[1] Laboratório Nacional de Computação Ciêntifica, Av. Getúlio Vargas, 333,Quitandinha, 25651-075 Petrópolis, Brasil
[2] Sanofi Pasteur, 38 Sidney Street, Cambridge, MA 02139, USA
[3] Research Center for Intelligent Media, Furtwangen University, Furtwangen, Germany
[4] Laboratory of Molecular Genetics, Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Av. da República, Estação Agronómica Nacional, 2780-157 Oeiras, Portugal
[5] Division of Informatics, Department of Pathology, University of Alabama at Birmingham, 619 South 19th Street, Birmingham, Alamaba, USA
[6] Laboratory of Microbiology, The Rockefeller University, 10021 New York, USA
[7] Digital Enterprise Research Institute, National University of Ireland at Galway, IDA Business Park, Lower Dangan, Galway, Ireland
[8] Biomathematics, Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Av. da República, Estação Agronómica Nacional, 2780-157 Oeiras, Portugal
关键词: knowledge organization system, policy;    SPARQL;    RDF;    KOS;    Linked Data;    S3DB;   
Others  :  1127995
DOI  :  10.1186/1471-2105-12-285
 received in 2011-04-14, accepted in 2011-07-14,  发布年份 2011
PDF
【 摘 要 】

Background

The value and usefulness of data increases when it is explicitly interlinked with related data. This is the core principle of Linked Data. For life sciences researchers, harnessing the power of Linked Data to improve biological discovery is still challenged by a need to keep pace with rapidly evolving domains and requirements for collaboration and control as well as with the reference semantic web ontologies and standards. Knowledge organization systems (KOSs) can provide an abstraction for publishing biological discoveries as Linked Data without complicating transactions with contextual minutia such as provenance and access control.

We have previously described the Simple Sloppy Semantic Database (S3DB) as an efficient model for creating knowledge organization systems using Linked Data best practices with explicit distinction between domain and instantiation and support for a permission control mechanism that automatically migrates between the two. In this report we present a domain specific language, the S3DB query language (S3QL), to operate on its underlying core model and facilitate management of Linked Data.

Results

Reflecting the data driven nature of our approach, S3QL has been implemented as an application programming interface for S3DB systems hosting biomedical data, and its syntax was subsequently generalized beyond the S3DB core model. This achievement is illustrated with the assembly of an S3QL query to manage entities from the Simple Knowledge Organization System. The illustrative use cases include gastrointestinal clinical trials, genomic characterization of cancer by The Cancer Genome Atlas (TCGA) and molecular epidemiology of infectious diseases.

Conclusions

S3QL was found to provide a convenient mechanism to represent context for interoperation between public and private datasets hosted at biomedical research institutions and linked data formalisms.

【 授权许可】

   
2011 Deuset al; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150222052828484.pdf 1952KB PDF download
Figure 7. 48KB Image download
Figure 6. 50KB Image download
Figure 5. 36KB Image download
Figure 4. 98KB Image download
Figure 3. 59KB Image download
Figure 2. 50KB Image download
Figure 1. 51KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

【 参考文献 】
  • [1]Bell G, Hey T, Szalay A: Computer science. Beyond the data deluge. Science (New York, NY) 2009, 323:1297-8.
  • [2]Chiang AP, Butte AJ: Data-driven methods to discover molecular determinants of serious adverse drug events. Clinical pharmacology and therapeutics 2009, 85:259-68.
  • [3]The end of theory: the data deluge makes the scientific method obsolete [http://www.wired.com/science/discoveries/magazine/16-07/pb_theory] webcite
  • [4]Hubbard T: The Ensembl genome database project. Nucleic Acids Research 2002, 30:38-41.
  • [5]Karolchik D: The UCSC Genome Browser Database. Nucleic Acids Research 2003, 31:51-54.
  • [6]Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic acids research 2005, 33:D54-8.
  • [7]Ashburner M, Ball CA, Blake JA, et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics 2000, 25:25-9.
  • [8]Bizer C, Heath T, Berners-Lee T: Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems (IJSWIS) 2009.
  • [9]Linked Data Linked Data - Connect Distributed Data across the Web [http://linkeddata.org/] webcite
  • [10]Linked data - Design issues [http://www.w3.org/DesignIssues/LinkedData.html] webcite
  • [11]Vandervalk BP, McCarthy EL, Wilkinson MD: Moby and Moby 2: creatures of the deep (web). Briefings in bioinformatics 2009, 10:114-28.
  • [12]Where the semantic web stumbled, linked data will succeed - O'Reilly Radar [http://radar.oreilly.com/2010/11/semantic-web-linked-data.html] webcite
  • [13]Berners-Lee T, Weitzner DJ, Hall W, et al.: A Framework for Web Science. Foundations and Trends® in Web Science 2006, 1:1-130.
  • [14]Hendler J, Berners-Lee T: From the Semantic Web to social machines: A research challenge for AI on the World Wide Web. Artificial Intelligence 2010, 174:156-161.
  • [15]Almeida JS, Chen C, Gorlitsky R, et al.: Data integration gets "Sloppy". Nature biotechnology 2006, 24:1070-1.
  • [16]Deus HF, Stanislaus R, Veiga DF, et al.: A Semantic Web management model for integrative biomedical informatics. PloS one 2008, 3:e2946.
  • [17]Putting the Web back in Semantic Web [http://www.w3.org/2005/Talks/1110-iswc-tbl/#(1)] webcite
  • [18]SPARQL Query Language for RDF [http://www.w3.org/TR/rdf-sparql-query] webcite
  • [19]Alexander K, Cyganiak R, Hausenblas M, Zhao J: Describing Linked Datasets On the Design and Usage of voiD, the " Vocabulary Of Interlinked Datasets". Linked Data on the Web Workshop (LDOW 09), in conjunction with 18th International World Wide Web Conference (WWW 09) 2009.
  • [20]Cheung KH, Frost HR, Marshall MS, et al.: A journey to Semantic Web query federation in the life sciences. BMC bioinformatics 2009, 10(Suppl 1):S10. BioMed Central Full Text
  • [21]A Prototype Knowledge Base for the Life Sciences [http://www.w3.org/TR/hcls-kb/] webcite
  • [22]Belleau F, Nolin M-A, Tourigny N, Rigault P, Morissette J: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. Journal of biomedical informatics 2008, 41:706-16.
  • [23]Smith B, Ashburner M, Rosse C, et al.: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature biotechnology 2007, 25:1251-5.
  • [24]Taylor CF, Field D, Sansone SA, et al.: Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nature biotechnology 2008, 26:889-96.
  • [25]Noy NF, Shah NH, Whetzel PL, et al.: BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic acids research 2009, 37:W170-3.
  • [26]Deus HF, Prud E, Zhao J, Marshall MS, Samwald M: Provenance of Microarray Experiments for a Better Understanding of Experiment Results. ISWC 2010 SWPM 2010.
  • [27]Stein LD: Integrating biological databases. Nature reviews Genetics 2003, 4:337-45.
  • [28]Goble C, Stevens R: State of the nation in data integration for bioinformatics. Journal of Biomedical Informatics 2008, 41:687-693.
  • [29]Ludäscher B, Altintas I, Bowers S, et al.: Scientific Process Automation and Workflow Management. In Scientific Data Management. Edited by Shoshani A, Rotem D. Chapman & Hall; 2009.
  • [30]Nelson B: Data sharing: Empty archives. Nature 2009, 461:160-3.
  • [31]Stanislaus R, Chen C, Franklin J, Arthur J, Almeida JS: AGML Central: web based gel proteomic infrastructure. Bioinformatics (Oxford, England) 2005, 21:1754-7.
  • [32]Silva S, Gouveia-Oliveira R, Maretzek A, et al.: EURISWEB--Web-based epidemiological surveillance of antibiotic-resistant pneumococci in day care centers. BMC medical informatics and decision making 2003, 3:9. BioMed Central Full Text
  • [33]Describing Linked Datasets with the VoiD Vocabulary [http://www.w3.org/TR/2011/NOTE-void-20110303/] webcite
  • [34]HIPAA Administrative Simplification Statute and Rules [http://www.hhs.gov/ocr/privacy/hipaa/administrative/index.html] webcite
  • [35]Socially Aware Cloud Storage [http://www.w3.org/DesignIssues/CloudStorage.html] webcite
  • [36]Koslow SH: Opinion: Sharing primary data: a threat or asset to discovery? Nature reviews Neuroscience 2002, 3:311-3.
  • [37]Baggerly KA, Coombes KR: Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. The Annals of Applied Statistics 2009, 3:1309-1334.
  • [38]Hodge G: Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files. 2000.
  • [39]SKOS Simple Knowledge Organization System Reference [http://www.w3.org/TR/skos-reference/] webcite
  • [40]Almeida JS, Deus HF, Maass W: S3DB core: a framework for RDF generation and management in bioinformatics infrastructures. BMC bioinformatics 2010, 11:387. BioMed Central Full Text
  • [41]Deus HF, Veiga DF, Freire PR, et al.: Exposing The Cancer Genome Atlas as a SPARQL endpoint. Journal of Biomedical Informatics 2010, 43:998-1008.
  • [42]Correa MC, Deus HF, Vasconcelos AT, et al.: AGUIA: autonomous graphical user interface assembly for clinical trials semantic data services. BMC medical informatics and decision making 2010., 10
  • [43]Freire P, Vilela M, Deus H, et al.: Exploratory analysis of the copy number alterations in glioblastoma multiforme. PloS one 2008, 3:e4076.
  • [44]NCBO Ontology Widgets [http://www.bioontology.org/wiki/index.php/NCBO_Widgets] webcite
  • [45]Bussler C: Is Semantic Web Technology Taking the Wrong Turn? Ieee Internet Computing 2008, 12:75-79.
  • [46]What people find hard about linked data [http://dynamicorange.com/2010/11/15/what-people-find-hard-about-linked-data/] webcite
  • [47]Raja A, Lakshmanan D: Domain Specific Languages. International Journal of Computer Applications 2010, 1:99-105.
  • [48]SPARQL Update [http://www.w3.org/TR/sparql11-update/] webcite
  • [49]Carroll JJ, Bizer C, Hayes P, Stickler P: Named graphs, provenance and trust. Proceedings of the 14th international conference on World Wide Web WWW 05 2005, 14:613.
  • [50]S3DB operator function states [http://code.google.com/p/s3db-operator/] webcite
  • [51]S3DB Operators [http://s3db-operator.googlecode.com/hg/propagation.html] webcite
  • [52]Deus HF, Sousa MA de, Carrico JA, Lencastre H de, Almeida JS: Adapting experimental ontologies for molecular epidemiology. AMIA Annual Symposium proceedings 2007, 935.
  • [53]The OAuth 1.0 Protocol [http://tools.ietf.org/html/rfc5849] webcite
  • [54]Francisco AP, Bugalho M, Ramirez M, Carriço JA: Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach. BMC bioinformatics 2009, 10:152. BioMed Central Full Text
  • [55]S3QL serialization engine [http://js.s3db.googlecode.com/hg/translate/quickTranslate.html] webcite
  • [56]Ippolito G, Leone S, Lauria FN, Nicastri E, Wenzel RP: Methicillin-resistant Staphylococcus aureus: the superbug. International journal of infectious diseases 2010, 14:S7-S11.
  • [57]Harris SR, Feil EJ, Holden MTG, et al.: Evolution of MRSA during hospital transmission and intercontinental spread. Science (New York, NY) 2010, 327:469-74.
  • [58]Linked Data API [http://code.google.com/p/linked-data-api/] webcite
  文献评价指标  
  下载次数:81次 浏览次数:28次