Source Code for Biology and Medicine | |
Git can facilitate greater reproducibility and increased transparency in science | |
Karthik Ram1  | |
[1] Environmental Science, Policy, and Management, University of California, Berkeley, Berkeley, CA 94720, USA | |
关键词: Open science; Version control; Reproducible research; | |
Others : 805797 DOI : 10.1186/1751-0473-8-7 |
|
received in 2013-01-25, accepted in 2013-02-06, 发布年份 2013 | |
【 摘 要 】
Background
Reproducibility is the hallmark of good science. Maintaining a high degree of transparency in scientific reporting is essential not just for gaining trust and credibility within the scientific community but also for facilitating the development of new ideas. Sharing data and computer code associated with publications is becoming increasingly common, motivated partly in response to data deposition requirements from journals and mandates from funders. Despite this increase in transparency, it is still difficult to reproduce or build upon the findings of most scientific publications without access to a more complete workflow.
Findings
Version control systems (VCS), which have long been used to maintain code repositories in the software industry, are now finding new applications in science. One such open source VCS, Git, provides a lightweight yet robust framework that is ideal for managing the full suite of research outputs such as datasets, statistical code, figures, lab notes, and manuscripts. For individual researchers, Git provides a powerful way to track and compare versions, retrace errors, explore new approaches in a structured manner, while maintaining a full audit trail. For larger collaborative efforts, Git and Git hosting services make it possible for everyone to work asynchronously and merge their contributions at any time, all the while maintaining a complete authorship trail. In this paper I provide an overview of Git along with use-cases that highlight how this tool can be leveraged to make science more reproducible and transparent, foster new collaborations, and support novel uses.
【 授权许可】
2013 Ram; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140708083331682.pdf | 913KB | download | |
Figure 3. | 25KB | Image | download |
Figure 2. | 33KB | Image | download |
Figure 1. | 29KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
【 参考文献 】
- [1]Vink CJ, Paquin P, Cruickshank RH: Taxonomy and irreproducible biological science. BioScience 2012, 62:451-452. Available: [http://www.bioone.org/doi/abs/10.1525/bio.2012.62.5.3 webcite]
- [2]Peng RD: Reproducible research in computational science. Science 2011, 334:1226-1227. Available: [http://www.sciencemag.org/cgi/doi/10.1126/science.1213847 webcite]
- [3]Begley CG, Ellis LM: Drug development: Raise standards for preclinical cancer research. Nature 2012, 483:531-533. Available: [http://dx.doi.org/10.1038/483531a webcite]
- [4]Schwab M, Karrenbach M, Claerbout J: Making scientific computations reproducible. Comput Sci Eng 2000, 2:61-67. Available: [http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=881708 webcite]
- [5]Ince DC, Hatton L, Graham-Cumming J: The case for open computer programs. Nature 2012, 482:485-488. Available: [http://dx.doi.org/10.1038/nature10836 webcite]
- [6]Van Noorden R: The trouble with retractions. Nature 2011, 478(7367):6-8.
- [7]Whitlock MC, McPeek MA, Rausher MD, Rieseberg L, Moore AJ: Data archiving. Am Nat 2010, 175:145-146. Available: [http://www.jstor.org/stable/10.1086/650340 webcite]
- [8]Vines TH, Andrew RL, Bock DG, Franklin MT, Gilbert KJ: Mandateddata archiving greatly improves access to research data. FASEB J 2013. [http://dx.doi.org/10.1096/fj.12-218164 webcite ]
- [9]Wolkovich EM, Regetz J, O’Connor MI: Advances in global change research require open science by individual researchers. Glob Change Biol 2012, 18:2102-2110. Available: [http://apps.webofknowledge.com/full_record.do?product=UA&search_mode=GeneralSearch&qid=1&SID= webcite 1CfaPnJ9gbl5bo171Jc&page=1&doc=4]
- [10]Neylon C: Open access must enable open use. Nature 2012, 492:8-9.
- [11]Wald C: Issues & Perspectives Scientists Embrace openness. 2010. Available: [http://sciencecareers.sciencemag.org/career_magazine/previous_issues/articles/2010_04_09/caredit.a1000036 webcite] Accessed 16 Jan 2013.
- [12]Greenland P, Fontanarosa PB: Ending honorary authorship. Science (New York, NY) 2012, 337:1019. Available: [http://www.sciencemag.org/content/337/6098/1019.short webcite]
- [13]Desjardins-Proulx P, White EP, Adamson JJ, Ram K, Poisot T, Gravel D: The case for open preprints in biology. PLoS BiolAccepted
- [14]Schultheiss SJ, Münch M-C, Andreeva GD, Rätsch G: Persistence and availability of Web services in computational biology. PloS One 2011, 6:e24914. Available: [http://dx.plos.org/10.1371/journal.pone.0024914 webcite]
- [15]Wren JD: 404 not found: the stability and persistence of URLs published in MEDLINE. Bioinformatics (Oxford, England) 2004, 20:668-72. Available: [http://bioinformatics.oxfordjournals.org/content/20/5/668.abstract webcite]
- [16]Prlić A, Procter JB: Ten simple rules for the open development of scientific software. PLoS Comput Biol 2012, 8:e1002802. Available: [http://dx.plos.org/10.1371/journal.pcbi.1002802 webcite]
- [17]Pearson DP: GitHub sees 3 millionth member account. 2013. Available: [http://www.gamesindustry.biz/articles/2013-01-17-Github-sees-3-millionth-member-account webcite] Accessed 18 Jan 2013
- [18]Finley K: Github Has surpassed sourceforge and Google code in popularity. 2011. Available: [http://readwrite.com/2011/06/02/github-has-passed-sourceforge webcite] Accessed 15 Jan 2013
- [19]The Octoverse in 2012 · GitHub Blog Available: [https://github.com/blog/1359-the-octoverse-in-2012 webcite]. Accessed 01AD–Feb 13AD
- [20]Morin A, Urban J, Sliz P: A quick guide to software licensing for the scientist-programmer. PLoS Comput Biol 2012, 8:e1002598. [http://dx.plos.org/10.1371/journal.pcbi.1002598 webcite]
- [21]Piwowar HA, Day RS, Fridsma B: Sharing detailed research data is associated with increased citation rate. PLOS One 2007, 2(3):e308.
- [22]Piwowar HA: Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PloS One 2011, 6:e18657. Available: [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3135593&tool=pmcentrez&rendertype=abstract webcite]
- [23]Qureshi W, Al-Mallah MH, Alsheikh-Ali, A a: Public availability of published research data in high-impact journals. PloS One 2011, 6:e24357. Available: [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3168487&tool=pmcentrez&rendertype=abstract webcite]
- [24]Niedermeyer THJ, Strohalm M: mMass as a software tool for the annotation of cyclic peptide tandem mass spectra. PloS one 2012, 7:e44913. Available: [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3441486&tool=pmcentrez&rendertype=abstract webcite]
- [25]US NSF - Dear Colleague Letter - Issuance of a new NSF Proposal & Award Policies and Procedures Guide (NSF13004) 2012. Available: [http://www.nsf.gov/pubs/2013/nsf13004/nsf13004.jsp?WT.mc_id=USNSF_109 webcite] Accessed 11 Nov 2012
- [26]Piwowar H, Altmetrics: Value all research products. Nature 2013, 493:159-159. Available: [http://www.nature.com/doifinder/10.1038/493159a webcite]
- [27]Wilson G, Aruliah DA, Brown CT, Hong NPC, Davis M: Best practices for scientific computing. Arxiv6. [http://arxiv.org/abs/1210.0530 webcite]