BMC Bioinformatics | |
Trends in the production of scientific data analysis resources | |
Proceedings | |
Constantin Georgescu1  Jonathan D Wren2  Jason Hennessey3  | |
[1] Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, 73104-5005, Oklahoma City, OK, USA;Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, 73104-5005, Oklahoma City, OK, USA;Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, USA;University of Oklahoma Health Sciences Center, Stephenson Cancer Center, USA;Department of Geriatric Medicine, University of Oklahoma Health Sciences Center, USA;Computer Science Department, Boston University, 111 Cummington Street, 02215, Boston, MA, USA; | |
关键词: Gini Coefficient; Lorenz Curve; Team Size; European Bioinformatics Institute; Uniform Resource Locator; | |
DOI : 10.1186/1471-2105-15-S11-S7 | |
来源: Springer | |
【 摘 要 】
BackgroundAs the amount of scientific data grows, peer-reviewed Scientific Data Analysis Resources (SDARs) such as published software programs, databases and web servers have had a strong impact on the productivity of scientific research. SDARs are typically linked to using an Internet URL, which have been shown to decay in a time-dependent fashion. What is less clear is whether or not SDAR-producing group size or prior experience in SDAR production correlates with SDAR persistence or whether certain institutions or regions account for a disproportionate number of peer-reviewed resources.MethodsWe first quantified the current availability of over 26,000 unique URLs published in MEDLINE abstracts/titles over the past 20 years, then extracted authorship, institutional and ZIP code data. We estimated which URLs were SDARs by using keyword proximity analysis.ResultsWe identified 23,820 non-archival URLs produced between 1996 and 2013, out of which 11,977 were classified as SDARs. Production of SDARs as measured with the Gini coefficient is more widely distributed among institutions (.62) and ZIP codes (.65) than scientific research in general, which tends to be disproportionately clustered within elite institutions (.91) and ZIPs (.96). An estimated one percent of institutions produced 68% of published research whereas the top 1% only accounted for 16% of SDARs. Some labs produced many SDARs (maximum detected = 64), but 74% of SDAR-producing authors have only published one SDAR. Interestingly, decayed SDARs have significantly fewer average authors (4.33 +/- 3.06), than available SDARs (4.88 +/- 3.59) (p < 8.32 × 10-4). Approximately 3.4% of URLs, as published, contain errors in their entry/format, including DOIs and links to clinical trials registry numbers.ConclusionSDAR production is less dependent upon institutional location and resources, and SDAR online persistence does not seem to be a function of infrastructure or expertise. Yet, SDAR team size correlates positively with SDAR accessibility, suggesting a possible sociological factor involved. While a detectable URL entry error rate of 3.4% is relatively low, it raises the question of whether or not this is a general error rate that extends to additional published entities.
【 授权许可】
Unknown
© Hennessey et al.; licensee BioMed Central Ltd. 2014. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311098559096ZK.pdf | 1345KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]