International Journal of Health Geographics | |
An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE) | |
Alain-Jacques Valleron1  David M Baker1  | |
[1] Université Pierre et Marie Curie, Paris, France | |
关键词: Type 1 diabetes; Geographical grid; Software; Environmental factors; Cluster; Computational epidemiology; | |
Others : 1135994 DOI : 10.1186/1476-072X-13-46 |
|
received in 2014-08-12, accepted in 2014-09-24, 发布年份 2014 | |
【 摘 要 】
Background
Examining whether disease cases are clustered in space is an important part of epidemiological research. Another important part of spatial epidemiology is testing whether patients suffering from a disease are more, or less, exposed to environmental factors of interest than adequately defined controls. Both approaches involve determining the number of cases and controls (or population at risk) in specific zones. For cluster searches, this often must be done for millions of different zones. Doing this by calculating distances can lead to very lengthy computations. In this work we discuss the computational advantages of geographical grid-based methods, and introduce an open source software (FGBASE) which we have created for this purpose.
Methods
Geographical grids based on the Lambert Azimuthal Equal Area projection are well suited for spatial epidemiology because they preserve area: each cell of the grid has the same area. We describe how data is projected onto such a grid, as well as grid-based algorithms for spatial epidemiological data-mining. The software program (FGBASE), that we have developed, implements these grid-based methods.
Results
The grid based algorithms perform extremely fast. This is particularly the case for cluster searches. When applied to a cohort of French Type 1 Diabetes (T1D) patients, as an example, the grid based algorithms detected potential clusters in a few seconds on a modern laptop. This compares very favorably to an equivalent cluster search using distance calculations instead of a grid, which took over 4 hours on the same computer. In the case study we discovered 4 potential clusters of T1D cases near the cities of Le Havre, Dunkerque, Toulouse and Nantes. One example of environmental analysis with our software was to study whether a significant association could be found between distance to vineyards with heavy pesticide. None was found. In both examples, the software facilitates the rapid testing of hypotheses.
Conclusions
Grid-based algorithms for mining spatial epidemiological data provide advantages in terms of computational complexity thus improving the speed of computations. We believe that these methods and this software tool (FGBASE) will lower the computational barriers to entry for those performing epidemiological research.
【 授权许可】
2014 Baker and Valleron; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150311093300117.pdf | 1236KB | download | |
Figure 4. | 92KB | Image | download |
Figure 3. | 96KB | Image | download |
Figure 2. | 70KB | Image | download |
Figure 1. | 57KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
【 参考文献 】
- [1]Kulldorff M: A spatial scan statistic. Commun Stat-Theor Methods 1997, 26(6):1481-1496.
- [2]Kulldorff M, Nagarwalla N: Spatial disease clusters: detection and inference. Stat Med 1995, 14(8):799-810.
- [3]Waller LA, Gotway CA: Applied spatial statistics for public health data. Hoboken, New Jersey: John Wiley & Sons; 2004.
- [4]Lawson AB, Denison DG: Spatial cluster modelling. CRC press; 2010.
- [5]Lawson A, Biggeri A, Böhning D, Lesaffre E, Viel J-F, Bertollini R: Disease mapping and risk assessment for public health. John Wiley & Sons; 1999.
- [6]Assuncao R, Costa M, Tavares A, Ferreira S: Fast detection of arbitrarily shaped disease clusters. Stat Med 2006, 25(5):723-742.
- [7]Bithell J: Statistical methods for analysing point-source exposures. In Geographical and Environmental Epidemiology: Methods for Small Area Studies. USA: Oxford University Press; 1992:221-230.
- [8]Bithell JF, Stone RA: On statistical methods for analysing the geographical distribution of cancer cases near nuclear installations. J Epidemiol Community Health 1989, 43(1):79-85.
- [9]Diggle PJ: A point process modelling approach to raised incidence of a rare phenomenon in the vicinity of a prespecified point. J R Stat Soc A Stat Soc 1990, 153:349-362.
- [10]Stone RA: Investigations of excess environmental risks around putative sources: statistical problems and a proposed test. Stat Med 1988, 7(6):649-660.
- [11]Jacquez GM, Greiling DA: International Journal of Health Geographics. Int J Health Geogr 2003, 2:4. BioMed Central Full Text
- [12]Kulldorff M: Information Management Services, Inc. SaTScanTM v8. 0: Software for the spatial and space-time scan statistics. 2009. [ http://www.satscan.org/ webcite]
- [13]Price RC, Pettey W, Freeman T, Keahey K, Leecaster M, Samore M, Tobias J, Facelli JC: SaTScan on a Cloud: On-Demand Large Scale Spatial Analysis of Epidemics. Online J Public Health Inform 2010., 2(1)
- [14]Fleming DM, Schellevis FG, Falcao I, Alonso TV, Padilla ML: The incidence of chickenpox in the community. Lessons for disease surveillance in sentinel practice networks. Eur J Epidemiol 2001, 17(11):1023-1027.
- [15]Annoni A, Bernard L, Lillethun A, Ihde J, Gallego J: Short Proceedings of the 1st European Workshop on Reference Grids. 2004.
- [16]Epstein PR: Climate change and infectious disease: stormy weather ahead? Epidemiology 2002, 13(4):373-375.
- [17]GDAL - Geospatial Data Abstraction Library: Version 1.10.1. Open Source Geospatial Foundation; 2014. [ http://www.gdal.org/ webcite]
- [18]Dobson JE, Bright EA, Coleman PR, Durfee RC, Worley BA: LandScan: a global population database for estimating populations at risk. Photogramm Eng Remote Sens 2000, 66(7):849-857.
- [19]Patterson CC, Dahlquist GG, Gyürüs E, Green A, Soltész G: Incidence trends for childhood type 1 diabetes in Europe during 1989–2003 and predicted new cases 2005–20: a multicentre prospective registration study. Lancet 2009, 373(9680):2027-2033.
- [20]Mohr S, Garland C, Gorham E, Garland F: The association between ultraviolet B irradiance, vitamin D status and incidence rates of type 1 diabetes in 51 regions worldwide. Diabetologia 2008, 51(8):1391-1398.
- [21]Knip M, Virtanen SM, Seppä K, Ilonen J, Savilahti E, Vaarala O, Reunanen A, Teramo K, Hämäläinen A-M, Paronen J: Dietary intervention in infancy and later signs of beta-cell autoimmunity. N Engl J Med 2010, 363(20):1900-1908.
- [22]Karlén J, Faresjö T, Ludvigsson J: Could the social environment trigger the induction of diabetes related autoantibodies in young children? Scand J Public Health 2012, 40(2):177-182.
- [23]Hober D, Alidjinou EK: Enteroviral pathogenesis of type 1 diabetes: queries and answers. Curr Opin Infect Dis 2013, 26(3):263-269.
- [24]Forlenza GP, Rewers M: The epidemic of type 1 diabetes: what is it telling us? Curr Opin Endocrinol Diabetes Obesity 2011, 18(4):248-251.
- [25]Nokoff N, Rewers M: Pathogenesis of type 1 diabetes: lessons from natural history studies of high‒risk individuals. Ann N Y Acad Sci 2013, 1281(1):1-15.
- [26]Association AD: Diagnosis and classification of diabetes mellitus. Diabetes Care 2009, 32(Suppl 1):S62-S67.
- [27]D’Agostino RBSR, Massaro JM, Sullivan LM: Non-inferiority trials: design concepts and issues - the encounters of academic consultants in statistics. Stat Med 2003, 22(2):169-186.
- [28]Kulldorff M: SaTScanTM User Guide. 2006. [ http://www.satscan.org/ webcite]
- [29]Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, De Bakker PI, Daly MJ: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007, 81(3):559-575.
- [30]Hoepman J-H, Jacobs B: Increased security through open source. Commun ACM 2007, 50(1):79-83.
- [31]Steiniger S, Hay GJ: Free and open source geographic information tools for landscape ecology. Ecol Inform 2009, 4(4):183-195.
- [32]Abrams AM, Kleinman KP: A SaTScan™ macro accessory for cartography (SMAC) package implemented with SAS® software. Int J Health Geogr 2007, 6(1):6. BioMed Central Full Text