International Journal of Health Geographics | |
Testing for clustering at many ranges inflates family-wise error rate (FWE) | |
Leslie A McClure1  Matthew Shane Loop1  | |
[1] Department of Biostatistics, University of Alabama at Birmingham, 1665 University Boulevard, RPHB 327, 35294 Birmingham, Alabama, USA | |
关键词: Multiple testing; Family wise error rate (FWE); Point process; Overall clustering; Ripley’s K function; | |
Others : 1133639 DOI : 10.1186/1476-072X-14-4 |
|
received in 2014-11-03, accepted in 2015-01-05, 发布年份 2015 |
【 摘 要 】
Background
Testing for clustering at multiple ranges within a single dataset is a common practice in spatial epidemiology. It is not documented whether this approach has an impact on the type 1 error rate.
Methods
We estimated the family-wise error rate (FWE) for the difference in Ripley’s K functions test, when testing at an increasing number of ranges at an alpha-level of 0.05. Case and control locations were generated from a Cox process on a square area the size of the continental US (≈3,000,000 mi2). Two thousand Monte Carlo replicates were used to estimate the FWE with 95% confidence intervals when testing for clustering at one range, as well as 10, 50, and 100 equidistant ranges.
Results
The estimated FWE and 95% confidence intervals when testing 10, 50, and 100 ranges were 0.22 (0.20 - 0.24), 0.34 (0.31 - 0.36), and 0.36 (0.34 - 0.38), respectively.
Conclusions
Testing for clustering at multiple ranges within a single dataset inflated the FWE above the nominal level of 0.05. Investigators should construct simultaneous critical envelopes (available in spatstat package in R), or use a test statistic that integrates the test statistics from each range, as suggested by the creators of the difference in Ripley’s K functions test.
【 授权许可】
2015 Loop and McClure; licensee BioMed Central.
Files | Size | Format | View |
---|---|---|---|
Figure 2. | 67KB | Image | download |
Figure 1. | 67KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
【 参考文献 】
- [1]Howard VJ, Cushman M, Pulley L, Gomez CR, Go RC, Prineas RJ, et al.: The reasons for geographic and racial differences in stroke studyobjectives and design. Neuroepidemiology 2005, 25(3):135-43.
- [2]Ripley BD: The second-order analysis of stationary point processes. J Appl Probability 1976, 13(2):255-66.
- [3]Diggle PJ, Chetwynd AG: Second-order analysis of spatial clustering for inhomogeneous populations. Biometrics 1991, 47(3):1155-63.
- [4]Auchincloss AH, Gebreab SY, Mair C, Roux AVD: A review of spatial methods in epidemiology, 2000–2010. Annu Rev Public Health 2012, 33:107.
- [5]McNally RJ, James PW, Ducker S, Norman PD, James OF: No rise in incidence but geographical heterogeneity in the occurrence of primary biliary cirrhosis in north east england. Am J Epidemiol 2014, 179(4):492-98.
- [6]Mill AC, Rushton SP, Shirley MD, Smith GC, Mason P, Brown MA, et al.: Clustering, persistence and control of a pollinator brood disease: epidemiology of american foulbrood. Environ Microbiol 2013, 1-11. doi:10.1111/1462-2920.12292
- [7]Rosendal T, Dewey C, Friendship R, Wootton S, Young B, Poljak Z: Spatial and temporal patterns of porcine reproductive and respiratory syndrome virus (prrsv) genotypes in ontario, canada, 2004-2007. BMC Vet Res 2014, 10(1):83. BioMed Central Full Text
- [8]Loosmore N, Ford E: Statistical inference using the G or K point pattern spatial statistics. Ecology 2006, 87(8):1925-31.
- [9]Kulldorff M: Tests of spatial randomness adjusted for an inhomogeneity: a general framework. J Am Stat Assoc 2006, 101(475):1289-305. doi:10.1198/016214506000000618
- [10]R Core Team: R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2013. http://www.R-project.org/ webcite
- [11]Rowlingson B, Diggle P: Splancs: Spatial and space-time point pattern analysis. 2013. R package version 2.01-34. http://CRAN.R-project.org/package=splancs webcite
- [12]Tango T: A test for spatial disease clustering adjusted for multiple testing. Stat Med 2000, 204:4-6.
- [13]Song C, Kulldorff M: Likelihood based tests for spatial randomness. Stat Med 2006, 25(5):825-39. doi:10.1002/sim.2430
- [14]Baddeley A, Turner R: Spatstat: an r package for analyzing spatial point patterns. J Stat Softw 2005, 12(6):1-42.
- [15]Marcon E, Puech F: Evaluating the geographic concentration of industries using distance-based methods. J Econ Geography 2003, 3(4):409-28.
- [16]Kroeger A, Lenhart A, Ochoa M, Villegas E, Levy M, Alexander N, et al.: Effective control of dengue vectors with curtains and water container covers treated with insecticide in mexico and venezuela: cluster randomised trials. BMJ 2006, 332(7552):1247-52.
- [17]Dolk H, Busby A, Armstrong B, Walls P: Geographical variation in anophthalmia and microphthalmia in England, 1988-94. BMJ 1998, 317(7163):905-10.
- [18]Big Data: Seizing Opportunities, Preserving Values. http://www.whitehouse.gov/sites/default/files/docs/big\_data\_privacy\_report\_may\_1\_2014.pdf webcite
- [19]Cox DR: Some statistical methods connected with series of events. J R Stat Soc Series B (Methodological) 1955, 17(2):129-164.
- [20]Matsumoto M, Nishimura T: Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comput Simulation (TOMACS) 1998, 8(1):3-30.
- [21]Bivand RS, Pebesma E, Gómez-Rubio V: Applied spatial data analysis with R. New York: Springer; 2013.
- [22]Ripley BD: Statistical inference for spatial processes. Cambridge: Cambridge University Press; 1991.
- [23]Hope AC: A simplified monte carlo significance test procedure. J R Stat Soc Series B (Methodological) 1968, 30(3):582-598.
- [24]Diggle PJ: Statistical analysis of spatial and spatio-temporal point patterns. Boca Raton: CRC Press; 2014.
- [25]Cuzick J, Edwards R: Spatial clustering for inhomogeneous populations. J R Stat Soc Series B (Methodological) 1990, 52(1):73-104.
- [26]Wickham H: Ggplot2: Elegant graphics for data analysis. New York: Springer; 2009. http://had.co.nz/ggplot2/book webcite
- [27]Wickham H, Francois R: Dplyr: Dplyr: a grammar of data manipulation. 2014. R package version 0.2. http://CRAN.R-project.org/package=dplyr webcite
- [28]Wickham H: The split-apply-combine strategy for data analysis. J Stat Softw 2011, 40(1):1-29.