期刊论文详细信息
International Journal of Health Geographics
Testing for clustering at many ranges inflates family-wise error rate (FWE)
Leslie A McClure1  Matthew Shane Loop1 
[1] Department of Biostatistics, University of Alabama at Birmingham, 1665 University Boulevard, RPHB 327, 35294 Birmingham, Alabama, USA
关键词: Multiple testing;    Family wise error rate (FWE);    Point process;    Overall clustering;    Ripley’s K function;   
Others  :  1133639
DOI  :  10.1186/1476-072X-14-4
 received in 2014-11-03, accepted in 2015-01-05,  发布年份 2015
【 摘 要 】

Background

Testing for clustering at multiple ranges within a single dataset is a common practice in spatial epidemiology. It is not documented whether this approach has an impact on the type 1 error rate.

Methods

We estimated the family-wise error rate (FWE) for the difference in Ripley’s K functions test, when testing at an increasing number of ranges at an alpha-level of 0.05. Case and control locations were generated from a Cox process on a square area the size of the continental US (≈3,000,000 mi2). Two thousand Monte Carlo replicates were used to estimate the FWE with 95% confidence intervals when testing for clustering at one range, as well as 10, 50, and 100 equidistant ranges.

Results

The estimated FWE and 95% confidence intervals when testing 10, 50, and 100 ranges were 0.22 (0.20 - 0.24), 0.34 (0.31 - 0.36), and 0.36 (0.34 - 0.38), respectively.

Conclusions

Testing for clustering at multiple ranges within a single dataset inflated the FWE above the nominal level of 0.05. Investigators should construct simultaneous critical envelopes (available in spatstat package in R), or use a test statistic that integrates the test statistics from each range, as suggested by the creators of the difference in Ripley’s K functions test.

【 授权许可】

   
2015 Loop and McClure; licensee BioMed Central.

附件列表
Files Size Format View
Figure 2. 67KB Image download
Figure 1. 67KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Howard VJ, Cushman M, Pulley L, Gomez CR, Go RC, Prineas RJ, et al.: The reasons for geographic and racial differences in stroke studyobjectives and design. Neuroepidemiology 2005, 25(3):135-43.
  • [2]Ripley BD: The second-order analysis of stationary point processes. J Appl Probability 1976, 13(2):255-66.
  • [3]Diggle PJ, Chetwynd AG: Second-order analysis of spatial clustering for inhomogeneous populations. Biometrics 1991, 47(3):1155-63.
  • [4]Auchincloss AH, Gebreab SY, Mair C, Roux AVD: A review of spatial methods in epidemiology, 2000–2010. Annu Rev Public Health 2012, 33:107.
  • [5]McNally RJ, James PW, Ducker S, Norman PD, James OF: No rise in incidence but geographical heterogeneity in the occurrence of primary biliary cirrhosis in north east england. Am J Epidemiol 2014, 179(4):492-98.
  • [6]Mill AC, Rushton SP, Shirley MD, Smith GC, Mason P, Brown MA, et al.: Clustering, persistence and control of a pollinator brood disease: epidemiology of american foulbrood. Environ Microbiol 2013, 1-11. doi:10.1111/1462-2920.12292
  • [7]Rosendal T, Dewey C, Friendship R, Wootton S, Young B, Poljak Z: Spatial and temporal patterns of porcine reproductive and respiratory syndrome virus (prrsv) genotypes in ontario, canada, 2004-2007. BMC Vet Res 2014, 10(1):83. BioMed Central Full Text
  • [8]Loosmore N, Ford E: Statistical inference using the G or K point pattern spatial statistics. Ecology 2006, 87(8):1925-31.
  • [9]Kulldorff M: Tests of spatial randomness adjusted for an inhomogeneity: a general framework. J Am Stat Assoc 2006, 101(475):1289-305. doi:10.1198/016214506000000618
  • [10]R Core Team: R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2013. http://www.R-project.org/ webcite
  • [11]Rowlingson B, Diggle P: Splancs: Spatial and space-time point pattern analysis. 2013. R package version 2.01-34. http://CRAN.R-project.org/package=splancs webcite
  • [12]Tango T: A test for spatial disease clustering adjusted for multiple testing. Stat Med 2000, 204:4-6.
  • [13]Song C, Kulldorff M: Likelihood based tests for spatial randomness. Stat Med 2006, 25(5):825-39. doi:10.1002/sim.2430
  • [14]Baddeley A, Turner R: Spatstat: an r package for analyzing spatial point patterns. J Stat Softw 2005, 12(6):1-42.
  • [15]Marcon E, Puech F: Evaluating the geographic concentration of industries using distance-based methods. J Econ Geography 2003, 3(4):409-28.
  • [16]Kroeger A, Lenhart A, Ochoa M, Villegas E, Levy M, Alexander N, et al.: Effective control of dengue vectors with curtains and water container covers treated with insecticide in mexico and venezuela: cluster randomised trials. BMJ 2006, 332(7552):1247-52.
  • [17]Dolk H, Busby A, Armstrong B, Walls P: Geographical variation in anophthalmia and microphthalmia in England, 1988-94. BMJ 1998, 317(7163):905-10.
  • [18]Big Data: Seizing Opportunities, Preserving Values. http://www.whitehouse.gov/sites/default/files/docs/big\_data\_privacy\_report\_may\_1\_2014.pdf webcite
  • [19]Cox DR: Some statistical methods connected with series of events. J R Stat Soc Series B (Methodological) 1955, 17(2):129-164.
  • [20]Matsumoto M, Nishimura T: Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comput Simulation (TOMACS) 1998, 8(1):3-30.
  • [21]Bivand RS, Pebesma E, Gómez-Rubio V: Applied spatial data analysis with R. New York: Springer; 2013.
  • [22]Ripley BD: Statistical inference for spatial processes. Cambridge: Cambridge University Press; 1991.
  • [23]Hope AC: A simplified monte carlo significance test procedure. J R Stat Soc Series B (Methodological) 1968, 30(3):582-598.
  • [24]Diggle PJ: Statistical analysis of spatial and spatio-temporal point patterns. Boca Raton: CRC Press; 2014.
  • [25]Cuzick J, Edwards R: Spatial clustering for inhomogeneous populations. J R Stat Soc Series B (Methodological) 1990, 52(1):73-104.
  • [26]Wickham H: Ggplot2: Elegant graphics for data analysis. New York: Springer; 2009. http://had.co.nz/ggplot2/book webcite
  • [27]Wickham H, Francois R: Dplyr: Dplyr: a grammar of data manipulation. 2014. R package version 0.2. http://CRAN.R-project.org/package=dplyr webcite
  • [28]Wickham H: The split-apply-combine strategy for data analysis. J Stat Softw 2011, 40(1):1-29.
  文献评价指标  
  下载次数:26次 浏览次数:44次