期刊论文详细信息
BMC Medical Research Methodology
t-tests, non-parametric tests, and large studies—a paradox of statistical practice?
Morten W Fagerland1 
[1] Unit of Biostatistics and Epidemiology, Oslo University Hospital, Oslo, N-0407, Norway
关键词: Statistical practice;    Sample size;    Welch test;    Wilcoxon-Mann-Whitney test;    Non-parametric test;    T-test;   
Others  :  1136637
DOI  :  10.1186/1471-2288-12-78
 received in 2012-01-11, accepted in 2012-06-14,  发布年份 2012
PDF
【 摘 要 】

Background

During the last 30 years, the median sample size of research studies published in high-impact medical journals has increased manyfold, while the use of non-parametric tests has increased at the expense of t-tests. This paper explores this paradoxical practice and illustrates its consequences.

Methods

A simulation study is used to compare the rejection rates of the Wilcoxon-Mann-Whitney (WMW) test and the two-sample t-test for increasing sample size. Samples are drawn from skewed distributions with equal means and medians but with a small difference in spread. A hypothetical case study is used for illustration and motivation.

Results

The WMW test produces, on average, smaller p-values than the t-test. This discrepancy increases with increasing sample size, skewness, and difference in spread. For heavily skewed data, the proportion of p<0.05 with the WMW test can be greater than 90% if the standard deviations differ by 10% and the number of observations is 1000 in each group. The high rejection rates of the WMW test should be interpreted as the power to detect that the probability that a random sample from one of the distributions is less than a random sample from the other distribution is greater than 50%.

Conclusions

Non-parametric tests are most useful for small studies. Using non-parametric tests in large studies may provide answers to the wrong question, thus confusing readers. For studies with a large sample size, t-tests and their corresponding confidence intervals can and should be used even for heavily skewed data.

【 授权许可】

   
2012 Fagerland; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150313082508991.pdf 243KB PDF download
Figure 3. 27KB Image download
Figure 2. 38KB Image download
Figure 1. 20KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

【 参考文献 】
  • [1]Horton NJ, Switzer SS: Statistical methods in the journal. New Engl J Med 2005, 353(18):1977-1979.
  • [2]Emerson JD, Colditz GA: Use of statistical analysis in the New England Journal of Medicine. New Engl J Med 1983, 309(12):709-713.
  • [3]Bland MJ: The tyranny of power: is there a better way to calculate sample size? BMJ 2009, 339:b3985. [10.1136/bmj.b3985]
  • [4]Skovlund E, Fenstad GU: Should we always choose a nonparametric test when comparing two apparently nonnormal distributions? J Clin Epidemiol 2001, 54:86-92.
  • [5]Fagerland MW, Sandvik L: Performance of five two-sample location tests for skewed distributions with unequal variances. Contemp Clin Trials 2009, 30:490-496.
  • [6]Altman DG: Practical Statistics For Medical Research. Boca Raton, FL: Chapman & Hall/CRC; 1991.
  • [7]Altman DG, Machin D, Bryant TN, Gardner MJ (eds): Statistics with Confidence (2nd edn). London: BMJ Books; 2000.
  • [8]Bland M: An Introduction to Medical Statistics (3rd edn). Oxford: Oxford University Press; 2000.
  • [9]Kirkwood BR, Sterne JAC: Essential Medical Statistics (2nd edn). Malden, MA: Blackwell Science, Inc.; 2003.
  • [10]Hart A: Mann-Whitney test is not just a test of medians: differences in spread can be important. BMJ 2001, 323:391-393.
  • [11]Fagerland MW, Sandvik L: The Wilcoxon-Mann-Whitney test under scrutiny. Stat Med 2009, 28:1487-1497.
  • [12]Kastrati A, Neumann FJ, Schulz S, Massberg S, Byrne RA, Ferenc M, et al.: Abciximab and heparin versus bivalirudin for non-ST-elevation myocardial infarction. New Engl J Med 2011, 365:1980-1989.
  • [13]Karim SSA, Naidoo K, Grobler A, Padayatchi N, Baxter C, Gray AL, et al.: Integration of antiretroviral therapy with tuberculosis treatment. New Engl J Med 2011, 365:1492-1501.
  • [14]Rao SV, Kaltenbach LA, Weintraub WS, Row MT, Brindis RG, Rumsfield JS, et al.: Prevalence and outcomes of same-day discharge after elective percutaneous coronary intervention among older patients. JAMA 2011, 306(13):1461-1467.
  • [15]Ferlitsch M, Reinhart K, Pramhas S, Wiener C, Gal O, Bannert C, et al.: Sex-specific prevalence of adenomas, advanced adenomas, and colorectal cancer in individuals undergoing screening colonoscopy. JAMA 2011, 306(12):1352-1358.
  • [16]Parodi G, Marucci R, Valenti R, Gori AM, Migliorini A, Giusti B, et al.: High residual platelet reactivity after clopidogrel loading and long-term cardiovascular events among patients with acute coronary syndromes undergoing PCI. JAMA 2011, 306(11):1215-1223.
  • [17]Christoffersen M, Frikke-Schmidt R, Schnohr P, Jensen GB, Nordestgaard BG, Tybjærg-Hansen A: Xanthelasmata, arcus corneae, and ischaemic vascular disease and death in general population: prospective cohort study. BMJ 2011, 343:d5497.
  • [18]Kühnast C, Neuhäuser M: A note on the use of the non-parametric Wilcoxon-Mann-Whitney test in the analysis of medical studies. GMS Ger Med Sci 2008, 6:Doc02.
  文献评价指标  
  下载次数:23次 浏览次数:10次