BMC Medical Research Methodology | |
t-tests, non-parametric tests, and large studies—a paradox of statistical practice? | |
Morten W Fagerland1  | |
[1] Unit of Biostatistics and Epidemiology, Oslo University Hospital, Oslo, N-0407, Norway | |
关键词: Statistical practice; Sample size; Welch test; Wilcoxon-Mann-Whitney test; Non-parametric test; T-test; | |
Others : 1136637 DOI : 10.1186/1471-2288-12-78 |
|
received in 2012-01-11, accepted in 2012-06-14, 发布年份 2012 | |
【 摘 要 】
Background
During the last 30 years, the median sample size of research studies published in high-impact medical journals has increased manyfold, while the use of non-parametric tests has increased at the expense of t-tests. This paper explores this paradoxical practice and illustrates its consequences.
Methods
A simulation study is used to compare the rejection rates of the Wilcoxon-Mann-Whitney (WMW) test and the two-sample t-test for increasing sample size. Samples are drawn from skewed distributions with equal means and medians but with a small difference in spread. A hypothetical case study is used for illustration and motivation.
Results
The WMW test produces, on average, smaller p-values than the t-test. This discrepancy increases with increasing sample size, skewness, and difference in spread. For heavily skewed data, the proportion of p<0.05 with the WMW test can be greater than 90% if the standard deviations differ by 10% and the number of observations is 1000 in each group. The high rejection rates of the WMW test should be interpreted as the power to detect that the probability that a random sample from one of the distributions is less than a random sample from the other distribution is greater than 50%.
Conclusions
Non-parametric tests are most useful for small studies. Using non-parametric tests in large studies may provide answers to the wrong question, thus confusing readers. For studies with a large sample size, t-tests and their corresponding confidence intervals can and should be used even for heavily skewed data.
【 授权许可】
2012 Fagerland; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150313082508991.pdf | 243KB | download | |
Figure 3. | 27KB | Image | download |
Figure 2. | 38KB | Image | download |
Figure 1. | 20KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
【 参考文献 】
- [1]Horton NJ, Switzer SS: Statistical methods in the journal. New Engl J Med 2005, 353(18):1977-1979.
- [2]Emerson JD, Colditz GA: Use of statistical analysis in the New England Journal of Medicine. New Engl J Med 1983, 309(12):709-713.
- [3]Bland MJ: The tyranny of power: is there a better way to calculate sample size? BMJ 2009, 339:b3985. [10.1136/bmj.b3985]
- [4]Skovlund E, Fenstad GU: Should we always choose a nonparametric test when comparing two apparently nonnormal distributions? J Clin Epidemiol 2001, 54:86-92.
- [5]Fagerland MW, Sandvik L: Performance of five two-sample location tests for skewed distributions with unequal variances. Contemp Clin Trials 2009, 30:490-496.
- [6]Altman DG: Practical Statistics For Medical Research. Boca Raton, FL: Chapman & Hall/CRC; 1991.
- [7]Altman DG, Machin D, Bryant TN, Gardner MJ (eds): Statistics with Confidence (2nd edn). London: BMJ Books; 2000.
- [8]Bland M: An Introduction to Medical Statistics (3rd edn). Oxford: Oxford University Press; 2000.
- [9]Kirkwood BR, Sterne JAC: Essential Medical Statistics (2nd edn). Malden, MA: Blackwell Science, Inc.; 2003.
- [10]Hart A: Mann-Whitney test is not just a test of medians: differences in spread can be important. BMJ 2001, 323:391-393.
- [11]Fagerland MW, Sandvik L: The Wilcoxon-Mann-Whitney test under scrutiny. Stat Med 2009, 28:1487-1497.
- [12]Kastrati A, Neumann FJ, Schulz S, Massberg S, Byrne RA, Ferenc M, et al.: Abciximab and heparin versus bivalirudin for non-ST-elevation myocardial infarction. New Engl J Med 2011, 365:1980-1989.
- [13]Karim SSA, Naidoo K, Grobler A, Padayatchi N, Baxter C, Gray AL, et al.: Integration of antiretroviral therapy with tuberculosis treatment. New Engl J Med 2011, 365:1492-1501.
- [14]Rao SV, Kaltenbach LA, Weintraub WS, Row MT, Brindis RG, Rumsfield JS, et al.: Prevalence and outcomes of same-day discharge after elective percutaneous coronary intervention among older patients. JAMA 2011, 306(13):1461-1467.
- [15]Ferlitsch M, Reinhart K, Pramhas S, Wiener C, Gal O, Bannert C, et al.: Sex-specific prevalence of adenomas, advanced adenomas, and colorectal cancer in individuals undergoing screening colonoscopy. JAMA 2011, 306(12):1352-1358.
- [16]Parodi G, Marucci R, Valenti R, Gori AM, Migliorini A, Giusti B, et al.: High residual platelet reactivity after clopidogrel loading and long-term cardiovascular events among patients with acute coronary syndromes undergoing PCI. JAMA 2011, 306(11):1215-1223.
- [17]Christoffersen M, Frikke-Schmidt R, Schnohr P, Jensen GB, Nordestgaard BG, Tybjærg-Hansen A: Xanthelasmata, arcus corneae, and ischaemic vascular disease and death in general population: prospective cohort study. BMJ 2011, 343:d5497.
- [18]Kühnast C, Neuhäuser M: A note on the use of the non-parametric Wilcoxon-Mann-Whitney test in the analysis of medical studies. GMS Ger Med Sci 2008, 6:Doc02.