期刊论文详细信息
Frontiers in Applied Mathematics and Statistics
Finite Sample Corrections for Parameters Estimation and Significance Testing
Tay, Darrell JiaJie1  Cheong, Siew Ann1  Li, Sai Ping2  Teh, Boon Kin2 
[1]Complexity Institute, Nanyang Technological University, Singapore
[2]Division of Physics and Applied Physics, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore
关键词: Significance testing;    Finite Sample Effects;    Curve fitting;    maximum likelihood;    Distribution Noise;   
DOI  :  10.3389/fams.2018.00002
学科分类:数学(综合)
来源: Frontiers
PDF
【 摘 要 】
An increasingly important problem in the era of Big Data is fitting data to distributions. However, many stop at visually inspecting the fits or use the coefficient of determination as a measure of the goodness of fit. In general, goodness-of-fit measures do not allow us to tell which of several distributions fit the data best. Also, the likelihood of drawing the data from a distribution can be low even when the fit is good. To overcome these limitations, Clauset et al. advocated a three-step procedure for fitting any distribution: (i) estimate parameter(s) accurately, (ii) choosing and calculating an appropriate goodness of fit, (iii) test its significance to determine how likely this goodness of fit will appear in samples of the distribution. When we perform this significance testing on exponential distributions, we often obtain low significance values despite the fits being visually good. This led to our realization that most fitting methods do not account for effects due to the finite number of elements and the finite largest element. The former produces sample size dependence in the goodness of fits and the latter introduces a bias in the estimated parameter and the goodness of fit. We propose modifications to account for both and show that these corrections improve the significance of the fits of both real and simulated data. In addition, we used simulations and analytical approximations to verify that convergence rate of the estimated parameters towards its true value depends on how fast the largest element converge to infinity, and provide fast inversion formulas to obtain p-values directly from the adjusted test statistics, in place of doing more Monte Carlo simulations.
【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO201904023953421ZK.pdf 7525KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:9次