BMC Research Notes | |
Input data quality control for NDNQI national comparative statistics and quarterly reports: a contrast of three robust scale estimators for multiple outlier detection | |
Nancy Dunton2  Byron J Gajewski2  Jonathan D Mahnken1  Brandon Crosser2  Qingjiang Hou1  | |
[1] Department of Biostatistics, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City, KS 66160, USA;School of Nursing, University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA | |
关键词: Quality control; Outlier; FAST-MCD; Median absolute deviation; Interquartile range; NDNQI; | |
Others : 1165892 DOI : 10.1186/1756-0500-5-456 |
|
received in 2012-03-19, accepted in 2012-08-17, 发布年份 2012 | |
【 摘 要 】
Background
To evaluate institutional nursing care performance in the context of national comparative statistics (benchmarks), approximately one in every three major healthcare institutions (over 1,800 hospitals) across the United States, have joined the National Database for Nursing Quality Indicators® (NDNQI®). With over 18,000 hospital units contributing data for nearly 200 quantitative measures at present, a reliable and efficient input data screening for all quantitative measures for data quality control is critical to the integrity, validity, and on-time delivery of NDNQI reports.
Methods
With Monte Carlo simulation and quantitative NDNQI indicator examples, we compared two ad-hoc methods using robust scale estimators, Inter Quartile Range (IQR) and Median Absolute Deviation from the Median (MAD), to the classic, theoretically-based Minimum Covariance Determinant (FAST-MCD) approach, for initial univariate outlier detection.
Results
While the theoretically based FAST-MCD used in one dimension can be sensitive and is better suited for identifying groups of outliers because of its high breakdown point, the ad-hoc IQR and MAD approaches are fast, easy to implement, and could be more robust and efficient, depending on the distributional property of the underlying measure of interest.
Conclusion
With highly skewed distributions for most NDNQI indicators within a short data screen window, the FAST-MCD approach, when used in one dimensional raw data setting, could overestimate the false alarm rates for potential outliers than the IQR and MAD with the same pre-set of critical value, thus, overburden data quality control at both the data entry and administrative ends in our setting.
【 授权许可】
2012 Hou et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150416034519349.pdf | 621KB | download | |
Figure 2. | 134KB | Image | download |
Figure 1. | 110KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
【 参考文献 】
- [1]Dunton N, Gajewski B, Kluas S, Pierson B: The relationship of nursing workforce characteristics to patient outcomes. Online J Nursing Issues 2007., 12(3)
- [2]Rousseeuw PJ, Van Dressen K: A fast algorithm for the minimum covariance determinant estimator. Technometrics Vol 1999, 41:212-223.
- [3]Dunton N, Miller P: Report on the 2008 NDNQI® customer satisfaction survey. Prepared for the American Nurses Association. National Database on Nursing Quality Indicators. University of Kansas: School of Nursing; 2008.
- [4]Gajewski BJ, Mahnken JD, Dunton N: Improving quality indicator report cards through Bayesian modelling. BMC Med Res Methodology 2008, 8:77. BioMed Central Full Text
- [5]Hou Q, Mahnken JD, Gajewski BJ, Dunton N: The Box-Cox power transformation on nursing sensitive indicators: does it matter if structural effects are omitted during the estimation of the transformation parameter? BMC Med Res Methodology 2011, 11:118. BioMed Central Full Text
- [6]Hampel F, Rousseeuw P, Stahel W: Robust statistics: the approach based on influence curves. New York: Weley; 1986.
- [7]Gajewski BJ: Robust multivariate estimation and variable selection in transportation and environmental engineering. Texas A & M University: Ph.D. dissertation; 2000.
- [8]Gervini D, Yohai VJ: A class of robust and fully efficient regression estimators. The annal of statistics 2002, 30(2):258-616.
- [9]Gajewski BJ, Spiegelman HC: Correspondence estimation of the source profiles in receptor modeling. Environmetrics 2004, 15:613-634.
- [10]She Y, Owen AB: Outlier detection using nonconvex penalized regression. J Am Stat Assoc 2011, 106(494):626-639.
- [11]Billor N, Kiral G: Comparison of multiple outlier detection methods for regression data. Commun Stat Simul Comput 2008, 37:3,521-545.
- [12]Larsen WA, McClearry SJ: The use of partial residual plots in regression analysis. Technometrics 1972, 14:781-790.
- [13]Cook RD: Influential observations in linear regression. J Am Stat Assoc 1979, 74:169-174.
- [14]Atkinson AC: Plots, Transformations, and Regression. New York: Oxford University Press; 1985.
- [15]Bacon-Shone J, Fung WK: A new graphical method for detecting single and multiple outliers in univariate and multivariate data. Appl Stat 1987, 36(2):153-162.
- [16]Garret RG: The Chi-square plot: a tool for multivariate outlier recognition. J Geochem Explor Vol 1989, 84:116-144.
- [17]Swallow WH, Kianifard F: Using robust scale estimates in detecting multiple outliers in linear regression. Biometrics 1996, 52:545-556.
- [18]Filzmore P, Reimann C, Garrett RG: Multivariate outlier detection in exploration geochemistry. Technical report TS 03–5, Department of Statistics. Austria: Vienna University of Technology; 2003.
- [19]Billor N, Chatterjee S, Hadi AS: A re-weighted least squares method for robust regression estimation. Am J Math Manag Sci 2007, 26:229-252.
- [20]Maronna RA: Robust M-estimators of multivariate location and scatter. Ann Stat 1976, 4:51-67.
- [21]Davies PL: Asymptotic behavior of S-estimators of mutilvariate location parameters and dispersion matrices. Ann Stat 1987, 15:1269-1292.
- [22]Woodruff DL, Rocke DM: Computable robust estimation of multivariate location and shape in high dimension using compound estimators. J Am Stat Assoc 1994, 89:888-896.
- [23]Barnett V, Lewis T: Outliers in statistical data. New York: John Wiley; 1978.
- [24]Ryan PT: Statistical methods for quality improvement. New York: John Wiley; 1989.
- [25]Draper NR, Smith H: Applied regression analysis. 2nd edition. New York: John Wiley; 1996.
- [26]Montalvo I, Dunton N: Transforming Nursing Data Into Quality Care: Profiles of Quality Improvement in U.S. Healthcare Facilities. USA: Healthcare Facilities; 2000.
- [27]Kianifard F, Swallow WH: A Monte Carlo comparison of five procedures for identifying outliers in linear regression. Commun Stat Theory Methods 1990, 19:1913-1938.
- [28]Hadi AS, Simonoff JS: A more robust outlier identifier for regression data. Bull Int Stat Inst 1997, 281:282.
- [29]Sebert DM: Identifying multiple outliers and influential subsets in linear regression: A clustering approach. Department of Industrial Engineering. Arizona State University, AZ: Unpublished dissertation; 1996.
- [30]Rousseeuw PJ, Van Zomereon BC: Unmasking multivariate outliers and leverage points. J Am Stat Assoc 1990, 85:633-639.
- [31]Brown DA, Durbin J, Evens JM: Techniques for testing the constancy of regression relationships over time. J R Stat Soc Ser B 1975, 37:149-192.
- [32]http://www.nursingquality.org/FAQPage.aspx#1
- [33]http://support.sas.com/documentation/onlinedoc/91pdf/index_913.html
- [34]Wisnowski JW, Montgomery DC, Simpson JR: A Comparative analysis of multiple outlier detection procedures in the linear regression model. Comput Stat Data Anal 2001, 36(3):351-382.