Современные информационные технологии и IT-образование | |
Evaluation of Statistical Data Quality in the Problem of Calculating the Integral Characteristic of a System for a Number of Observations | |
Tatyana Zhgun1 | |
[1] Yaroslav-the-Wise Novgorod State University; | |
关键词: composite index; data quality; data errors; principal component analysis; method of finite differences; | |
DOI : 10.25559/SITITO.16.202002.295-303 | |
来源: DOAJ |
【 摘 要 】
The construction of the composite index of a system can be considered as a problem of separating signal from noise. The signal in this case is the weight coefficients of the linear convolution of indicators. The weights to be determined should reflect the structure of the system being evaluated. However, principal component analysis and factor analysis determine the structure of principal components and principal factors differently for different observations. The reason for this may be the presence of inevitable errors in the used data. A solution of the problem requires a detailed understanding of input data errors’ influence on the calculated model’s parameters. The article discusses the use of the finite difference method for evaluating statistical data quality in the problem of calculating the integral characteristic of a system for a number of observations. For this technique to be applicable, the data must be approximated with polynomials of lower degrees than the number of observations minus one. The assumption is tested empirically on a specific data set. 37 variables characterizing the quality of life of the population of Russia for 2010-2017 are considered. The dependencies of the quality of data approximation on the degree of polynomial regression are analyzed. The results of the numerical experiment make it possible to draw a conclusion about the legitimacy of evaluating data errors using the finite difference method. The use of the finite difference apparatus for analyzing the data shows the presence of fatal errors from 0.59% to 28.92%. Therefore, obtaining the composite characteristics of objects on the basis of such data must necessarily take into account the presence of a fatal error. In particular, the number of parameters characterizing the system should be large enough to compensate for random errors with averaging.
【 授权许可】
Unknown