期刊论文详细信息
Journal of Biometrics & Biostatistics
Using Statistical Techniques and Replication Samples for Missing Values Imputation with an Application on Metabolomics
article
Akram Yazdani1  Azam Yazdani1 
[1] Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai
关键词: Statistical techniques;    Missing value imputation;    Empirical distribution;    Optimal imputation;    Metabolomics;    Replication samples;   
DOI  :  10.4172/2155-6180.1000393
来源: Hilaris Publisher
PDF
【 摘 要 】

Background: Data preparation, such as missing values imputation and transformation, is the first step in any data analysis and requires crucial attention. We take advantage of availability of replication samples to identify the empirical distribution of missing values through utilization of statistical techniques. We apply these techniques to metabolomics data for imputation. Results: Using replication samples, we obtained the empirical distribution of missing values. After application of the techniques on metabolites, we observed that the rate of missing values is approximately distributed uniformly across metabolite range. Therefore, the missing values cannot be imputed with the lowest values. To have a realistic simulation, we designed a simulation study based on empirical distribution of missing values to find an optimal imputation approach. Our findings validated the optimal approach introduced previously for metabolomics. Conclusions: Our analysis utilized replication samples as a new approach to metabolite imputation and found empirical distribution of missing values, designed a simulation study close to reality, and compared different approaches for selecting an optimal imputation approach. The result of this study validated the optimal approach for metabolite imputation through a different data set and different approach, and the aim was to encourage researchers to pay more attention to metabolite imputation since imputing metabolomic missing values with lowest value is going to be a common approach, for example in genomic-metabolomic data analysis.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307140003957ZK.pdf 593KB PDF download
  文献评价指标  
  下载次数:2次 浏览次数:0次