期刊论文

【摘要】

BackgroundHigh throughput metabolomics makes it possible to measure the relative abundances of numerous metabolites in biological samples, which is useful to many areas of biomedical research. However, missing values (MVs) in metabolomics datasets are common and can arise due to both technical and biological reasons. Typically, such MVs are substituted by a minimum value, which may lead to different results in downstream analyses.ResultsHere we present a modified version of the K-nearest neighbor (KNN) approach which accounts for truncation at the minimum value, i.e., KNN truncation (KNN-TN). We compare imputation results based on KNN-TN with results from other KNN approaches such as KNN based on correlation (KNN-CR) and KNN based on Euclidean distance (KNN-EU). Our approach assumes that the data follow a truncated normal distribution with the truncation point at the detection limit (LOD). The effectiveness of each approach was analyzed by the root mean square error (RMSE) measure as well as the metabolite list concordance index (MLCI) for influence on downstream statistical testing. Through extensive simulation studies and application to three real data sets, we show that KNN-TN has lower RMSE values compared to the other two KNN procedures as well as simpler imputation methods based on substituting missing values with the metabolite mean, zero values, or the LOD. MLCI values between KNN-TN and KNN-EU were roughly equivalent, and superior to the other four methods in most cases.ConclusionOur findings demonstrate that KNN-TN generally has improved performance in imputing the missing values of the different datasets compared to KNN-CR and KNN-EU when there is missingness due to missing at random combined with an LOD. The results shown in this study are in the field of metabolomics but this method could be applicable with any high throughput technology which has missing due to LOD.

【授权许可】

CC BY
© The Author(s). 2017

【预览】

附件列表
Files	Size	Format	View
RO202311108255575ZK.pdf	1456KB	PDF	download
Fig. 6	1766KB	Image	download
Fig. 3	595KB	Image	download
Fig. 3	1801KB	Image	download
Fig. 4	183KB	Image	download
Fig. 1	206KB	Image	download
12936_2017_2051_Article_IEq85.gif	1KB	Image	download

【图表】

12936_2017_2051_Article_IEq85.gif

Fig. 1

Fig. 4

Fig. 3

Fig. 3

Fig. 6

【参考文献】

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]

BMC Bioinformatics
Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies
Methodology Article
Shesh N. Rai¹ Jasmit S. Shah² Guy N. Brock³ Aruni Bhatnagar⁴ Bradford G. Hill⁴ Andrew P. DeFilippis⁴
[1] Department of Bioinformatics and Biostatistics, University of Louisville, 40202, Louisville, KY, USA;Department of Bioinformatics and Biostatistics, University of Louisville, 40202, Louisville, KY, USA;Department of Medicine, Division of Cardiovascular Medicine, Diabetes and Obesity Center, University of Louisville, 40202, Louisville, KY, USA;Department of Bioinformatics and Biostatistics, University of Louisville, 40202, Louisville, KY, USA;Present Affiliation: Department of Biomedical Informatics, The Ohio State University, 43210, Columbus, OH, USA;Department of Medicine, Division of Cardiovascular Medicine, Diabetes and Obesity Center, University of Louisville, 40202, Louisville, KY, USA;
关键词: Metabolomics; Missing value; Imputation; Truncated normal; High dimensional data; K-nearest neighbors;
DOI : 10.1186/s12859-017-1547-6
received in 2016-09-15, accepted in 2017-02-13, 发布年份 2017
来源: Springer
PDF


	文献评价指标
	下载次数：14次	浏览次数：2次

【 摘 要 】

【 授权许可】

【 预 览 】

【 图 表 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【图表】

【参考文献】