期刊论文详细信息
BMC Bioinformatics
Variable selection for binary classification using error rate p-values applied to metabolomics data
Methodology Article
Johan A. Westerhuis1  J. Hendrik Venter2  Carolus J. Reinecke3  Mari van Reenen4 
[1]Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098 XH, Amsterdam, The Netherlands
[2]Centre for Human Metabolomics, Faculty of Natural Sciences, North-West University (Potchefstroom Campus), Private Bag X6001, Potchefstroom, South Africa
[3]Centre for Business Mathematics and Informatics, Faculty of Natural Sciences, North-West University (Potchefstroom Campus), Private Bag X6001, Potchefstroom, South Africa
[4]Centre for Human Metabolomics, Faculty of Natural Sciences, North-West University (Potchefstroom Campus), Private Bag X6001, Potchefstroom, South Africa
[5]Centre for Human Metabolomics, Faculty of Natural Sciences, North-West University (Potchefstroom Campus), Private Bag X6001, Potchefstroom, South Africa
[6]Department of Statistics, Faculty of Natural Sciences, North-West University (Potchefstroom Campus), Private Bag X6001, Potchefstroom, South Africa
关键词: Variable selection;    Significance testing;    Non-parametric;    Binary classification;    Metabolomics;   
DOI  :  10.1186/s12859-015-0867-7
 received in 2015-06-11, accepted in 2015-12-21,  发布年份 2016
来源: Springer
PDF
【 摘 要 】
BackgroundMetabolomics datasets are often high-dimensional though only a limited number of variables are expected to be informative given a specific research question. The important task of selecting informative variables can therefore become complex. In this paper we look at discriminating between two groups. Two tasks need to be performed: (i) finding variables which differ between the two groups; and (ii) determining how the selected variables can be used to classify new subjects. We introduce an approach using minimum classification error rates as test statistics to find discriminatory and therefore informative variables. The thresholds resulting in the minimum error rates can be used to classify new subjects. This approach transforms error rates into p-values and is referred to as ERp.ResultsWe show that non-parametric hypothesis testing, based on minimum classification error rates as test statistics, can find statistically significantly shifted variables. The discriminatory ability of variables becomes more apparent when error rates are evaluated based on their corresponding p-values, as relatively high error rates can still be statistically significant. ERp can handle unequal and small group sizes, as well as account for the cost of misclassification. ERp retains (if known) or reveals (if unknown) the shift direction, aiding in biological interpretation. The threshold resulting in the minimum error rate can immediately be used to classify new subjects.We use NMR generated metabolomics data to illustrate how ERp is able to discriminate subjects diagnosed with Mycobacterium tuberculosis infected meningitis from a control group. The list of discriminatory variables produced by ERp contains all biologically relevant variables with appropriate shift directions discussed in the original paper from which this data is taken.ConclusionsERp performs variable selection and classification, is non-parametric and aids biological interpretation while handling unequal group sizes and misclassification costs. All this is achieved by a single approach which is easy to perform and interpret. ERp has the potential to address many other characteristics of metabolomics data. Future research aims to extend ERp to account for a large proportion of observations below the detection limit, as well as expand on interactions between variables.
【 授权许可】

CC BY   
© van Reenen et al. 2016

【 预 览 】
附件列表
Files Size Format View
RO202311103732704ZK.pdf 2118KB PDF download
MediaObjects/12951_2023_2117_MOESM1_ESM.docx 4908KB Other download
12951_2016_246_Article_IEq6.gif 1KB Image download
12951_2016_246_Article_IEq7.gif 1KB Image download
Fig. 8 2685KB Image download
Fig. 2 663KB Image download
Fig. 4 2807KB Image download
Fig. 1 285KB Image download
Fig. 10 2860KB Image download
Fig. 2 2277KB Image download
Fig. 1 127KB Image download
Fig. 5 629KB Image download
MediaObjects/13046_2023_2842_MOESM1_ESM.docx 6521KB Other download
Fig. 3 204KB Image download
12951_2017_255_Article_IEq48.gif 1KB Image download
Fig. 1 334KB Image download
Fig. 1 105KB Image download
Fig. 6 1312KB Image download
Fig. 5 993KB Image download
12951_2016_246_Article_IEq8.gif 1KB Image download
42004_2023_1031_Article_IEq16.gif 1KB Image download
12951_2016_246_Article_IEq9.gif 1KB Image download
42004_2023_1031_Figa_HTML.png 4KB Image download
MediaObjects/12888_2023_5225_MOESM1_ESM.docx 1153KB Other download
MediaObjects/42004_2023_1031_MOESM1_ESM.pdf 4101KB PDF download
MediaObjects/12951_2023_2146_MOESM1_ESM.doc 46918KB Other download
Fig. 6 412KB Image download
Fig. 5 3768KB Image download
Fig. 1 182KB Image download
12936_2017_1904_Article_IEq1.gif 1KB Image download
12951_2017_255_Article_IEq49.gif 1KB Image download
MediaObjects/41408_2023_927_MOESM6_ESM.tif 3545KB Other download
12951_2017_255_Article_IEq50.gif 1KB Image download
MediaObjects/12944_2023_1941_MOESM2_ESM.xlsx 10KB Other download
12951_2016_223_Article_IEq1.gif 1KB Image download
Scheme 1 2400KB Image download
MediaObjects/13046_2023_2857_MOESM1_ESM.pdf 6527KB PDF download
Fig. 2 2232KB Image download
Fig. 1 1626KB Image download
Fig. 1 573KB Image download
Fig. 10 4904KB Image download
Fig. 4 371KB Image download
Fig. 1 245KB Image download
Fig. 1 111KB Image download
MediaObjects/12974_2023_2910_MOESM3_ESM.tif 3321KB Other download
Fig. 2 155KB Image download
Fig. 4 3333KB Image download
12951_2017_255_Article_IEq51.gif 1KB Image download
MediaObjects/41021_2023_280_MOESM1_ESM.docx 35KB Other download
12951_2017_255_Article_IEq52.gif 1KB Image download
Fig. 4 1969KB Image download
Fig. 9 1203KB Image download
【 图 表 】

Fig. 9

Fig. 4

12951_2017_255_Article_IEq52.gif

12951_2017_255_Article_IEq51.gif

Fig. 4

Fig. 2

Fig. 1

Fig. 1

Fig. 4

Fig. 10

Fig. 1

Fig. 1

Fig. 2

Scheme 1

12951_2016_223_Article_IEq1.gif

12951_2017_255_Article_IEq50.gif

12951_2017_255_Article_IEq49.gif

12936_2017_1904_Article_IEq1.gif

Fig. 1

Fig. 5

Fig. 6

42004_2023_1031_Figa_HTML.png

12951_2016_246_Article_IEq9.gif

42004_2023_1031_Article_IEq16.gif

12951_2016_246_Article_IEq8.gif

Fig. 5

Fig. 6

Fig. 1

Fig. 1

12951_2017_255_Article_IEq48.gif

Fig. 3

Fig. 5

Fig. 1

Fig. 2

Fig. 10

Fig. 1

Fig. 4

Fig. 2

Fig. 8

12951_2016_246_Article_IEq7.gif

12951_2016_246_Article_IEq6.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  文献评价指标  
  下载次数:0次 浏览次数:0次