BMC Bioinformatics | |
Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment | |
Methodology Article | |
David Causeur1  Anne-Laure Boulesteix2  Roman Hornung2  | |
[1] Applied Mathematics Department, Agrocampus Ouest, 65 rue de St. Brieuc, 35042, Rennes, France;Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, D-81377, Munich, Germany; | |
关键词: Batch effects; High-dimensional data; Data preparation; Prediction; Latent factors; | |
DOI : 10.1186/s12859-015-0870-z | |
received in 2015-09-24, accepted in 2015-12-22, 发布年份 2016 | |
来源: Springer | |
【 摘 要 】
BackgroundIn the context of high-throughput molecular data analysis it is common that the observations included in a dataset form distinct groups; for example, measured at different times, under different conditions or even in different labs. These groups are generally denoted as batches. Systematic differences between these batches not attributable to the biological signal of interest are denoted as batch effects. If ignored when conducting analyses on the combined data, batch effects can lead to distortions in the results. In this paper we present FAbatch, a general, model-based method for correcting for such batch effects in the case of an analysis involving a binary target variable. It is a combination of two commonly used approaches: location-and-scale adjustment and data cleaning by adjustment for distortions due to latent factors. We compare FAbatch extensively to the most commonly applied competitors on the basis of several performance metrics. FAbatch can also be used in the context of prediction modelling to eliminate batch effects from new test data. This important application is illustrated using real and simulated data. We implemented FAbatch and various other functionalities in the R package bapred available online from CRAN.ResultsFAbatch is seen to be competitive in many cases and above average in others. In our analyses, the only cases where it failed to adequately preserve the biological signal were when there were extremely outlying batches and when the batch effects were very weak compared to the biological signal.ConclusionsAs seen in this paper batch effect structures found in real datasets are diverse. Current batch effect adjustment methods are often either too simplistic or make restrictive assumptions, which can be violated in real datasets. Due to the generality of its underlying model and its ability to perform well FAbatch represents a reliable tool for batch effect adjustment for most situations found in practice.
【 授权许可】
CC BY
© Hornung et al. 2016
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311108618622ZK.pdf | 831KB | download | |
Fig. 5 | 40KB | Image | download |
MediaObjects/12944_2023_1921_MOESM1_ESM.pdf | 34KB | download | |
Fig. 1 | 806KB | Image | download |
2300KB | Image | download | |
Fig. 4 | 372KB | Image | download |
Fig. 1 | 93KB | Image | download |
MediaObjects/13046_2020_1739_MOESM1_ESM.pdf | 221KB | download | |
Fig. 2 | 1052KB | Image | download |
Fig. 4 | 2985KB | Image | download |
MediaObjects/13046_2020_1739_MOESM2_ESM.tif | 27281KB | Other | download |
Fig. 5 | 893KB | Image | download |
Fig. 7 | 2851KB | Image | download |
MediaObjects/13046_2023_2851_MOESM8_ESM.docx | 44KB | Other | download |
Fig. 1 | 378KB | Image | download |
Fig. 1 | 161KB | Image | download |
601KB | Image | download | |
Fig. 1 | 252KB | Image | download |
Fig. 6 | 194KB | Image | download |
Fig. 2 | 126KB | Image | download |
Fig. 5 | 377KB | Image | download |
Fig. 2 | 223KB | Image | download |
Fig. 2 | 326KB | Image | download |
Table 1 | 87KB | Table | download |
12944_2017_533_Article_IEq2.gif | 1KB | Image | download |
Fig. 3 | 191KB | Image | download |
MediaObjects/13046_2023_2865_MOESM5_ESM.tif | 16266KB | Other | download |
MediaObjects/41408_2023_931_MOESM1_ESM.docx | 75KB | Other | download |
Fig. 4 | 393KB | Image | download |
Fig. 4 | 1257KB | Image | download |
Fig. 1 | 2894KB | Image | download |
Table 2 | 149KB | Table | download |
40708_2023_205_Article_IEq12.gif | 1KB | Image | download |
MediaObjects/12888_2023_5290_MOESM1_ESM.docx | 17KB | Other | download |
Fig. 1 | 110KB | Image | download |
MediaObjects/41408_2023_931_MOESM3_ESM.png | 120KB | Other | download |
Fig. 1 | 143KB | Image | download |
Fig. 6 | 974KB | Image | download |
MediaObjects/41408_2023_931_MOESM4_ESM.xlsx | 45KB | Other | download |
40708_2023_205_Article_IEq19.gif | 1KB | Image | download |
Fig. 1 | 723KB | Image | download |
【 图 表 】
Fig. 1
40708_2023_205_Article_IEq19.gif
Fig. 6
Fig. 1
Fig. 1
40708_2023_205_Article_IEq12.gif
Fig. 1
Fig. 4
Fig. 4
Fig. 3
12944_2017_533_Article_IEq2.gif
Fig. 2
Fig. 2
Fig. 5
Fig. 2
Fig. 6
Fig. 1
Fig. 1
Fig. 1
Fig. 7
Fig. 5
Fig. 4
Fig. 2
Fig. 1
Fig. 4
Fig. 1
Fig. 5
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]