期刊论文详细信息
BMC Medical Informatics and Decision Making
Ensembles of randomized trees using diverse distributed representations of clinical events
Research
Henrik Boström1  Jing Zhao1  Hercules Dalianis1  Aron Henriksson1 
[1] Department of Computer and Systems Sciences, Stockholm University, Borgarfjordsgatan 12, SE-16407, Kista, Sweden;
关键词: Random forest;    Distributional semantics;    Heterogeneous data;    Electronic health records;    Pharmacovigilance;    Adverse drug events;   
DOI  :  10.1186/s12911-016-0309-0
来源: Springer
PDF
【 摘 要 】

BackgroundLearning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The predictive performance may be further improved by utilizing multiple representations of the same events, which can be obtained by, for instance, manipulating the representation learning procedure. The question, however, remains how to make best use of a set of diverse representations of clinical events – modeled in an ensemble of semantic spaces – for the purpose of predictive modeling.MethodsThree different ways of exploiting a set of (ten) distributed representations of four types of clinical events – diagnosis codes, drug codes, measurements, and words in clinical notes – are investigated in a series of experiments using ensembles of randomized trees. Here, the semantic space ensembles are obtained by varying the context window size in the representation learning procedure. The proposed method trains a forest wherein each tree is built from a bootstrap replicate of the training set whose entire original feature set is represented in a randomly selected set of semantic spaces – corresponding to the considered data types – of a given context window size.ResultsThe proposed method significantly outperforms concatenating the multiple representations of the bagged dataset; it also significantly outperforms representing, for each decision tree, only a subset of the features in a randomly selected set of semantic spaces. A follow-up analysis indicates that the proposed method exhibits less diversity while significantly improving average tree performance. It is also shown that the size of the semantic space ensemble has a significant impact on predictive performance and that performance tends to improve as the size increases.ConclusionsThe strategy for utilizing a set of diverse distributed representations of clinical events when constructing ensembles of randomized trees has a significant impact on predictive performance. The most successful strategy – significantly outperforming the considered alternatives – involves randomly sampling distributed representations of the clinical events when building each decision tree in the forest.

【 授权许可】

CC BY   
© The Author(s) 2016

【 预 览 】
附件列表
Files Size Format View
RO202311096093218ZK.pdf 570KB PDF download
12864_2017_3733_Article_IEq60.gif 1KB Image download
12864_2017_3733_Article_IEq61.gif 1KB Image download
12864_2016_2789_Article_IEq42.gif 1KB Image download
12888_2017_1557_Article_IEq1.gif 1KB Image download
12864_2017_3733_Article_IEq64.gif 1KB Image download
12864_2016_2696_Article_IEq4.gif 1KB Image download
12864_2017_3733_Article_IEq66.gif 1KB Image download
【 图 表 】

12864_2017_3733_Article_IEq66.gif

12864_2016_2696_Article_IEq4.gif

12864_2017_3733_Article_IEq64.gif

12888_2017_1557_Article_IEq1.gif

12864_2016_2789_Article_IEq42.gif

12864_2017_3733_Article_IEq61.gif

12864_2017_3733_Article_IEq60.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  文献评价指标  
  下载次数:2次 浏览次数:3次