首页> 外文期刊>BMC Medical Informatics and Decision Making >Ensembles of randomized trees using diverse distributed representations of clinical events
【24h】

Ensembles of randomized trees using diverse distributed representations of clinical events

机译:使用临床事件的各种分布式表示形式的随机树集合

获取原文
           

摘要

Background Learning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The predictive performance may be further improved by utilizing multiple representations of the same events, which can be obtained by, for instance, manipulating the representation learning procedure. The question, however, remains how to make best use of a set of diverse representations of clinical events – modeled in an ensemble of semantic spaces – for the purpose of predictive modeling. Methods Three different ways of exploiting a set of (ten) distributed representations of four types of clinical events – diagnosis codes, drug codes, measurements, and words in clinical notes – are investigated in a series of experiments using ensembles of randomized trees. Here, the semantic space ensembles are obtained by varying the context window size in the representation learning procedure. The proposed method trains a forest wherein each tree is built from a bootstrap replicate of the training set whose entire original feature set is represented in a randomly selected set of semantic spaces – corresponding to the considered data types – of a given context window size. Results The proposed method significantly outperforms concatenating the multiple representations of the bagged dataset; it also significantly outperforms representing, for each decision tree, only a subset of the features in a randomly selected set of semantic spaces. A follow-up analysis indicates that the proposed method exhibits less diversity while significantly improving average tree performance. It is also shown that the size of the semantic space ensemble has a significant impact on predictive performance and that performance tends to improve as the size increases. Conclusions The strategy for utilizing a set of diverse distributed representations of clinical events when constructing ensembles of randomized trees has a significant impact on predictive performance. The most successful strategy – significantly outperforming the considered alternatives – involves randomly sampling distributed representations of the clinical events when building each decision tree in the forest.
机译:背景技术与使用浅层的,基于计数的表示法相比,基于临床事件在电子健康记录中的分布学习深度表示法已被证明可以对更高性能的预测模型进行后续训练。通过利用相同事件的多种表示,可以进一步提高预测性能,例如,可以通过操纵表示学习过程来获得这些表示。但是,问题仍然在于如何充分利用一组临床事件的各种表示形式(在语义空间中进行建模)来进行预测建模。方法在一系列使用随机树的实验中,研究了三种不同的方式来利用四种类型的临床事件(诊断代码,药物代码,测量值和单词中的单词)的一组(十个)分布式表示形式。在这里,通过在表示学习过程中改变上下文窗口的大小来获得语义空间集合。所提出的方法训练了一个森林,其中每棵树都是从训练集的引导复制中构建的,该训练集的整个原始特征集表示在给定上下文窗口大小的语义空间的随机选择集合中(对应于所考虑的数据类型)。结果所提出的方法明显优于串联袋装数据集的多种表示形式。对于每个决策树,它的性能也明显优于仅代表随机选择的语义空间集合中的特征子集。后续分析表明,所提出的方法展现出更少的多样性,同时显着提高了平均树性能。还显示出语义空间集合的大小对预测性能具有重大影响,并且随着大小的增加,性能趋于提高。结论构建随机树群时利用临床事件的各种不同分布式表示形式的策略对预测性能具有重大影响。最成功的策略-明显优于已考虑的替代方法-涉及在森林中构建每个决策树时随机抽样临床事件的分布式表示形式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号