首页> 外文会议>International symposium on intelligent data analysis >Random Forests with Latent Variables to Foster Feature Selection in the Context of Highly Correlated Variables. Illustration with a Bioinformatics Application,
【24h】

Random Forests with Latent Variables to Foster Feature Selection in the Context of Highly Correlated Variables. Illustration with a Bioinformatics Application,

机译:具有高度相关变量的潜在变量来促进特征选择的随机森林。具有生物信息学应用程序的插图,

获取原文

摘要

The random forest model is a popular framework used in classification and regression. In cases where dense dependences exist within the variables, it may be beneficial to capture these dependences through latent variables, further used to build the random forest. In this paper, we present Sylva, a generalization of the T-Trees model (Botta et al., 2008), the only attempt so far where latent variables are integrated in the random forest learning scheme. Sylva is an innovative hybrid approach in which an adapted random forest framework benefits from the modeling of dependences via FLTM, a forest of latent tree models (Mourad et al., 2011). The FLTM model drives the generation on the fly of the latent variables used to learn the random forest. In the unprecedented large-scale study reported here, Sylva, instantiated by different clustering methods, is compared to T-Trees using high-dimensional real-world datasets in the context of genetic association studies. We show that the already high predictive power of T-Trees is not significantly increased by Sylva. In constrast, in Sylva, the importance measure distribution corresponding to top-ranked variables is significantly skewed towards higher values than in T-Trees, which meets the feature selection objective.
机译:随机森林模型是用于分类和回归的流行框架。在变量中存在密集依赖性的情况下,通过进一步用于构建随机森林的潜在变量来捕获这些依赖性可能是有益的。在本文中,我们介绍了Sylva,它是T树模型的推广(Botta等,2008),这是迄今为止唯一将潜在变量集成到随机森林学习方案中的尝试。 Sylva是一种创新的混合方法,其中经过调整的随机森林框架受益于通过FLTM(一种潜伏的树模型森林)进行的依赖性建模(Mourad等,2011)。 FLTM模型可动态驱动用于学习随机森林的潜在变量的生成。在本文报道的史无前例的大规模研究中,在遗传关联研究的背景下,使用高维真实世界数据集将通过不同聚类方法实例化的Sylva与T-Trees进行了比较。我们显示,Sylva并未显着提高T树的本已很高的预测能力。相反,在西尔瓦(Sylva)中,与排名靠前的变量相对应的重要性度量分布明显偏向比T树中更高的值,这满足了特征选择的目标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号