...
首页> 外文期刊>Evolutionary computation >A Tandem Evolutionary Algorithm for Identifying Causal Rules from Complex Data
【24h】

A Tandem Evolutionary Algorithm for Identifying Causal Rules from Complex Data

机译:从复杂数据识别因果规则的串联进化算法

获取原文
获取原文并翻译 | 示例
           

摘要

We propose a new evolutionary approach for discovering causal rules in complex classification problems from batch data. Key aspects include (a) the use of a hypergeometric probability mass function as a principled statistic for assessing fitness that quantifies the probability that the observed association between a given clause and target class is due to chance, taking into account the size of the dataset, the amount of missing data, and the distribution of outcome categories, (b) tandem age-layered evolutionary algorithms for evolving parsimonious archives of conjunctive clauses, and disjunctions of these conjunctions, each of which have probabilistically significant associations with outcome classes, and (c) separate archive bins for clauses of different orders, with dynamically adjusted order-specific thresholds. The method is validated on majority-on and multiplexer benchmark problems exhibiting various combinations of heterogeneity, epistasis, overlap, noise in class associations, missing data, extraneous features, and imbalanced classes. We also validate on a more realistic synthetic genome dataset with heterogeneity, epistasis, extraneous features, and noise. In all synthetic epistatic benchmarks, we consistently recover the true causal rule sets used to generate the data. Finally, we discuss an application to a complex real-world survey dataset designed to inform possible ecohealth interventions for Chagas disease.
机译:我们提出了一种新的进化方法,用于从批处理数据中发现复杂分类问题中的因果规则。关键方面包括(a)使用超几何概率质量函数作为原则统计量来评估适用性,其中考虑到数据集的大小,量化给定子句和目标类别之间观察到的关联是偶然的概率,缺失数据的数量以及结果类别的分布;(b)演变连词的简约存档的串联分层分层进化算法,以及这些连词的析取,每个连词都与结果类具有概率上的显着关联,并且(c )具有不同顺序的子句的阈值,可针对不同顺序的子句使用单独的存档箱。该方法已在多数优势和多路复用器基准问题上得到验证,这些问题表现出异质性,上位性,重叠,类关联中的噪声,数据丢失,无关特征和类不平衡的各种组合。我们还验证了具有异质性,上位性,无关特征和噪声的更现实的合成基因组数据集。在所有综合性上位基准中,我们始终如一地恢复用于生成数据的真实因果规则集。最后,我们讨论了在复杂的现实世界调查数据集上的应用,该数据集旨在告知可能的南美锥虫病生态健康干预措施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号