首页> 外文期刊>Journal of Intelligent Information Systems >Rank correlated subgroup discovery
【24h】

Rank correlated subgroup discovery

机译:等级相关子组发现

获取原文
获取原文并翻译 | 示例
           

摘要

Subgroup discovery (SD) and exceptional model mining (EMM), its generalization to handle more complex targets, are two mature fields at the frontier of data mining and machine learning. More precisely, EMM aims to find coherent subgroups of a dataset where multiple targets interact in an unusual way. Correlation model classes have already been defined to discover interesting subgroups when dealing with two numerical targets. However, in this supervised setting, the two numerical targets are fixed before the subgroup search. To make unsupervised exploration possible, we propose to search for arbitrary subsets of numerical targets whose correlation is exceptional for an automatically found subgroup. This involves solving two challenges: the definition of a model that evaluates the interest of a subgroup for a subset of numerical targets and the definition of a pattern language that enumerates both subgroups and targets and lends itself to effective research strategies. We propose an integrated solution to both challenges. We introduce the problem of rank-correlated subgroup discovery with an arbitrary subset of numerical targets. A rank-correlated subgroup is identified by both conditions on descriptive attributes, whether numeric or nominal, and a pattern on numeric attributes that captures (positive or negative) rank correlations based on a generalization of the Kendall's tau. We define a new branch-and-bound algorithm that exploits some pruning properties based on two upper-bounds and a closure property. An empirical study on several datasets demonstrates the efficiency and the effectiveness of the algorithm.
机译:子组发现(SD)和卓越的模型挖掘(EMM),其概括地处理更复杂的目标,是数据挖掘和机器学习前沿的两个成熟字段。更准确地说,EMM旨在找到一个DataSet的连贯子组,其中多个目标以异常方式进行交互。在处理两个数值目标时,已经定义了相关模型类以发现有趣的子组。但是,在该监督设置中,在子组搜索之前修复了两个数值目标。为了使无人监督的探索成为可能,我们建议寻找数值目标的任意子集,其关联对于自动找到的子组出色。这涉及解决两个挑战:一种模型的定义,该模型评估子组的子集的兴趣和模式语言的定义,这些语言枚举子组和目标的分组和目标,并将其自身归入有效的研究策略。我们向两种挑战提出综合解决方案。我们介绍了对数值目标的任意子集的秩相关子组发现的问题。通过对描述性属性的条件,无论是数字还是标称的条件,以及基于KENDALL的TAU的概括的数字属性上的数字属性的模式,以及数字属性的模式。我们定义了一种新的分支和绑定算法,该算法基于两个上限和关闭属性来利用一些修剪属性。关于若干数据集的实证研究表明了算法的效率和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号