首页> 外文期刊>Journal of Intelligent Information Systems >Efficient discovery of correlated patterns using multiple minimum all-confidence thresholds
【24h】

Efficient discovery of correlated patterns using multiple minimum all-confidence thresholds

机译:使用多个最小全置信阈值有效发现相关模式

获取原文
获取原文并翻译 | 示例
           

摘要

Correlated patterns are an important class of regularities that exist in a database. Although there exists no universally acceptable best measure to judge the interestingness of a pattern, all-confidence is emerging as a popular measure to discover the patterns. It is because the measure satisfies both the anti-monotonic and null-invariance properties. The former property makes the pattern mining practicable in real-world applications. The latter property facilitates the user to discover the patterns involving both frequent and rare items without generating the huge number of patterns. In this paper, we show that though the measure satisfies the null-invariance property, mining the patterns containing both frequent and rare items with a single minimum all-confidence (minAllConf) threshold leads to the dilemma known as "rare item problem." At a high minAllConf, the discovered correlated patterns involving rare items have very short length. At a low minAllConf, combinatorial explosion can occur, producing too many patterns. To confront the problem, the paper introduces an alternative model based on the concept of multiple minAllConf thresholds. The proposed model generalizes the existing model of correlated patterns and facilitates the user to specify a different minAllConf for each pattern depending upon its items' frequencies. A pattern-growth algorithm, called GCoMine, has also been proposed to discover the patterns. Experiment results show that GCoMine is efficient, and the proposed model can address the problem effectively.
机译:关联模式是数据库中存在的一类重要的规则。尽管目前尚无普遍公认的最佳方法来判断模式的趣味性,但人们逐渐将自信作为一种发现模式的流行方法。这是因为该措施同时满足了反单调性和零不变性。前者的属性使模式挖掘在实际应用中切实可行。后一个特性使用户能够发现涉及频繁和稀有物品的模式,而无需生成大量模式。在本文中,我们表明,尽管测度满足零不变属性,但挖掘包含频繁项和稀有项且具有单个最小全部置信度(minAllConf)阈值的模式会导致称为“稀有项问题”的难题。在minAllConf较高时,发现的涉及稀有项目的相关模式的长度非常短。当minAllConf较低时,可能会发生组合爆炸,从而产生太多模式。为了解决这个问题,本文介绍了一个基于多个minAllConf阈值概念的替代模型。所提出的模型概括了相关模式的现有模型,并方便用户根据其模式的频率为每个模式指定不同的minAllConf。还提出了一种称为GCoMine的模式增长算法来发现模式。实验结果表明,GCoMine是有效的,该模型可以有效地解决该问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号