...
首页> 外文期刊>Knowledge and information systems >Mining top-k frequent patterns without minimum support threshold
【24h】

Mining top-k frequent patterns without minimum support threshold

机译:在没有最低支持阈值的情况下挖掘前k个频繁模式

获取原文
获取原文并翻译 | 示例
           

摘要

Finding frequent patterns play an important role in mining association rules, sequences, episodes, Web log mining and many other interesting relationships among data. Frequent pattern mining methods often produce a huge number of frequent itemsets that is not feasible for effective usage. The number of highly correlated patterns is usually very small and may even be one. Most of the existing frequent pattern mining techniques often require the setting of many input parameters and may involve multiple passes over the database. Minimum support is the widely used parameter in frequent pattern mining to discover statistically significant patterns. Specifying appropriate minimum support is a challenging task for a data analyst as the choice of minimum support value is somewhat arbitrary. Generally, it is required to repeatedly execute an algorithm, heuristically tuning the value of minimum support over a wide range, until the desired result is obtained, certainly, a very time-consuming process. Setting up an inappropriate minimum support may also cause an algorithm to fail in finding the true patterns. We present a novel method to efficiently retrieve top few maximal frequent patterns in order of significance without use of the minimum support parameter. Instead, we are only required to specify a more human understandable parameter, namely the desired number itemsets k. Our technique requires only a single pass over the database and generation of length two itemsets. The association ratio graph is proposed as a compact structure containing concise information, which is created in time quadratic to the size of the database. Algorithms are described for using this graph structure to discover top-most and top-k maximal frequent itemsets without minimum support threshold. To effectively achieve this, the method employs construction of an all path source-to-destination tree to discover all maximal cycles in the graph. The results can be ranked in decreasing order of significance. Results are presented demonstrating the performance advantages to be gained from the use of this approach.
机译:查找频繁模式在挖掘关联规则,序列,情节,Web日志挖掘以及数据之间的许多其他有趣关系中起着重要作用。频繁模式挖掘方法通常会产生大量的频繁项集,这对于有效使用是不可行的。高度相关的模式的数量通常很少,甚至可能是一个。大多数现有的频繁模式挖掘技术通常都需要设置许多输入参数,并且可能需要多次通过数据库。最小支持是频繁模式挖掘中发现统计学上重要模式的广泛使用的参数。对于数据分析师来说,指定适当的最低支持是一项艰巨的任务,因为最低支持值的选择有些随意。通常,需要重复执行算法,在很宽的范围内试探性地调整最小支持的值,直到获得期望的结果,这无疑是非常耗时的过程。设置不合适的最小支持也可能导致算法无法找到真实的模式。我们提出了一种新颖的方法,可以有效地按重要性顺序检索前几个最大频繁模式,而无需使用最小支持参数。相反,我们只需要指定一个更易于理解的参数,即所需的项目集数k。我们的技术只需要单次通过数据库并生成长度为两个的项目集。关联比率图被建议为一个紧凑的结构,其中包含简洁的信息,其创建时间是数据库大小的平方。描述了使用此图结构来发现最高和前k个最大频繁项目集而没有最小支持阈值的算法。为了有效地实现此目的,该方法采用了全路径源到目的地树的构造来发现图中的所有最大周期。可以按重要性降序对结果进行排名。给出的结果证明了使用此方法可获得的性能优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号