...
首页> 外文期刊>Knowledge-Based Systems >Efficient algorithms for mining colossal patterns in high dimensional databases
【24h】

Efficient algorithms for mining colossal patterns in high dimensional databases

机译:在高维数据库中挖掘巨大模式的高效算法

获取原文
获取原文并翻译 | 示例
           

摘要

Mining association rules plays an important role in decision support systems. To mine strong association rules, it is necessary to mine frequent patterns. There are many algorithms that have been developed to efficiently mine frequent patterns, such as Apriori, Eclat, FP-Growth, PrePost, and FIN. However, these are only efficient with a small number of items in the database. When a database has a large number of items (from thousands to hundreds of thousands) but the number of transactions is small, these algorithms cannot run when the minimum support threshold is also small (because the search space is huge). This thus causes the problem of mining colossal patterns in high dimensional databases. In 2012, Sohrabi and Barforoush proposed the BVBUC algorithm for training colossal patterns based on a bottom up scheme. However, this needs more time to check subsets and supersets, because it generates a lot of candidates and consumes more memory to store these. In this paper we propose new, efficient algorithms for mining colossal patterns. Firstly, the CP (Colossal Pattern)-tree is designed. Next, we develop two theorems to rapidly compute patterns of nodes and prune nodes without the loss of information in colossal patterns. Based on the CP-tree and these theorems, an algorithm (named CP-Miner) is proposed to solve the problem of mining colossal patterns. A Sorting strategy for efficiently mining colossal patterns is thus developed. This strategy helps to reduce the number of significant candidates and the time needed to check subsets and supersets. The PCP-Miner algorithm, which Uses this strategy, is then proposed, and we also conduct experiments to show the efficiency of these algorithms. (C) 2017 Elsevier B.V. All rights reserved.
机译:关联规则挖掘在决策支持系统中起着重要作用。要挖掘强大的关联规则,有必要挖掘频繁的模式。已经开发了许多算法来有效地挖掘频繁模式,例如Apriori,Eclat,FP-Growth,PrePost和FIN。但是,这些仅在数据库中包含少量项目时才有效。当数据库中有大量项目(从数千到数十万)但事务数量很少时,当最小支持阈值也很小(因为搜索空间很大)时,这些算法将无法运行。因此,这导致在高维数据库中挖掘巨大模式的问题。 2012年,Sohrabi和Barforoush提出了一种BVBUC算法,用于基于自下而上的方案来训练巨大的模式。但是,这需要更多时间来检查子集和超集,因为它会生成大量候选对象并消耗更多内存来存储它们。在本文中,我们提出了用于挖掘巨大模式的新的高效算法。首先,设计了CP(Colossal Pattern)树。接下来,我们建立两个定理以快速计算节点和修剪节点的模式,而不会丢失巨大模式中的信息。基于CP树和这些定理,提出了一种算法(CP-Miner)来解决巨大模式的挖掘问题。因此,开发了用于有效挖掘巨大模式的分类策略。该策略有助于减少重要候选者的数量,并减少检查子集和超集所需的时间。然后提出了使用该策略的PCP-Miner算法,并且我们还进行了实验以证明这些算法的效率。 (C)2017 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Knowledge-Based Systems》 |2017年第15期|75-89|共15页
  • 作者单位

    Ton Duc Thang Univ, Div Data Sci, Ho Chi Minh City, Vietnam|Ton Duc Thong Univ, Fac Informat Technol, Ho Chi Minh City, Vietnam;

    Ho Chi Minh City Univ Technol, Fac Informat Technol, Ho Chi Minh City, Vietnam|Sejong Univ, Coll Elect & Informat Engn, Seoul, South Korea;

    VSB Tech Univ Ostrava, Fac Elect Engn & Comp Sci, Dept Comp Sci, 17 Listopadu 15-2172, Ostrava 70833, Czech Republic;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Bottom up; Colossal patterns; Data mining; High dimensional databases;

    机译:自底向上;巨大的模式;数据挖掘;高维数据库;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号