Efficient algorithms for mining colossal patterns in high dimensional databases

Thanh-Long Nguyen; Bay Vo; Snasel Vaclav

首页> 外文期刊>Knowledge-Based Systems >Efficient algorithms for mining colossal patterns in high dimensional databases

【24h】

Efficient algorithms for mining colossal patterns in high dimensional databases

机译：在高维数据库中挖掘巨大模式的高效算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Mining association rules plays an important role in decision support systems. To mine strong association rules, it is necessary to mine frequent patterns. There are many algorithms that have been developed to efficiently mine frequent patterns, such as Apriori, Eclat, FP-Growth, PrePost, and FIN. However, these are only efficient with a small number of items in the database. When a database has a large number of items (from thousands to hundreds of thousands) but the number of transactions is small, these algorithms cannot run when the minimum support threshold is also small (because the search space is huge). This thus causes the problem of mining colossal patterns in high dimensional databases. In 2012, Sohrabi and Barforoush proposed the BVBUC algorithm for training colossal patterns based on a bottom up scheme. However, this needs more time to check subsets and supersets, because it generates a lot of candidates and consumes more memory to store these. In this paper we propose new, efficient algorithms for mining colossal patterns. Firstly, the CP (Colossal Pattern)-tree is designed. Next, we develop two theorems to rapidly compute patterns of nodes and prune nodes without the loss of information in colossal patterns. Based on the CP-tree and these theorems, an algorithm (named CP-Miner) is proposed to solve the problem of mining colossal patterns. A Sorting strategy for efficiently mining colossal patterns is thus developed. This strategy helps to reduce the number of significant candidates and the time needed to check subsets and supersets. The PCP-Miner algorithm, which Uses this strategy, is then proposed, and we also conduct experiments to show the efficiency of these algorithms. (C) 2017 Elsevier B.V. All rights reserved.

机译：关联规则挖掘在决策支持系统中起着重要作用。要挖掘强大的关联规则，有必要挖掘频繁的模式。已经开发了许多算法来有效地挖掘频繁模式，例如Apriori，Eclat，FP-Growth，PrePost和FIN。但是，这些仅在数据库中包含少量项目时才有效。当数据库中有大量项目（从数千到数十万）但事务数量很少时，当最小支持阈值也很小（因为搜索空间很大）时，这些算法将无法运行。因此，这导致在高维数据库中挖掘巨大模式的问题。 2012年，Sohrabi和Barforoush提出了一种BVBUC算法，用于基于自下而上的方案来训练巨大的模式。但是，这需要更多时间来检查子集和超集，因为它会生成大量候选对象并消耗更多内存来存储它们。在本文中，我们提出了用于挖掘巨大模式的新的高效算法。首先，设计了CP（Colossal Pattern）树。接下来，我们建立两个定理以快速计算节点和修剪节点的模式，而不会丢失巨大模式中的信息。基于CP树和这些定理，提出了一种算法（CP-Miner）来解决巨大模式的挖掘问题。因此，开发了用于有效挖掘巨大模式的分类策略。该策略有助于减少重要候选者的数量，并减少检查子集和超集所需的时间。然后提出了使用该策略的PCP-Miner算法，并且我们还进行了实验以证明这些算法的效率。（C）2017 Elsevier B.V.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2017年第15期|75-89|共15页
作者
Thanh-Long Nguyen; Bay Vo; Snasel Vaclav;
展开▼
作者单位

Ton Duc Thang Univ, Div Data Sci, Ho Chi Minh City, Vietnam|Ton Duc Thong Univ, Fac Informat Technol, Ho Chi Minh City, Vietnam;

Ho Chi Minh City Univ Technol, Fac Informat Technol, Ho Chi Minh City, Vietnam|Sejong Univ, Coll Elect & Informat Engn, Seoul, South Korea;

VSB Tech Univ Ostrava, Fac Elect Engn & Comp Sci, Dept Comp Sci, 17 Listopadu 15-2172, Ostrava 70833, Czech Republic;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bottom up; Colossal patterns; Data mining; High dimensional databases;

机译：自底向上;巨大的模式;数据挖掘;高维数据库;

相似文献

外文文献
中文文献
专利

1. An efficient dynamic switching algorithm for mining colossal closed itemsets from high dimensional datasets [J] . Vanahalli Manjunath K., Patil Nagamma Data & Knowledge Engineering . 2019,第Sepa期

机译：从高维数据集中挖掘巨大封闭项目集的有效动态切换算法
2. An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets [J] . Vanahalli Manjunath K., Patil Nagamma Information Sciences: An International Journal . 2019,第期

机译：一种有效的并行行枚举算法，用于从高维数据集频繁频繁的巨大闭合项集
3. Efficient colossal pattern mining in high dimensional datasets [J] . Mohammad Karim Sohrabi, Ahmad Abdollahzadeh Barforoush Knowledge-Based Systems . 2012,第期

机译：高维数据集中的高效巨大模式挖掘
4. Constraint-Based Method for Mining Colossal Patterns in High Dimensional Databases [C] . Thanh-Long Nguyen, Bay Vo, Bao Huynh, International Conference on Information Systems Architecture and Technology . 2018

机译：基于约束的高维数据库中庞大模式的方法
5. Data mining analysis of digital library database usage patterns as a tool facilitating efficient user navigation. [D] . Gibson, Ian Eric. 2001

机译：数字图书馆数据库使用模式的数据挖掘分析是一种有助于高效用户导航的工具。
6. TSARM-UDP: An Efficient Time Series Association Rules Mining Algorithm Based on Up-to-Date Patterns [O] . Qiang Zhao, Qing Li, Deshui Yu, 2021

机译：TSARM-UDP：基于最新模式的有效时间序列关联规则挖掘算法
7. An Efficient Algorithm for Mining Maximal Frequent Sequential Patterns in Large Databases [O] . Qiu-bin SU, Lu LU, Bin CHENG 2018

机译：大型数据库中挖掘最大频繁顺序模式的高效算法
8. Efficient bit string implementation of a database cross-field association system (with an application to protein sequence patterns) [R] . Guigo, R, Vazquez, I, Smith, T F 1992

机译：数据库跨域关联系统的高效位串实现（应用于蛋白质序列模式）

Efficient algorithms for mining colossal patterns in high dimensional databases

摘要

著录项

相似文献

相关主题

期刊订阅