基于分布式倒排索引的频繁项集挖掘

李雪迪; 郑彦

首页> 中文期刊> 《计算机技术与发展》 >基于分布式倒排索引的频繁项集挖掘

基于分布式倒排索引的频繁项集挖掘

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

频繁项集挖掘是关联规则挖掘中的核心，其直接影响了频繁项集的产生效率。针对Eclat算法在挖掘海量数据中的频繁项集时存在的内存和计算资源不足等问题，文中设计了通过分布式倒排索引实现频繁项集挖掘的DiiEclat算法。倒排索引等同于将数据垂直分布，按事务编号的不同将倒排索引分布式地存储在不同的索引节点上，每个节点上的事务分别做交集，最后由检索代理合并交集结果。在 chess、mushroom、T40I10D100K 和 T10I4D100K 数据集上，对 DiiEclat、Eclat、Diffset等算法进行了实验对比。结果表明，给出的DiiEclat算法通过事务集合垂直划分和并行计算，解决了数据挖掘过程中求交集运算效率低下和内存不足等问题，算法高效、可扩展。%Mining frequent itemsets is the core of mining association rules, which directly affects the efficiency of generating frequent itemsets. Eclat algorithm exists issues of insufficient memory and computing resource when mining frequent itemset of massive data. The DiiEclat algorithm is proposed for mining frequent itemsets through distributed inverted index. Inverted index is equal to the vertical distri-bution of the data,and according to the number of different transactions inverted index will be distributed on different index nodes,each node calculates the intersection of transactions on itself,the results of the intersection merged by the retrieval agent. The execution time of DiiEclat,Eclat,Diffset and Eclat opt is compared in four datasets such as chess,mushroom,T40I10D100K and T10I4D100K. The experi-mental results show that DiiEclat is given to improve efficiency of intersection operation through the vertical division of the transaction sets and parallel computing,and it is efficient and scalable.

著录项

来源
《计算机技术与发展》 |2016年第3期|101-104|共4页
作者
李雪迪; 郑彦;
展开▼
作者单位

南京邮电大学计算机学院;

江苏南京 210003;

南京邮电大学计算机学院;

江苏南京 210003;

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计、软件工程;
关键词
Eclat算法; 频繁项集; 倒排索引; 并行计算;

相似文献

中文文献
外文文献
专利

1. 一种基于倒排索引的频繁项集挖掘方法 [J] . 贾丽波 ,姜晓明 ,叶青 . 长春理工大学学报（自然科学版） . 2019,第002期
2. 基于分布式倒排索引和VSM算法的P2P复杂搜索 [J] . 李想 ,吴国新 ,郭晶 . 计算机技术与发展 . 2009,第004期
3. 基于频繁项集挖掘的发布/订阅分布式系统运行模式识别 [J] . 吴雯君 ,沈卓炜 ,曹玖新 . 网络空间安全 . 2020,第008期
4. 基于频繁项集挖掘的发布/订阅分布式系统运行模式识别 [J] . 吴雯君 ,沈卓炜 ,曹玖新 . 信息安全与技术 . 2020,第008期
5. 基于项编码的分布式频繁项集挖掘算法 [J] . 郑静益 ,邓晓衡 . 计算机应用研究 . 2019,第004期
6. 一种基于倒排索引树的增量更新关联挖掘算法 [C] . Xu Chun ,徐春 ,Li Guangyuan . 2015全国高性能计算学术年会 . 2015
7. 基于Spark的分布式频繁项集挖掘算法研究 [A] . 陈少总 . 2017

基于分布式倒排索引的频繁项集挖掘

摘要

著录项

相似文献

相关主题

期刊订阅