首页> 外文会议>Proof of Designed Reliability >Index support for frequent itemset mining in a relational DBMS
【24h】

Index support for frequent itemset mining in a relational DBMS

机译:关系DBMS中频繁项集挖掘的索引支持

获取原文
获取原文并翻译 | 示例

摘要

Many efforts have been devoted to couple data mining activities with relational DBMSs, but a true integration into the relational DBMS kernel has been rarely achieved. This paper presents a novel indexing technique, which represents transactions in a succinct form, appropriate for tightly integrating frequent itemset mining in a relational DBMS. The data representation is complete, i.e., no support threshold is enforced, in order to allow reusing the index for mining itemsets with any support threshold. Furthermore, an appropriate structure of the stored information has been devised, in order to allow a selective access of the index blocks necessary for the current extraction phase. The index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Experiments have been run for various datasets, characterized by different data distributions. The execution time of the frequent itemset extraction task exploiting the index is always comparable with and sometime faster than a C++ implementation of the FP-growth algorithm accessing data stored on a flat file.
机译:已经进行了许多努力来将数据挖掘活动与关系DBMS耦合在一起,但是很少真正实现到关系DBMS内核的真正集成。本文提出了一种新颖的索引技术,该技术以简洁的形式表示事务,适用于将频繁项集挖掘紧密集成在关系DBMS中。数据表示是完整的,即不强制执行支持阈值,以便允许将索引重新用于挖掘具有任何支持阈值的项目集。此外,已经设计了存储信息的适当结构,以便允许选择性地访问当前提取阶段所需的索引块。该索引已实现到PostgreSQL开源DBMS中,并利用了其物理级别的访问方法。已经针对以不同数据分布为特征的各种数据集进行了实验。利用索引的频繁项集提取任务的执行时间始终与访问存储在平面文件中的数据的FP-growth算法的C ++实现可比,并且有时比其快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号