Model-based probabilistic frequent itemset mining

Thomas Bernecker; Reynold Cheng; David W. Cheung; Hans-Peter Kriegel; Sau Dan Lee; Matthias Renz; Florian Verhein; Liang Wang; Andreas Zuefle

首页> 外文期刊>Knowledge and information systems >Model-based probabilistic frequent itemset mining

【24h】

Model-based probabilistic frequent itemset mining

机译：基于模型的概率频繁项集挖掘

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel methods to capture the itemset mining process as a probability distribution function taking two models into account: the Poisson distribution and the normal distribution. These model-based approaches extract frequent itemsets with a high degree of accuracy and support large databases. We apply our techniques to improve the performance of the algorithms for (1) finding itemsets whose frequentness probabilities are larger than some threshold and (2) mining itemsets with the k highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate and four orders of magnitudes faster than previous approaches. In further theoretical and experimental studies, we give an intuition which model-based approach fits best to different types of data sets.

机译：数据不确定性是新兴应用程序固有的，例如基于位置的服务，传感器监视系统和数据集成。为了处理大量不精确的信息，最近已经开发了不确定的数据库。在本文中，我们研究如何有效地从大型不确定数据库中发现频繁的项目集，这些数据库在“可能的世界语义学”下得到了解释。这在技术上具有挑战性，因为不确定的数据库会引发成倍数量的可能世界。为了解决这个问题，我们提出了一种新颖的方法来捕获项集挖掘过程作为概率分布函数，同时考虑了两个模型：泊松分布和正态分布。这些基于模型的方法可以高度准确地提取频繁项集并支持大型数据库。我们应用我们的技术来提高算法的性能，以：（1）查找频繁概率大于某个阈值的项目集，以及（2）挖掘具有k个最高概率的项目集。我们的方法支持元组和属性不确定性模型，它们通常用于表示不确定性数据库。对真实和合成数据集的广泛评估表明，我们的方法非常准确，比以前的方法快四个数量级。在进一步的理论和实验研究中，我们给出了一种直觉，即基于模型的方法最适合不同类型的数据集。

著录项

来源
《Knowledge and information systems》 |2013年第1期|共37页
作者
Thomas Bernecker; Reynold Cheng; David W. Cheung; Hans-Peter Kriegel; Sau Dan Lee; Matthias Renz; Florian Verhein; Liang Wang; Andreas Zuefle;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统理论;
关键词

相似文献

外文文献
中文文献
专利

1. Model-based probabilistic frequent itemset mining [J] . Thomas Bernecker, Reynold Cheng, David W. Cheung, Knowledge and information systems . 2013,第1期

机译：基于模型的概率频繁项集挖掘
2. Probabilistic maximal frequent itemset mining methods over uncertain databases [J] . Li Haifeng, Hai Mo, Zhang Ning, Intelligent data analysis . 2019,第6期

机译：概率最大频繁的项目集挖掘方法在不确定数据库中
3. Probabilistic frequent itemset mining over uncertain data streams [J] . Haifeng Li, Ning Zhang, Jianming Zhu, Expert Systems with Application . 2018,第DECa期

机译：不确定数据流上的概率频繁项集挖掘
4. Accelerating Probabilistic Frequent Itemset Mining: A Model-Based Approach [C] . Liang Wang, Reynold Cheng, Sau Dan Lee, CIKM 10;ACM conference on information and knowledge management . 2011

机译：加速概率频繁项集挖掘：基于模型的方法
5. Mining Frequent Itemsets from Uncertain Data: Extensions to Constrained Mining and Stream Mining. [D] . Hao, Boyu. 2010

机译：从不确定的数据中挖掘频繁项集：约束挖掘和流挖掘的扩展。
6. Unravelling associations between unassigned mass spectrometry peaks with frequent itemset mining techniques [O] . Trung Nghia Vu, Aida Mrzic, Dirk Valkenborg, 2014

机译：利用频繁项集挖掘技术揭示未分配质谱峰之间的关联
7. Model-based probabilistic frequent itemset mining [O] . 2013

机译：基于模型的概率频繁项集挖掘

Model-based probabilistic frequent itemset mining

摘要

著录项

相似文献

相关主题

期刊订阅