A probabilistic condensed representation of data for stream mining

机译：数据流的概率压缩表示

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data mining and machine learning algorithms usually operate directly on the data. However, if the data is not available at once or consists of billions of instances, these algorithms easily become infeasible with respect to memory and run-time concerns. As a solution to this problem, we propose a framework, called MiDEO (Mining Density Estimates inferred Online), in which algorithms are designed to operate on a condensed representation of the data. In particular, we propose to use density estimates, which are able to represent billions of instances in a compact form and can be updated when new instances arrive. As an example for an algorithm that operates on density estimates, we consider the task of mining association rules, which we consider as a form of simple statements about the data. The algorithm, called POEt (Pattern mining on Online density esTimates), is evaluated on synthetic and real-world data and is compared to state-of-the-art algorithms.

机译：数据挖掘和机器学习算法通常直接对数据进行操作。但是，如果数据无法立即获得或由数十亿个实例组成，那么就内存和运行时而言，这些算法将变得不可行。作为此问题的解决方案，我们提出了一个名为MiDEO（在线推断采矿密度估计）的框架，该框架中的算法被设计为对数据的压缩表示进行操作。特别是，我们建议使用密度估计，该估计能够以紧凑的形式表示数十亿个实例，并且可以在新实例到达时进行更新。作为基于密度估计的算法的示例，我们考虑了挖掘关联规则的任务，我们将其视为关于数据的简单语句的一种形式。该算法称为POEt（在线密度esTimates上的模式挖掘），可对合成数据和实际数据进行评估，并与最新算法进行比较。

著录项

来源
《2014 International Conference on Data Science and Advanced Analytics》|2014年|297-303|共7页
会议地点 Shanghai(CN)
作者
Geilke Michael; Karwath Andreas; Kramer Stefan;
展开▼
作者单位

Johannes Gutenberg-Univ. Mainz, Mainz, Germany;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Probabilistic frequent itemset mining over uncertain data streams [J] . Haifeng Li, Ning Zhang, Jianming Zhu, Expert Systems with Application . 2018,第DECa期

机译：不确定数据流上的概率频繁项集挖掘
2. A novel approach for mining probabilistic frequent itemsets over uncertain data streams [J] . Tianlai Li, Fangai Liu, Xinhua Wang International journal of applied decision sciences . 2018,第3期

机译：一种在不确定数据流上挖掘概率频繁项集的新方法
3. Condensed representations of changes in dynamic graphs through emerging subgraph mining [J] . Angelo Impedovo, Corrado Loglisci, Michelangelo Ceci, Engineering Applications of Artificial Intelligence . 2020,第Sepa期

机译：通过新兴子图挖掘的动态图形变化的凝聚表示
4. A probabilistic condensed representation of data for stream mining [C] . Geilke Michael, Karwath Andreas, Kramer Stefan International Conference on Data Science and Advanced Analytics . 2014

机译：流挖掘数据的概率浓缩表示
5. Mining Frequent Itemsets from Uncertain Data: Extensions to Constrained Mining and Stream Mining. [D] . Hao, Boyu. 2010

机译：从不确定的数据中挖掘频繁项集：约束挖掘和流挖掘的扩展。
6. An efficient reversible privacy-preserving data mining technology over data streams [O] . Chen-Yi Lin, Yuan-Hung Kao, Wei-Bin Lee, -1

机译：高效的可逆数据隐私保护数据挖掘技术
7. Condensed Representations for Data Mining [O] . Jean-Francois Boulicaut -1

机译：数据挖掘的浓缩表示
8. Probabilistic Stream Relational Algebra: A Data Model for Sensor Data Streams. [R] . Liu, H., Hwang, S., Ivastava, J. 2004

机译：概率流关系代数：传感器数据流的数据模型。

A probabilistic condensed representation of data for stream mining

摘要

著录项

相似文献

相关主题

期刊订阅