首页> 外文会议>2014 International Conference on Data Science and Advanced Analytics >A probabilistic condensed representation of data for stream mining
【24h】

A probabilistic condensed representation of data for stream mining

机译:数据流的概率压缩表示

获取原文
获取原文并翻译 | 示例

摘要

Data mining and machine learning algorithms usually operate directly on the data. However, if the data is not available at once or consists of billions of instances, these algorithms easily become infeasible with respect to memory and run-time concerns. As a solution to this problem, we propose a framework, called MiDEO (Mining Density Estimates inferred Online), in which algorithms are designed to operate on a condensed representation of the data. In particular, we propose to use density estimates, which are able to represent billions of instances in a compact form and can be updated when new instances arrive. As an example for an algorithm that operates on density estimates, we consider the task of mining association rules, which we consider as a form of simple statements about the data. The algorithm, called POEt (Pattern mining on Online density esTimates), is evaluated on synthetic and real-world data and is compared to state-of-the-art algorithms.
机译:数据挖掘和机器学习算法通常直接对数据进行操作。但是,如果数据无法立即获得或由数十亿个实例组成,那么就内存和运行时而言,这些算法将变得不可行。作为此问题的解决方案,我们提出了一个名为MiDEO(在线推断采矿密度估计)的框架,该框架中的算法被设计为对数据的压缩表示进行操作。特别是,我们建议使用密度估计,该估计能够以紧凑的形式表示数十亿个实例,并且可以在新实例到达时进行更新。作为基于密度估计的算法的示例,我们考虑了挖掘关联规则的任务,我们将其视为关于数据的简单语句的一种形式。该算法称为POEt(在线密度esTimates上的模式挖掘),可对合成数据和实际数据进行评估,并与最新算法进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号