首页> 外文期刊>Statistics and computing >Multiple changepoint detection in categorical data streams
【24h】

Multiple changepoint detection in categorical data streams

机译:分类数据流中的多个变换点检测

获取原文
获取原文并翻译 | 示例
           

摘要

The need for efficient tools is pressing in the era of big data, particularly in streaming data applications. As data streams are ubiquitous, the ability to accurately detect multiple changepoints, without affecting the continuous flow of data, is an important issue. Change detection for categorical data streams is understudied, and existing work commonly introduces fixed control parameters while providing little insight into how they may be chosen. This is ill-suited to the streaming paradigm, motivating the need for an approach that introduces few parameters which may be set without requiring any prior knowledge of the stream. This paper introduces such a method, which can accurately detect changepoints in categorical data streams with fixed storage and computational requirements. The detector relies on the ability to adaptively monitor the category probabilities of a multinomial distribution, where temporal adaptivity is introduced using forgetting factors. A novel adaptive threshold is also developed which can be computed given a desired false positive rate. This method is then compared to sequential and nonsequential change detectors in a large simulation study which verifies the usefulness of our approach. A real data set consisting of nearly 40 million events from a computer network is also investigated.
机译:有效工具的需求正在进行大数据的时代,特别是在流数据应用中。随着数据流普遍存在,能够准确地检测多个变换点,而不影响连续数据流量,是一个重要问题。考虑分类数据流的变更检测被解读,并且现有的工作通常引入固定控制参数,同时提供对如何选择它们的内容。这对流媒体范式不适起来,激励了一种方法,该方法引入少数参数,这可以在不需要流的任何先前知识的情况下设置。本文介绍了这种方法,可以准确地检测具有固定存储和计算要求的分类数据流中的变换点。检测器依赖于自适应监控多项分布的类别概率的能力,其中使用遗忘因子引入了时间适应性。还开发了一种新的自适应阈值,其可以被赋予所需的假阳性率来计算。然后将该方法与大型仿真研究中的顺序和非顺序变化探测器进行比较,这验证了我们方法的有用性。还研究了由计算机网络中的近4000万事件组成的真实数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号