首页> 外文学位 >Meaningful Rule Discovery and Adaptive Classification of Multi-Dimensional Time Series Data.
【24h】

Meaningful Rule Discovery and Adaptive Classification of Multi-Dimensional Time Series Data.

机译:有意义的规则发现和多维时间序列数据的自适应分类。

获取原文
获取原文并翻译 | 示例

摘要

The ability to make predictions about future events is at the heart of much of science; so, it is not surprising that prediction has been a topic of great interest in the data mining community for the last decade. Most of the previous work has attempted to predict the future based on the current value of a stream. However, for many problems the actual values are irrelevant, whereas the shape of the current time series pattern may foretell the future. The handful of research efforts that consider this variant of the problem have met with limited success. In particular, it is now understood that most of these efforts allow the discovery of spurious rules. We believe the reason why rule discovery in real-valued time series has failed thus far is because most efforts have more or less indiscriminately applied the ideas of symbolic stream rule discovery to real-valued rule discovery. We feel that the lack of progress in this pursuit can be attributed to two related factors: the lack of effective algorithms for rule discovery in one dimensional time series, resulting in poor-quality and random rules; less accurate classifiers built for multi-dimensional time series in order to make accurate predictions.;In recent years Dynamic Time Warping (DTW) has emerged as the distance measure of choice for virtually all time series data mining applications. For example, virtually all applications that process data from wearable devices use DTW as a core sub-routine. This is the result of significant progress in improving DTW's efficiency, together with multiple empirical studies showing that DTW-based classifiers at least equal (and generally surpass) the accuracy of all their rivals across dozens of datasets. Thus far, most of the research has considered only the one-dimensional case, with practitioners generalizing to the multi-dimensional case in one of two ways, dependent or independent warping. In general, it appears the community believes either that the two ways are equivalent, or that the choice is irrelevant.;In this dissertation, we strive to solve these problems. The contribution of this dissertation is as follows:;First, we show why the idea of applying symbolic stream rule discovery to real-valued rule discovery is not directly suitable for rule discovery in time series. Beyond our novel definitions/representations, which allow for meaningful and extendable specifications of rules, we further show novel algorithms that allow us to quickly discover high quality rules in very large datasets that accurately predict the occurrence of future events.;Finally, we show that the two most commonly used multi-dimensional DTW methods can produce different classifications, and neither one dominates over the other. This seems to suggest that one should learn the best method for a particular application. However, we will show that this is not necessary; a simple, principled rule can be used on a case-by-case basis to predict which of the two methods we should trust at the time of classification. Our method allows us to ensure that classification results are at least as accurate as the better of the two rival methods, and, in many cases, our method is significantly more accurate. We demonstrate our ideas with the most extensive set of multi-dimensional time series classification experiments ever attempted.
机译:对未来事件进行预测的能力是许多科学的核心。因此,在过去的十年中,预测一直是数据挖掘界非常感兴趣的话题,这不足为奇。先前的大多数工作都试图根据流的当前值来预测未来。但是,对于许多问题,实际值无关紧要,而当前时间序列模式的形状可能预示着未来。考虑到问题的这种变体的少数研究工作取得了有限的成功。特别是,现在可以理解,大多数这些努力都允许发现虚假规则。我们认为,到目前为止,在实值时间序列中的规则发现失败的原因是,由于大多数努力或多或少地将符号流规则发现的思想应用到了实值规则发现中。我们认为,这种追求的缺乏进展可以归因于两个相关因素:一维时间序列中规则发现的有效算法缺乏,导致质量较差和随机规则;为了进行准确的预测,为多维时间序列构建的精度较低的分类器。近年来,动态时间规整(DTW)已经成为几乎所有时间序列数据挖掘应用程序选择的距离度量。例如,几乎所有处理来自可穿戴设备的数据的应用程序都将DTW用作核心子例程。这是在提高DTW效率方面取得的重大进步的结果,并且多项经验研究表明,基于DTW的分类器在数十个数据集中至少等于(并且通常超过)所有竞争对手的准确性。迄今为止,大多数研究仅考虑一维情况,而从业人员则以依赖或独立翘曲这两种方式之一将其推广到多维情况。通常,社区似乎认为这两种方法是等效的,或者选择是不相关的。在本文中,我们努力解决这些问题。本文的主要工作如下:首先,我们说明了为什么将符号流规则发现应用于实值规则发现的思想并不直接适用于时间序列中的规则发现。除了我们的新颖定义/表示法(允许使用有意义且可扩展的规则说明)之外,我们还进一步展示了新颖的算法,这些算法使我们能够在非常大的数据集中快速发现高质量的规则,从而准确地预测未来事件的发生。两种最常用的多维DTW方法可以产生不同的分类,并且任何一个都不占主导。这似乎表明,应该为特定应用学习最佳方法。但是,我们将证明这是没有必要的。一个简单的,有原则的规则可以根据具体情况使用,以预测在分类时应该信任的两种方法中的哪一种。我们的方法使我们能够确保分类结果至少与两种竞争方法中的更好方法一样准确,并且在许多情况下,我们的方法明显更准确。我们用尝试过的最广泛的多维时间序列分类实验集展示了我们的想法。

著录项

  • 作者

    Shokoohi-Yekta, Mohammad.;

  • 作者单位

    University of California, Riverside.;

  • 授予单位 University of California, Riverside.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 114 p.
  • 总页数 114
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号