首页> 外文学位 >Data stream classification techniques for multiple novel classes and dynamic feature spaces.
【24h】

Data stream classification techniques for multiple novel classes and dynamic feature spaces.

机译:用于多个新颖类和动态特征空间的数据流分类技术。

获取原文
获取原文并翻译 | 示例

摘要

Multi-step methodologies and techniques, and multi-scan algorithms, suitable for knowledge discovery and data mining, cannot be readily applied to data streams. Data stream classification is more challenging because of many practical aspects associated with efficient processing and temporal behavior of the stream. Two such well studied aspects are infinite length and concept drift. Since a data stream may be considered a continuous process, which is theoretically infinite in length, it is impractical to store and use all the historical data for training. Data streams also frequently experience concept-drift as a result of changes in the underlying concepts. However, two other important characteristics of data streams, namely, concept evolution and feature evolution are rarely addressed in the literature. Concept evolution occurs in the stream when novel classes arrive, and feature evolution occurs when new features emerge in the stream. This dissertation addresses concept evolution and feature evolution in addition to the existing challenges of infinite length and concept drift. Although there are a few data stream classification techniques that address concept evolution, none of them considers feature evolution. In this dissertation, the concept evolution and feature evolution phenomenon are studied, and the insights are used to construct superior novel class detection techniques. First, the dynamic nature of the feature space is considered, and an effective solution is provided for classification and novel class detection when the feature space is dynamic. Second, an adaptive threshold is proposed for outlier detection, which is a vital part of novel class detection. Third, a probabilistic approach is proposed for novel class detection using discrete Gini Coefficient, and its effectiveness is proved both theoretically and empirically. Finally, the issue of simultaneous multiple novel class occurrence is addressed, and an elegant solution is provided to detect more than one novel classes at the same time. Comparison with the state-of-the-art data stream classification techniques on several real and synthetic data streams establishes the effectiveness of the proposed approach.
机译:适用于知识发现和数据挖掘的多步骤方法和技术以及多扫描算法不能轻易地应用于数据流。由于与流的有效处理和时间行为相关的许多实际情况,数据流分类更具挑战性。深入研究的两个方面是无限长度和概念漂移。由于数据流可以被认为是一个连续的过程,从理论上讲它是无限长的,因此存储和使用所有历史数据进行训练是不切实际的。由于基础概念的变化,数据流还经常经历概念漂移。但是,文献中很少涉及数据流的另外两个重要特征,即概念演变和特征演变。当新颖的类到达时,概念演变就在流中发生,而当新的特征出现时就发生特征演化。除了无限的长度和概念漂移的现有挑战之外,本文还讨论了概念的演化和特征的演化。尽管有一些解决概念演变的数据流分类技术,但都没有考虑特征演变。本文研究了概念演化和特征演化现象,并以此为基础构建了新颖的类检测技术。首先,考虑了特征空间的动态性质,为特征空间是动态的分类和小说类检测提供了有效的解决方案。其次,提出了一种用于离群值检测的自适应阈值,这是新颖的类别检测的重要组成部分。第三,提出了一种使用离散基尼系数的新颖类检测概率方法,并从理论和经验上证明了其有效性。最后,解决了同时出现多个小说类的问题,并提供了一种优雅的解决方案来同时检测多个小说类。与几种实际和合成数据流上的最新数据流分类技术进行比较,证明了该方法的有效性。

著录项

  • 作者

    Chen, QIng.;

  • 作者单位

    The University of Texas at Dallas.;

  • 授予单位 The University of Texas at Dallas.;
  • 学科 Engineering Computer.;Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 123 p.
  • 总页数 123
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 康复医学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号