...
首页> 外文期刊>Neurocomputing >A survey on data preprocessing for data stream mining: Current status and future directions
【24h】

A survey on data preprocessing for data stream mining: Current status and future directions

机译:数据流挖掘的数据预处理概述:现状和未来方向

获取原文
获取原文并翻译 | 示例
           

摘要

Data preprocessing and reduction have become essential techniques in current knowledge discovery scenarios, dominated by increasingly large datasets. These methods aim at reducing the complexity inherent to real-world datasets, so that they can be easily processed by current data mining solutions. Advantages of such approaches include, among others, a faster and more precise learning process, and more understandable structure of raw data. However, in the context of data preprocessing techniques for data streams have a long road ahead of them, despite online learning is growing in importance thanks to the development of Internet and technologies for massive data collection. Throughout this survey, we summarize, categorize and analyze those contributions on data preprocessing that cope with streaming data. This work also takes into account the existing relationships between the different families of methods (feature and instance selection, and discretization). To enrich our study, we conduct thorough experiments using the most relevant contributions and present an analysis of their predictive performance, reduction rates, computational time, and memory usage. Finally, we offer general advices about existing data stream preprocessing algorithms, as well as discuss emerging future challenges to be faced in the domain of data stream preprocessing. (C) 2017 Elsevier B.V. All rights reserved.
机译:数据预处理和归约已成为当前知识发现场景中的必不可少的技术,并以越来越大的数据集为主导。这些方法旨在降低实际数据集固有的复杂性,以便可以通过当前的数据挖掘解决方案轻松地对其进行处理。这种方法的优势包括更快,更精确的学习过程以及更易理解的原始数据结构。但是,在数据流的数据预处理技术中,尽管互联网的发展由于互联网和海量数据收集技术的发展而变得越来越重要,但数据流的预处理技术仍遥遥领先。在整个调查过程中,我们总结,分类和分析了那些对处理流数据的数据预处理的贡献。这项工作还考虑了不同方法系列之间的现有关系(功能和实例选择以及离散化)。为了丰富我们的研究,我们使用最相关的贡献进行了彻底的实验,并对它们的预测性能,减少率,计算时间和内存使用情况进行了分析。最后,我们提供有关现有数据流预处理算法的一般建议,并讨论数据流预处理领域中将要面临的新的未来挑战。 (C)2017 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号