首页> 外文会议>IEEE International Congress on Big Data >Dimensional scalability of supervised and unsupervised concept drift detection: An empirical study
【24h】

Dimensional scalability of supervised and unsupervised concept drift detection: An empirical study

机译:有监督和无监督概念漂移检测的维度可伸缩性:一项实证研究

获取原文

摘要

Big Data presents challenges for predictive analytic algorithms due to the possibility of non-stationary populations. Concept drift detection algorithms can be used to detect changes in underlying distribution in order to retrain. Most concept drift detection methods are known to scale to a relatively low number of features (a few hundred). However, in many areas, datasets with thousands or even tens of thousands of features are becoming common. This paper studies the behavior of supervised concept drift detection algorithms (Drift Detection Method (DDM), Early Drift Detection Method (EDDM)) and unsupervised algorithm (Friedman and Rafsky's algorithm) on high-dimensional datatsets. Our goal was to find if these algorithms can scale, first by studying the growth of execution time with the dimension of the dataset, and second by studying their comparative accuracy on high-dimensional datasets. The algorithms were run on datasets consisting of up to 100,000 features. Results show a linear growth of the execution time with respect to the dimension in each algorithm. The performance of unsupervised algorithm degraded significantly on datasets close to 100,000 dimensions. Our results also show that the drift detection accuracy of the three algorithms did not degrade as the number of features increase.
机译:由于非平稳总体的可能性,大数据为预测分析算法提出了挑战。概念漂移检测算法可用于检测基础分布的变化,以便进行重新训练。已知大多数概念漂移检测方法都可以缩放到相对较少的特征(几百个)。但是,在许多地区,具有成千上万甚至数万个特征的数据集变得越来越普遍。本文研究了高维数据集上有监督概念的漂移检测算法(漂移检测方法(DDM),早期漂移检测方法(EDDM))和无监督算法(Friedman和Rafsky算法)的行为。我们的目标是,首先通过研究执行时间随数据集维度的增长,然后通过研究其在高维数据集上的比较准确性,来发现这些算法是否可以扩展。该算法在包含多达100,000个特征的数据集上运行。结果显示了每种算法中执行时间相对于维度的线性增长。在接近100,000个维度的数据集上,无监督算法的性能显着下降。我们的结果还表明,随着特征数量的增加,这三种算法的漂移检测精度不会降低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号