...
首页> 外文期刊>Knowledge and Information Systems >Facing the reality of data stream classification: coping with scarcity of labeled data
【24h】

Facing the reality of data stream classification: coping with scarcity of labeled data

机译:面对数据流分类的现实:应对标签数据的匮乏

获取原文
获取原文并翻译 | 示例
           

摘要

Recent approaches for classifying data streams are mostly based on supervised learning algorithms, which can only be trained with labeled data. Manual labeling of data is both costly and time consuming. Therefore, in a real streaming environment where large volumes of data appear at a high speed, only a small fraction of the data can be labeled. Thus, only a limited number of instances will be available for training and updating the classification models, leading to poorly trained classifiers. We apply a novel technique to overcome this problem by utilizing both unlabeled and labeled instances to train and update the classification model. Each classification model is built as a collection of micro-clusters using semi-supervised clustering, and an ensemble of these models is used to classify unlabeled data. Empirical evaluation of both synthetic and real data reveals that our approach outperforms state-of-the-art stream classification algorithms that use ten times more labeled data than our approach.
机译:用于对数据流进行分类的最新方法主要基于监督学习算法,该算法只能使用标记数据进行训练。手动标记数据既昂贵又费时。因此,在高速出现大量数据的真实流环境中,只能标记一小部分数据。因此,仅有限数量的实例可用于训练和更新分类模型,从而导致训练不足的分类器。我们采用一种新颖的技术来克服这一问题,它利用未标记和标记的实例来训练和更新分类模型。使用半监督聚类将每个分类模型构建为一组微型聚类,并将这些模型的集合用于分类未标记的数据。对合成数据和真实数据进行的经验评估表明,我们的方法优于最新的流分类算法,后者使用的标记数据比我们的方法多十倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号