Facing the reality of data stream classification: coping with scarcity of labeled data

Mohammad M. Masud; Clay Woolam; Jing Gao; Latifur Khan; Jiawei Han; Kevin W. Hamlen; Nikunj C. Oza

首页> 外文期刊>Knowledge and Information Systems >Facing the reality of data stream classification: coping with scarcity of labeled data

【24h】

Facing the reality of data stream classification: coping with scarcity of labeled data

机译：面对数据流分类的现实：应对标签数据的匮乏

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent approaches for classifying data streams are mostly based on supervised learning algorithms, which can only be trained with labeled data. Manual labeling of data is both costly and time consuming. Therefore, in a real streaming environment where large volumes of data appear at a high speed, only a small fraction of the data can be labeled. Thus, only a limited number of instances will be available for training and updating the classification models, leading to poorly trained classifiers. We apply a novel technique to overcome this problem by utilizing both unlabeled and labeled instances to train and update the classification model. Each classification model is built as a collection of micro-clusters using semi-supervised clustering, and an ensemble of these models is used to classify unlabeled data. Empirical evaluation of both synthetic and real data reveals that our approach outperforms state-of-the-art stream classification algorithms that use ten times more labeled data than our approach.

机译：用于对数据流进行分类的最新方法主要基于监督学习算法，该算法只能使用标记数据进行训练。手动标记数据既昂贵又费时。因此，在高速出现大量数据的真实流环境中，只能标记一小部分数据。因此，仅有限数量的实例可用于训练和更新分类模型，从而导致训练不足的分类器。我们采用一种新颖的技术来克服这一问题，它利用未标记和标记的实例来训练和更新分类模型。使用半监督聚类将每个分类模型构建为一组微型聚类，并将这些模型的集合用于分类未标记的数据。对合成数据和真实数据进行的经验评估表明，我们的方法优于最新的流分类算法，后者使用的标记数据比我们的方法多十倍。

著录项

来源
《Knowledge and Information Systems》 |2011年第1期|p.213-244|共32页
作者
Mohammad M. Masud; Clay Woolam; Jing Gao; Latifur Khan; Jiawei Han; Kevin W. Hamlen; Nikunj C. Oza;
展开▼
作者单位

Department of Computer Science, University of Texas at Dallas, Richardson, TX, 75080, USA;

Department of Computer Science, University of Texas at Dallas, Richardson, TX, 75080, USA;

Department of Computer Science, University of Illinois at Urbana Champaign, Urbana, IL, 61801, USA;

Department of Computer Science, University of Texas at Dallas, Richardson, TX, 75080, USA;

Department of Computer Science, University of Illinois at Urbana Champaign, Urbana, IL, 61801, USA;

Department of Computer Science, University of Texas at Dallas, Richardson, TX, 75080, USA;

Intelligent Systems Division, NASA Ames Research Center, Moffett Field, CA, 94035, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Data stream classification; Semi-supervised clustering; Ensemble classification; Concept drift;

机译：数据流分类;半监督聚类;集合分类;概念漂移;

相似文献

外文文献
中文文献
专利

1. Facing the reality of data stream classification: coping with scarcity of labeled data [J] . Mohammad M. Masud, Clay Woolam, Jing Gao, Knowledge and information systems . 2012,第1期

机译：面对数据流分类的现实：应对标签数据的匮乏
2. Facing the reality of data stream classification: coping with scarcity of labeled data [J] . Mohammad M. Masud, Clay Woolam, Jing Gao, Knowledge and Information Systems . 2012,第1期

机译：面对数据流分类的现实：应对标签数据的匮乏
3. Data scarcity, robustness and extreme multi-label classification [J] . Babbar Rohit, Schoelkopf Bernhard Machine Learning . 2019,第8a9期

机译：数据不足，健壮性和极端的多标签分类
4. An overview of learning in data streams with label scarcity [C] . Radhika V. Kulkarni, Suhas H. Patil, R. Subhashini International Conference on Inventive Computation Technologies . 2016

机译：标签稀缺的数据流学习概述
5. Adaptive classification of scarcely labeled and evolving data streams. [D] . Masud, Mohammad Mehedy. 2009

机译：很少标记和不断发展的数据流的自适应分类。
6. Streaming chunk incremental learning for class-wise data stream classification with fast learning speed and low structural complexity [O] . Prem Junsawang, Suphakant Phimoltares, Chidchanok Lursinsap 2012

机译：流式块增量学习，用于以快速的学习速度和较低的结构复杂度对类数据流进行分类
7. A Survey on Multi-Label Data Stream Classification [O] . Xiulin Zheng, Peipei Li, Zhe Chu, 2020

机译：多标签数据流分类调查

Facing the reality of data stream classification: coping with scarcity of labeled data

摘要

著录项

相似文献

相关主题

期刊订阅