Employing unlabeled data to improve the classification performance of SVM, and its application in audio event classification

Leng Yan; Sun Chengli; Xu Xinyan; Yuan Qi; Xing Shuning; Wan Honglin; Wang Jingjing; Li Dengwang

首页> 外文期刊>Knowledge-Based Systems >Employing unlabeled data to improve the classification performance of SVM, and its application in audio event classification

【24h】

Employing unlabeled data to improve the classification performance of SVM, and its application in audio event classification

机译：利用未标记数据提高支持向量机的分类性能及其在音频事件分类中的应用

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In many classification cases, the labeled samples are difficult to acquire. However, the unlabeled samples are easy to obtain. Active learning (AL) technology can be used to resolve the labeling problem. Among numerous kinds of AL algorithms, the one that focuses on labeling the unlabeled samples within the margin band of SVM is an effective way to decrease manual labeling workload. AL needs human involvement, but the time and energy which human can provide is often limited. Therefore, there is a big restriction for sample labeling based on the AL technology. To this end, the motivation of this work is to do studies on the processing after the AL process. For the AL algorithm which focuses on exploring the unlabeled samples within the margin band of SVM, after it stops, we aim for investigating whether such unlabeled samples can continue to be explored by semi-supervised learning (SSL) or not. To design such SSL algorithm, one of the challenges is how to figure out unlabeled samples' confidence, and then select the ones with high confidence. In this work, we proposed 3 criterions to determine confidence, i.e. 1) the smoothness assumption; 2) the explored positive samples and the explored negative samples should be similar to the labeled positive samples and the labeled negative samples as much as possible, respectively; 3) the explored positive samples and the explored negative samples should be different from the labeled negative samples and the labeled positive samples as much as possible, respectively. Based on these 3 criterions, a SSL algorithm-SSL_3C was proposed in this work. Furthermore, we applied SSL_3C to audio event classification field, and did experiments on two public datasets. Experimental results demonstrate that SSL_3C can improve the classification performance after the AL process effectively. The selected unlabeled samples are not only of high confidence, but also very informative. Moreover, SSL_3C is not sensitive to the size of labeled and unlabeled training set. The contributions of this work lie in two aspects: first, for the unlabeled samples within the margin band of SVM, we have proposed an effective SSL algorithm to explore them; second, we innovatively proposed 3 criterions to determine unlabeled samples' confidence. Based on these 3 criterions, the explored unlabeled samples are not only of high confidence, but also very informative. Since labeling problem exists in many classification fields, and SSL_3C can effectively decrease manual labeling workload, then the proposed SSL_3C should find widespread applications in many other fields. (C) 2016 Elsevier B.V. All rights reserved.

机译：在许多分类情况下，标记的样本很难获得。但是，未标记的样品很容易获得。主动学习（AL）技术可用于解决标签问题。在众多的AL算法中，着重于在SVM边缘范围内标记未标记样本的算法是减少手动标记工作量的有效方法。 AL需要人类的参与，但是人类可以提供的时间和精力通常是有限的。因此，基于AL技术的样品标记受到很大的限制。为此，这项工作的动机是对AL加工后的加工进行研究。对于专注于探索SVM边缘范围内的未标记样本的AL算法，该算法停止后，我们旨在调查是否可以通过半监督学习（SSL）继续探索此类未标记样本。要设计这样的SSL算法，挑战之一是如何找出未标记样本的置信度，然后选择具有高置信度的样本。在这项工作中，我们提出了3条确定置信度的标准，即1）平滑度假设; 2）探索的阳性样本和探索的阴性样本应分别与标记的阳性样本和标记的阴性样本尽可能相似； 3）探索的阳性样品和探索的阴性样品应分别与标记的阴性样品和标记的阳性样品尽可能不同。基于这三个标准，本文提出了一种SSL算法SSL_3C。此外，我们将SSL_3C应用于音频事件分类字段，并在两个公共数据集上进行了实验。实验结果表明，SSL_3C可以有效提高AL处理后的分类性能。选定的未标记样品不仅具有很高的置信度，而且还提供了很多信息。此外，SSL_3C对标记和未标记训练集的大小不敏感。这项工作的贡献在于两个方面：首先，对于SVM边缘范围内的未标记样本，我们提出了一种有效的SSL算法对其进行探索；其次，我们创新地提出了3条标准来确定未标记样品的置信度。基于这三个标准，所探索的未标记样品不仅具有很高的置信度，而且还提供了很多信息。由于标签问题存在于许多分类领域中，并且SSL_3C可以有效减少手动标签工作量，因此所提出的SSL_3C应该在许多其他领域中得到广泛应用。（C）2016 Elsevier B.V.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2016年第15期|117-129|共13页
作者
Leng Yan; Sun Chengli; Xu Xinyan; Yuan Qi; Xing Shuning; Wan Honglin; Wang Jingjing; Li Dengwang;
展开▼
作者单位

Shandong Normal Univ, Sch Phys & Elect, Shandong Prov Key Lab Med Phys & Image Proc Techn, Jinan 250014, Peoples R China;

Nanchang Hangkong Univ, Sch Informat, Nanchang 330063, Peoples R China;

Shandong Coll Elect Technol, Dept Comp Sci & Technol, Jinan 250014, Peoples R China;

Shandong Normal Univ, Sch Phys & Elect, Shandong Prov Key Lab Med Phys & Image Proc Techn, Jinan 250014, Peoples R China;

Shandong Normal Univ, Sch Informat Sci & Engn, Jinan 250014, Peoples R China;

Shandong Normal Univ, Sch Phys & Elect, Shandong Prov Key Lab Med Phys & Image Proc Techn, Jinan 250014, Peoples R China;

Shandong Normal Univ, Sch Phys & Elect, Shandong Prov Key Lab Med Phys & Image Proc Techn, Jinan 250014, Peoples R China;

Shandong Normal Univ, Sch Phys & Elect, Shandong Prov Key Lab Med Phys & Image Proc Techn, Jinan 250014, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Semi-supervised learning; Active learning; SVM; Margin band; Audio event classification;

机译：半监督学习;主动学习;支持向量机;边际带;音频事件分类;

相似文献

外文文献
中文文献
专利

1. One-Class SVMs Challenges In Audio Detection and Classification Applications [J] . Asma Rabaoui, Hachem Kadri, Zied Lachiri, EURASIP journal on advances in signal processing . 2008,第19期

机译：音频检测和分类应用中的一类SVM挑战
2. One-Class SVMs Challenges in Audio Detection and Classification Applications [J] . Asma Rabaoui, Hachem Kadri, Zied Lachiri, EURASIP journal on advances in signal processing . 2008,第1期

机译：音频检测和分类应用中的一类SVM挑战
3. Improving classification performance using unlabeled data: Naive Bayesian case [J] . Chang-Hwan Lee Knowledge-Based Systems . 2007,第3期

机译：使用未标记的数据提高分类性能：朴素贝叶斯案例
4. A SOM/MLP hybrid network that uses unlabeled data to improve classification performance [C] . Deborah A. Stacey, Stefan C. Kremer, Rozita Dara Artificial Neural Networks in Engineering Conference . 2000

机译：使用未标记数据来提高分类性能的SOM / MLP混合网络
5. Using unlabeled data to improve text classification. [D] . Nigam, Kanal Paul. 2001

机译：使用未标记的数据来改善文本分类。
6. Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach [O] . Camilo L M Morais, Marfran C D Santos, Kássio M G Lima, -1

机译：使用随机突变Kennard-Stone算法方法改善光谱分析中用于分类应用的数据拆分
7. 1One-class SVMs challenges in audio detection and classification applications [O] . Asma Rabaoui, Hachem Kadri, Zied Lachiri, 2016

机译：1一类sVm在音频检测和分类应用中面临挑战
8. Using Unlabeled Data to Improve Text Classification [R] . Nigam, K. P. 2001

机译：使用未标记的数据改进文本分类

Employing unlabeled data to improve the classification performance of SVM, and its application in audio event classification

摘要

著录项

相似文献

相关主题

期刊订阅