Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets

OMID NAGHASH ALMASI; MODJTABA ROUHANI

首页> 外文期刊>Turkish Journal of Electrical Engineering and Computer Sciences >Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets

【24h】

Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets

机译：基于模糊聚类的大型现实世界数据集快速降噪支持向量机训练方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Classifying large and real-world datasets is a challenging problem in machine learning algorithms. Among the machine learning methods, the support vector machine (SVM) is a well-known approach with high generalization ability. Unfortunately, while the number of training data increases and the data contain noise, the performance of SVM significantly decreases. In this paper, a fast and de-noise two-stage method for training SVMs to deal with large, real-world datasets is proposed. In the first stage, data that contain noises or are suspected to be noisy are identified and eliminated from the genuine training dataset. The process of elimination and identification is based on the movement of the center of the convex hull data in the training dataset. The convex hull data are computed via the QHull algorithm. On the other hand, the well-known fuzzy clustering method (FCM) is applied to compress and reduce the size of the training dataset. Finally, the reduced and purified cluster centers are used for training the SVM. A set of experiments is conducted on the four benchmarking datasets of the UCI database. Moreover, the amount of training time and the generalization of the proposed approach are compared with FCM-SVM and normal SVM. The results indicate that the proposed method reduces the amount of training time and has a considerable success in removing noisy data from the training dataset. Therefore, the proposed method can achieve a higher generalization performance in comparison with the other methods in large, real-world datasets.

机译：在机器学习算法中，对大型和现实世界的数据集进行分类是一个具有挑战性的问题。在机器学习方法中，支持向量机（SVM）是一种具有高泛化能力的众所周知的方法。不幸的是，尽管训练数据的数量增加并且数据包含噪声，但SVM的性能却大大降低。本文提出了一种用于训练SVM来处理大型，真实世界数据集的快速降噪两阶段方法。在第一阶段，从真正的训练数据集中识别出包含噪声或怀疑有噪声的数据并将其删除。消除和识别的过程基于训练数据集中凸包数据中心的移动。凸包数据通过QHull算法计算。另一方面，众所周知的模糊聚类方法（FCM）用于压缩和减少训练数据集的大小。最后，将经过精简和简化的集群中心用于训练SVM。对UCI数据库的四个基准数据集进行了一组实验。此外，将训练时间量和所提方法的一般性与FCM-SVM和普通SVM进行了比较。结果表明，所提出的方法减少了训练时间，并且在从训练数据集中去除噪声数据方面取得了相当大的成功。因此，与大型现实数据集中的其他方法相比，该方法可以实现更高的泛化性能。

著录项

来源
《Turkish Journal of Electrical Engineering and Computer Sciences》 |2016年第1期|共15页
作者
OMID NAGHASH ALMASI; MODJTABA ROUHANI;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类工业经济;
关键词

相似文献

外文文献
中文文献
专利

1. A new training method for support vector machines: Clustering k-NN support vector machines [J] . Emre Comak, Ahmet Arslan Expert systems with applications . 2008,第3期

机译：支持向量机的一种新训练方法：聚类k-NN支持向量机
2. A fuzzy support vector machine algorithm for classification based on a novel PIM fuzzy clustering method [J] . Zhenning Wu, Huaguang Zhang, Jinhai Liu Neurocomputing . 2014,第feba11期

机译：基于新型PIM模糊聚类方法的模糊支持向量机分类算法
3. Fast-forward solver for inhomogeneous media using machine learning methods: artificial neural network, support vector machine and fuzzy logic [J] . Abdolrazzaghi Mohammad, Hashemy Soheil, Abdolali Ali Neural computing & applications . 2018,第12期

机译：使用机器学习方法的非均匀介质的快速前进求解器：人工神经网络，支持向量机和模糊逻辑
4. Efficient resampling methods for training support vector machines with imbalanced datasets [C] . Batuwita Rukshan, Palade Vasile The 2010 International Joint Conference on Neural Networks . 2010

机译：用于训练不平衡数据集的支持向量机的有效重采样方法
5. Active learning with support vector machines for imbalanced datasets and a method for stopping active learning based on stabilizing predictions. [D] . Bloodgood, Michael. 2009

机译：支持向量机用于不平衡数据集的主动学习，以及一种基于稳定预测的主动学习停止方法。
6. Feature Selection Method Based on Artificial Bee Colony Algorithm and Support Vector Machines for Medical Datasets Classification [O] . Mustafa Serter Uzer, Nihat Yilmaz, Onur Inan 2013

机译：基于人工蜂群算法和支持向量机的医学数据集特征选择方法
7. Efficient resampling methods for training support vector machines with imbalanced datasets [O] . Rukshan Batuwita, Vasile Palade 2010

机译：有效的重采样方法，用于培训支持向量机与不平衡数据集

Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets

摘要

著录项

相似文献

相关主题

期刊订阅