...
首页> 外文期刊>Turkish Journal of Electrical Engineering and Computer Sciences >Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets
【24h】

Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets

机译:基于模糊聚类的大型现实世界数据集快速降噪支持向量机训练方法

获取原文
           

摘要

Classifying large and real-world datasets is a challenging problem in machine learning algorithms. Among the machine learning methods, the support vector machine (SVM) is a well-known approach with high generalization ability. Unfortunately, while the number of training data increases and the data contain noise, the performance of SVM significantly decreases. In this paper, a fast and de-noise two-stage method for training SVMs to deal with large, real-world datasets is proposed. In the first stage, data that contain noises or are suspected to be noisy are identified and eliminated from the genuine training dataset. The process of elimination and identification is based on the movement of the center of the convex hull data in the training dataset. The convex hull data are computed via the QHull algorithm. On the other hand, the well-known fuzzy clustering method (FCM) is applied to compress and reduce the size of the training dataset. Finally, the reduced and purified cluster centers are used for training the SVM. A set of experiments is conducted on the four benchmarking datasets of the UCI database. Moreover, the amount of training time and the generalization of the proposed approach are compared with FCM-SVM and normal SVM. The results indicate that the proposed method reduces the amount of training time and has a considerable success in removing noisy data from the training dataset. Therefore, the proposed method can achieve a higher generalization performance in comparison with the other methods in large, real-world datasets.
机译:在机器学习算法中,对大型和现实世界的数据集进行分类是一个具有挑战性的问题。在机器学习方法中,支持向量机(SVM)是一种具有高泛化能力的众所周知的方法。不幸的是,尽管训练数据的数量增加并且数据包含噪声,但SVM的性能却大大降低。本文提出了一种用于训练SVM来处理大型,真实世界数据集的快速降噪两阶段方法。在第一阶段,从真正的训练数据集中识别出包含噪声或怀疑有噪声的数据并将其删除。消除和识别的过程基于训练数据集中凸包数据中心的移动。凸包数据通过QHull算法计算。另一方面,众所周知的模糊聚类方法(FCM)用于压缩和减少训练数据集的大小。最后,将经过精简和简化的集群中心用于训练SVM。对UCI数据库的四个基准数据集进行了一组实验。此外,将训练时间量和所提方法的一般性与FCM-SVM和普通SVM进行了比较。结果表明,所提出的方法减少了训练时间,并且在从训练数据集中去除噪声数据方面取得了相当大的成功。因此,与大型现实数据集中的其他方法相比,该方法可以实现更高的泛化性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号