首页> 外文会议>Smart Cities Symposium >Scalable parallel SVM on cloud clusters for large datasets classification
【24h】

Scalable parallel SVM on cloud clusters for large datasets classification

机译:用于大型数据集分类的云集群上的可扩展并行SVM

获取原文

摘要

This paper proposes a new parallel support vector machine (PSVM) that is efficient in terms of time complexity. Support vector machine is one of the popular classifiers for analysis of data and classification of patterns. However, SVM requires a large memory (in the range of 100 GB or more) in order to process big-data (i.e., in the range of 1 TB data or more). This paper proposes to execute SVMs in parallel on several clusters to analyze and classify big-data. In this approach, the data are divided to n equal partitions. Each partitioned data is used by an individual cluster to train an SVM. The outcomes of each of the SVMs executed on several clusters are then combined by another SVM referred as final SVM. The inputs to this final SVM are the support vectors (SVs) of the SVMs that were executed on different clusters, while the desired output is the corresponding output of the respective SV. We evaluated our proposed method on high performance computing (HPC) clusters and amazon cloud clusters (ACC) using different benchmark datasets. Experimental results show that the proposed method is efficient in terms of training time with minimal error rate and memory requirement, compared to the existing stand-alone SVM.
机译:本文提出了一种新的并行支持向量机(PSVM),它在时间复杂度方面很有效。支持向量机是用于数据分析和模式分类的流行分类器之一。但是,SVM需要大内存(100 GB或更大)以处理大数据(即1 TB数据或更大)。本文提出在多个群集上并行执行SVM,以对大数据进行分析和分类。在这种方法中,数据被划分为n个相等的分区。每个分区数据由单个群集用于训练SVM。然后,在几个集群上执行的每个SVM的结果将被另一个称为最终SVM的SVM合并。最终SVM的输入是在不同群集上执行的SVM的支持向量(SV),而所需的输出是各个SV的对应输出。我们使用不同的基准数据集评估了我们针对高性能计算(HPC)集群和亚马逊云集群(ACC)提出的方法。实验结果表明,与现有的独立SVM相比,该方法在训练时间方面效率高,错误率和内存需求最小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号