...
首页> 外文期刊>Future generation computer systems >On the performance of high dimensional data clustering and classification algorithms
【24h】

On the performance of high dimensional data clustering and classification algorithms

机译:关于高维数据聚类和分类算法的性能

获取原文
获取原文并翻译 | 示例
           

摘要

There is often a need to perform machine learning tasks on voluminous amounts of data. These tasks have application in fields such as pattern recognition, data mining, bioinformatics, and recommendation systems. Here we evaluate the performance of 4 clustering algorithms and 2 classification algorithms supported by Mahout within two different cloud runtimes, Hadoop and Granules. Our benchmarks use the same Mahout backend code, ensuring a fair comparison. The differences between these implementations stem from how the Hadoop and Granules runtimes (1) support and manage the lifecycle of individual computations, and (2) how they orchestrate exchange of data between different stages of the computational pipeline during successive iterations of the clustering algorithm. We include an analySIs of our results for each of these algorithms in a distributed setting, as well as a discussion on measures for failure recovery.
机译:通常需要对大量数据执行机器学习任务。这些任务已应用于模式识别,数据挖掘,生物信息学和推荐系统等领域。在这里,我们评估了Mahout在两种不同的云运行时Hadoop和Granules中支持的4种聚类算法和2种分类算法的性能。我们的基准测试使用相同的Mahout后端代码,以确保公平的比较。这些实现之间的差异源于Hadoop和Granules运行时如何(1)支持和管理单个计算的生命周期,以及(2)在集群算法的连续迭代过程中,它们如何协调计算管道的不同阶段之间的数据交换。我们在分布式环境中对每种算法的结果进行了分析,并讨论了故障恢复措施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号