On the performance of high dimensional data clustering and classification algorithms

Kathleen Ericson; Shrideep Pallickara

首页> 外文期刊>Future generation computer systems >On the performance of high dimensional data clustering and classification algorithms

【24h】

On the performance of high dimensional data clustering and classification algorithms

机译：关于高维数据聚类和分类算法的性能

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

There is often a need to perform machine learning tasks on voluminous amounts of data. These tasks have application in fields such as pattern recognition, data mining, bioinformatics, and recommendation systems. Here we evaluate the performance of 4 clustering algorithms and 2 classification algorithms supported by Mahout within two different cloud runtimes, Hadoop and Granules. Our benchmarks use the same Mahout backend code, ensuring a fair comparison. The differences between these implementations stem from how the Hadoop and Granules runtimes (1) support and manage the lifecycle of individual computations, and (2) how they orchestrate exchange of data between different stages of the computational pipeline during successive iterations of the clustering algorithm. We include an analySIs of our results for each of these algorithms in a distributed setting, as well as a discussion on measures for failure recovery.

机译：通常需要对大量数据执行机器学习任务。这些任务已应用于模式识别，数据挖掘，生物信息学和推荐系统等领域。在这里，我们评估了Mahout在两种不同的云运行时Hadoop和Granules中支持的4种聚类算法和2种分类算法的性能。我们的基准测试使用相同的Mahout后端代码，以确保公平的比较。这些实现之间的差异源于Hadoop和Granules运行时如何（1）支持和管理单个计算的生命周期，以及（2）在集群算法的连续迭代过程中，它们如何协调计算管道的不同阶段之间的数据交换。我们在分布式环境中对每种算法的结果进行了分析，并讨论了故障恢复措施。

著录项

来源
《Future generation computer systems》 |2013年第4期|1024-1034|共11页
作者
Kathleen Ericson; Shrideep Pallickara;
展开▼
作者单位

Computer Science Department, Colorado State University, Fort Collins, CO 80523, USA;

Computer Science Department, Colorado State University, Fort Collins, CO 80523, USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
machine learning; distributed stream processing; hadoop; mahout; clustering; classification; granules;

机译：机器学习分布式流处理;Hadoop mahout;集群分类;颗粒剂;

相似文献

外文文献
中文文献
专利

1. Emulation of high-performance correlation-based quantum clustering algorithm for two-dimensional data on FPGA [J] . Quantum information processing . 2020,第6期

机译：基于高性能相关性的量子聚类算法对FPGA的二维数据的仿真
2. Emulation of high-performance correlation-based quantum clustering algorithm for two-dimensional data on FPGA [J] . Nature reviews Drug discovery . 2020,第6期

机译：基于高性能相关性的量子聚类算法对FPGA的二维数据的仿真
3. Genetic Algorithm Based Dimensionality Reduction for Improving Performance of K-Means Clustering: A Case Study for Categorization of Medical Dataset [J] . Asha Gowda Karegowda, Vidya T. Shama, M.A. Jayaram, International journal of soft computing . 2012,第5期

机译：基于遗传算法的降维方法提高K-Means聚类性能：以医学数据集分类为例
4. Classification Performances Of Data Mining Clustering Algorithms For Remotely Sensed Multispectral Image Data [C] . Hamza Erol, Bala Mikat Tyoden, Recep Erol IEEE International Conference on Innovations in Intelligent Systems and Applications . 2018

机译：遥感多光谱图像数据挖掘聚类算法的分类性能
5. Building a Decision Cluster Classification Model by a Clustering Algorithm to Classify Large High Dimensional Data with Multiple Classes. [D] . Li, Yan. 2010

机译：通过聚类算法构建决策聚类分类模型，对具有多个类的大型高维数据进行分类。
6. Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms [O] . Yu Guo, Armin Graber, Robert N McBurney, 2010

机译：高维数据设置中的样本量和统计功效考虑因素：分类算法的比较研究
7. On the performance of high dimensional data clustering and classification algorithms [O] . Kathleen Ericson, Shrideep Pallickara 2013

机译：关于高维数据聚类和分类算法的性能
8. Feature extraction and classification algorithms for high dimensional data [R] . Lee, Chulhee, Landgrebe, David 1993

机译：高维数据的特征提取和分类算法

On the performance of high dimensional data clustering and classification algorithms

摘要

著录项

相似文献

相关主题

期刊订阅