Scalable implementation of dependence clustering in Apache Spark

机译：Apache Spark中依赖项群集的可扩展实现

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs.

机译：本文提出了一种依赖谱聚类算法的可扩展版本，它属于频谱聚类方法的一类。该方法是使用GraphX API原语在Apache Spark中实现的。此外，介绍了一种在Spark环境中实现光谱聚类算法的快速近似扩散过程。另外，该算法针对频谱聚类进行了基准测试。将方法应用于实际数据的结果可以得出结论，该实现可很好地扩展，但对于密集连接的图则表现出良好的性能。

著录项

来源
《2017 Evolving and Adaptive Intelligent Systems》|2017年|1-6|共6页
会议地点 Ljubljana(SI)
作者
Elena Ivannikova;
展开▼
作者单位

Department of Mathematical Information Technology, University of Jyväskylä, PO Box 35 (Agora), 40014 Jyväskylä;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Sparks; Clustering algorithms; Approximation algorithms; Algorithm design and analysis; Clustering methods; Eigenvalues and eigenfunctions; Data analysis;

机译：Sparks;聚类算法;逼近算法;算法设计和分析;聚类方法;特征值和特征函数;数据分析;

相似文献

外文文献
中文文献
专利

1. Scalability of Artificial Neural Network in Apache Spark Powered Cluster [J] . Advanced Science Letters . 2017,第6期

机译：Apache Spark Power群集中人工神经网络的可扩展性
2. A comparison on scalability for batch big data processing on Apache Spark and Apache Flink [J] . Diego García-Gil, Sergio Ramírez-Gallego, Salvador García, Big Data Analytics . 2017,第1期

机译：Apache Spark和Apache Flink上批处理大数据处理的可伸缩性比较
3. Parallel particle swarm optimization classification algorithm variant implemented with Apache Spark [J] . Al-Sawwa Jamil, Ludwig Simone A. CONCURRENCY PRACTICE & EXPERIENCE . 2020,第2期

机译：使用Apache Spark实现的并行粒子群优化分类算法变体
4. Scalable implementation of dependence clustering in Apache Spark [C] . Elena Ivannikova IEEE Conference on Evolving and Adaptive Intelligent Systems . 2017

机译：Apache Spark中的依赖聚类可扩展实现
5. A performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark's GraphX [D] . Langewisch, Ryan P. 2015

机译：在Apache Spark的GraphX中执行推入重贴标签最大流量算法的性能研究
6. SparkRA: Enabling Big Data Scalability for the GATK RNA-seq Pipeline with Apache Spark [O] . Zaid Al-Ars, Saiyi Wang, Hamid Mushtaq 2020

机译：SparkRA：使用Apache Spark为GATK RNA-seq管道启用大数据可伸缩性
7. Scalable implementation of dependence clustering in Apache Spark [O] . Ivannikova, Elena 2017

机译：Apache Spark中依赖项群集的可扩展实现

Scalable implementation of dependence clustering in Apache Spark

摘要

著录项

相似文献

相关主题

期刊订阅