Efficient <inline-formula><tex-math>$k$</tex-math><alternatives> <inline-graphic xlink:type='simple' xlink:href='qu-ieq1-2306193.gif'/></alternatives></inline-formula>-Means++ Approximation with MapReduce

Xu Y.; Qu W.; Li Z.; Min G.; Li K.; Liu Z.

首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Efficient

$k$

-Means++ Approximation with MapReduce

【24h】

Efficient $k$ -Means++ Approximation with MapReduce

机译：有效的 $ k $ < / alternatives> -借助MapReduce的++近似

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

-means is undoubtedly one of the most popular clustering algorithms owing to its simplicity and efficiency. However, this algorithm is highly sensitive to the chosen initial centers and thus a proper initialization is crucial for obtaining an ideal solution. To address this problem, -means++ is proposed to sequentially choose the centers so as to achieve a solution that is provably close to the optimal one. However, due to its weak scalability, -means++ becomes inefficient as the size of data increases. To improve its scalability and efficiency, this paper presents MapReduce -means++ method which can drastically reduce the number of MapReduce jobs by using only one MapReduce job to obtain centers. The -means++ initialization algorithm is executed in the Mapper phase and the weighted -means++ initialization algorithm is run in the Reducer phase. As this new MapReduce -means++ method replaces the iterations among multiple machines with a single machine, it can reduce the communication and I/O costs significantly. We also prove that the proposed MapReduce -means++ method obtains approximation to the optimal solution of -means. To reduce the expensive distance computation of the proposed method, we further propose a pruning strategy that can greatly avoid a large number of redundant distance computations. Extensive experiments on real and synthetic data are conducted and the performance results indicate that the proposed MapReduce -means++ method is much more efficient and can achieve a good approximation.

机译：-means无疑是最流行的聚类算法之一，因为它具有简单性和效率。但是，该算法对所选的初始中心高度敏感，因此正确的初始化对于获得理想的解决方案至关重要。为了解决此问题，建议使用-means ++顺序选择中心，以实现可证明接近最佳中心的解决方案。但是，由于其较弱的可伸缩性，-means ++随着数据大小的增加而变得效率低下。为了提高其可扩展性和效率，本文提出了MapReduce -means ++方法，该方法可以通过仅使用一个MapReduce作业来获得中心，从而大大减少MapReduce作业的数量。 -means ++初始化算法在Mapper阶段执行，加权-means ++初始化算法在Reducer阶段运行。由于此新的MapReduce -means ++方法用一台机器替换了多台机器之间的迭代，因此可以显着减少通信和I / O成本。我们还证明了所提出的MapReduce -means ++方法获得了-means最优解的近似值。为了减少所提出方法的昂贵距离计算，我们进一步提出了一种修剪策略，该策略可以大大避免大量冗余距离计算。对真实和合成数据进行了广泛的实验，性能结果表明，所提出的MapReduce -means ++方法效率更高，并且可以实现良好的近似。

著录项

来源
《Parallel and Distributed Systems, IEEE Transactions on》 |2014年第12期|3135-3144|共10页
作者
Xu Y.; Qu W.; Li Z.; Min G.; Li K.; Liu Z.;
展开▼
作者单位

School of Information Science and Technology, Dalian Maritime University, Dalian, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Algorithm design and analysis; Approximation algorithms; Approximation methods; Clustering algorithms; Educational institutions; Scalability; Standards; Clustering algorithms; MapReduce; approximation; k-means; k-means++; scalability;

机译：算法设计与分析;近似算法;近似方法;聚类算法;教育机构;可扩展性;标准;聚类算法;MapReduce;近似;k-均值;k-means ++;可扩展性;

相似文献

外文文献
中文文献
专利

1. THE $L_{r}$ CONVERGENCE AND WEAK LAWS OF LARGE NUMBERS FOR $widetilde{unicode[STIX]{x1D70C}}$ -MIXING RANDOM VARIABLES [J] . YAN-JIAO MENG The ANZIAM journal: the Australian & New Zealand industrial and applied mathematics journal . 2017,第3a4期

机译： $ L_ {r} $ $ widetilde { unicode [STIX] {x1D70C}} $$ -混合随机变量
2. Entanglement entropy for T T ˉ documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$ mathrm{T}overline{mathrm{T}} $$end{document} , J T ˉ documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$ mathrm{J}overline{mathrm{T}} $$end{document} , T J ˉ documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$ mathrm{T}overline{mathrm{J}} $$end{document} deformed holographic CFT [J] . Soumangsu Chakraborty, Akikazu Hashimoto The journal of high energy physics . 2021,第2期

机译：为<直列式ID = “IEq1”> <替代>纠缠熵 T T ˉ < TEX-数学ID = “IEq1_TeX”> 的DocumentClass [12磅] {最小} usepackage {amsmath} usepackage {wasysym} usepackage {amsfonts} usepackage {amssymb} usepackage {amsbsy} usepackage {mathrsfs} usepackage { upgreek} setlength { oddsidemargin} { - 69pt} {开始文档} $$ mathrm【T} {划线 mathrm【T}} $$ {端文档} <直列图形的xlink：HREF = “MediaObjects / 13130__14822_IEq1.gif”/> ，<直列式ID = “IEq2”> <替代> Ĵ T ˉ 的DocumentClass [12磅] {最小} {usepackage amsmath} {usepackage wasysym} {usepackage amsfonts} {usepackage amssymb} {usepackage amsbsy} {usepackage mathrsfs} {usepackage upgreek} setlength { oddsidemargin} { - 69pt} {开始文档} $$ mathrm {Ĵ} 上划线{ mathrm【T}} $$ {端文档} <直列图形的xlink：HREF = “MediaObjects / 13130__14822_IEq2.gif”/> ，<直列式ID = “IEq3”> <替代> T < MML：MI mathvariant = “正常”>Ĵ ˉ 的DocumentClass [12磅] {最小} usepackage {amsmath} usepackage {wasysym} usepackage {amsfonts} usepackage {amssymb} usepackage {amsbsy} usepackage {mathrsfs} usepackage {upgreek} setlength { oddsidemargin} { - 69pt} {开始文档} $$ mathrm【T} {划线 mathrm {Ĵ}} $$ {端文档} <直列图形的xlink：HREF = “MediaObjects / 13130__14822_IEq3.gif”/> 变形全息CFT
3. Top-induced contributions to H → b b ˉ documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$ boverline{b} $$end{document} and H → c c ˉ documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$ coverline{c} $$end{document} at O α s 3 documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$ mathcal{O}left({lpha}_s^3ight) $$end{document} [J] . Roberto Mondini, Ulrich Schubert, Ciaran Williams The journal of high energy physics . 2020,第12期

机译：顶部引起的<斜体> h →<内联公式id =“IEQ1”> <替代方案> b < MML：MOVER ACCENT =“TRUE”> B ˉ documentClass [12pt] {minimal} usepackage {ammath} usepackage {keysym} usepackage {amsfonts} usepackage {amssys} usepackage {mathrsfs} usepackage {supmeek} setLength { oddsidemargin} { - 69pt} begin {document} $$ b overline {b} $$ end {document} 和<斜体> h →<内联公式id =”IEQ2“> <替代方案> C C ˉ documentclass [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} usepackage {amssymb} u sepackage {amsbsy} usepackage {mathrsfs} usepackage {supmeek} setLength { oddsidemargin} { - 69pt} begin {document} $$ c overline {c} $$ end {document} 在 < mml：mi mathvariant =“script”> α S 3 DocumentClass [12pt] {minimal} usepackage {ammath} usepackage {keysym} usepackage {amsfonts} usepackage {amssysfs} usepackage {mathrsfs} usepackage {supmeek} setLength { oddsidemargin} { -69pt} begin {document} $$$ mathcal {o} left（{ alpha} _s ^ 3 右）$$ end {document}
4. Alternate Wavelength Switching in a Widely Tunable Dual-Wavelength Tm $^{3+}$-Doped Fiber Laser at 1900 nm [O] . Chenglai Jia, Xun Liang, Martin Rochette, 2015

机译：备用波长切换在广泛可调双波长TM <命名内容内容=“MATH”XLINK：type =“简单”> $ ^ {3+ $ - 1900 nm的光纤激光器

Efficient $k$ -Means++ Approximation with MapReduce

摘要

著录项

相似文献

相关主题

期刊订阅