首页> 外文会议>International Conference on Computing for Geospatial Research and Application >On the Organization of Cluster Voting with Massive Distributed Streams
【24h】

On the Organization of Cluster Voting with Massive Distributed Streams

机译:大规模分布式流的集群投票组织

获取原文

摘要

Data processing is one of the important challenges on Big Data. In this paper we investigate optimal processing algorithm for massive data streams, propose a new processing algorithm called multi-buffer based majority algorithm. The algorithm maintains time complexity of O(n) and selects prevalent elements of frequencies as low as 1%. Our experiments indicate that multi-buffer based majority algorithm has improvements on both accuracy and efficiency. Moreover, we use multibuffer based algorithm to process data streams on single system and distributed system. These experiments indicate that using multi-buffer based algorithm can have better performance on distributed system. Moreover, we give explanations of the experiments' result and indicate several major factors which influence the result accuracy: stream size, element range in the stream, frequency of predominant elements and our buffer sets.
机译:数据处理是大数据上的重要挑战之一。在本文中,我们研究了海量数据流的最佳处理算法,提出了一种新的处理算法,称为基于多缓冲区的多数算法。该算法保持O(n)的时间复杂度,并选择频率低至1%的流行元素。我们的实验表明,基于多缓冲区的多数算法在准确性和效率上都有改进。此外,我们使用基于多缓冲区的算法来处理单个系统和分布式系统上的数据流。这些实验表明,使用基于多缓冲区的算法可以在分布式系统上具有更好的性能。此外,我们对实验结果进行了解释,并指出了影响结果准确性的几个主要因素:数据流大小,数据流中的元素范围,主要元素的频率和我们的缓冲集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号