首页> 外文会议>IEEE International Conference on Fuzzy Systems >A Two-Stage Fuzzy C-Means Data Placement Strategy for Scientific Cloud Workflows
【24h】

A Two-Stage Fuzzy C-Means Data Placement Strategy for Scientific Cloud Workflows

机译:科学云工作流的两阶段模糊C均值数据放置策略

获取原文

摘要

Presently, cloud computing technologies have enabled to maintain the distribution of massive data applications, such as scientific workflows. They have helped greatly in ensuring the processing of immensely huge scientific data stored among distributed data centers. Actually, the processing of massive data via scientific workflows appears to be costly in terms of data transmission, execution delay and bandwidth cost. Consequently, for the execution workflow and data transmission costs to be noticeably reduced, certain data placement optimization techniques turn out to be necessary. Hence, whenever a workflow task appears to require the location of some datasets in different specified data centers, the placement of massive data volumes turns out to constitute a hard challenge. In the present work, a data placement strategy associated with scientific cloud workflow is advanced, as based on fuzzy c-means clustering technique. Actually, the proposed data placement methodology involves a two-stage strategy. The first stage, an offline one, involves grouping the initial datasets into k data centers, and then, regrouping them via fuzzy c-means technique. In the second stage, the online one, and following execution of the workflow, the generated datasets are placed in the data centers according to their dependencies, based on the application of the same fuzzy c-means technique, too. Eventually, the proposed two-stage strategy appears to be effective in reducing the overall data placement amounts in respect of the state-of-the art strategies.
机译:当前,云计算技术已经能够维持海量数据应用程序(例如科学工作流)的分布。它们极大地帮助确保了处理分布式数据中心之间存储的巨大科学数据。实际上,就数据传输,执行延迟和带宽成本而言,通过科学工作流程处理海量数据似乎是昂贵的。因此,为了显着降低执行工作流程和数据传输成本,某些数据放置优化技术被证明是必要的。因此,每当工作流任务似乎需要在不同的指定数据中心中放置一些数据集时,大量数据量的放置就构成了艰巨的挑战。在当前工作中,基于模糊c均值聚类技术,提出了与科学云工作流相关联的数据放置策略。实际上,所提出的数据放置方法涉及两个阶段的策略。第一个阶段是脱机阶段,涉及将初始数据集分组为k个数据中心,然后通过模糊c均值技术对其进行重新分组。在第二阶段,即联机阶段,并在执行工作流之后,也基于相同的模糊c均值技术的应用,将生成的数据集根据其依赖关系放置在数据中心中。最终,相对于最新的策略,建议的两阶段策略似乎在减少总体数据放置量方面是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号