A Two-Stage Fuzzy C-Means Data Placement Strategy for Scientific Cloud Workflows

机译：科学云工作流的两阶段模糊C均值数据放置策略

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Presently, cloud computing technologies have enabled to maintain the distribution of massive data applications, such as scientific workflows. They have helped greatly in ensuring the processing of immensely huge scientific data stored among distributed data centers. Actually, the processing of massive data via scientific workflows appears to be costly in terms of data transmission, execution delay and bandwidth cost. Consequently, for the execution workflow and data transmission costs to be noticeably reduced, certain data placement optimization techniques turn out to be necessary. Hence, whenever a workflow task appears to require the location of some datasets in different specified data centers, the placement of massive data volumes turns out to constitute a hard challenge. In the present work, a data placement strategy associated with scientific cloud workflow is advanced, as based on fuzzy c-means clustering technique. Actually, the proposed data placement methodology involves a two-stage strategy. The first stage, an offline one, involves grouping the initial datasets into k data centers, and then, regrouping them via fuzzy c-means technique. In the second stage, the online one, and following execution of the workflow, the generated datasets are placed in the data centers according to their dependencies, based on the application of the same fuzzy c-means technique, too. Eventually, the proposed two-stage strategy appears to be effective in reducing the overall data placement amounts in respect of the state-of-the art strategies.

机译：当前，云计算技术已经能够维持海量数据应用程序（例如科学工作流）的分布。它们极大地帮助确保了处理分布式数据中心之间存储的巨大科学数据。实际上，就数据传输，执行延迟和带宽成本而言，通过科学工作流程处理海量数据似乎是昂贵的。因此，为了显着降低执行工作流程和数据传输成本，某些数据放置优化技术被证明是必要的。因此，每当工作流任务似乎需要在不同的指定数据中心中放置一些数据集时，大量数据量的放置就构成了艰巨的挑战。在当前工作中，基于模糊c均值聚类技术，提出了与科学云工作流相关联的数据放置策略。实际上，所提出的数据放置方法涉及两个阶段的策略。第一个阶段是脱机阶段，涉及将初始数据集分组为k个数据中心，然后通过模糊c均值技术对其进行重新分组。在第二阶段，即联机阶段，并在执行工作流之后，也基于相同的模糊c均值技术的应用，将生成的数据集根据其依赖关系放置在数据中心中。最终，相对于最新的策略，建议的两阶段策略似乎在减少总体数据放置量方面是有效的。

著录项

来源
《IEEE International Conference on Fuzzy Systems》|2018年|1-8|共8页
会议地点
作者
Hamdi Kchaou; Zied Kechaou; Adel M. Alimi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data centers; Task analysis; Cloud computing; Data transfer; Clustering algorithms; Partitioning algorithms; Genetic algorithms;

机译：数据中心;任务分析;云计算;数据传输;聚类算法;分区算法;遗传算法;

相似文献

外文文献
中文文献
专利

1. Fuzzy Theory-Based Data Placement for Scientific Workflows in Hybrid Cloud Environments [J] . Zheyi Chen, Xu Zhao, Bing Lin Discrete dynamics in nature and society . 2020,第4期

机译：混合云环境中科学工作流的模糊理论基于数据展示
2. A Time-Driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing [J] . Lin Bing, Zhu Fangning, Zhang Jianshan, IEEE transactions on industrial informatics . 2019,第7期

机译：结合边缘计算和云计算的科学工作流的时间驱动数据放置策略
3. Security-aware intermediate data placement strategy in scientific cloud workflows [J] . Wei Liu, Su Peng, Wei Du, Knowledge and information systems . 2014,第2期

机译：科学云工作流程中具有安全意识的中间数据放置策略
4. A Two-Stage Fuzzy C-Means Data Placement Strategy for Scientific Cloud Workflows [C] . Hamdi Kchaou, Zied Kechaou, Adel M. Alimi IEEE International Conference on Fuzzy Systems . 2018

机译：科学云工作流的两阶段模糊C-mears数据放置策略
5. Virtual Machine Consolidation in Cloud Data Centres using a Parameter-based Placement Strategy [D] . ?Mosa, Abdelkhalik 2019

机译：使用基于参数的放置策略虚拟机整合在云数据中心
6. An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data [O] . Junsheng Huang, Baohua Mao, Yun Bai, 2020

机译：出租车GPS数据的数据归因综合模糊C均值方法
7. Fuzzy Theory-Based Data Placement for Scientific Workflows in Hybrid Cloud Environments [O] . Zheyi Chen, Xu Zhao, Bing Lin 2020

机译：混合云环境中科学工作流的模糊理论基于数据展示

A Two-Stage Fuzzy C-Means Data Placement Strategy for Scientific Cloud Workflows

摘要

著录项

相似文献

相关主题

期刊订阅