A resource management and fault tolerance services in grid computing

HwaMin Lee; KwangSik Chung; SungHo Chin; JongHyuk Lee; DaeWon Lee; Seongbin Park; HeonChang Yu

首页> 外文期刊>Journal of Parallel and Distributed Computing >A resource management and fault tolerance services in grid computing

【24h】

A resource management and fault tolerance services in grid computing

机译：网格计算中的资源管理和容错服务

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In grid computing, resource management and fault tolerance services are important issues. The availability of the selected resources for job execution is a primary factor that determines the computing performance. In this paper, we propose a resource manager for optimal resource selection. Our resource manager automatically selects the set of optimal resources among candidate resources that achieves optimal performance using a genetic algorithm. Typically, the probability of a failure is higher in the grid computing than in a traditional parallel computing and the failure of resources affects job execution fatally. Therefore, a fault tolerance service is essential in computational grids. And grid services are often expected to meet some minimum levels of Quality of Service (QoS) for a desirable operation. To address this issue, we also propose a fault tolerance service that satisfies QoS requirements. We extend the definition of failures from the conventional notion of failures in distribute systems in order to provide a fault tolerance service that deals with various types of resource failures, which include process failures, processor failures, and network failures. We also design and implement a fault detector and a fault manager. The implementation and simulation results indicate that our approaches are promising in that (1) the resource manager finds the optimal set of resources that guarantees efficient job execution, (2) the fault detector detects the occurrence of resource failures and (3) the fault manager guarantees that the submitted jobs complete and the performance of job execution is improved due to job migration even if some failures occur.

机译：在网格计算中，资源管理和容错服务是重要的问题。所选资源用于作业执行的可用性是确定计算性能的主要因素。在本文中，我们提出了一个资源管理器来优化资源选择。我们的资源管理器会使用遗传算法自动选择可实现最佳性能的候选资源。通常，网格计算中发生故障的可能性比传统的并行计算中高，并且资源故障严重影响作业的执行。因此，容错服务在计算网格中至关重要。而且，对于期望的操作，通常期望网格服务满足某些最低的服务质量（QoS）水平。为了解决此问题，我们还提出了一种可满足QoS要求的容错服务。我们从分布式系统中的常规故障概念扩展了故障的定义，以便提供一种容错服务，处理各种类型的资源故障，包括过程故障，处理器故障和网络故障。我们还设计并实现了故障检测器和故障管理器。实施和仿真结果表明，我们的方法是有前途的：（1）资源管理器找到可确保有效执行作业的最佳资源集；（2）故障检测器检测到资源故障的发生；（3）故障管理器即使发生某些故障，也可以确保由于作业迁移而完成了提交的作业，并提高了作业执行的性能。

著录项

来源
《Journal of Parallel and Distributed Computing》 |2005年第11期|p.1305-1317|共13页
作者
HwaMin Lee; KwangSik Chung; SungHo Chin; JongHyuk Lee; DaeWon Lee; Seongbin Park; HeonChang Yu;
展开▼
作者单位

Department of Computer Science Education, Korea University, 1, 5-Ka, Anam-Dong, Sungbuk-Ku, Seoul, Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术及设备;
关键词
fault tolerance; resource manager; quality of service; migration; grid computing;

机译：容错;资源管理器;服务质量;迁移;网格计算;

相似文献

外文文献
中文文献
专利

1. Fault tolerance in grid computing by resource clustering [J] . Miloud Khaldi, Mohammed Rebbah, Boudjelal Meftah, International Journal of Internet Technology and Secured Transactions . 2020,第1a2期

机译：资源聚类网格计算中的容错
2. A Fault Tolerance Algorithm for Resource Discovery in Semantic Grid Computing Using Task Agents [J] . Masoud Barati, Soheil Lotfi, Azizallah Rahmati Journal of Software Engineering and Applications . 2014,第4期

机译：使用任务代理的语义网格计算中资源发现的容错算法
3. Load Balancing with Fault Tolerance and Optimal Resource Utilization in Grid Computing [J] . Neeraj Nehra, R.B. Patel, V.K. Bhat Information Technology Journal . 2007,第6期

机译：网格计算中具有容错能力和最佳资源利用的负载平衡
4. A Resource Management System for Fault Tolerance in Grid Computing [C] . International Conference on Computational Science and Engineering . 2009

机译：网格计算中容错资源管理系统
5. Scalable, fault-tolerant management of grid services: Application to messaging middleware. [D] . Gadgil, Harshawardhan. 2007

机译：网格服务的可扩展，容错管理：应用于消息中间件。
6. An improved ant colony optimization algorithm with fault tolerance for job scheduling in grid computing systems [O] . Hajara Idris, Absalom E. Ezugwu, Sahalu B. Junaidu, -1

机译：网格计算系统中一种具有容错能力的蚁群优化算法
7. Fault tolerance in grid computing by resource clustering [O] . Miloud Khaldi, Mohammed Rebbah, Boudjelal Meftah, 2020

机译：资源群集网格计算中的容错

A resource management and fault tolerance services in grid computing

摘要

著录项

相似文献

相关主题

期刊订阅