首页> 外文会议>International Advanced Research Workshop on High Performance Computing >Autonomous Performance and Risk Management in Large Distributed Systems and Grids
【24h】

Autonomous Performance and Risk Management in Large Distributed Systems and Grids

机译:大型分布式系统和网格中的自主性能和风险管理

获取原文

摘要

The DARPA UltraLog project has built and deployed a large scale experimental distributed mobile agent system designed to improve distributed system performance and survivability. The system is designed to self-adjust to meet performance and risk goals and constraints in a changing environment. It is built on the Cougaar architecture of mobile agents that perform their tasks from multiple points on the network. The issue of performance and risk in an agent based system arises as it does with any distributed system or grid, and is complicated by the autonomous and mobile nature of agents in a society. The natural parallelism present in many large scale problems makes them good candidates for an agent based solution with autonomous self-healing capabilities. The large logistics applications that are used as a test bed for DARPA Ultralog serve as testimony to the ability of large agent based systems to solve some of the world's most complex tasks in grid computing. Furthermore, the agent systems do this in a distributed and autonomous fashion, obtaining a solution speedup through parallel tasking in the agent environment, and improving survivability and security through the mobile agent architecture. The design of a modern grid system or general distributed system must make effective use of resources to build a system that meets capacity, response time, risk, and cost goals. The same intractable problems that we encounter in distributed system design are now being confronted by real-time solvers in the pursuit of real-time management of system performance and risk. This real-time design problem (or redesign after attack or failure) is complicated by the frequently conflicting goals in performance versus survivability and risk. In a survivable system we try to spread the agents among several remote servers so we do not have major parts of the system vulnerable to attack on any given host. From a performance perspective, we try to assign the agents to a few servers that are close together, or even assign the agents to one server to minimize delays from remote messaging. The UltraLog system is designed to be a highly flexible and robust computing architecture that can meet requirements in performance and reliability and withstand attacks. In this system the attackers will need to have considerable knowledge about the performance space if they are to have any hope of impacting the system. In the UltraLog system we have matched a flexible software architecture with a flexible hardware architecture to create a system that is survivable, testable, and meets performance demands in a distributed environment. In this paper we have described the techniques we have used to build tooling for real time management of performance and risk of large distributed agent systems. We have also outlined some of the steps in the solution process, and some of the basic state changes, or basic moves in the solution space that are used to improve solver speed for the state optimization problem.
机译:DARPA UltraLog项目已经建立和部署了大规模的实验分布式移动代理系统,旨在提高分布式系统性能和生存能力。该系统旨在自我调整,以满足变化环境中的性能和风险目标和约束。它是基于移动代理的Cougaar架构,从网络上的多个点执行他们的任务。基于代理系统的绩效和风险问题产生了与任何分布式系统或网格的影响,并且在社会中的代理人的自主和移动性质是复杂的。许多大规模问题中存在的自然并行性使其成为基于代理的解决方案的良好候选者,具有自主修复能力。用作DARPA UltraLog的测试床的大型物流应用作为基于大型代理系统的能力的证据,以解决网格计算中的一些世界上最复杂的任务。此外,代理系统以分布式和自主方式执行此操作,通过代理环境中的并行任务获得解决方案加速,并通过移动代理架构提高生存能力和安全性。现代电网系统或一般分布式系统的设计必须有效利用资源来构建符合能力,响应时间,风险和成本目标的系统。我们在分布式系统设计中遇到的相同难以应变的问题现在正在追求实时管理系统性能和风险的实时求解。这种实时设计问题(或在攻击或失败后重新设计)因性能与生存能力和风险的经常相互冲突的目标而复杂。在一个可生存的系统中,我们尝试在几个远程服务器中传播代理商,因此我们没有易受任何给定的主机攻击的系统的主要部分。从性能角度来看,我们尝试将代理分配给几个靠近的服务器,甚至将代理分配给一个服务器以最小化远程消息传递的延迟。 UltraLog系统设计为具有高度灵活且强大的计算架构,可以满足性能和可靠性和耐受攻击的要求。在该系统中,如果他们希望影响系统希望,攻击者需要对性能空间具有相当大的知识。在UltraRog系统中,我们与灵活的软件架构匹配了灵活的硬件架构,以创建一个可生存,可测试的系统,并满足分布式环境中的性能需求。在本文中,我们已经描述了我们用于建立实时管理性能和大型分布式代理系统的风险的工具的技术。我们还概述了解决方案过程中的一些步骤,以及一些基本状态的变化,或者在解决方案空间中的基本移动,用于提高状态优化问题的求解速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号