【24h】

Scalable Relative Debugging

机译:可扩展的相对调试

获取原文
获取原文并翻译 | 示例
           

摘要

Detecting and isolating bugs that arise only at high processor counts is a challenging task. Over a number of years, we have implemented a special debugging method, called "relative debugging," that supports debugging applications as they evolve or are ported to larger machines. It allows a user to compare the state of a suspect program against another reference version even as the number of processors is increased. The innovative idea is the comparison of runtime data to reason about the state of the suspect program. While powerful, a naïve implementation of the comparison phase does not scale to large problems running on large machines. In this paper, we propose two different solutions including a hash-based scheme and a direct point-to-point scheme. We demonstrate the implementation, a case study, as well as the performance, of our techniques on 20K cores of a Cray XE6 system.
机译:检测和隔离仅在处理器数量较高时出现的错误是一项艰巨的任务。多年来,我们已经实现了一种称为“相对调试”的特殊调试方法,该方法在调试应用程序发展或移植到大型计算机时支持它们。即使处理器数量增加,它也允许用户将可疑程序的状态与另一个参考版本进行比较。创新的想法是将运行时数据与可疑程序状态的推理进行比较。尽管功能强大,但比较阶段的简单实现无法扩展到大型计算机上运行的大问题。在本文中,我们提出了两种不同的解决方案,包括基于散列的方案和直接点对点方案。我们演示了我们的技术在Cray XE6系统的20K内核上的实现,案例研究以及性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号