首页> 外文期刊>Operating systems review >Hiding Communication Latency and Coherence Overhead in Software DSMs
【24h】

Hiding Communication Latency and Coherence Overhead in Software DSMs

机译:隐藏软件DSM中的通信延迟和一致性开销

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper we propose the use of a PCI-based programmable protocol controller for hiding communication and coherence overheads in software DSMs. Our protocol controller provides three different types of overhead tolerance: a) moving basic communication and coherence tasks away from computation processors; b) prefetching of diffs; and c) generating and applying diffs with hardware assistance. We evaluate the isolated and combined impact of these features on the performance of TreadMarks. We also compare performance against two versions of the Shrimp-based AURC protocol. Using detailed execution-driven simulations of a 16-node network of workstations, we show that the greatest performance benefits provided by our protocol controller come from our hardware-supported diffs. Reducing the burden of communication and coherence transactions on the computation processor is also beneficial but to a smaller extent. Prefetching is not always profitable. Our results show that our protocol controller can improve running time performance by up to 50% for TreadMarks, which means that it can double the TreadMarks speedups. The overlapping implementation of TreadMarks performs as well or better than AURC for 5 of our 6 applications. We conclude that the simple hardware support we propose allows for the implementation of high-performance software DSMs at low cost. Based on this conclusion, we are building the NCP_2 parallel system at COPPE/UFRJ.
机译:在本文中,我们建议使用基于PCI的可编程协议控制器来隐藏软件DSM中的通信和一致性开销。我们的协议控制器提供三种不同类型的开销容限:a)将基本通信和一致性任务从计算处理器移开; b)差异预取; c)在硬件协助下生成和应用差异。我们评估了这些功能对TreadMarks性能的孤立和综合影响。我们还将性能与基于虾的AURC协议的两个版本进行了比较。通过对由16个节点组成的工作站网络的详细执行驱动仿真,我们证明了协议控制器提供的最大性能优势来自硬件支持的差异。减轻计算处理器上的通信和一致性事务的负担也是有益的,但程度较小。预取并不总是有利可图。我们的结果表明,我们的协议控制器可以将TreadMarks的运行时间性能提高多达50%,这意味着它可以使TreadMarks的速度提高一倍。对于我们的6个应用程序中的5个,TreadMarks的重叠实现比AURC表现更好或更好。我们得出的结论是,我们提出的简单硬件支持允许以低成本实现高性能软件DSM。基于此结论,我们正在COPPE / UFRJ建立NCP_2并行系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号