首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer
【24h】

Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

机译:Cray XMT多线程超级计算机的快速准确的仿真

获取原文
获取原文并翻译 | 示例
           

摘要

Irregular applications, such as data mining or graph-based computations, show unpredictable memoryetwork access patterns and control structures. Massively multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2, and XMT, appear to address irregular application requirements better than commodity clusters. However, the research on massively multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy, and customization. At the same time, Shared Memory MultiProcessors (SMPs) with multicore processors have become an attractive platform to simulate large-scale systems. This paper introduces a cycle-level simulator of the massively multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques implemented to obtain high-simulation speed while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at runtime and includes a parametric network and memory model that takes into account contention and hot spotting. On a modern 48-core SMP host, the proposed infrastructure simulates a large set of irregular applications 500 to 2,000 times slower than real time when compared to a 128-processor XMT, with an accuracy error under 10 percent. Emulation is only from 25 to 200 times slower than real time. The paper also presents a case study, where the simulation infrastructure is used to identify bottlenecks in the current XMT architecture and to estimate the performance scaling of a possible multicore design with next generation memory and- network interconnect.
机译:诸如数据挖掘或基于图形的计算之类的不规则应用程序显示出不可预测的内存/网络访问模式和控制结构。具有大量处理器数量的大规模多线程体系结构,例如Cray MTA-1,MTA-2和XMT,似乎比商品集群更好地解决了不规则的应用程序需求。但是,由于诸如机器大小,内存占用,仿真速度,准确性和自定义之类的问题,目前缺乏足够的体系结构仿真基础结构限制了对大规模多线程系统的研究。同时,具有多核处理器的共享内存多处理器(SMP)已成为模拟大型系统的有吸引力的平台。本文介绍了大型多线程Cray XMT超级计算机的周期级模拟器。该模拟器运行未修改的XMT应用程序。我们讨论了如何应对其发展所带来的挑战,并详细介绍了为保持高准确性而实现高仿真速度的技术。通过将XMT处理器(具有128个硬件线程的ThreadStorm)映射到主机计算核心,随着模拟处理器数量的增加(直至可用主机核心数量),模拟速度保持恒定。该模拟器支持在运行时在不同精度级别之间进行零开销切换,并包括考虑了竞争和热点的参数网络和内存模型。在现代的48核SMP主机上,与128处理器XMT相比,拟议的基础架构可模拟大量不规则应用程序,其速度比实时速度慢500至2,000倍,而准确度误差低于10%。仿真仅比实时慢25到200倍。本文还提供了一个案例研究,其中使用仿真基础结构来确定当前XMT体系结构中的瓶颈,并估计具有下一代内存和网络互连的可能多核设计的性能扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号