Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

Villa Oreste; Tumeo Antonino; Secchi Simone; Manzano Joseph B.

首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

【24h】

Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

机译：Cray XMT多线程超级计算机的快速准确的仿真

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Irregular applications, such as data mining or graph-based computations, show unpredictable memoryetwork access patterns and control structures. Massively multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2, and XMT, appear to address irregular application requirements better than commodity clusters. However, the research on massively multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy, and customization. At the same time, Shared Memory MultiProcessors (SMPs) with multicore processors have become an attractive platform to simulate large-scale systems. This paper introduces a cycle-level simulator of the massively multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques implemented to obtain high-simulation speed while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at runtime and includes a parametric network and memory model that takes into account contention and hot spotting. On a modern 48-core SMP host, the proposed infrastructure simulates a large set of irregular applications 500 to 2,000 times slower than real time when compared to a 128-processor XMT, with an accuracy error under 10 percent. Emulation is only from 25 to 200 times slower than real time. The paper also presents a case study, where the simulation infrastructure is used to identify bottlenecks in the current XMT architecture and to estimate the performance scaling of a possible multicore design with next generation memory and- network interconnect.

机译：诸如数据挖掘或基于图形的计算之类的不规则应用程序显示出不可预测的内存/网络访问模式和控制结构。具有大量处理器数量的大规模多线程体系结构，例如Cray MTA-1，MTA-2和XMT，似乎比商品集群更好地解决了不规则的应用程序需求。但是，由于诸如机器大小，内存占用，仿真速度，准确性和自定义之类的问题，目前缺乏足够的体系结构仿真基础结构限制了对大规模多线程系统的研究。同时，具有多核处理器的共享内存多处理器（SMP）已成为模拟大型系统的有吸引力的平台。本文介绍了大型多线程Cray XMT超级计算机的周期级模拟器。该模拟器运行未修改的XMT应用程序。我们讨论了如何应对其发展所带来的挑战，并详细介绍了为保持高准确性而实现高仿真速度的技术。通过将XMT处理器（具有128个硬件线程的ThreadStorm）映射到主机计算核心，随着模拟处理器数量的增加（直至可用主机核心数量），模拟速度保持恒定。该模拟器支持在运行时在不同精度级别之间进行零开销切换，并包括考虑了竞争和热点的参数网络和内存模型。在现代的48核SMP主机上，与128处理器XMT相比，拟议的基础架构可模拟大量不规则应用程序，其速度比实时速度慢500至2,000倍，而准确度误差低于10％。仿真仅比实时慢25到200倍。本文还提供了一个案例研究，其中使用仿真基础结构来确定当前XMT体系结构中的瓶颈，并估计具有下一代内存和网络互连的可能多核设计的性能扩展。

著录项

来源
《Parallel and Distributed Systems, IEEE Transactions on》 |2012年第12期|p.2266-2279|共14页
作者
Villa Oreste; Tumeo Antonino; Secchi Simone; Manzano Joseph B.;
展开▼
作者单位

Pacific Northwest National Laboratory, Richland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Modeling of computer architecture; evaluation; integration and modeling; measurement; modeling; multithreaded processors; simulation of multiple-processor systems; system architectures;

机译：计算机体系结构建模;评估;集成与建模;测量;建模;多线程处理器;多处理器系统仿真;系统架构;

相似文献

外文文献
中文文献
专利

1. Massively multithreaded maxflow for image segmentation on the Cray XMT-2 [J] . Shahid H. Bokhari, Ümit V. Çatalyürek, Metin N. Gurcan Concurrency and Computation . 2014,第18期

机译：大规模多线程maxflow在Cray XMT-2上进行图像分割
2. CRAY XT5 SUPERCOMPUTER NAMED WORLD'S FASTEST SUPERCOMPUTER [J] . Desktop engineering . 2010,第5期

机译：CRAY XT5超级计算机被评为世界上最快的超级计算机
3. Proposed Supercomputer Will Be Six Billion Times Faster Than a Cray-1 [J] . Charles Murray Design News . 2019,第Apra期

机译：提出的超级计算机将比Cray-1快6亿倍
4. Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT [C] . Secchi Simone, Tumeo Antonino, Villa Oreste 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2011

机译：多线程分布式共享内存计算机的竞争模型：Cray XMT
5. Coupling Peridynamics with Finite-Elements for Fast, Stable and Accurate Simulations of Crack Propagation [D] . Lindsay, Payton E. 2017

机译：耦合有限元与周缘动力学，快速，稳定和准确地模拟裂纹扩展
6. Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2 [O] . Shahid H. Bokhari, Ümit V. Çatalyürek, Metin N. Gurcan -1

机译：大规模多线程Maxflow在Cray XMT-2上进行图像分割
7. Implementing and Evaluating Multithreaded Triad Census Algorithms on the Cray XMT [O] . George Chin, Andres Marquez, Kristyn Maschhoff 2013

机译：在Cray XMT上实现和评估多线程三合彩普查算法
8. Numerical simulation of groundwater flow and contaminant transport on the Cray T3D and C90 supercomputers [R] . Ashby, S. F. , Bosl, W. J. , Falgout, R. D. , 1994

机译：Cray T3D和C90超级计算机地下水流和污染物运移的数值模拟

Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

摘要

著录项

相似文献

相关主题

期刊订阅