首页> 外文学位 >Atomic block formation for explicit data graph execution architectures .
【24h】

Atomic block formation for explicit data graph execution architectures .

机译:显式数据图执行体系结构的原子块形成。

获取原文
获取原文并翻译 | 示例

摘要

Limits on power consumption, complexity, and on-chip latency have focused computer architects on power-efficient designs that exploit parallelism. One approach divides programs into atomic blocks of operations that execute semi-independently, which efficiently creates a large window of potentially concurrent operations.;This dissertation studies the intertwined roles of the compiler, architecture, and microarchitecture in achieving efficiency and high performance with a block-atomic architecture.;For such an architecture to achieve high performance the compiler must form blocks effectively. The compiler must create large blocks of instructions to amortize the per-block overhead, but control flow and content restrictions limit the compiler's options. Block formation should consider factors such of frequency of execution, block size such as selecting control-flow paths that are frequently executed, and exploiting locality of computations to reduce communication overheads.;This dissertation determines what characteristics of programs influence block formation and proposes techniques to generate effective blocks. The first contribution is a method for solving phase-ordering problems inherent to block formation, mitigating the tension between block-enlarging optimizations---if-conversion, tail duplication, loop unrolling, and loop peeling---as well as scalar optimizations. Given these optimizations, analysis shows that the remaining obstacles to creating larger blocks are inherent in the control flow structure of applications, and furthermore that any fixed block size entails a sizable amount of wasted space. To eliminate this overhead, this dissertation proposes an architectural implementation of variable-size blocks that allow the compiler to dramatically improve block efficiency.;We use these mechanisms to develop policies for block formation that achieve high performance on a range of applications and processor configurations. We find that the best policies differ significantly depending on the number of participating cores. Using machine learning, we discover generalized policies for particular hardware configurations and find that the best policy varies significantly between applications and based on the number of parallel resources available in the microarchitecture. These results show that effective and efficient block-atomic execution is possible when the compiler and microarchitecture are designed cooperatively.
机译:功耗,复杂性和片上延迟的限制使计算机架构师专注于利用并行性的高能效设计。一种方法将程序分为半独立执行的原子操作块,这有效地创建了一个潜在的并发操作的大窗口。本论文研究了编译器,体系结构和微体系结构在通过块实现效率和高性能方面的交织作用。原子体系结构;为了使这种体系结构实现高性能,编译器必须有效地形成块。编译器必须创建较大的指令块以分摊每个块的开销,但是控制流和内容限制限制了编译器的选项。块形成应考虑执行频率,块大小(例如选择经常执行的控制流路径)以及利用计算的局部性来减少通信开销等因素。产生有效的区块。第一个贡献是解决块形成所固有的相序问题,减轻块扩大优化(如转换,尾部复制,循环展开和循环剥离)以及标量优化之间的紧张关系的方法。有了这些优化,分析表明,创建更大的块的其余障碍是应用程序的控制流结构固有的,此外,任何固定的块大小都会导致相当数量的浪费空间。为了消除这种开销,本文提出了一种可变大小块的体系结构实现,使编译器可以显着提高块效率。我们使用这些机制来开发块形成策略,以在各种应用程序和处理器配置上实现高性能。我们发现,最佳策略根据参与核心的数量而有很大不同。使用机器学习,我们发现了针对特定硬件配置的通用策略,并发现最佳策略在应用程序之间以及基于微体系结构中可用并行资源的数量而有很大不同。这些结果表明,当编译器和微体系结构协同设计时,有效且高效的块原子执行是可能的。

著录项

  • 作者

    Maher, Bertrand Allen.;

  • 作者单位

    The University of Texas at Austin.;

  • 授予单位 The University of Texas at Austin.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 185 p.
  • 总页数 185
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号