首页> 外文学位 >Compile-time and run-time optimizations for enhancing locality and parallelism on multi-core and many-core systems.
【24h】

Compile-time and run-time optimizations for enhancing locality and parallelism on multi-core and many-core systems.

机译:编译时和运行时优化,用于增强多核和多核系统上的局部性和并行性。

获取原文
获取原文并翻译 | 示例

摘要

Current trends in computer architecture exemplify the emergence of multiple processor cores on a chip. The modern multiple-core computer architectures that include general-purpose multi-core architectures (from Intel, AMD, IBM, and Sun), and specialized parallel architectures such as the Cell Broadband Engine and Graphics Processing Units (GPUs) have very high computation power per chip. A significant challenge to be addressed in these systems is the effective load-balanced utilization of the processor cores. Memory subsystem has always been a performance bottleneck in computer systems and it is more so, with the emergence of processor subsystem with multiple on-chip processor cores. Effectively managing the on-chip and off-chip memories and enhancing data reuse to maximize memory performance is another significant challenge in modern multiple-core architectures.;Our work addresses these challenges in multi-core and many-core systems, through various compile-time and run-time optimization techniques. We provide effective automatic compiler support for managing on-chip and off-chip memory accesses, with the compiler making effective decisions on what elements to move in and move out of on-chip memory, when and how to move them, and how to efficiently access the elements brought into on-chip memory. We develop an effective tiling approach for mapping computation in regular programs on to many-core systems like GPUs. We develop an automatic approach for compiler-assisted dynamic scheduling of computation to enhance load balancing for parallel tiled execution on multi-core systems.;There are various issues that are specific to the target architecture which need attention to maximize application performance on the architecture. First, the levels of parallelism available and the appropriate granularity of parallelism needed for the target architecture have to be considered while mapping the computation. Second, the memory access model may be inherent to the architecture and optimizations have to be developed for the specific memory access model. We develop compile-time transformation approaches to address performance factors related to parallelism and data locality that are GPU architecture-specific, and develop an end-to-end compiler framework for GPUs.
机译:计算机体系结构的当前趋势证明了芯片上多个处理器内核的出现。包括通用多核体系结构(来自Intel,AMD,IBM和Sun)的现代多核计算机体系结构以及诸如单元宽带引擎和图形处理单元(GPU)之类的专用并行体系结构具有很高的计算能力。每个芯片。在这些系统中要解决的一个重大挑战是处理器内核的有效负载均衡利用。内存子系统一直是计算机系统中的性能瓶颈,而随着具有多个片上处理器核心的处理器子系统的出现,情况更是如此。在现代多核架构中,有效管理片上和片外存储器以及增强数据重用性以最大化存储器性能是另一个重大挑战。我们的工作是通过各种编译方式来解决多核和多核系统中的这些挑战,时间和运行时优化技术。我们提供有效的自动编译器支持,以管理片内和片外存储器访问,并且编译器可以有效地决定哪些元素要移入或移出片内存储器,何时以及如何移动它们以及如何高效地进行操作。访问带入片内存储器的元素。我们开发了一种有效的切片方法,用于将常规程序中的计算映射到GPU等许多核心系统上。我们开发了一种自动方法,用于编译器辅助的动态计算调度,以增强多核系统上并行平铺执行的负载平衡。有一些特定于目标体系结构的问题,需要注意以最大程度地提高体系结构上的应用程序性能。首先,在映射计算时,必须考虑可用的并行度级别和目标体系结构所需的适当并行度。其次,内存访问模型可能是体系结构固有的,必须为特定的内存访问模型开发优化。我们开发了编译时转换方法来解决与GPU架构相关的与并行性和数据局部性有关的性能因素,并开发了GPU的端到端编译器框架。

著录项

  • 作者

    Baskaran, Muthu Manikandan.;

  • 作者单位

    The Ohio State University.;

  • 授予单位 The Ohio State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 145 p.
  • 总页数 145
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:38:24

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号