首页> 外文学位 >Compile-time and run-time optimizations for enhancing locality and parallelism on multi-core and many-core systems.

【24h】

Compile-time and run-time optimizations for enhancing locality and parallelism on multi-core and many-core systems.

机译：编译时和运行时优化，用于增强多核和多核系统上的局部性和并行性。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Current trends in computer architecture exemplify the emergence of multiple processor cores on a chip. The modern multiple-core computer architectures that include general-purpose multi-core architectures (from Intel, AMD, IBM, and Sun), and specialized parallel architectures such as the Cell Broadband Engine and Graphics Processing Units (GPUs) have very high computation power per chip. A significant challenge to be addressed in these systems is the effective load-balanced utilization of the processor cores. Memory subsystem has always been a performance bottleneck in computer systems and it is more so, with the emergence of processor subsystem with multiple on-chip processor cores. Effectively managing the on-chip and off-chip memories and enhancing data reuse to maximize memory performance is another significant challenge in modern multiple-core architectures.;Our work addresses these challenges in multi-core and many-core systems, through various compile-time and run-time optimization techniques. We provide effective automatic compiler support for managing on-chip and off-chip memory accesses, with the compiler making effective decisions on what elements to move in and move out of on-chip memory, when and how to move them, and how to efficiently access the elements brought into on-chip memory. We develop an effective tiling approach for mapping computation in regular programs on to many-core systems like GPUs. We develop an automatic approach for compiler-assisted dynamic scheduling of computation to enhance load balancing for parallel tiled execution on multi-core systems.;There are various issues that are specific to the target architecture which need attention to maximize application performance on the architecture. First, the levels of parallelism available and the appropriate granularity of parallelism needed for the target architecture have to be considered while mapping the computation. Second, the memory access model may be inherent to the architecture and optimizations have to be developed for the specific memory access model. We develop compile-time transformation approaches to address performance factors related to parallelism and data locality that are GPU architecture-specific, and develop an end-to-end compiler framework for GPUs.

机译：计算机体系结构的当前趋势证明了芯片上多个处理器内核的出现。包括通用多核体系结构（来自Intel，AMD，IBM和Sun）的现代多核计算机体系结构以及诸如单元宽带引擎和图形处理单元（GPU）之类的专用并行体系结构具有很高的计算能力。每个芯片。在这些系统中要解决的一个重大挑战是处理器内核的有效负载均衡利用。内存子系统一直是计算机系统中的性能瓶颈，而随着具有多个片上处理器核心的处理器子系统的出现，情况更是如此。在现代多核架构中，有效管理片上和片外存储器以及增强数据重用性以最大化存储器性能是另一个重大挑战。我们的工作是通过各种编译方式来解决多核和多核系统中的这些挑战，时间和运行时优化技术。我们提供有效的自动编译器支持，以管理片内和片外存储器访问，并且编译器可以有效地决定哪些元素要移入或移出片内存储器，何时以及如何移动它们以及如何高效地进行操作。访问带入片内存储器的元素。我们开发了一种有效的切片方法，用于将常规程序中的计算映射到GPU等许多核心系统上。我们开发了一种自动方法，用于编译器辅助的动态计算调度，以增强多核系统上并行平铺执行的负载平衡。有一些特定于目标体系结构的问题，需要注意以最大程度地提高体系结构上的应用程序性能。首先，在映射计算时，必须考虑可用的并行度级别和目标体系结构所需的适当并行度。其次，内存访问模型可能是体系结构固有的，必须为特定的内存访问模型开发优化。我们开发了编译时转换方法来解决与GPU架构相关的与并行性和数据局部性有关的性能因素，并开发了GPU的端到端编译器框架。

著录项

作者
Baskaran, Muthu Manikandan.;
展开▼
作者单位

The Ohio State University.;

展开▼
授予单位 The Ohio State University.;
学科 Computer Science.
学位 Ph.D.
年度 2009
页码 145 p.
总页数 145
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:38:24

相似文献

外文文献
中文文献
专利

1. EXPLOITING MULTI-CORE AND MANY-CORE PARALLELISM FOR SUBSPACE CLUSTERING [J] . Amitava DATTA, Amardeep KAUR, Tobias LAUER, International Journal of Applied Mathematics and Computer Science . 2019,第1期

机译：探索多核和多核并行性用于子集群
2. The Agamid design-space exploration framework Task-accurate simulation of hardware-enhanced run-time management for many-core [J] . Gregorek Daniel, Garcia-Ortiz Alberto Design automation for embedded systems . 2018,第4期

机译：AgaMID设计空间探索框架任务 - 用于许多核心的硬件增强运行时管理的准确模拟
3. Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core Systems [J] . KARAN AGGARWAL, UDAY BONDH UGULA ACM Transactions on Parallel Computing . 2020,第4期

机译：优化多核和多核系统的线性束缚评估算法
4. A Study of Leveraging Memory Level Parallelism for DRAM System on Multi-core/Many-Core Architecture [C] . Chen Licheng, Huang Yongbing, Bao Yungang, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications . 2013

机译：利用多核/多核体系结构的DRAM系统利用内存级并行性的研究
5. A Scalable Locality-aware Adaptive Work-stealing Scheduler for Multi-core Task Parallelism. [D] . Guo, Yi. 2010

机译：用于多核任务并行性的可扩展的可感知位置的自适应工作窃取调度程序。
6. Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing [O] . Mikyung Kang, Dong-In Kang, Stephen P. Crago, 2011

机译：云计算中多核架构运行时监视器的设计和开发
7. A Study of Leveraging Memory Level Parallelism for DRAM System on Multi-core/Many-Core Architecture [O] . Licheng Chen, Yongbing Huang, Yungang Bao, 2013

机译：用于多核/许多核心架构DRAM系统的存储器级并行度的研究
8. Multi-Core and Many-Core Shaped-Memory Parallel Raycasting Volume Rendering Optimization and Tuning. [R] . Bethel, E. W., Howison, M. 2012

机译：多核和多核形状内存并行光线渲染体绘制优化和调整。

Compile-time and run-time optimizations for enhancing locality and parallelism on multi-core and many-core systems.

摘要

著录项

相似文献

相关主题

期刊订阅