CAWA: Coordinated warp scheduling and Cache Prioritization for critical warp acceleration of GPGPU workloads

机译：CAWA：协调的warp调度和缓存优先级划分，可用于GPGPU工作负载的关键warp加速

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The ubiquity of graphics processing unit (GPU) architectures has made them efficient alternatives to chipmultiprocessors for parallel workloads. GPUs achieve superior performance by making use of massive multi-threading and fast context-switching to hide pipeline stalls and memory access latency. However, recent characterization results have shown that general purpose GPU (GPGPU) applications commonly encounter long stall latencies that cannot be easily hidden with the large number of concurrent threads/warps. This results in varying execution time disparity between different parallel warps, hurting the overall performance of GPUs - the warp criticality problem. To tackle the warp criticality problem, we propose a coordinated solution, criticality-aware warp acceleration (CAWA), that efficiently manages compute and memory resources to accelerate the critical warp execution. Specifically, we design (1) an instruction-based and stall-based criticality predictor to identify the critical warp in a thread-block, (2) a criticality-aware warp scheduler that preferentially allocates more time resources to the critical warp, and (3) a criticality-aware cache reuse predictor that assists critical warp acceleration by retaining latency-critical and useful cache blocks in the L1 data cache. CAWA targets to remove the significant execution time disparity in order to improve resource utilization for GPGPU workloads. Our evaluation results show that, under the proposed coordinated scheduler and cache prioritization management scheme, the performance of the GPGPU workloads can be improved by 23% while other state-of-the-art schedulers, GTO and 2-level schedulers, improve performance by 16% and −2% respectively.

机译：图形处理单元（GPU）架构无处不在，使其成为并行工作负载的芯片多处理器的有效替代品。 GPU通过使用大量的多线程和快速的上下文切换来隐藏流水线停滞和内存访问延迟，从而获得卓越的性能。但是，最近的表征结果表明，通用GPU（GPGPU）应用程序通常会遇到较长的停顿延迟，而延迟延迟很容易被大量并发线程/线程束隐藏。这导致不同并行扭曲之间的执行时间差异不同，从而损害了GPU的整体性能-扭曲临界性问题。为了解决翘曲关键性问题，我们提出了一种协调解决方案，即关键性感知翘曲加速（CAWA），它可以有效地管理计算和内存资源以加速关键翘曲的执行。具体来说，我们设计（1）基于指令和基于停顿的关键性预测器，以识别线程块中的关键扭曲，（2）关键性感知扭曲调度器，该调度器优先向关键扭曲分配更多的时间资源，并且（ 3）关键度感知的缓存重用预测器，通过将延迟关键和有用的缓存块保留在L1数据缓存中，有助于关键扭曲加速。 CAWA的目标是消除明显的执行时间差异，以提高GPGPU工作负载的资源利用率。我们的评估结果表明，在提出的协调调度程序和缓存优先级管理方案下，GPGPU工作负载的性能可以提高23％，而其他最新的调度程序（GTO和2级调度程序）则可以提高性能。分别为16％和-2％。

著录项

来源
《42th Annual International Symposium on Computer Architecture》|2015年|515-527|共13页
会议地点 Portland OR(US)
作者
Lee Shin-Ying; Arunkumar Akhil; Wu Carole-Jean;
展开▼
作者单位

School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. CAWA: Coordinated Warp Scheduling and Cache Prioritization for Critical Warp Acceleration of GPGPU Workloads [J] . Shin-Ying Lee, Akhil Arunkumar, Carole-Jean Wu Computer architecture news . 2015,第3期

机译：CAWA：协调的翘曲调度和缓存优先级，用于GPGPU工作负载的关键翘曲加速
2. CWLP: coordinated warp scheduling and locality-protected cache allocation on GPUs [J] . Yang ZHANG, Zuo-cheng XING, Cang LIU, 浙江大学学报（英文版）（C辑：计算机与电子） . 2018,第002期

机译：CWLP：GPU上的协调翘曲调度和受区域保护的缓存分配
3. FRF: Toward Warp-Scheduler Friendly STT-RAM/SRAM Fine-Grained Hybrid GPGPU Register File Design [J] . Deng Quan, Zhang Youtao, Zhao Zhenyu, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第10期

机译：FRF：朝着经线调度器友好的STT-RAM / SRAM精细颗粒混合GPGPU注册文件设计
4. CAWA: Coordinated warp scheduling and Cache Prioritization for critical warp acceleration of GPGPU workloads [C] . Lee Shin-Ying, Arunkumar Akhil, Wu Carole-Jean Annual International Symposium on Computer Architecture . 2015

机译：CAWA：CONORDINATION WARP调度和高速缓存优先级，用于GPGPU工作负载的严重扭曲加速度
5. Predicting Critical Warps in Near-Threshold GPGPU Applications Using a Dynamic Choke Point Analysis [D] . Sanyal, Sourav. 2019

机译：使用动态扼流点分析预测近阈值GPGPU应用中的临界扭曲
6. CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU [O] . Jianliang Ma, Jinglei Meng, Tianzhou Chen, 2015

机译：CaLRS：GPGPU上的关键感知共享LLC请求调度算法
7. Predictive Warp Scheduling for Efficient Execution in GPGPU [O] . Abhinish Anand, Winnie Thomas, Suryakant Toraskar, 2021

机译：GPGPU高效执行的预测扭曲调度
8. Non-Preemptive Time Warp Scheduling Algorithms. [R] . Burdorf, C. D., Marti, J. B. 1990

机译：非抢占式时间扭曲调度算法。

CAWA: Coordinated warp scheduling and Cache Prioritization for critical warp acceleration of GPGPU workloads

摘要

著录项

相似文献

相关主题

期刊订阅