首页> 外文会议>42th Annual International Symposium on Computer Architecture >CAWA: Coordinated warp scheduling and Cache Prioritization for critical warp acceleration of GPGPU workloads
【24h】

CAWA: Coordinated warp scheduling and Cache Prioritization for critical warp acceleration of GPGPU workloads

机译:CAWA:协调的warp调度和缓存优先级划分,可用于GPGPU工作负载的关键warp加速

获取原文
获取原文并翻译 | 示例

摘要

The ubiquity of graphics processing unit (GPU) architectures has made them efficient alternatives to chipmultiprocessors for parallel workloads. GPUs achieve superior performance by making use of massive multi-threading and fast context-switching to hide pipeline stalls and memory access latency. However, recent characterization results have shown that general purpose GPU (GPGPU) applications commonly encounter long stall latencies that cannot be easily hidden with the large number of concurrent threads/warps. This results in varying execution time disparity between different parallel warps, hurting the overall performance of GPUs - the warp criticality problem. To tackle the warp criticality problem, we propose a coordinated solution, criticality-aware warp acceleration (CAWA), that efficiently manages compute and memory resources to accelerate the critical warp execution. Specifically, we design (1) an instruction-based and stall-based criticality predictor to identify the critical warp in a thread-block, (2) a criticality-aware warp scheduler that preferentially allocates more time resources to the critical warp, and (3) a criticality-aware cache reuse predictor that assists critical warp acceleration by retaining latency-critical and useful cache blocks in the L1 data cache. CAWA targets to remove the significant execution time disparity in order to improve resource utilization for GPGPU workloads. Our evaluation results show that, under the proposed coordinated scheduler and cache prioritization management scheme, the performance of the GPGPU workloads can be improved by 23% while other state-of-the-art schedulers, GTO and 2-level schedulers, improve performance by 16% and −2% respectively.
机译:图形处理单元(GPU)架构无处不在,使其成为并行工作负载的芯片多处理器的有效替代品。 GPU通过使用大量的多线程和快速的上下文切换来隐藏流水线停滞和内存访问延迟,从而获得卓越的性能。但是,最近的表征结果表明,通用GPU(GPGPU)应用程序通常会遇到较长的停顿延迟,而延迟延迟很容易被大量并发线程/线程束隐藏。这导致不同并行扭曲之间的执行时间差异不同,从而损害了GPU的整体性能-扭曲临界性问题。为了解决翘曲关键性问题,我们提出了一种协调解决方案,即关键性感知翘曲加速(CAWA),它可以有效地管理计算和内存资源以加速关键翘曲的执行。具体来说,我们设计(1)基于指令和基于停顿的关键性预测器,以识别线程块中的关键扭曲,(2)关键性感知扭曲调度器,该调度器优先向关键扭曲分配更多的时间资源,并且( 3)关键度感知的缓存重用预测器,通过将延迟关键和有用的缓存块保留在L1数据缓存中,有助于关键扭曲加速。 CAWA的目标是消除明显的执行时间差异,以提高GPGPU工作负载的资源利用率。我们的评估结果表明,在提出的协调调度程序和缓存优先级管理方案下,GPGPU工作负载的性能可以提高23%,而其他最新的调度程序(GTO和2级调度程序)则可以提高性能。分别为16%和-2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号