首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Autotuning GEMM Kernels for the Fermi GPU
【24h】

Autotuning GEMM Kernels for the Fermi GPU

机译:为Fermi GPU自动调整GEMM内核

获取原文
获取原文并翻译 | 示例
           

摘要

In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arithmetic and memory protected with error correction codes. Being the crucial component of numerical software packages, such as LAPACK and ScaLAPACK, the general dense matrix multiplication routine is one of the more important workloads to be implemented on these devices. This paper presents a methodology for producing matrix multiplication kernels tuned for a specific architecture, through a canonical process of heuristic autotuning, based on generation of multiple code variants and selecting the fastest ones through benchmarking. The key contribution of this work is in the method for generating the search space; specifically, pruning it to a manageable size. Performance numbers match or exceed other available implementations.
机译:近年来,图形芯片的使用已被公认为是加速科学和工程应用的可行方法,自NVIDIA推出Fermi架构以来,更是如此,它具有数值计算必不可少的功能,例如快速双精度算术和用纠错码保护的存储器。作为数字软件包(例如LAPACK和ScaLAPACK)的关键组件,常规的密集矩阵乘法例程是要在这些设备上实现的更重要的工作负载之一。本文介绍了一种方法,该方法可通过启发式自动调谐的规范过程,基于多种代码变体的生成,并通过基准测试选择最快的变体,来生成针对特定体系结构进行了优化的矩阵乘法内核。这项工作的关键贡献在于生成搜索空间的方法。具体来说,将其修剪到可管理的大小。性能数字匹配或超过其他可用的实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号