Speeding Up Stencil Computations with Kernel Convolution

机译：通过内核卷积加速模板计算

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A technique to speed up stencil computation is introduced. Computation and data reuse schemes are developed for its application to 1- and 3-dimensional stencils. The approach traverses the data domain fewer times than a state-of-the-art, straightforward iterative stencil implementation would. Performance results are shown for a variety of platforms, exemplifying how it can be straightforwardly applied with existing techniques and frameworks. The technique, named Aggregate Stencil-Loop Iteration (ASLI), works by applying a stencil obtained by the original stencil operator convolved with itself one or more times. This more complex operator creates new opportunities for in-register data reuse and increases the FLOPs-to-load ratio. The total number of FLOPs decreases for 1D but increases for 2D and 3D star-shaped stencils. In both scenarios, speed-up relative to the state-of-the-art is achieved. ASLI is relatively easy to implement and works synergistically with existing methods to optimize stencil computations.

机译：介绍了一种加快模板计算速度的技术。开发了计算和数据重用方案，以将其应用于一维和三维模板。该方法比最先进的，直接的迭代模板实现遍历数据域的次数更少。显示了针对各种平台的性能结果，举例说明了如何将其直接应用到现有技术和框架中。该技术名为“聚合模板循环迭代（ASLI）”，其工作原理是应用由原始模板操作员对其自身进行一次或多次卷积而获得的模板。这个更复杂的运营商为寄存器内数据重用创造了新的机会，并提高了FLOP与负载的比率。对于1D，FLOP的总数减少，但对于2D和3D星形模板，FLOP的总数增加。在这两种情况下，都可以实现相对于最新技术的加速。 ASLI相对容易实现，并且与现有方法协同工作以优化模板计算。

著录项

来源
《IEEE International Symposium on Computer Architecture and High Performance Computing》|2016年|76-83|共8页
会议地点
作者
Guilherme C. Januario; Bryan S. Rosenburg; Yoonho Park; Michael Perrone; Jose Moreira; Tereza C.M.B. Carvalho;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Three-dimensional displays; Kernel; Convolution; Mathematical model; Aggregates; Registers; Electronic mail;

机译：三维显示;内核;卷积;数学模型;集合;寄存器;电子邮件;

相似文献

外文文献
中文文献
专利

1. Using GPU's to Accelerate Stencil-based Computation Kernels for the Development of Large Scale Scientific Applications on Heterogeneous Systems [J] . Jian Tao, Marek Blazewicz, Steven R. Brandt ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2012,第8期

机译：使用GPU加速基于模板的计算内核，以开发异构系统上的大规模科学应用程序
2. Computation of Mellin convolution integrals with a logarithmic kernel: application to the third Appell function [J] . Chelo Ferreira, J. L. López, Ester Pérez Sinusía Integral transforms and special functions . 2014,第7a9期

机译：具有对数核的Mellin卷积积分的计算：在第三Appell函数中的应用
3. FPGA-based architecture for the real-time computation of 2-D convolution with large kernel size [J] . Javier Toledo-Moreo F., Javier Martínez-Alvarez J., Garrigós-Guerrero J., Journal of systems architecture . 2012,第8期

机译：基于FPGA的体系结构可实时计算大内核尺寸的二维卷积
4. Speeding Up Stencil Computations with Kernel Convolution [C] . Guilherme C. Januario, Bryan S. Rosenburg, Yoonho Park, International Symposium on Computer Architecture and High Performance Computing . 2016

机译：使用核卷积加快模板计算
5. New Frontiers in Polar Coding: Large Kernels, Convolutional Decoding, and Deletion Channels [D] . Chaghooshi, Arman Fazeli. 2018

机译：极性编码的新前沿：大核，卷积解码和删除频道
6. Systematic computational exploration of the parameter space of the multi-compartment model of the lobster pyloric pacemaker kernel suggests that the kernel can achieve functional activity under various parameter configurations [O] . Tomasz G Smolinski, Cristina Soto-Treviño, Pascale Rabbah, 2007

机译：对龙虾幽门起搏器内核多室模型参数空间的系统计算探索表明该内核可以在各种参数配置下实现功能活动
7. Convolution Kernel for Fast CPU/GPU Computation of 2D/3D Isotropic Gradients on a Square/Cubic Lattice [O] . Sebastien Leclaire, Maud El-Hachem, Marcelo Reggio 2012

机译：卷积核，用于快速CPU / GPU计算方形/立方格子上的2D / 3D各向同性梯度的计算

Speeding Up Stencil Computations with Kernel Convolution

摘要

著录项

相似文献

相关主题

期刊订阅