...
首页> 外文期刊>Parallel Processing Letters >HIGH PRECISION INTEGER MULTIPLICATION WITH A GPU USING STRASSEN'S ALGORITHM WITH MULTIPLE FFT SIZES
【24h】

HIGH PRECISION INTEGER MULTIPLICATION WITH A GPU USING STRASSEN'S ALGORITHM WITH MULTIPLE FFT SIZES

机译:使用具有多个FFT大小的斯特拉斯森算法的GPU进行高精度整数乘法

获取原文
获取原文并翻译 | 示例
           

摘要

We have improved our prior implementation of Strassens algorithm for high performance multiplication of very large integers on a general purpose graphics processor (GPU). A combination of algorithmic and implementation optimizations result in a factor of up to 13.9 speed improvement over our previous work, running on an NVIDIA 295. We have also reoptimized the implementation for an NVIDIA 480, from which we obtain a factor of up to 19 speedup in comparison with a Core i7 processor core of the same technology generation. To provide a fairer chip to chip comparison, we also determined total GPU throughput on a set of multiplications relative to all of the cores on a multicore chip running in parallel. We find that the GTX 480 provides a factor of six higher throughput than all four cores/eight threads of the Core i7. This paper discusses how we adapted the algorithm to operate within the limitations of the GPU and how we dealt with other issues encountered in the implementation process, including details of the memory layout of our FFTs. Compared with our earlier work, which used Karatsuba's algorithm to guide multiplication of different operand sizes built on top of Strassen's algorithm being applied to fixed-size segments of the operands, we are now able to apply Strassen's algorithm directly to operands ranging in size from 255K bits to 16,320K bits.
机译:我们已经改进了Strassens算法的先前实现,可以在通用图形处理器(GPU)上对非常大的整数进行高性能乘法。算法和实现优化的结合,使我们在NVIDIA 295上运行的工作比以前的速度提高了13.9倍。我们还对NVIDIA 480的实现进行了重新优化,从中我们获得了高达19倍的提速。与同类技术的Core i7处理器内核相比。为了提供更公平的芯片间比较,我们还确定了一组乘法运算相对于并行运行的多核芯片上所有内核的总GPU吞吐量。我们发现,GTX 480的吞吐量是Core i7的所有四个内核/八个线程的六倍。本文讨论了如何使算法适应GPU的限制,以及如何处理实现过程中遇到的其他问题,包括FFT的内存布局细节。与我们以前的工作相比,之前的工作使用Karatsuba算法来指导在将Strassen算法应用于操作数的固定大小段的基础上对不同操作数大小进行乘法运算,现在,我们能够将Strassen算法直接应用于大小为255K的操作数位到16,320K位

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号