Fast multipole methods on graphics processors

Gumerov NA; Duraiswami R

首页> 外文期刊>Journal of Computational Physics >Fast multipole methods on graphics processors

【24h】

Fast multipole methods on graphics processors

机译：图形处理器上的快速多极方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The fast multipole method allows the rapid approximate evaluation of sums of radial basis functions. For a specified accuracy, epsilon, the method scales as O(N) in both time and memory compared to the direct method with complexity O(N-2), which allows the solution of larger problems with given resources. Graphical processing units (GPU) are now increasingly viewed as data parallel compute coprocessors that can provide significant computational performance at low price. We describe acceleration of the FMM using the data parallel GPU architecture. The FMM has a complex hierarchical (adaptive) structure, which is not easily implemented on data-parallel processors. We described strategies for parallelization of all components of the FMM, develop a model to explain the performance of the algorithm on the GPU architecture; and determined optimal settings for the FMM on the GPU. These optimal settings are different from those on usual CPUs. Some innovations in the FMM algorithm, including the use of modified stencils, real polynomial basis functions for the Laplace kernel, and decompositions of the translation operators, are also described. We obtained accelerations of the Laplace kernel FMM on a single NVIDIA GeForce 8800 GTX GPU in the range of 30-60 compared to a serial CPU FMM implementation. For a problem with a million sources, the summations involved are performed in approximately one second. This performance is equivalent to solving of the same problem at a 43 Teraflop rate if we use straightforward summation. (c) 2008 Elsevier Inc. All rights reserved.

机译：快速多极方法可以快速近似地评估径向基函数之和。对于指定的精度epsilon，与复杂度为O（N-2）的直接方法相比，该方法在时间和内存上均缩放为O（N），从而可以解决给定资源下的较大问题。图形处理单元（GPU）现在越来越被视为数据并行计算协处理器，可以以低廉的价格提供显着的计算性能。我们使用数据并行GPU架构描述FMM的加速。 FMM具有复杂的分层（自适应）结构，很难在数据并行处理器上实现。我们描述了FMM所有组件并行化的策略，开发了一个模型来解释算法在GPU架构上的性能;并确定了GPU上FMM的最佳设置。这些最佳设置与常规CPU上的设置不同。还介绍了FMM算法中的一些创新，包括使用修改后的模具，拉普拉斯内核的实际多项式基函数以及平移运算符的分解。与串行CPU FMM实现相比，我们在单个NVIDIA GeForce 8800 GTX GPU上获得了30-60范围内的Laplace内核FMM加速。对于一百万个来源的问题，涉及的求和在大约一秒钟内执行。如果我们使用直接求和，则此性能等效于以43 Teraflop的速率解决相同的问题。（c）2008 Elsevier Inc.保留所有权利。

著录项

来源
《Journal of Computational Physics》 |2008年第18期|共24页
作者
Gumerov NA; Duraiswami R;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类应用物理学;
关键词
N-BODY METHODS; PARTICLE SIMULATIONS; 3 DIMENSIONS; ALGORITHMS; GRAPE-6;

机译：N体方法;粒子模拟;3维;算法;图6;

相似文献

外文文献
中文文献
专利

1. Fast multipole methods on graphics processors [J] . Gumerov NA, Duraiswami R Journal of Computational Physics . 2008,第18期

机译：图形处理器上的快速多极方法
2. Graphics processing unit (GPU) accelerated fast multipole BEM with level-skip M2L for 3D elasticity problems [J] . Yingjun Wang, Qifu Wang, Xiaowei Deng, Advances in Engineering Software . 2015,第apra期

机译：图形处理器（GPU）加速了具有级跃M2L的快速多极BEM，可解决3D弹性问题
3. Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units [J] . Takahashi T., Cecka C., Fong W., International Journal for Numerical Methods in Engineering . 2012,第1期

机译：使用图形处理单元的快速多极子方法优化多极子到本地算子
4. Graphics processing unit accelerated Fast Multipole Method - Fast Fourier Transform [C] . Q. Nguyen, V. Dang, O. Kilic IEEE International Symposium on Antennas and Propagation . 2013

机译：图形处理单元加速快速多极方法 - 快速傅里叶变换
5. Fast transforms based on structured matrices with applications to the fast multipole method. [D] . Tang, Zhihui. 2004

机译：基于结构化矩阵的快速变换及其在快速多极点方法中的应用。
6. Fast inverse scattering solutions using the distorted Born iterative method and the multilevel fast multipole algorithm [O] . Andrew J. Hesford, Weng C. Chew -1

机译：使用失真的Born迭代方法和多级快速多极子算法的快速逆散射解
7. Fast Multipole Methods on Graphics Processors [O] . Gumerov, Nail A., Duraiswami, Ramani 2007

机译：图形处理器上的快速多极方法
8. Analysis and Implementation of Particle-to-Particle (P2P) Graphics Processor Unit (GPU) Kernel for Black-Box Adaptive Fast Multipole Method. [R] . Haney, R. H., Darve, E., Ansari, M. P., 2015

机译：黑盒自适应快速多极子粒子到粒子图形处理器单元（GpU）核的分析与实现。

Fast multipole methods on graphics processors

摘要

著录项

相似文献

相关主题

期刊订阅