首页> 外文学位 >Efficient parallel formulations of hierarchical methods and their applications.
【24h】

Efficient parallel formulations of hierarchical methods and their applications.

机译:分层方法及其应用的有效并行表述。

获取原文
获取原文并翻译 | 示例

摘要

Hierarchical methods such as the Fast Multipole Method (FMM) and Barnes-Hut (BH) are used for rapid evaluation of potential (gravitational, electrostatic) fields in particle systems. They are also used for solving integral equations using boundary element methods. The linear systems arising from these methods are dense and are solved iteratively. Hierarchical methods reduce the complexity of the core matrix-vector product from ;We have developed highly scalable parallel formulations of a hybrid FMM/BH method that are capable of handling arbitrarily irregular distributions. We apply these formulations to astrophysical simulations of Plummer and Gaussian galaxies. We have used our parallel formulations to solve the integral form of the Laplace equation. We show that our parallel hierarchical mat-vecs yield high efficiency and overall performance even on relatively small problems. A problem containing approximately 200K nodes takes under a second to compute on 256 processors and yet yields over 85% efficiency. The efficiency and raw performance is expected to increase for bigger problems. For the 200K node problem, our code delivers about 5 GFLOPS of performance on a 256 processor T3D. This is impressive considering the fact that the problem has floating point divides and roots, and very little locality resulting in poor cache performance. A dense matrix-vector product of the same dimensions would require about 0.5 TeraBytes of memory and about 770 TeraFLOPS of computing speed. Clearly, if the loss in accuracy resulting from the use of hierarchical methods is acceptable, our code yields significant savings in time and memory.;We also study the convergence of a GMRES solver built around this mat-vec. We accelerate the convergence of the solver using three preconditioning techniques: diagonal scaling, block-diagonal preconditioning, and inner-outer preconditioning. We study the performance and parallel efficiency of these preconditioned solvers. Using this solver, we solve dense linear systems with hundreds of thousands of unknowns. Solving a 105K unknown problem takes about 10 minutes on a 64 processor T3D. Until very recently, boundary element problems of this magnitude could not even be generated, let alone solved.
机译:快速多极方法(FMM)和Barnes-Hut(BH)等分层方法用于快速评估粒子系统中的势场(重力场,静电场)。它们还用于使用边界元方法求解积分方程。这些方法产生的线性系统是稠密的,可以迭代求解。分层方法降低了核心矩阵向量乘积的复杂性;我们已经开发了可伸缩的FMM / BH混合方法的并行公式,能够处理任意不规则分布。我们将这些公式应用于Plummer和高斯星系的天体模拟。我们使用了平行公式来求解拉普拉斯方程的积分形式。我们表明,即使在相对较小的问题上,我们的并行层次结构Mat-vecs也会产生高效率和整体性能。一个包含大约200K节点的问题需要不到一秒钟的时间就可以在256个处理器上进行计算,但是效率却超过85%。对于更大的问题,效率和原始性能有望提高。对于200K节点问题,我们的代码在256处理器T3D上提供约5 GFLOPS的性能。考虑到问题具有浮点分隔和根,并且局部性极低,导致缓存性能较差,这一点令人印象深刻。相同尺寸的密集矩阵矢量积将需要大约0.5 TB的内存和大约770 TerFLOPS的计算速度。显然,如果使用分层方法导致的精度损失是可以接受的,则我们的代码将节省大量的时间和内存。我们还研究了基于该Mat-vec的GMRES求解器的收敛性。我们使用三种预处理技术来加快求解器的收敛速度:对角缩放,块对角形预处理和内部-外部预处理。我们研究了这些预处理求解器的性能和并行效率。使用此求解器,我们可以求解具有数十万个未知数的密集线性系统。在64处理器T3D上,解决105K未知问题大约需要10分钟。直到最近,这种规模的边界元素问题甚至无法产生,更不用说解决了。

著录项

  • 作者

    Grama, Ananth Y.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Physics Astronomy and Astrophysics.;Computer Science.;Engineering Materials Science.
  • 学位 Ph.D.
  • 年度 1996
  • 页码 76 p.
  • 总页数 76
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号