首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Synthesizing efficient out-of-core programs for block recursive algorithms using block-cyclic data distributions
【24h】

Synthesizing efficient out-of-core programs for block recursive algorithms using block-cyclic data distributions

机译:使用块循环数据分布为块递归算法合成有效的核心程序

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we present a framework for synthesizing I/O efficient out-of-core programs for block recursive algorithms, such as the fast Fourier transform (FFT) and block matrix transposition algorithms. Our framework uses an algebraic representation which is based on tensor products and other matrix operations. The programs are optimized for the striped Vitter and Shriver's two-level memory model in which data can be distributed using various cyclic(B) distributions in contrast to the normally used physical track distribution cyclic(B/sub d/), where B/sub d/ is the physical disk block size. We first introduce tensor bases to capture the semantics of block-cyclic data distributions of out-of-core data and also data access patterns to out-of-core data. We then present program generation techniques for tensor products and matrix transposition. We accurately represent the number of parallel I/O operations required for the synthesized programs for tensor products and matrix transposition as a function of tensor bases and data distributions. We introduce an algorithm to determine the data distribution which optimizes the performance of the synthesized programs. Further, we formalize the procedure of synthesizing efficient out-of-core programs for tensor product formulas with various block-cyclic distributions as a dynamic programming problem. We demonstrate the effectiveness of our approach through several examples. We show that the choice of an appropriate data distribution can reduce the number of passes to access out-of-core data by as large as eight times for a tensor product and the dynamic programming approach can largely reduce the number of passes to access out-of-core data for the overall tensor product formulas.
机译:在本文中,我们提出了一个用于为块递归算法(例如快速傅里叶变换(FFT)和块矩阵转置算法)合成I / O高效核外程序的框架。我们的框架使用基于张量积和其他矩阵运算的代数表示。该程序针对条带化Vitter和Shriver的两级存储模型进行了优化,在该模型中,可以使用各种循环(B)分布来分配数据,而通常使用的物理磁道分布循环(B / sub d /) d /是物理磁盘块大小。我们首先引入张量基数,以捕获核心外数据的块循环数据分布的语义以及核心外数据的数据访问模式。然后,我们介绍张量积和矩阵转置的程序生成技术。我们准确地表示张量积和矩阵转置的合成程序所需的并行I / O操作数,这些函数是张量基数和数据分布的函数。我们介绍一种确定数据分布的算法,该算法可以优化综合程序的性能。此外,我们将合成具有各种块循环分布的张量积公式的有效核外程序的程序形式化为动态规划问题。我们通过几个例子证明了我们方法的有效性。我们表明,选择适当的数据分布可以将张量积的访问次数减少多达八倍,而动态编程方法可以在很大程度上减少访问外部数据的次数。总体张量积公式的核数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号