首页> 外文学位 >Matrix factorization using a block-recursive structure and block-recursive algorithms.
【24h】

Matrix factorization using a block-recursive structure and block-recursive algorithms.

机译:使用块递归结构和块递归算法进行矩阵分解。

获取原文
获取原文并翻译 | 示例

摘要

The divide-and-conquer paradigm yields algorithms that parallelize easily, a very important consideration in high-performance computing. However, high-performance computing also relies on local reuse of data in a memory hierarchy of registers, caches, main memory, and swapping disks.; I have worked with a sequential representation of quadtree matrices that is a divide-and-conquer data structure. This representation uses an indexing scheme, Morton ordering, that automatically blocks the elements of the matrix and promotes memory locality in recursive algorithms over the quadtree. This work focuses on using this representation for two important matrix algorithms, matrix multiplication and QR factorization.; Techniques for generating independent and local code were discovered while implementing a recursive matrix-matrix multiplication over the sequential quadtree matrix. For good memory locality the basic multiplication routine was written as two dual, mutually recursive functions with the recursive calls in each version precisely ordered to reuse data already in cache.; To avoid unnecessary work, minimal decorations annotate each submatrix of the matrix; these decorations are used to ignore zero blocks and to ellide the zero tests on blocks known to be dense. Finally, a strategy was developed for dispatching parallel processes in the multiplication algorithm.; These lessons were then used to develop an efficient QR factorization algorithm for quadtree matrices. It also uses recursive functions that localize data in blocks and that balance parallel processing.; The sequential quadtree matrix and multiplication functions are implemented in C because of its optimizing compilers. Since C optimizers favor iterative code and are deficient for recursive code, some base optimizations were done by hand. These hand optimizations could and should be done by some future optimizing compiler.; Results are mostly favorable; while the quadtree-matrix algorithms do not always perform at the same level as decades-old routines (i.e., BLAS and LAPACK), which are fine-tuned for traditional storage, the quadtree algorithms are competitive and even expose some flaws in these old routines.
机译:分而治之范式产生易于并行化的算法,这是高性能计算中非常重要的考虑因素。但是,高性能计算还依赖于寄存器,高速缓存,主内存和交换磁盘的内存层次结构中数据的本地重用。我处理了四叉树矩阵的顺序表示形式,它是一种分而治之的数据结构。此表示使用索引方案Morton排序,该方案会自动阻塞矩阵的元素并在四叉树上的递归算法中提升内存局部性。这项工作的重点是将这种表示形式用于两个重要的矩阵算法,即矩阵乘法和 QR 因式分解。在顺序四叉树矩阵上实现递归矩阵-矩阵乘法时,发现了用于生成独立代码和本地代码的技术。为了获得良好的内存局部性,基本的乘法例程被编写为两个双重的,相互递归的函数,每个版本中的递归调用都被精确地排序为重用缓存中已有的数据。为避免不必要的工作,最少的装饰会注释矩阵的每个子矩阵。这些修饰用于忽略零块,并消除对已知密集块的零测试。最后,开发了一种在乘法算法中调度并行进程的策略。然后,这些课程被用于为四叉树矩阵开发有效的 QR 分解算法。它还使用递归函数,这些递归函数将数据按块进行本地化并平衡并行处理。顺序四叉树矩阵和乘法函数由于其优化的编译器而在C中实现。由于C优化器偏爱迭代代码并且缺乏递归代码,因此一些基本的优化是手工完成的。这些手动优化可以并且应该由将来的一些优化编译器来完成。结果大部分是有利的;尽管四叉树矩阵算法的性能并不总是与几十年前的例程(即BLAS和LAPACK)的性能相同,后者已针对传统存储进行了微调,但四叉树算法具有竞争力,甚至暴露了这些旧例程中的某些缺陷。 。

著录项

  • 作者

    Frens, Jeremy David.;

  • 作者单位

    Indiana University.;

  • 授予单位 Indiana University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 151 p.
  • 总页数 151
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号