首页> 外文学位 >FPGA Implementation of QR Decomposition Algorithms Using High-Level Synthesis on Zynq SoC
【24h】

FPGA Implementation of QR Decomposition Algorithms Using High-Level Synthesis on Zynq SoC

机译:Zynq SoC上使用高级综合的QR分解算法的FPGA实现

获取原文
获取原文并翻译 | 示例

摘要

Matrix decomposition and computation constitute an important part of various signal processing, image processing, and communication systems. A better solution in terms of power, performance, and area, can lead to improved performance of the whole system. Designing and testing a new idea is a big challenge due to time limitations. So, a better implementation flow using High-Level Synthesis is discussed. This flow is used to implement QR decomposition algorithms. Three different QR factorization techniques, Gram-Schmidt, Givens Rotation, and Householder Transformation is discussed. These algorithms are compared in terms of area, performance, and precision.;All the algorithms are implemented with two different variations in terms of the data type used. A 32-bit floating-point implementation and 16-bit fixed-point implementation are discussed. Results for different designs with various optimization techniques like loop unrolling and pipelining are presented. A scalable architecture is implemented for all the algorithms which are compared for a 10 x 10 matrix architecture. Results for scaled up 100 x 100 matrix architecture are also discussed for the Gram-Schmidt algorithm. Gram-Schmidt had the best performance in all. The performance of Gram Schmidt algorithm was improved by a factor of 3 for 10 x 10 matrix size and by a factor of up to 10 for 100 x 100 matrix size using different optimizations. Givens rotation was close in terms of performance, but the Householder Transformation was four times slower compared to other two algorithms, the reason being the high complexity of the algorithm. All floating-point implementations had nearly 100% precision but varied from 3% to 5% in average error for fixed-point data-type for a 10 x 10 implementation.;All the algorithms were coded in C++ and synthesized using High-Level Synthesis using Xilinx Vivado HLS 2016.4 tool. This generated an IP core which was imported to Xilinx Vivado 2016.4 for implementation. The design was targeted for Zedboard, a Zynq -- 7020 Extensible Processing Platform (EPP) Development Kit, which has a 7 series Xilinx FPGA architecture and a dual core ARM Cortex A-9 processor.
机译:矩阵分解和计算构成各种信号处理,图像处理和通信系统的重要组成部分。就功率,性能和面积而言,更好的解决方案可以改善整个系统的性能。由于时间限制,设计和测试新想法是一个巨大的挑战。因此,讨论了使用高级综合的更好的实现流程。此流程用于实现QR分解算法。讨论了三种不同的QR因式分解技术,Gram-Schmidt,Givens旋转和Householder变换。对这些算法进行了面积,性能和精度方面的比较。;所有算法都在所使用的数据类型方面以两种不同的方式实现。讨论了32位浮点实现和16位定点实现。给出了采用各种优化技术(如循环展开和流水线)的不同设计的结果。对于所有算法,都实现了可伸缩的体系结构,并与10 x 10矩阵体系结构进行了比较。对于Gram-Schmidt算法,还讨论了按比例放大100 x 100矩阵体系结构的结果。 Gram-Schmidt的表现最佳。使用不同的优化方法,对于10 x 10矩阵大小,Gram Schmidt算法的性能提高了3倍,而对于100 x 100矩阵大小,性能提高了10倍。给定的轮换在性能上是接近的,但是Householder Transformation比其他两种算法慢四倍,原因是该算法的复杂性很高。对于10 x 10的实现,所有浮点实现的精度接近100%,但定点数据类型的平均误差在3%到5%之间。所有算法都用C ++编码,并使用高级综合进行合成使用Xilinx Vivado HLS 2016.4工具。这生成了一个IP内核,该内核已导入Xilinx Vivado 2016.4中以进行实施。该设计针对的是Zedboard,即Zynq-7020可扩展处理平台(EPP)开发套件,该套件具有7系列Xilinx FPGA架构和双核ARM Cortex A-9处理器。

著录项

  • 作者

    Desai, Parth.;

  • 作者单位

    Illinois Institute of Technology.;

  • 授予单位 Illinois Institute of Technology.;
  • 学科 Computer engineering.;Electrical engineering.
  • 学位 M.S.
  • 年度 2017
  • 页码 62 p.
  • 总页数 62
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号