首页> 外文学位 >FPGA Implementation of QR Decomposition Algorithms Using High-Level Synthesis on Zynq SoC

【24h】

FPGA Implementation of QR Decomposition Algorithms Using High-Level Synthesis on Zynq SoC

机译：Zynq SoC上使用高级综合的QR分解算法的FPGA实现

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Matrix decomposition and computation constitute an important part of various signal processing, image processing, and communication systems. A better solution in terms of power, performance, and area, can lead to improved performance of the whole system. Designing and testing a new idea is a big challenge due to time limitations. So, a better implementation flow using High-Level Synthesis is discussed. This flow is used to implement QR decomposition algorithms. Three different QR factorization techniques, Gram-Schmidt, Givens Rotation, and Householder Transformation is discussed. These algorithms are compared in terms of area, performance, and precision.;All the algorithms are implemented with two different variations in terms of the data type used. A 32-bit floating-point implementation and 16-bit fixed-point implementation are discussed. Results for different designs with various optimization techniques like loop unrolling and pipelining are presented. A scalable architecture is implemented for all the algorithms which are compared for a 10 x 10 matrix architecture. Results for scaled up 100 x 100 matrix architecture are also discussed for the Gram-Schmidt algorithm. Gram-Schmidt had the best performance in all. The performance of Gram Schmidt algorithm was improved by a factor of 3 for 10 x 10 matrix size and by a factor of up to 10 for 100 x 100 matrix size using different optimizations. Givens rotation was close in terms of performance, but the Householder Transformation was four times slower compared to other two algorithms, the reason being the high complexity of the algorithm. All floating-point implementations had nearly 100% precision but varied from 3% to 5% in average error for fixed-point data-type for a 10 x 10 implementation.;All the algorithms were coded in C++ and synthesized using High-Level Synthesis using Xilinx Vivado HLS 2016.4 tool. This generated an IP core which was imported to Xilinx Vivado 2016.4 for implementation. The design was targeted for Zedboard, a Zynq -- 7020 Extensible Processing Platform (EPP) Development Kit, which has a 7 series Xilinx FPGA architecture and a dual core ARM Cortex A-9 processor.

机译：矩阵分解和计算构成各种信号处理，图像处理和通信系统的重要组成部分。就功率，性能和面积而言，更好的解决方案可以改善整个系统的性能。由于时间限制，设计和测试新想法是一个巨大的挑战。因此，讨论了使用高级综合的更好的实现流程。此流程用于实现QR分解算法。讨论了三种不同的QR因式分解技术，Gram-Schmidt，Givens旋转和Householder变换。对这些算法进行了面积，性能和精度方面的比较。；所有算法都在所使用的数据类型方面以两种不同的方式实现。讨论了32位浮点实现和16位定点实现。给出了采用各种优化技术（如循环展开和流水线）的不同设计的结果。对于所有算法，都实现了可伸缩的体系结构，并与10 x 10矩阵体系结构进行了比较。对于Gram-Schmidt算法，还讨论了按比例放大100 x 100矩阵体系结构的结果。 Gram-Schmidt的表现最佳。使用不同的优化方法，对于10 x 10矩阵大小，Gram Schmidt算法的性能提高了3倍，而对于100 x 100矩阵大小，性能提高了10倍。给定的轮换在性能上是接近的，但是Householder Transformation比其他两种算法慢四倍，原因是该算法的复杂性很高。对于10 x 10的实现，所有浮点实现的精度接近100％，但定点数据类型的平均误差在3％到5％之间。所有算法都用C ++编码，并使用高级综合进行合成使用Xilinx Vivado HLS 2016.4工具。这生成了一个IP内核，该内核已导入Xilinx Vivado 2016.4中以进行实施。该设计针对的是Zedboard，即Zynq-7020可扩展处理平台（EPP）开发套件，该套件具有7系列Xilinx FPGA架构和双核ARM Cortex A-9处理器。

著录项

作者
Desai, Parth.;
展开▼
作者单位

Illinois Institute of Technology.;

展开▼
授予单位 Illinois Institute of Technology.;
学科 Computer engineering.;Electrical engineering.
学位 M.S.
年度 2017
页码 62 p.
总页数 62
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. CNN-Grinder: From Algorithmic to High-Level Synthesis descriptions of CNNs for Low-end-low-cost FPGA SoCs [J] . Microprocessors and microsystems . 2020,第Mara期

机译：CNN-Grinder：从算法到对低端，低成本FPGA SoC的CNN的高级综合描述
2. An Exploration Framework for Efficient High-Level Synthesis of Support Vector Machines: Case Study on ECG Arrhythmia Detection for Xilinx Zynq SoC [J] . Tsoutsouras Vasileios, Koliogeorgi Konstantina, Xydis Sotirios, Journal of signal processing systems for signal, image, and video technology . 2017,第2期

机译：支持向量机的高效高级综合探索框架：Xilinx Zynq SoC ECG心律失常检测的案例研究
3. High-Level Synthesis for Accelerating the FPGA Implementation of Computationally Demanding Control Algorithms for Power Converters [J] . Navarro, D., Lucia, IEEE transactions on industrial informatics . 2013,第3期

机译：用于加速FPGA的高级综合实现功率转换器的计算需求控制算法
4. High-level synthesis hardware implementation and verification of HEVC DCT on SoC-FPGA [C] . Belal Mohamed, Amr Elsayed, Omar Amin, International Computer Engineering Conference . 2017

机译：SoC-FPGA上HEVC DCT的高级综合硬件实现和验证
5. FPGA-Based Implementation of QR Decomposition. [D] . Yu, Hanguang. 2014

机译：基于FPGA的QR分解实现。
6. Simulation of Algorithms for Pulse Timing in FPGAs [O] . Michael D. Haselman, Scott Hauck, Thomas K. Lewellen, -1

机译：FPGA中脉冲时序算法的仿真
7. Implementation of an AFDX Interface with Zynq SoC Board in FPGA [O] . Fernando Molina, Pablo Corral, Miguel Aljaro, 2020

机译：在FPGA中实现与Zynq SoC板的AFDX接口

相关主题

FPGA Implementation of QR Decomposition Algorithms Using High-Level Synthesis on Zynq SoC

摘要

著录项

相似文献

相关主题

期刊订阅