【24h】

QrnPro: New Processor Architecture for Accelerating Quran Applications

机译:QRNPRO:用于加速古兰经应用的新处理器架构

获取原文

摘要

Quran applications include image/video processing, voice recognition, encrypting/decrypting data, etc., which are based on data parallelism. These applications are characterized by structured and regular computations on large data sets. In this paper, new processor architecture called QrnPro is proposed to accelerate Quran applications. QrnPro exploits data parallelism found in Quran applications by adding the vector processing technique to VLIW architecture. QrnPro uses VLIW architecture for processing multiple independent scalar instructions concurrently on parallel execution units. Moreover, data parallelism is expressed by vector instructions and processed on the same parallel execution units of the VLIW architecture. This combination between VLIW and vector processing makes efficient exploitation of resources even though the percentage of data parallelism is not 100%. Instruction memory of size 256×128-bit stores scalar/vector instructions of Quran applications in the form of 128-bit VLIW. A single register file (8-vector×16-element×32-bit or 128×32-bit registers) is used for storing both multi-scalar/vector elements. The control unit feeds the parallel execution units by the required operands (multi-scalar/vector elements) and can produce up to 4×32-bit results each clock cycle. Scalar/vector loads/stores take place from/to the data memory (512×128-bit) of QrnPro in a rate of 128-bit (4×32-bit elements) per clock cycle. Finally, the writeback stage writes up to four results (4×32-bit) per clock cycle coming from the memory system or from the execution units into the QrnPro register file. The design of our proposed QrnPro is implemented using VHDL targeting the Xilinx FPGA Virtex-5, XC5VLX110T-3FF1136 device and its performance is evaluated.
机译:兰经的应用包括图像/视频处理,语音识别,加密/解密数据,等等,这些都是根据数据并行性。这些应用程序是通过对大数据集的结构和常规的计算特点。在本文中,称为QrnPro新的处理器架构,提出了加快古兰经应用。 QrnPro通过将矢量处理技术VLIW架构利用在兰经应用中发现的数据并行性。 QrnPro使用VLIW体系结构上并行执行单元同时处理多个独立标量指令。此外,数据并行由向量指令表达,并且在VLIW体系结构的相同的并行执行单元处理。 VLIW和向量处理之间的这种组合使得资源的有效利用,即使数据并行的百分比不是100%。大小256中的128位VLIW的形式的应用古兰经×128位的存储标量/向量指令的指令存储器。单个寄存器文件(8-矢量×16元件×32位或128×32位寄存器),用于存储两个多标量/矢量元素。所述控制单元将通过将所需的操作数(多标量/向量元素)的并行执行单元和可以产生高达4×32位结果中的每个时钟周期。标量/向量加载/存储发生从/向数据存储器(512×128位)的QrnPro在每个时钟周期128位(4×32位元素)的速率。最后,写回阶段写入最多四个结果(4×32位)在每个时钟周期从存储器系统或从执行单元到QrnPro寄存器文件到来。我们提出的QrnPro的设计是用VHDL针对Xilinx的FPGA的Virtex-5,XC5VLX110T-3FF1136设备中实现并对其性能进行了评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号