首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy
【24h】

Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy

机译:使用便签式存储器层次结构加速数据密集型应用程序的解耦处理器体系结构

获取原文
获取原文并翻译 | 示例
           

摘要

We present an architecture of decoupled processors with a memory hierarchy consisting only of scratch-pad memories, and a main memory. This architecture exploits the more efficient pre-fetching of Decoupled processors, that make use of the parallelism between address computation and application data processing, which mainly exists in streaming applications. This benefit combined with the ability of scratch-pad memories to store data with no conflict misses and low energy per access contributes significantly for increasing the system's performance. The application code is split in two parallel programs the first runs on the Access processor and computes the addresses of the data in the memory hierarchy. The second processes the application data and runs on the Execute processor, a processor with a limited address space-just the register file addresses. Each transfer of any block in the memory hierarchy up to the Execute processor's register file is controlled by the Access processor and the DMA units. This strongly differentiates this architecture from traditional uniprocessors and existing decoupled processors with cache memory hierarchies. The architecture is compared in performance with uniprocessor architectures with (a) scratch-pad and (b) cache memory hierarchies and (c) the existing decoupled architectures, showing its higher normalized performance. The reason for this gain is the efficiency of data transferring that the scratch-pad memory hierarchy provides combined with the ability of the Decoupled processors to eliminate memory latency using memory management techniques for transferring datarninstead of fixed prefetching methods. Experimental results show that the performance is increased up to almost 2 times compared to uniprocessor architectures with scratch-pad and up to 3.7 times compared to the ones with cache. The proposed architecture achieves the above performance without having penalties in energy delay product costs.
机译:我们提出了一种解耦处理器的体系结构,其存储器层次结构仅由暂存存储器和主存储器组成。该体系结构利用了解耦处理器的更有效预取,该预取处理器利用了地址计算和应用程序数据处理之间的并行性,这种并行性主要存在于流应用程序中。这种好处与暂存器存储数据的能力相结合而不会发生冲突遗漏,并且每次访问的能量都很低,从而极大地提高了系统的性能。应用程序代码分为两个并行程序,第一个并行程序在Access处理器上运行,并计算内存层次结构中的数据地址。第二个过程处理应用程序数据并在执行处理器上运行,该处理器是地址空间有限的处理器-仅寄存器文件地址。存储器层次结构中直到执行处理器的寄存器文件的任何块的每次传输都由访问处理器和DMA单元控制。这使该体系结构与传统的单处理器和具有高速缓存存储器层次结构的现有解耦处理器有很大区别。将该架构的性能与具有(a)暂存器和(b)高速缓存存储器层次结构以及(c)现有解耦架构的单处理器架构进行了比较,显示了其更高的归一化性能。此增益的原因是暂存内存层次结构提供的数据传输效率与解耦处理器使用内存管理技术(而不是固定预取方法)传输数据的消除内存延迟的能力相结合。实验结果表明,与具有暂存器的单处理器体系结构相比,性能提高了近2倍,与具有缓存的体系结构相比,性能提高了3.7倍。所提出的架构实现了以上性能,而没有对能量延迟产品成本造成损失。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号