首页> 外文学位 >Data-parallel digital signal processors: Algorithm mapping, architecture scaling and workload adaptation.
【24h】

Data-parallel digital signal processors: Algorithm mapping, architecture scaling and workload adaptation.

机译:数据并行数字信号处理器:算法映射,体系结构缩放和工作负载适配。

获取原文
获取原文并翻译 | 示例

摘要

Emerging applications such as high definition television (HDTV), streaming video, image processing in embedded applications and signal processing in high-speed wireless communications are driving a need for high performance digital signal processors (DSPs) with real-time processing. This class of applications demonstrates significant data parallelism, finite precision, need for power-efficiency and the need for 100's of arithmetic units in the DSP to meet real-time requirements. Data-parallel DSPs meet these requirements by employing clusters of functional units, enabling 100's of computations every clock cycle. These DSPs exploit instruction level parallelism and subword parallelism within clusters, similar to a traditional VLIW (Very Long Instruction Word) DSP, and exploit data parallelism across clusters, similar to vector processors.; Stream processors are data-parallel DSPs that use a bandwidth hierarchy to support dataflow to 100's of arithmetic units and are used for evaluating the contributions of this thesis. Different software realizations of the dataflow in the algorithms can affect the performance of stream processors by greater than an order-of-magnitude. The thesis first presents the design of signal processing algorithms that map efficiently on stream processors by parallelizing the algorithms and by re-ordering the flow of data. The design space for stream processors also exhibits trade-offs between arithmetic units per cluster, clusters and the clock frequency to meet the real-time requirements of a given application. This thesis provides a design space exploration tool for stream processors that meets real-time requirements while minimizing power consumption. The presented exploration methodology rapidly searches this design space at compile time to minimize power consumption and selects the number of adders, multipliers, clusters and the real-time clock frequency in the processor. Finally, the thesis improves the power efficiency in the designed stream processor by adapting the compute resources to run-time variations in the workload. The thesis presents an adaptive multiplexer network that allows the number of active clusters to be varied during run-time by turning off unused clusters. Thus, by efficient mapping of algorithms, exploring the architecture design space, and by compute resource adaptation, this thesis improves power efficiency in stream processors and enhances their suitability for high performance, power-aware, signal processing applications.
机译:诸如高清电视(HDTV),流视频,嵌入式应用中的图像处理以及高速无线通信中的信号处理之类的新兴应用正在推动对具有实时处理功能的高性能数字信号处理器(DSP)的需求。此类应用程序展示了显着的数据并行性,有限的精度,对功率效率的需求以及DSP中满足实时要求的100个算术单元的需求。数据并行DSP通过使用功能单元集群来满足这些要求,每个时钟周期可以进行100多个计算。与传统的VLIW(超长指令字)DSP相似,这些DSP利用群集中的指令级并行性和子字并行性,类似于矢量处理器,利用群集中的数据并行性。流处理器是数据并行DSP,使用带宽层次结构支持到100个算术单元的数据流,并用于评估本论文的贡献。算法中数据流的不同软件实现可能对流处理器的性能影响大于一个数量级。本文首先提出了信号处理算法的设计,该算法通过并行化算法并重新排序数据流来在流处理器上有效地进行映射。流处理器的设计空间还体现了每个集群的运算单元,集群和时钟频率之间的折衷,以满足给定应用程序的实时要求。本文提供了一种用于流处理器的设计空间探索工具,该工具可以在满足实时要求的同时将功耗降至最低。提出的探索方法在编译时快速搜索该设计空间,以最大程度地降低功耗,并选择处理器中加法器,乘法器,簇的数量和实时时钟频率。最后,本文通过使计算资源适应工作负载的运行时变化,提高了设计流处理器的功率效率。本文提出了一种自适应多路复用器网络,通过关闭未使用的集群,可以在运行时改变活动集群的数量。因此,通过有效地映射算法,探索架构设计空间以及通过计算资源自适应,本论文提高了流处理器中的功率效率,并增强了它们对高性能,功率感知,信号处理应用程序的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号