...
首页> 外文期刊>Microprocessors and microsystems >A high performance, area efficient TTA-like vertex shader architecture with optimized floating point arithmetic unit for embedded graphics applications
【24h】

A high performance, area efficient TTA-like vertex shader architecture with optimized floating point arithmetic unit for embedded graphics applications

机译:高性能,面积有效的类TTA顶点着色器体系结构,具有针对嵌入式图形应用的优化浮点运算单元

获取原文
获取原文并翻译 | 示例
           

摘要

A fully programmable vertex shader based on Transport Triggered Architecture (TTA) is proposed in this paper to provide high efficiency of performance and connectivity for embedded applications. At the architecture level, fine-grained data transport in TTA datapath and multi-threading method are adopted to exploit instruction and data level parallelism respectively in the graphics applications. The datapath connectivity can be optimized mainly by native architectural visible bypass in TTA and hybrid result re-collection schemes. At the shader core level, a novel SIMD multi-functional dot-production unit and an area efficient special function unit are introduced for floating-point processing. The proposed processor which achieves peak capacity of 1.5 GFLOPS and 125 Mvertices/s can totally acquire 17.6% reduction in hardware cost and can provide 1.3 times improvement in performance per logic cost ratio under a 0.18 urn CMOS process for real graphics benchmarks compared to previous expanded VLIW vertex processor.
机译:本文提出了一种基于传输触发架构(TTA)的完全可编程的顶点着色器,以为嵌入式应用程序提供高性能和高性能的连接。在体系结构级别,采用TTA数据路径中的细粒度数据传输和多线程方法来分别利用图形应用程序中的指令和数据级别并行性。可以主要通过TTA中的本机体系结构可见旁路和混合结果重新收集方案来优化数据路径连接。在着色器核心级别,引入了一种新颖的SIMD多功能点生产单元和一种面积有效的特殊功能单元,用于浮点处理。拟议的处理器可实现1.5 GFLOPS的峰值容量和125 Mvertices / s的峰值处理能力,与之前的扩展产品相比,在实际图形基准测试中,在0.18 um CMOS工艺下,可在逻辑成本上平均降低1.7.6倍,从而使逻辑成本比性能提高1.3倍。 VLIW顶点处理器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号