首页> 外文会议>Practical Experience with SMDS >Efficient SIMD code generation for runtime alignment and length conversion
【24h】

Efficient SIMD code generation for runtime alignment and length conversion

机译:高效的SIMD代码生成,用于运行时对齐和长度转换

获取原文
获取原文并翻译 | 示例

摘要

When generating codes for today's multimedia extensions, one of the major challenges is to deal with memory alignment issues. While hand programming still yields best performing SIMD codes, it is both time consuming and error prone. Compiler technology has greatly improved, including techniques that simdize loops with misaligned accesses by automatically rearranging misaligned memory streams in registers. Current techniques are applicable to runtime alignments, but they aggressively reduce the alignment overhead only when all alignments are known at compile time. This paper presents two major enhancements to the state of the art, improving both performance and coverage. First, we propose a novel technique to simdize loops with runtime alignment nearly as efficiently as those with compile-time misalignment. Runtime alignment is pervasive in real applications because it is either part of the algorithms, or it is an artifact of the compiler's inability to extract accurate alignment information from complex applications. Second, we incorporate length conversion operations, e.g., conversions between data of different sizes, into the alignment handling framework. Length conversions are pervasive in multimedia applications where mixed integer types are often used. Supporting length conversion can greatly improve the coverage of simdizable loops. Experimental results indicate that our runtime alignment technique achieves a 19% to 32% speedup increase over prior art for a benchmark stressing the impact of misaligned data. We also demonstrate speedup factors of up to 8.11 for real benchmarks over sequential execution.
机译:在为当今的多媒体扩展生成代码时,主要的挑战之一是处理内存对齐问题。尽管手工编程仍然可以产生性能最佳的SIMD代码,但它既耗时又容易出错。编译器技术得到了极大的改进,包括通过自动重新排列寄存器中未对齐的内存流来模拟具有未对齐访问的循环的技术。当前技术适用于运行时对齐,但是只有在编译时知道所有对齐时,它们才会积极减少对齐开销。本文提出了对现有技术的两个主要改进,同时提高了性能和覆盖范围。首先,我们提出了一种新颖的技术来模拟运行时对齐的循环,几乎与编译时未对齐的循环一样有效。运行时对齐在实际应用程序中无处不在,因为它要么是算法的一部分,要么是编译器无法从复杂应用程序中提取准确的对齐信息的产物。其次,我们将长度转换操作(例如,不同大小的数据之间的转换)合并到对齐处理框架中。在通常使用混合整数类型的多媒体应用中,长度转换很普遍。支持长度转换可以大大提高可模拟循环的覆盖范围。实验结果表明,对于强调不对齐数据影响的基准,我们的运行时对齐技术比现有技术实现了19%到32%的加速提高。对于连续执行的实际基准,我们还展示了高达8.11的加速因子。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号