...
首页> 外文期刊>Multimedia Tools and Applications >Optimizing image processing on multi-core CPUs with Intel parallel programming technologies
【24h】

Optimizing image processing on multi-core CPUs with Intel parallel programming technologies

机译:使用Intel并行编程技术优化多核CPU上的图像处理

获取原文
获取原文并翻译 | 示例
           

摘要

The rapid advance of computer hardware and popularity of multimedia applications enable multi-core processors with sub-word parallelism instructions to become a dominant market trend in desk-top PCs as well as high end mobile devices. This paper presents an efficient parallel implementation of 2D convolution algorithm demanding high performance computing power in multi-core desktop PCs. It is a representative computation intensive algorithm, in image and signal processing applications, accompanied by heavy memory access; on the other hand, their computational complexities are relatively low. The purpose of this study is to explore the effectiveness of exploiting the streaming SIMD (Single Instruction Multiple Data) extension (SSE) technology and TBB (Threading Building Block) run-time library in Intel multi-core processors. By doing so, we can take advantage of all the hardware features of multi-core processor concurrently for data- and task-level parallelism. For the performance evaluation, we implemented a 3×3 kernel based convolution algorithm using SSE2 and TBB with different combinations and compared their processing speeds. The experimental results show that both technologies have a significant effect on the performance and the processing speed can be greatly improved when using two technologies at the same time; for example, 6.2, 6.1, and 1.4 times speedup compared with the implementation of either of them are suggested for 256 × 256, 512 ×512, and 1024 × 1024 data sets, respectively.
机译:计算机硬件的迅速发展和多媒体应用的普及使带有子字并行指令的多核处理器成为台式PC以及高端移动设备的主要市场趋势。本文提出了在多核台式机中需要高性能计算能力的2D卷积算法的高效并行实现。它是图像和信号处理应用中具有代表性的计算密集型算法,伴随着大量的内存访问;另一方面,它们的计算复杂度较低。这项研究的目的是探索在英特尔多核处理器中利用流式SIMD(单指令多数据)扩展(SSE)技术和TBB(线程构建模块)运行时库的有效性。这样,我们可以同时利用多核处理器的所有硬件功能实现数据和任务级并行性。为了进行性能评估,我们使用SSE2和TBB以不同的组合实现了基于3×3核的卷积算法,并比较了它们的处理速度。实验结果表明,两种技术对性能的影响显着,同时使用两种技术可以大大提高处理速度。例如,对于256×256、512×512和1024×1024的数据集,建议将其速度分别提高6.2、6.1和1.4倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号