...
首页> 外文期刊>Microprocessors and microsystems >Performance Evaluation of Matrix-Matrix Multiplications Using Intel's Advanced Vector Extensions (AVX)
【24h】

Performance Evaluation of Matrix-Matrix Multiplications Using Intel's Advanced Vector Extensions (AVX)

机译:使用英特尔高级矢量扩展(AVX)进行矩阵-矩阵乘法的性能评估

获取原文
获取原文并翻译 | 示例
           

摘要

Intel's Advanced Vector Extensions is known as single instruction multiple data streams (SIMD), and the instruction sets is introduced in the second-generation Intel Core processor family. This new technology is supported by new generations of Intel and AMD processors. The advanced vector extensions (AVX) exploits single instruction multiple data (SIMD) computing units for fine grained-parallelism. These instructions process multiple data elements simultaneously and independently. Many applications such as signal processing, recognition, visual processing, scientific and engineering numerical, physics and other areas of applications need for vector floating point performance supported by AVX. Matrix-Matrix multiplications is the core of many important algorithms such as signal processing, scientific and engineering numerical, so it is substantial to accelerate implementation of matrix-matrix multiplications. It is very important to use appropriate compilers that can optimally utilize the new features of the evolving processors. For this purpose, a clear vision on the performance of the compilers on performance characteristics of AVX is needed. In addition choosing the appropriate programming method is substantial to gain the best performance. In this paper, the performance evaluation of matrix-matrix multiplications in three forms (C = A. B, C = A. B-T, and C= A(T). B), using Intel's advanced vector extension (AVX) instruction sets has been reported. The obtained results are compared using inline assembly versus intrinsic functions for programming. A comparative study to indicate the effects of two widely used C++ compilers: Intel C++ compiler (ICC) in Intel Parallel Studio XE 2016 against Microsoft Visual Studio C++ compiler 2015 (MSVC++) has been investigated. The results are evaluated on Intel Core i7 processor on a Broadwell system for square matrices of different large sizes. The results demonstrate that the Intel compiler has better performance than MSVC++ compiler by 1.34, 1.32, and 1.22 using inline assembly language and by 1.36, 1.19, and 1.25 using intrinsic functions for C=A. B, C=A. B-T, and C=A(T). B, respectively. The performance of using intrinsic functions, compared to the inline assembly demonstrates that the intrinsic functions has better performance than inline assembly by 2.1, 2.13, and 2.18 using Intel compiler and by 2.08, 2.49, and 2.11 using MSVC++ compiler for C=A. B, C = A. B-T, and C= A(T). B, respectively. (C) 2016 Elsevier B.V. All rights reserved.
机译:英特尔的高级矢量扩展被称为单指令多数据流(SIMD),并且该指令集在第二代英特尔酷睿处理器家族中引入。这项新技术得到了新一代Intel和AMD处理器的支持。高级矢量扩展(AVX)将单指令多数据(SIMD)计算单元用于细粒度的并行处理。这些指令同时且独立地处理多个数据元素。许多应用程序,例如信号处理,识别,视觉处理,科学和工程数值,物理学以及其他应用程序领域,都需要AVX支持的矢量浮点性能。矩阵-矩阵乘法是许多重要算法(例如信号处理,科学和工程数值)的核心,因此加速矩阵-矩阵乘法的实现非常重要。使用适当的编译器以优化利用不断发展的处理器的新功能非常重要。为此,需要对编译器的性能和AVX的性能特征有一个清晰的认识。另外,选择适当的编程方法对于获得最佳性能至关重要。在本文中,使用英特尔高级矢量扩展(AVX)指令集的三种形式的矩阵矩阵乘法的性能评估(C = A. B,C = A. BT和C = A(T)。B)具有被报道。使用内联汇编程序和固有函数对所得结果进行比较。进行了一项比较研究,以表明两种广泛使用的C ++编译器的效果:已对Intel Parallel Studio XE 2016中的Intel C ++编译器(ICC)与Microsoft Visual Studio C ++编译器2015(MSVC ++)进行了比较。在Broadwell系统上的Intel Core i7处理器上针对不同大尺寸的正方形矩阵评估结果。结果表明,使用内联汇编语言,英特尔编译器的性能比MSVC ++编译器高1.34、1.32和1.22,对于C = A,使用内在函数,英特尔编译器的性能比MSVC ++编译器高1.36、1.19和1.25。 B,C = A。 B-T,C = A(T)。 B分别。与内联汇编相比,使用内在函数的性能表明,内联函数比内联汇编具有更好的性能,使用Intel编译器的性能分别为2.1、2.13和2.18,使用C = A的MSVC ++编译器的性能为2.08、2.49和2.11。 B,C =A。B-T,C = A(T)。 B分别。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号