Performance Evaluation of Matrix-Matrix Multiplications Using Intel's Advanced Vector Extensions (AVX)

Hassana Somaia Awad; Hemeida A. M.; Mahmoud Mountasser M. M.

首页> 外文期刊>Microprocessors and microsystems >Performance Evaluation of Matrix-Matrix Multiplications Using Intel's Advanced Vector Extensions (AVX)

【24h】

Performance Evaluation of Matrix-Matrix Multiplications Using Intel's Advanced Vector Extensions (AVX)

机译：使用英特尔高级矢量扩展（AVX）进行矩阵-矩阵乘法的性能评估

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Intel's Advanced Vector Extensions is known as single instruction multiple data streams (SIMD), and the instruction sets is introduced in the second-generation Intel Core processor family. This new technology is supported by new generations of Intel and AMD processors. The advanced vector extensions (AVX) exploits single instruction multiple data (SIMD) computing units for fine grained-parallelism. These instructions process multiple data elements simultaneously and independently. Many applications such as signal processing, recognition, visual processing, scientific and engineering numerical, physics and other areas of applications need for vector floating point performance supported by AVX. Matrix-Matrix multiplications is the core of many important algorithms such as signal processing, scientific and engineering numerical, so it is substantial to accelerate implementation of matrix-matrix multiplications. It is very important to use appropriate compilers that can optimally utilize the new features of the evolving processors. For this purpose, a clear vision on the performance of the compilers on performance characteristics of AVX is needed. In addition choosing the appropriate programming method is substantial to gain the best performance. In this paper, the performance evaluation of matrix-matrix multiplications in three forms (C = A. B, C = A. B-T, and C= A(T). B), using Intel's advanced vector extension (AVX) instruction sets has been reported. The obtained results are compared using inline assembly versus intrinsic functions for programming. A comparative study to indicate the effects of two widely used C++ compilers: Intel C++ compiler (ICC) in Intel Parallel Studio XE 2016 against Microsoft Visual Studio C++ compiler 2015 (MSVC++) has been investigated. The results are evaluated on Intel Core i7 processor on a Broadwell system for square matrices of different large sizes. The results demonstrate that the Intel compiler has better performance than MSVC++ compiler by 1.34, 1.32, and 1.22 using inline assembly language and by 1.36, 1.19, and 1.25 using intrinsic functions for C=A. B, C=A. B-T, and C=A(T). B, respectively. The performance of using intrinsic functions, compared to the inline assembly demonstrates that the intrinsic functions has better performance than inline assembly by 2.1, 2.13, and 2.18 using Intel compiler and by 2.08, 2.49, and 2.11 using MSVC++ compiler for C=A. B, C = A. B-T, and C= A(T). B, respectively. (C) 2016 Elsevier B.V. All rights reserved.

机译：英特尔的高级矢量扩展被称为单指令多数据流（SIMD），并且该指令集在第二代英特尔酷睿处理器家族中引入。这项新技术得到了新一代Intel和AMD处理器的支持。高级矢量扩展（AVX）将单指令多数据（SIMD）计算单元用于细粒度的并行处理。这些指令同时且独立地处理多个数据元素。许多应用程序，例如信号处理，识别，视觉处理，科学和工程数值，物理学以及其他应用程序领域，都需要AVX支持的矢量浮点性能。矩阵-矩阵乘法是许多重要算法（例如信号处理，科学和工程数值）的核心，因此加速矩阵-矩阵乘法的实现非常重要。使用适当的编译器以优化利用不断发展的处理器的新功能非常重要。为此，需要对编译器的性能和AVX的性能特征有一个清晰的认识。另外，选择适当的编程方法对于获得最佳性能至关重要。在本文中，使用英特尔高级矢量扩展（AVX）指令集的三种形式的矩阵矩阵乘法的性能评估（C = A. B，C = A. BT和C = A（T）。B）具有被报道。使用内联汇编程序和固有函数对所得结果进行比较。进行了一项比较研究，以表明两种广泛使用的C ++编译器的效果：已对Intel Parallel Studio XE 2016中的Intel C ++编译器（ICC）与Microsoft Visual Studio C ++编译器2015（MSVC ++）进行了比较。在Broadwell系统上的Intel Core i7处理器上针对不同大尺寸的正方形矩阵评估结果。结果表明，使用内联汇编语言，英特尔编译器的性能比MSVC ++编译器高1.34、1.32和1.22，对于C = A，使用内在函数，英特尔编译器的性能比MSVC ++编译器高1.36、1.19和1.25。 B，C ＝ A。 B-T，C = A（T）。 B分别。与内联汇编相比，使用内在函数的性能表明，内联函数比内联汇编具有更好的性能，使用Intel编译器的性能分别为2.1、2.13和2.18，使用C = A的MSVC ++编译器的性能为2.08、2.49和2.11。 B，C =A。B-T，C = A（T）。 B分别。（C）2016 Elsevier B.V.保留所有权利。

著录项

来源
《Microprocessors and microsystems》 |2016年第11期|369-374|共6页
作者
Hassana Somaia Awad; Hemeida A. M.; Mahmoud Mountasser M. M.;
展开▼
作者单位

Aswan Univ, Dept Elect Engn, Comp & Syst Sect, Aswan 81542, Egypt;

Aswan Univ, Fac Energy Engn, Dept Elect Engn, Aswan, Egypt;

Aswan Univ, Dept Elect Engn, Comp & Syst Sect, Aswan 81542, Egypt;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Advanced vector extension (AVX); Matrix-matrix multiplications; Intrinsic functions; Inline assembly; Intel C plus plus compiler; Microsoft VC plus plus compiler;

机译：高级矢量扩展（AVX）;矩阵矩阵乘法;内部函数;内联汇编;Intel C plus加编译器;Microsoft VC plus plus编译器;

相似文献

外文文献
中文文献
专利

1. Effective Implementation of Matrix-Vector Multiplication on Intel's AVX multicore Processor [J] . Hassan Somaia A., Mahmoud Mountasser M. M., Hemeida A. M., Computer Languages, Systems & Structures . 2018,第JANa期

机译：英特尔AVX多核处理器上矩阵矢量乘法的有效实现
2. SWIMM 2.0: Enhanced Smith-Waterman on Intel's Multicore and Manycore Architectures Based on AVX-512 Vector Extensions [J] . Rucci Enzo, Garcia Sanchez Carlos, Botella Juan Guillermo, International journal of parallel programming . 2019,第2期

机译：SWIMM 2.0：基于AVX-512矢量扩展的英特尔多核和Manycore架构上的增强型Smith-Waterman
3. SWIMM 2.0: Enhanced Smith-Waterman on Intel's Multicore and Manycore Architectures Based on AVX-512 Vector Extensions [J] . Rucci Enzo, Garcia Sanchez Carlos, Botella Juan Guillermo, International journal of parallel programming . 2019,第2期

机译：Swimm 2.0：基于AVX-512矢量扩展的Intel的Multicore和Manycore架构增强了史密斯 - Waterman
4. Application of AVX (Advanced Vector Extensions) for improved performance of the PARFES - finite element Parallel Direct Solver [C] . Fialko Sergiy Federated Conference on Computer Science and Information Systems . 2013

机译：AVX（高级矢量扩展）的应用可改善PARFES的性能-有限元并行直接求解器
5. An advanced intelligent network: Description and performance evaluation. [D] . Kadioglu, Ceri. 1991

机译：先进的智能网络：描述和性能评估。
6. Tuning the cache memory usage in tomographic reconstruction on standard computers with Advanced Vector eXtensions (AVX) [O] . Jose-Ignacio Agulleiro, Jose-Jesus Fernandez 2015

机译：在具有Advanced Vector eXtensions（AVX）的标准计算机上的层析成像重建中调整缓存的使用情况
7. Tuning the cache memory usage in tomographic reconstruction on standard computers with Advanced Vector eXtensions (AVX) [O] . Agulleiro Jose-Ignacio, Fernandez Jose-Jesus 2015

机译：在具有Advanced Vector eXtensions（AVX）的标准计算机上的层析成像重建中调整缓存的使用情况

Performance Evaluation of Matrix-Matrix Multiplications Using Intel's Advanced Vector Extensions (AVX)

摘要

著录项

相似文献

相关主题

期刊订阅