Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together

机译：张力的自我关注：将成对和全球依赖性有效地建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both long-range and local dependencies. Further, their expressive power and performance can be boosted by using a vector to measure pair-wise dependency, but this requires to expand the alignment matrix to a tensor, which results in memory and computation bottlenecks. In this paper, we propose a novel attention mechanism called "Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as memory-efficient as a CNN, but significantly outperforms previous CNN-/RNN-/attention-based models. MTSA 1) captures both pair-wise (token2token) and global (source2token) dependencies by a novel compatibility function composed of dot-product and additive attentions, 2) uses a tensor to represent the feature-wise alignment scores for better expressive power but only requires parallelizable matrix multiplications, and 3) combines multi-head with multi-dimensional attentions, and applies a distinct positional mask to each head (subspace). so the memory and computation can be distributed to multiple heads, each with sequential information encoded independently. The experiments show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or competitive performance on nine NLP benchmarks with compelling memory- and time-efficiency.

机译：神经网络配备自我关注具有并行计算，轻量级结构以及捕获远程和局部依赖性的能力。此外，可以通过使用向量来衡量对依赖性的向量来提高它们的富有效力和性能，但这需要将对准矩阵扩展到张量，这导致存储器和计算瓶颈。在本文中，我们提出了一种称为“多掩模张力的自我关注”（MTSA）的新型注意机制，其与CNN一样快，并且作为CNN，但显着优于以前的CNN- / RNN / PEILITION楷模。 MTSA 1）通过由点 - 产品和添加剂关注组成的新型兼容功能来捕获成对（令牌2Token）和全局（Source2Token）依赖性，2）使用张量来表示特征性对准分数以获得更好的表达功率，但仅限于需要并行矩阵乘法，3）将多头与多维关节组合，并将不同的位置掩模应用于每个头部（子空间）。因此，存储器和计算可以分配到多个磁头，每个头部都具有独立编码的顺序信息。该实验表明，基于MTSA的CNN / RNN-FLUD达到最先进的或竞争性能，涉及令人信服的记忆和时间效率的NINE NLP基准。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2019年|xciii p. 701-1400|共11页
会议地点
作者
Tao Shen; Tianyi Zhou; Guodong Long; Jing Jiang; Chengqi Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. 莲胚芽叶绿素合成对光照的依赖性 [J] . 季宏伟, 李良璧, 匡廷云植物学报（英文版） . 2001,第007期
2. RNAmountAlign: Efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment [J] . Amir H. Bayegan, Peter Clote PLoS One . 2020,第1期

机译：Rnamountalign：用于本地，全局，半球形成对和多个RNA序列/结构对准的高效软件
3. Super Pairwise Alignment (SPA): An Efficient Approach to Global Alignment for Homologous Sequences [J] . Shi-Yi Shen, Jun Yang, Adam Yao, Journal of computational biology: A journal of computational molecular cell biology . 2002,第3期

机译：超级成对比对（SPA）：同源序列全局比对的有效方法
4. Global Suboptimal Pairwise Sequence Alignment in Linear Space Using Pair Hidden Markov Models [J] . Fakhry M. Khellah Journal of Computational Intelligence in Bioinformatics . 2010,第2a3期

机译：使用对隐马尔可夫模型的线性空间中的全局次优成对序列比对
5. Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together [C] . Tao Shen, Tianyi Zhou, Guodong Long, Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2019

机译：张紧的自我注意：高效地对成对和全局依赖性进行建模
6. Surrogate Modeling and Global Sensitivity Analysis towards Efficient Simulation of Nuclear Reactor Stochastic Dynamics [D] . Banyay, Gregory Allen. 2019

机译：核反应堆随机动力学有效模拟的代理建模与全局敏感性分析
7. RNAmountAlign: Efficient software for local global semiglobal pairwise and multiple RNA sequence/structure alignment [O] . Amir H. Bayegan, Peter Clote 2020

机译：Rnamountalign：局部全局半球形成对和多RNA序列/结构对准的高效软件
8. RNAmountAlign: efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment [O] . Amir H Bayegan, Peter Clote 2018

机译：Rnamountalign：局部，全局，半球形成对和多RNA序列/结构对准的高效软件
9. Technical report series on global modeling and data assimilation. Volume 3: An efficient thermal infrared radiation parameterization for use in general circulation models [R] . Chou, Ming-Dah 1994

机译：关于全球建模和数据同化的技术报告系列。第3卷：用于一般循环模型的有效热红外辐射参数化

Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together

摘要

著录项

相似文献

相关主题

期刊订阅