首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together
【24h】

Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together

机译:张力的自我关注:将成对和全球依赖性有效地建模

获取原文

摘要

Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both long-range and local dependencies. Further, their expressive power and performance can be boosted by using a vector to measure pair-wise dependency, but this requires to expand the alignment matrix to a tensor, which results in memory and computation bottlenecks. In this paper, we propose a novel attention mechanism called "Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as memory-efficient as a CNN, but significantly outperforms previous CNN-/RNN-/attention-based models. MTSA 1) captures both pair-wise (token2token) and global (source2token) dependencies by a novel compatibility function composed of dot-product and additive attentions, 2) uses a tensor to represent the feature-wise alignment scores for better expressive power but only requires parallelizable matrix multiplications, and 3) combines multi-head with multi-dimensional attentions, and applies a distinct positional mask to each head (subspace). so the memory and computation can be distributed to multiple heads, each with sequential information encoded independently. The experiments show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or competitive performance on nine NLP benchmarks with compelling memory- and time-efficiency.
机译:神经网络配备自我关注具有并行计算,轻量级结构以及捕获远程和局部依赖性的能力。此外,可以通过使用向量来衡量对依赖性的向量来提高它们的富有效力和性能,但这需要将对准矩阵扩展到张量,这导致存储器和计算瓶颈。在本文中,我们提出了一种称为“多掩模张力的自我关注”(MTSA)的新型注意机制,其与CNN一样快,并且作为CNN,但显着优于以前的CNN- / RNN / PEILITION楷模。 MTSA 1)通过由点 - 产品和添加剂关注组成的新型兼容功能来捕获成对(令牌2Token)和全局(Source2Token)依赖性,2)使用张量来表示特征性对准分数以获得更好的表达功率,但仅限于需要并行矩阵乘法,3)将多头与多维关节组合,并将不同的位置掩模应用于每个头部(子空间)。因此,存储器和计算可以分配到多个磁头,每个头部都具有独立编码的顺序信息。该实验表明,基于MTSA的CNN / RNN-FLUD达到最先进的或竞争性能,涉及令人信服的记忆和时间效率的NINE NLP基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号