【24h】

Mask Attention Networks: Rethinking and Strengthen Transformer

机译:面具注意网络:重新思考和加强变压器

获取原文

摘要

Transformer is an attention-based neural network, which consists of two sublayers, namely, Self-Attention Network (SAN) and Feed-Forward Network (FFN). Existing research explores to enhance the two sublayers separately to improve the capability of Transformer for text representation. In this paper, we present a novel understanding of SAN and FFN as Mask Attention Networks (MANs) and show that they are two special cases of MANs with static mask matrices. However, their static mask matrices limit the capability for localness modeling in text representation learning. We therefore introduce a new layer named dynamic mask attention network (DMAN) with a learn-able mask matrix which is able to model localness adaptively. To incorporate advantages of DMAN, SAN, and FFN, we propose a sequential layered structure to combine the three types of layers. Extensive experiments on various tasks, including neural machine translation and text summarization demonstrate that our model outperforms the original Transformer.
机译:变压器是一种基于关注的神经网络,由两个子层,即自我关注网络(SAN)和前馈网络(FFN)组成。现有的研究探索分别提高两个子层,以提高变压器的文本表示的能力。在本文中,我们向SAN和FFN提出了一种新颖的理解,作为掩码注意网络(MANS),并表明它们是具有静态掩模矩阵的两个人的特殊情况。然而,它们的静态掩模矩阵限制了文本表示学习中的本地性建模的能力。因此,我们引入了一个名为动态掩码注意网络(DMAN)的新层,其中具有学员的掩码矩阵,能够自适应地模拟本地性。为了纳入DMAN,SAN和FFN的优势,我们提出了一种顺序分层结构来组合三种类型的层。关于各种任务的广泛实验,包括神经电机翻译和文本摘要表明,我们的模型优于原始变压器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号