Mask Attention Networks: Rethinking and Strengthen Transformer

机译：面具注意网络：重新思考和加强变压器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Transformer is an attention-based neural network, which consists of two sublayers, namely, Self-Attention Network (SAN) and Feed-Forward Network (FFN). Existing research explores to enhance the two sublayers separately to improve the capability of Transformer for text representation. In this paper, we present a novel understanding of SAN and FFN as Mask Attention Networks (MANs) and show that they are two special cases of MANs with static mask matrices. However, their static mask matrices limit the capability for localness modeling in text representation learning. We therefore introduce a new layer named dynamic mask attention network (DMAN) with a learn-able mask matrix which is able to model localness adaptively. To incorporate advantages of DMAN, SAN, and FFN, we propose a sequential layered structure to combine the three types of layers. Extensive experiments on various tasks, including neural machine translation and text summarization demonstrate that our model outperforms the original Transformer.

机译：变压器是一种基于关注的神经网络，由两个子层，即自我关注网络（SAN）和前馈网络（FFN）组成。现有的研究探索分别提高两个子层，以提高变压器的文本表示的能力。在本文中，我们向SAN和FFN提出了一种新颖的理解，作为掩码注意网络（MANS），并表明它们是具有静态掩模矩阵的两个人的特殊情况。然而，它们的静态掩模矩阵限制了文本表示学习中的本地性建模的能力。因此，我们引入了一个名为动态掩码注意网络（DMAN）的新层，其中具有学员的掩码矩阵，能够自适应地模拟本地性。为了纳入DMAN，SAN和FFN的优势，我们提出了一种顺序分层结构来组合三种类型的层。关于各种任务的广泛实验，包括神经电机翻译和文本摘要表明，我们的模型优于原始变压器。

著录项

来源
《Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2021年|1692-1701|共10页
会议地点
作者
Zhihao Fan; Yeyun Gong; Dayiheng Liu; Zhongyu Wei; Siyuan Wang; Jian Jiao; Nan Duan; Ruofei Zhang; Xuanjing Huang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Long Text Classification Algorithm Using a Hybrid Model of Bidirectional Encoder Representation from Transformers-Hierarchical Attention Networks-Dilated Convolutions Network [J] . ZHAO Yuanyuan, GAO Shining, LIU Yang, 东华大学学报（英文版） . 2021,第004期

机译：使用变压器 - 分层关注网络扩展卷轴网络的双向编码器表示的混合模型的长文本分类算法
2. Multiway dynamic mask attention networks for natural language inference [J] . Tang Jingfan, Wu Xinqiang, Zhang Min, Journal of Computational Methods in Sciences and Engineering . 2021,第1期

机译：多通动态掩模注意网络用于自然语言推断
3. Attention-Guided Spatial Transformer Networks for Fine-Grained Visual Recognition [J] . Dichao LIU, Yu WANG, Jien KATO IEICE transactions on information and systems . 2019,第12期

机译：注意引导的空间变压器网络，用于细粒度的视觉识别
4. Rethinking the Self-Attention in Vision Transformers [C] . Kyungmin Kim, Bichen Wu, Xiaoliang Dai, IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . 2021

机译：重新思考视觉变形金刚的自我关注
5. Temporal integration and attention: Contributions of visible persistence and masking. [D] . Shore, David Iechiel. 1997

机译：时间整合和关注：可见的持久性和掩盖性的贡献。
6. Atomoxetine Treatment Strengthens an Anti-Correlated Relationship between Functional Brain Networks in Medication-Naïve Adults with Attention-Deficit Hyperactivity Disorder: A Randomized Double-Blind Placebo-Controlled Clinical Trial [O] . Hsiang-Yuan Lin, Susan Shur-Fen Gau 2016

机译：Atomoxetine治疗可增强在纯正治疗的患有注意力缺陷多动障碍的成年人中功能性大脑网络之间的反相关关系：一项随机双盲安慰剂对照临床试验
7. Radio Transformer Networks: Attention Models for Learning to Synchronize in Wireless Systems [O] . O'Shea, Timothy J, Pemula, Latha, Batra, Dhruv, 2016

机译：无线电变压器网络：学习同步的注意力模型在无线系统

Mask Attention Networks: Rethinking and Strengthen Transformer

摘要

著录项

相似文献

相关主题

期刊订阅