Scalable Syntax-Aware Language Models Using Knowledge Distillation

机译：使用知识蒸馏的可扩展语法感知语言模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models. However, their computational complexity renders scaling difficult, and it remains an open question whether structural biases are still necessary when sequential models have access to ever larger amounts of training data. To answer this question, we introduce an efficient knowledge distillation (KD) technique that transfers knowledge from a syntactic language model trained on a small corpus to an LSTM language model, hence enabling the LSTM to develop a more structurally sensitive representation of the larger training data it learns from. On targeted syntactic evaluations, we find that, while sequential LSTMs perform much better than previously reported, our proposed technique substantially improves on this baseline, yielding a new state of the art. Our findings and analysis affirm the importance of structural biases, even in models that learn from large amounts of data.

机译：事先工作表明，在少量训练数据上，句法神经语言模型比顺序语言模型更成功地学习结构敏感的概括。然而，它们的计算复杂性呈现缩放困难，并且仍然是在顺序模型可以访问更大的培训数据时仍然需要结构偏差的开放问题。为了回答这个问题，我们介绍了一种有效的知识蒸馏（KD）技术，将知识从剪相语言模型转移到LSTM语言模型上，从而使LSTM能够在更大的训练数据中开发更具结构上敏感的表示它从中学习。在有针对性的句法评估中，我们发现，在顺序LSTMS比以前报告的情况下表现得更好，我们提出的技术基本上提高了这一基线，产生了新的最新技术。我们的调查结果和分析肯定了结构偏差的重要性，即使在从大量数据学习的模型中也是如此。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|cxxxiv p. 3297-3951|共13页
会议地点
作者
Adhiguna Kuncoro; Chris Dyer; Laura Rimell; Stephen Clark; Phil Blunsom;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Deep Unsupervised Hashing for Large-Scale Cross-Modal Retrieval Using Knowledge Distillation Model [J] . Mingyong Li, Qiqi Li, Lirong Tang, Computational intelligence and neuroscience . 2021,第a期

机译：使用知识蒸馏模型进行大规模交叉模态检索的深度无监督散列
2. A Method for Knowledge Modeling: Application of Unified Modeling Language (UML) to Knowledge Modeling [J] . Kim Sung-kwan, Lim Seong-bae, Mitchell Robert B. International Journal of Knowledge Management . 2006,第4期

机译：知识建模的方法：统一建模语言（UML）在知识建模中的应用
3. The Knowledge Modeler Description Language (KMDL) for Modelling Knowledge-intensive Business Processes [J] . Norbert Gronau Industrie management . 2003,第3期

机译：知识建模器描述语言（KMDL），用于对知识密集型业务流程进行建模
4. Scalable Syntax-Aware Language Models Using Knowledge Distillation [C] . Adhiguna Kuncoro, Chris Dyer, Laura Rimell, Annual meeting of the Association for Computational Linguistics . 2019

机译：使用知识提取的可扩展语法意识语言模型
5. Declarative Languages and Scalable Systems for Graph Analytics and Knowledge Discovery. [D] . Yang, Mohan. 2017

机译：用于图分析和知识发现的声明性语言和可扩展系统。
6. Deep Unsupervised Hashing for Large-Scale Cross-Modal Retrieval Using Knowledge Distillation Model [O] . Mingyong Li, Qiqi Li, Lirong Tang, 2021

机译：使用知识蒸馏模型进行大规模交叉模态检索的深度无监督散列
7. Scalable Syntax-Aware Language Models Using Knowledge Distillation [O] . Adhiguna Kuncoro, Chris Dyer, Laura Rimell, 2019

机译：使用知识蒸馏的可扩展语法感知语言模型
8. Using Domain-Specific Languages to Improve the Scale and Integration of Cognitive Models. [R] . Douglass, S. A., Mittal, S. 2011

机译：使用领域特定语言来改善认知模型的规模和整合。

Scalable Syntax-Aware Language Models Using Knowledge Distillation

摘要

著录项

相似文献

相关主题

期刊订阅