首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Scalable Syntax-Aware Language Models Using Knowledge Distillation
【24h】

Scalable Syntax-Aware Language Models Using Knowledge Distillation

机译:使用知识蒸馏的可扩展语法感知语言模型

获取原文

摘要

Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models. However, their computational complexity renders scaling difficult, and it remains an open question whether structural biases are still necessary when sequential models have access to ever larger amounts of training data. To answer this question, we introduce an efficient knowledge distillation (KD) technique that transfers knowledge from a syntactic language model trained on a small corpus to an LSTM language model, hence enabling the LSTM to develop a more structurally sensitive representation of the larger training data it learns from. On targeted syntactic evaluations, we find that, while sequential LSTMs perform much better than previously reported, our proposed technique substantially improves on this baseline, yielding a new state of the art. Our findings and analysis affirm the importance of structural biases, even in models that learn from large amounts of data.
机译:事先工作表明,在少量训练数据上,句法神经语言模型比顺序语言模型更成功地学习结构敏感的概括。然而,它们的计算复杂性呈现缩放困难,并且仍然是在顺序模型可以访问更大的培训数据时仍然需要结构偏差的开放问题。为了回答这个问题,我们介绍了一种有效的知识蒸馏(KD)技术,将知识从剪相语言模型转移到LSTM语言模型上,从而使LSTM能够在更大的训练数据中开发更具结构上敏感的表示它从中学习。在有针对性的句法评估中,我们发现,在顺序LSTMS比以前报告的情况下表现得更好,我们提出的技术基本上提高了这一基线,产生了新的最新技术。我们的调查结果和分析肯定了结构偏差的重要性,即使在从大量数据学习的模型中也是如此。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号