首页> 外文会议>International Conference on Computational Linguistics >Semi-supervised Domain Adaptation for Dependency Parsing via Improved Contextualized Word Representations
【24h】

Semi-supervised Domain Adaptation for Dependency Parsing via Improved Contextualized Word Representations

机译:通过改进的上下文化字表示,半监督域适应依赖性解析

获取原文

摘要

In recent years, parsing performance is dramatically improved on in-domain texts thanks to the rapid progress of deep neural network models. The major challenge for current parsing research is to improve parsing performance on out-of-domain texts that are very different from the in-domain training data when there is only a small-scale out-domain labeled data. To deal with this problem, we propose to improve the contextualized word representations via adversarial learning and fine-tuning BERT processes. Concretely, we apply adversarial learning to three representative semi-supervised domain adaption methods, i.e., direct concatenation (CON), feature augmentation (FA), and domain embedding (DE) with two useful strategies, i.e., fused target-domain word representations and orthogonality constraints, thus enabling to model more pure yet effective domain-specific and domain-invariant representations. Simultaneously, we utilize a large-scale target-domain unlabeled data to fine-tune BERT with only the language model loss, thus obtaining reliable contextualized word representations that benefit for the cross-domain dependency parsing. Experiments on a benchmark dataset show that our proposed adversarial approaches achieve consistent improvements, and fine-tuning BERT further boosts the parsing accuracy by a large margin. Our single model achieves the same state-of-the-art performance as the top submitted system in the NLPCC-2019 shared task, which uses ensemble models and BERT.
机译:近年来,由于深神经网络模型的快速进展,域文本的解析性能大大提高。当前解析研究的主要挑战是提高域名文本的解析性能与域名训练数据的域外文本非常不同,当时只有小尺寸的Out域标记数据。要解决这个问题,我们建议通过对抗性学习和微调伯特过程来改善上下文化词。具体地,我们将对抗学习应用于三个代表性半监督域适应方法,即直接连接(CON),功能增强(FA),以及具有两个有用策略的域嵌入(DE),即融合目标域字表示和正交约束,从而使更模拟更纯但有效的域特定域和域不变的表示。同时,我们仅利用大规模的目标域未标记的数据,只有语言模型丢失,从而获得了对跨域依赖解析有益的可靠上下文化词表示。基准数据集的实验表明,我们提出的对抗性方法实现了一致的改进,微调杆进一步通过大边距提高了解析精度。我们的单一模型实现了与NLPCC-2019共享任务中的顶级提交的系统相同的最先进的性能,它使用集合模型和BERT。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号