首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Cross-Lingual Training for Automatic Question Generation
【24h】

Cross-Lingual Training for Automatic Question Generation

机译:自动生成问题的跨语言培训

获取原文

摘要

Automatic question generation (QG) is a challenging problem in natural language understanding. QG systems are typically built assuming access to a large number of training instances where each instance is a question and its corresponding answer. For a new language, such training instances are hard to obtain making the QG problem even more challenging. Using this as our motivation, we study the reuse of an available large QG dataset in a secondary language (e.g. English) to learn a QG model for a primary language (e.g. Hindi) of interest. For the primary language, we assume access to a large amount of monolingual text but only a small QG dataset. We propose a cross-lingual QG model which uses the following training regime: (ⅰ) Unsupervised pre-training of language models in both primary and secondary languages and (ⅱ) joint supervised training for QG in both languages. We demonstrate the efficacy of our proposed approach using two different primary languages. Hindi and Chinese. We also create and release a new question answering dataset for Hindi consisting of 6555 sentences.
机译:在自然语言理解中,自动问题生成(QG)是一个具有挑战性的问题。 QG系统通常是在假设访问大量训练实例的情况下构建的,其中每个实例都是一个问题及其相应的答案。对于一种新语言,很难获得这样的训练实例,这使得QG问题变得更加具有挑战性。以此为动力,我们研究了以辅助语言(例如英语)对可用的大型QG数据集的重用,以学习感兴趣的主要语言(例如印地语)的QG模型。对于主要语言,我们假设可以访问大量的单语文本,但只能访问少量的QG数据集。我们提出了一种跨语言的QG模型,该模型使用以下训练方案:(ⅰ)在初级和中级语言中对语言模型进行无监督的预训练,以及(ⅱ)在两种语言中对QG进行联合监督的训练。我们使用两种不同的主要语言演示了我们提出的方法的有效性。印地语和中文。我们还为印地语创建并发布了一个新的问题解答数据集,其中包含6555个句子。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号