【24h】

Neural Duplicate Question Detection without Labeled Training Data

机译:没有标记训练数据的神经重复问题检测

获取原文

摘要

Supervised training of neural models to duplicate question detection in community Question Answering (cQA) requires large amounts of labeled question pairs, which are costly to obtain. To minimize this cost, recent works thus often used alternative methods, e.g., adversarial domain adaptation. In this work, we propose two novel methods: (1) the automatic generation of duplicate questions, and (2) weak supervision using the title and body of a question. We show that both can achieve improved performances even though they do not require any labeled data. We provide comprehensive comparisons of popular training strategies, which provides important insights on how to 'best' train models in different scenarios. We show that our proposed approaches are more effective in many cases because they can utilize larger amounts of unlabeled data from cQA forums. Finally, we also show that our proposed approach for weak supervision with question title and body information is also an effective method to train cQA answer selection models without direct answer supervision.
机译:在社区问答系统(cQA)中对神经模型进行监督训练以复制问题检测需要大量带标签的问题对,而这些问题对的获取成本很高。为了最小化该成本,因此最近的工作经常使用替代方法,例如对抗域适配。在这项工作中,我们提出了两种新颖的方法:(1)自动生成重复问题,以及(2)使用问题的标题和正文进行弱监督。我们表明,即使它们不需要任何标记数据,两者都可以实现更高的性能。我们提供了流行训练策略的全面比较,从而提供了有关如何在不同情况下“最佳”训练模型的重要见解。我们证明了我们提出的方法在许多情况下更为有效,因为它们可以利用来自cQA论坛的大量未标记数据。最后,我们还表明,我们提出的带有问题标题和正文信息的弱监督方法也是一种在没有直接答案监督的情况下训练cQA答案选择模型的有效方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号