Neural Duplicate Question Detection without Labeled Training Data

机译：没有标记训练数据的神经重复问题检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Supervised training of neural models to duplicate question detection in community Question Answering (cQA) requires large amounts of labeled question pairs, which are costly to obtain. To minimize this cost, recent works thus often used alternative methods, e.g., adversarial domain adaptation. In this work, we propose two novel methods: (1) the automatic generation of duplicate questions, and (2) weak supervision using the title and body of a question. We show that both can achieve improved performances even though they do not require any labeled data. We provide comprehensive comparisons of popular training strategies, which provides important insights on how to 'best' train models in different scenarios. We show that our proposed approaches are more effective in many cases because they can utilize larger amounts of unlabeled data from cQA forums. Finally, we also show that our proposed approach for weak supervision with question title and body information is also an effective method to train cQA answer selection models without direct answer supervision.

机译：在社区问答系统（cQA）中对神经模型进行监督训练以复制问题检测需要大量带标签的问题对，而这些问题对的获取成本很高。为了最小化该成本，因此最近的工作经常使用替代方法，例如对抗域适配。在这项工作中，我们提出了两种新颖的方法：（1）自动生成重复问题，以及（2）使用问题的标题和正文进行弱监督。我们表明，即使它们不需要任何标记数据，两者都可以实现更高的性能。我们提供了流行训练策略的全面比较，从而提供了有关如何在不同情况下“最佳”训练模型的重要见解。我们证明了我们提出的方法在许多情况下更为有效，因为它们可以利用来自cQA论坛的大量未标记数据。最后，我们还表明，我们提出的带有问题标题和正文信息的弱监督方法也是一种在没有直接答案监督的情况下训练cQA答案选择模型的有效方法。

著录项

来源
《International joint conference on natural language processing;Conference on empirical methods in natural language processing》|2019年|1607-1617|共11页
会议地点
作者
Andreas Rueckle; Nafise Sadat Moosavi; Iryna Gurevych;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Pixel-Wise Defect Detection by CNNs without Manually Labeled Training Data [J] . Haselmann M., Gruber D. P. Applied Artificial Intelligence . 2019,第5a8期

机译：没有人工标记训练数据的CNN进行像素明智的缺陷检测
2. Pixel-Wise Defect Detection by CNNs without Manually Labeled Training Data [J] . Haselmann M., Gruber D. P. Applied Artificial Intelligence . 2019,第5a8期

机译：在没有手动标记的训练数据的情况下，CNN的像素 - 明智的缺陷检测
3. 3D-RADNet Extracting labels from DICOM metadata for training general medical domain deep 3D convolution neural networks [J] . Richard Du, Varut Vardhanabhuti JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：3D-RADNET从DICOM元数据提取标签培训一般医疗领域深3D卷积神经网络
4. Neural Duplicate Question Detection without Labeled Training Data [C] . Andreas Rueckle, Nafise Sadat Moosavi, Iryna Gurevych International joint conference on natural language processing . 2019

机译：神经复制问题检测无标记训练数据
5. Using CNNs to Understand Lighting Without Real Labeled Training Data [D] . ?Zhou, Hao 2019

机译：使用CNN来了解没有真实标记的训练数据的照明
6. Identifying High-Risk Patients without Labeled Training Data: Anomaly Detection Methodologies to Predict Adverse Outcomes [O] . Zeeshan Syed, Mohammed Saeed, Ilan Rubinfeld 2010

机译：识别没有标签训练数据的高危患者：预测异常结果的异常检测方法
7. Neural Duplicate Question Detection without Labeled Training Data [O] . Andreas Rücklé, Nafise Sadat Moosavi, Iryna Gurevych 2019

机译：神经复制问题检测无标记训练数据

Neural Duplicate Question Detection without Labeled Training Data

摘要

著录项

相似文献

相关主题

期刊订阅