首页> 外文期刊>Information Processing & Management >A multi-cascaded model with data augmentation for enhanced paraphrase detection in short texts
【24h】

A multi-cascaded model with data augmentation for enhanced paraphrase detection in short texts

机译:具有数据增强功能的多级模型,可增强短文本中的复述检测

获取原文
获取原文并翻译 | 示例
           

摘要

Paraphrase detection is an important task in text analytics with numerous applications such as plagiarism detection, duplicate question identification, and enhanced customer support help-desks. Deep models have been proposed for representing and classifying paraphrases. These models, however, require large quantities of human-labeled data, which is expensive to obtain. In this work, we present a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts. Our data augmentation strategy considers the notions of paraphrases and non-paraphrases as binary relations over the set of texts. Subsequently, it uses graph theoretic concepts to efficiently generate additional paraphrase and non-paraphrase pairs in a sound manner. Our multi-cascaded model employs three supervised feature learners (cascades) based on CNN and LSTM networks with and without soft-attention. The learned features, together with hand-crafted linguistic features, are then forwarded to a discriminator network for final classification. Our model is both wide and deep and provides greater robustness across clean and noisy short texts. We evaluate our approach on three benchmark datasets and show that it produces a comparable or state-of-the-art performance on all three.
机译:复述检测是文本分析中一项重要的任务,它具有众多应用程序,例如detection窃检测,重复问题识别以及增强的客户支持服务台。已经提出了用于表示和分类释义的深层模型。但是,这些模型需要大量的人类标记数据,而这些数据的获取成本很高。在这项工作中,我们提出了一种数据增强策略和多级联模型,用于改进短文本中的复述检测。我们的数据扩充策略将释义和非释义的概念视为文本集上的二进制关系。随后,它使用图论概念以声音方式有效地生成其他复述和非复述对。我们的多级模型使用了基于CNN和LSTM网络的三个监督特征学习者(级联),带有和不带有软注意力。然后将学习到的特征与手工制作的语言特征一起转发到鉴别器网络以进行最终分类。我们的模型既广泛又深入,并且在纯净和嘈杂的短文本中提供了更高的鲁棒性。我们在三个基准数据集上评估了我们的方法,并表明该方法在这三个方面均具有可比或最新的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号