首页> 外文会议>International conference on recent advances in natural language processing >Question Similarity in Community Question Answering: A Systematic Exploration of Preprocessing Methods and Models
【24h】

Question Similarity in Community Question Answering: A Systematic Exploration of Preprocessing Methods and Models

机译:社区问答中的问题相似性:预处理方法和模型的系统探索

获取原文

摘要

Community Question Answering forums are popular among Internet users, and a basic problem they encounter is trying to find out if their question has already been posed before. To address this issue, NLP researchers have developed methods to automatically detect question-similarity, which was one of the shared tasks in Se-mEval. The best performing systems for this task made use of Syntactic Tree Kernels or the SoftCosine metric. However, it remains unclear why these methods seem to work, whether their performance can be improved by better preprocessing methods and what kinds of errors they (and other methods) make. In this paper, we therefore systematically combine and compare these two approaches with the more traditional BM25 and translation-based models. Moreover, we analyze the impact of preprocessing steps (lowercasing, suppression of punctuation and stop words removal) and word meaning similarity based on different distributions (word translation probability, Word2Vec. fastText and ELMo) on the performance of the task. We conduct an error analysis to gain insight into the differences in performance between the system set-ups. The implementation is made publicly available.~1
机译:社区问答论坛在Internet用户中很流行,他们遇到的一个基本问题是试图确定他们的问题是否已经提出过。为了解决这个问题,NLP研究人员开发了自动检测问题相似性的方法,这是Se-mEval中的共同任务之一。使用“语法树内核”或“ SoftCosine”度量标准,可以最好地完成此任务。但是,目前尚不清楚这些方法为何起作用,是否可以通过更好的预处理方法来改善其性能,以及它们(和其他方法)会产生哪些类型的错误。因此,在本文中,我们将这两种方法与更传统的BM25和基于翻译的模型进行系统地组合和比较。此外,我们基于不同的分布(单词翻译概率,Word2Vec.fastText和ELMo),分析了预处理步骤(小写,抑制标点符号和去除停用词)和单词含义相似度对任务性能的影响。我们进行错误分析以深入了解系统设置之间的性能差异。该实现是公开可用的。〜1

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号