首页> 外文会议>International conference on world wide web >Joint Question Clustering and Relevance Prediction for Open Domain Non-Factoid Question Answering
【24h】

Joint Question Clustering and Relevance Prediction for Open Domain Non-Factoid Question Answering

机译:开放域非事实类问答的联合问题聚类和相关性预测

获取原文

摘要

Web searches are increasingly formulated as natural language questions, rather than keyword queries. Retrieving answers to such questions requires a degree of understanding of user expectations. An important step in this direction is to automatically infer the type of answer implied by the question, e.g., factoids, statements on a topic, instructions, reviews, etc. Answer Type taxonomies currently exist for factoid-style questions, but not for open-domain questions. Building taxonomies for non-factoid questions is a harder problem since these questions can come from a very broad semantic space. A few attempts have been made to develop taxonomies for non-factoid questions, but these tend to be too narrow or domain specific. In this paper, we address this problem by modeling the Answer Type as a latent variable that is learned in a data-driven fashion, allowing the model to be more adaptive to new domains and data sets. We propose approaches that detect the relevance of candidate answers to a user question by jointly 'clustering' questions according to the hidden variable, and modeling relevance conditioned on this hidden variable. In this paper we propose 3 new models: (a) Logistic Regression Mixture (LRM), (b) Glocal Logistic Regression Mixture (G-LRM) and (c) Mixture Glocal Logistic Regression Mixture (MG-LRM) that automatically learn question-clusters and cluster-specific relevance models. All three models perform better than a baseline relevance model that uses explicit Answer Type categories predicted by a supervised Answer-Type classifier, on a newsgroups dataset. Our models also perform better than a baseline relevance model that does not use any answer-type information on a blogs dataset.
机译:网络搜索越来越多地被定义为自然语言问题,而不是关键字查询。检索此类问题的答案需要一定程度的对用户期望的理解。朝此方向迈出的重要一步是自动推断问题所隐含的答案类型,例如,类事实,对主题的陈述,说明,评论等。类事实式问题目前存在答案类型分类法,但对于开放式问题则不存在域问题。为非事实性问题建立分类法是一个比较困难的问题,因为这些问题可能来自非常广泛的语义空间。已经进行了一些尝试来开发非事实类问题的分类法,但是这些尝试往往过于狭窄或特定于领域。在本文中,我们通过将答案类型建模为以数据驱动方式学习的潜在变量来解决此问题,从而使模型更适应新的领域和数据集。我们提出了一种方法,通过根据隐藏变量共同“聚类”问题并对基于该隐藏变量的相关性进行建模,来检测候选答案与用户问题的相关性。在本文中,我们提出了3种新模型:(a)Logistic回归混合物(LRM),(b)Glocal Logistic回归混合物(G-LRM)和(c)Glocal Logistic回归混合物(MG-LRM),它们可以自动学习问题-聚类和特定于聚类的关联模型。与在新闻组数据集上使用由监督的答案类型分类器预测的显式答案类型类别的基线相关性模型相比,这三个模型的性能都更好。我们的模型也比不使用博客数据集上任何答案类型信息的基线相关性模型表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号