首页> 外文会议>IEEE international conference on data engineering >DiSCern: A diversified citation recommendation system for scientific queries
【24h】

DiSCern: A diversified citation recommendation system for scientific queries

机译:DiSCern:用于科学查询的多元化引文推荐系统

获取原文

摘要

Performing literature survey for scholarly activities has become a challenging and time consuming task due to the rapid growth in the number of scientific articles. Thus, automatic recommendation of high quality citations for a given scientific query topic is immensely valuable. The state-of-the-art on the problem of citation recommendation suffers with the following three limitations. First, most of the existing approaches for citation recommendation require input in the form of either the full article or a seed set of citations, or both. Nevertheless, obtaining the recommendation for citations given a set of keywords is extremely useful for many scientific purposes. Second, the existing techniques for citation recommendation aim at suggesting prestigious and well-cited articles. However, we often need recommendation of diversified citations of the given query topic for many scientific purposes; for instance, it helps authors to write survey papers on a topic and it helps scholars to get a broad view of key problems on a topic. Third, one of the problems in the keyword based citation recommendation is that the search results typically would not include the semantically correlated articles if these articles do not use exactly the same keywords. To the best of our knowledge, there is no known citation recommendation system in the literature that addresses the above three limitations simultaneously. In this paper, we propose a novel citation recommendation system called DiSCern to precisely address the above research gap. DiSCern finds relevant and diversified citations in response to a search query, in terms of keyword(s) to describe the query topic, while using only the citation graph and the keywords associated with the articles, and no latent information. We use a novel keyword expansion step, inspired by community finding in social network analysis, in DiSCern to ensure that the semantically correlated articles are also included in the results. Our proposed appr- ach primarily builds on the Vertex Reinforced Random Walk (VRRW) to balance prestige and diversity in the recommended citations. We demonstrate the efficacy of DiSCern empirically on two datasets: a large publication dataset of more than 1.7 million articles in computer science domain and a dataset of more than 29,000 articles in theoretical high-energy physics domain. The experimental results show that our proposed approach is quite efficient and it outperforms the state-of-the-art algorithms in terms of both relevance and diversity.
机译:由于科学论文数量的快速增长,进行学术活动的文献调查已成为一项具有挑战性和耗时的任务。因此,针对给定的科学查询主题自动推荐高质量的引文非常有价值。有关引文推荐问题的最新技术具有以下三个局限性。首先,大多数现有的引文推荐方法都要求以整篇文章或引文种子集或两者的形式进行输入。然而,对于许多科学目的,获得给定关键字集的引文推荐是非常有用的。第二,现有的引文推荐技术旨在提出有名望和被引用的文章。但是,出于许多科学目的,我们经常需要对给定查询主题进行多种引用的推荐;例如,它可以帮助作者撰写有关某个主题的调查论文,还可以帮助学者广泛了解该主题的关键问题。第三,基于关键字的引文推荐中的问题之一是,如果这些文章不使用完全相同的关键字,则搜索结果通常将不包括语义相关的文章。据我们所知,文献中没有已知的引文推荐系统可以同时解决上述三个限制。在本文中,我们提出了一种新颖的引文推荐系统DiSCern,以精确解决上述研究空白。 DiSCern可以根据描述查询主题的关键词来找到与搜索查询相关的相关且多样化的引文,而仅使用引文图和与文章相关联的关键词,而没有潜在信息。我们在DiSCern中使用了一个新颖的关键字扩展步骤,该步骤受到了社区在社交网络分析中的发现的启发,以确保与语义相关的文章也包含在结果中。我们提出的方法主要建立在“顶点增强随机游走”(VRRW)之上,以平衡推荐引用中的声望和多样性。我们通过经验证明了DiSCern在两个数据集上的功效:计算机科学领域中超过170万篇文章的大型出版物数据集和理论高能物理领域中超过29,000篇文章的数据集。实验结果表明,我们提出的方法非常有效,并且在相关性和多样性方面都优于最新的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号