首页> 外文会议>2014 IEEE/ACM Joint Conference on Digital Libraries >Towards a stratified learning approach to predict future citation counts
【24h】

Towards a stratified learning approach to predict future citation counts

机译:采取分层学习方法来预测未来的引用次数

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we study the problem of predicting future citation count of a scientific article after a given time interval of its publication. To this end, we gather and conduct an exhaustive analysis on a dataset of more than 1.5 million scientific papers of computer science domain. On analysis of the dataset, we notice that the citation count of the articles over the years follows a diverse set of patterns; on closer inspection we identify six broad categories of citation patterns. This important observation motivates us to adopt stratified learning approach in the prediction task, whereby, we propose a two-stage prediction model - in the first stage, the model maps a query paper into one of the six categories, and then in the second stage a regression module is run only on the subpopulation corresponding to that category to predict the future citation count of the query paper. Experimental results show that the categorization of this huge dataset during the training phase leads to a remarkable improvement (around 50%) in comparison to the well-known baseline system.
机译:在本文中,我们研究了在给定的出版时间间隔后预测科学文章未来引用次数的问题。为此,我们收集并对计算机科学领域超过150万篇科学论文的数据集进行详尽的分析。在对数据集进行分析时,我们注意到多年来这些文章的引文计数遵循多种模式。通过仔细检查,我们可以确定六大类引用模式。这一重要发现促使我们在预测任务中采用分层学习方法,因此,我们提出了一个两阶段的预测模型-在第一阶段,该模型将查询文件映射到六个类别之一,然后在第二阶段仅在与该类别对应的子人群上运行回归模块,以预测查询文件的将来引用次数。实验结果表明,与众所周知的基准系统相比,在训练阶段对庞大数据集的分类带来了显着的改进(大约50%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号