Exploiting citation networks for large-scale author name disambiguation

Christian Schulz; Amin Mazloumian; Alexander M Petersen; Orion Penner; Dirk Helbing

首页> 外文期刊>EPJ Data Science >Exploiting citation networks for large-scale author name disambiguation

【24h】

Exploiting citation networks for large-scale author name disambiguation

机译：利用引文网络大规模消除作者姓名歧义

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects individual papers and then merges similar clusters. This parameterized model is optimized using an h-index based recall measure, favoring the correct assignment of well-cited publications, and a name-initials-based precision using WoS metadata and cross-referenced Google Scholar profiles. Despite the use of limited metadata, we reach a recall of 87% and a precision of 88% with a preference for researchers with high h-index values. 47 million articles of WoS can be disambiguated on a single machine in less than a day. We develop an h-index distribution model, confirming that the prediction is in excellent agreement with the empirical data, and yielding insight into the utility of the h-index in real academic ranking scenarios.

机译：我们提出了一种新颖的算法和验证方法，用于消除大型书目数据集中作者姓名的歧义，并将其应用于完整的Web of Science（WoS）引用索引。我们的算法仅依赖于WoS涵盖的整个时期内的作者和引文图。建立基于共同合著者，自我引文，共享参考文献和引用的成对出版相似度度量，以执行两步的聚类聚类，该聚类聚类首先连接各个论文，然后合并相似的聚类。该参数化模型使用基于h索引的召回措施进行了优化，这有利于正确引用被引用的出版物，并使用WoS元数据和交叉引用的Google Scholar个人资料来实现基于名称首字母的精度。尽管使用了有限的元数据，但我们的召回率达到了87％，准确率达到了88％，并且偏爱具有高h指数值的研究人员。在不到一天的时间内，可以在一台机器上消除4700万篇WoS的歧义。我们开发了h指数分布模型，确认预测与经验数据非常吻合，并深入了解了h指数在实际学术排名方案中的效用。

著录项

来源
《EPJ Data Science》 |2014年第1期|共14页
作者
Christian Schulz; Amin Mazloumian; Alexander M Petersen; Orion Penner; Dirk Helbing;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Citation-Based Bootstrapping for Large-Scale Author Disambiguation [J] . Michael Levin, Stefan Krawczyk, Steven Bethard, Journal of the American Society for Information Science and Technology . 2012,第5期

机译：基于引用的自举技术可消除大型作者的歧义
2. Distortive Effects of Initial-Based Name Disambiguation on Measurements of Large-Scale Coauthorship Networks [J] . Jinseok Kim, Jana Diesner Journal of the American Society for Information Science and Technology . 2016,第6期

机译：初始名称歧义化对大规模共同作者网络度量的扭曲效应
3. Incremental author name disambiguation using author profile models and self-citations [J] . ?jaz HUSSAIN, Sohail ASGHAR Turkish Journal of Electrical Engineering and Computer Sciences . 2019,第5期

机译：增量作者姓名使用作者配置文件模型和自我引用歧义
4. Incremental Author Name Disambiguation for Scientific Citation Data [C] . Zhengqiao Zhao, Jason Rollins, Linge Bai, IEEE International Conference on Data Science and Advanced Analytics . 2017

机译：递增的作者姓名对科学引文数据的歧义消除
5. The Impact of Author Name Disambiguation on Knowledge Discovery from Large-Scale Scholarly Data [D] . Kim, Jinseok. 2017

机译：作者姓名歧义对大型学术数据的知识发现的影响
6. Scientific collaboration and endorsement: Network analysis of coauthorship and citation networks [O] . Ying Ding -1

机译：科学合作与认可：共同奉献和引用网络的网络分析
7. Exploiting citation networks for large-scale author name disambiguation [O] . Christian Schulz, Amin Mazloumian, Alexander M Petersen, 2014

机译：利用引文网络大规模消除作者姓名歧义
8. Exploiting OSPaN (Optical Solar Patrol Network) Data to Understand Large-Scale Solar Eruptions Impacting Space Weather [R] . Cliver, E. W. 2011

机译：利用OspaN（光学太阳能巡逻网络）数据来了解影响太空天气的大规模太阳火山爆发

Exploiting citation networks for large-scale author name disambiguation

摘要

著录项

相似文献

相关主题

期刊订阅