首页> 外文会议>Asian conference on intelligent information and database systems >Vietnamese Author Name Disambiguation for Integrating Publications from Heterogeneous Sources
【24h】

Vietnamese Author Name Disambiguation for Integrating Publications from Heterogeneous Sources

机译:越南作者姓名从异类文献中整合出版物时的歧义

获取原文

摘要

Automatic integration of bibliographical data from various sources is a really critical task in the field of digital libraries. One of the most important challenges for this process is the author name disambiguation. In this paper, we applied supervised learning approach and proposed a set of features that can be used to assist training classifiers in disambiguating Vietnamese author names. In order to evaluate efficiency of the proposed features set, we did experiments on five supervised learning methods: Random Forest, Support Vector Machine (SVM), k-Nearest Neighbors (kNN), C4.5 (Decision Tree), Bayes. The experiment dataset collected from three online digital libraries such as Microsoft Academic Search, ACM Digital Library, IEEE Digital Library. Our experiments shown that kNN, Random Forest, C4.5 classifier outperform than the others. The average accuracy archived with kNN approximates 94.55%, random forest is 94.23%, C4.5 is 93.98%, SVM is 91.91% and Bayes is lowest with 81.56%. Summary, we archived the highest accuracy 98.39% for author name disambiguation problem with the proposed feature set in our experiments on the Vietnamese authors dataset.
机译:来自各种来源的书目数据的自动集成在数字图书馆领域是一项非常关键的任务。此过程中最重要的挑战之一是作者名称的歧义消除。在本文中,我们应用了监督学习方法,并提出了一组可用于帮助训练分类器消除越南作者姓名歧义的功能。为了评估提出的功能集的效率,我们对五种监督学习方法进行了实验:随机森林,支持向量机(SVM),k最近邻(kNN),C4.5(决策树),贝叶斯。实验数据集是从三个在线数字图书馆(例如Microsoft学术搜索,ACM数字图书馆,IEEE数字图书馆)收集的。我们的实验表明,kNN,随机森林,C4.5分类器的性能优于其他分类器。用kNN归档的平均准确度约为94.55%,随机森林为94.23%,C4.5为93.98%,SVM为91.91%,贝叶斯最低,为81.56%。总结,在越南作者数据集的实验中,我们使用建议的功能集将作者姓名消除歧义问题的最高准确度存档为98.39%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号