...
首页> 外文期刊>International Journal on Digital Libraries >On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method
【24h】

On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method

机译:针对作者姓名歧义的领域特定启发式方法的组合:最近的聚类方法

获取原文
获取原文并翻译 | 示例
           

摘要

Author name disambiguation has been one of the hardest problems faced by digital libraries since their early days. Historically, supervised solutions have empirically outperformed those based on heuristics, but with the burden of having to rely on manually labeled training sets for the learning process. Moreover, most supervised solutions just apply some type of generic machine learning solution and do not exploit specific knowledge about the problem. In this article, we follow a similar reasoning, but in the opposite direction. Instead of extending an existing supervised solution, we propose a set of carefully designed heuristics and similarity functions, and apply supervision only to optimize such parameters for each particular dataset. As our experiments show, the result is a very effective, efficient and practical author name disambiguation method that can be used in many different scenarios. In fact, we show that our method can beat state-of-the-art supervised methods in terms of effectiveness in many situations while being orders of magnitude faster. It can also run without any training information, using only default parameters, and still be very competitive when compared to these supervised methods (beating several of them) and better than most existing unsupervised author name disambiguation solutions.
机译:自数字图书馆成立以来,作者名称的歧义一直是数字图书馆面临的最困难的问题之一。从历史上看,有监督的解决方案在经验上优于基于启发式的解决方案,但其负担是必须依赖手动标记的训练集进行学习。而且,大多数受监督的解决方案仅应用某种类型的通用机器学习解决方案,而没有利用有关该问题的特定知识。在本文中,我们遵循类似的推理,但方向相反。我们没有扩展现有的有监督的解决方案,而是提出了一组经过精心设计的启发式和相似性函数,并且仅将监督应用于为每个特定数据集优化此类参数。如我们的实验所示,结果是一种非常有效,高效且实用的作者姓名歧义消除方法,可以在许多不同的情况下使用。实际上,我们表明,在许多情况下,我们的方法在有效性方面都可以击败最新的监督方法,而且速度要快几个数量级。它也可以在不使用任何培训信息的情况下运行,仅使用默认参数,与这些监督方法(击败其中的几种方法)相比仍具有很高的竞争力,并且比大多数现有的无监督作者姓名歧义解决方案要好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号