首页> 外文期刊>Frontiers in Research Metrics and Analytics >Deep Reference Mining from Scholarly Literature in the Arts and Humanities
【24h】

Deep Reference Mining from Scholarly Literature in the Arts and Humanities

机译:艺术和人文学科学术文献的深度参考挖掘

获取原文
           

摘要

We consider the task of reference mining: the detection, extraction and classification of references within the full text of scholarly publications. Reference mining brings forward specific challenges, such as the need to capture the morphology of highly abbreviated words and the dependence among the elements of a reference, both following codified reference styles. This task is particularly difficult, and little explored, with respect to the literature in the arts and humanities, where references are mostly given in footnotes. We apply a deep learning architecture for reference mining from the full text of scholarly publications. We explore and discuss three architectural components: word and character-level word embeddings, different prediction layers (Softmax and Conditional Random Fields) and multi-task over single-task learning. Our best model uses both pre-trained word embeddings and characters embeddings, and a BiLSTM-CRF architecture. We test our solution on a dataset of annotated references from the historiography on Venice and, using a linear-chain CRF classifier as a baseline, we show that this deep learning architecture improves by a considerable margin. Furthermore, multi-task learning performs almost on par with a single-task approach. We thus confirm that there are important gains to be had by adopting deep learning for the task of reference mining.
机译:我们考虑参考文献挖掘的任务:在学术出版物的全文中对参考文献进行检测,提取和分类。参考挖掘提出了特定的挑战,例如需要遵循高度统一的参考样式来捕获高度缩写词的形态以及参考元素之间的依赖性。对于艺术和人文领域的文献而言,这项任务特别困难,而且很少探讨,因为文献大多在脚注中给出。我们将深度学习架构用于学术出版物全文的参考挖掘。我们探索并讨论了三个体系结构组件:单词和字符级单词嵌入,不同的预测层(Softmax和条件随机字段)以及单任务学习中的多任务。我们最好的模型同时使用了预训练的单词嵌入和字符嵌入以及BiLSTM-CRF体系结构。我们在威尼斯历史学的带注释引用的数据集上测试了我们的解决方案,并使用线性链CRF分类器作为基线,我们证明了这种深度学习体系结构有相当大的提高。此外,多任务学习的性能几乎与单任务方法相当。因此,我们确认通过采用深度学习来完成参考挖掘的工作将有重要的收获。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号