首页> 外文期刊>Knowledge-Based Systems >Biomedical-domain pre-trained language model for extractive summarization
【24h】

Biomedical-domain pre-trained language model for extractive summarization

机译:生物医学域预培训的提取综准语言模型

获取原文
获取原文并翻译 | 示例
           

摘要

In recent years, the performance of deep neural network in extractive summarization task has been improved significantly compared with traditional methods. However, in the field of biomedical extractive summarization, existing methods cannot make good use of the domain-aware external knowledge; furthermore, the document structural feature is omitted by existing deep neural network model. In this paper, we propose a novel model called BioBERTSum to better capture token-level and sentence-level contextual representation, which uses a domain-aware bidirectional language model pre-trained on large-scale biomedical corpora as encoder, and further fine-tunes the language model for extractive text summarization task on single biomedical document. Especially, we adopt a sentence position embedding mechanism, which enables the model to learn the position information of sentences and achieve the structural feature of document. To the best of our knowledge, this is the first work to use the pre-trained language model and fine-tuning strategy for extractive summarization task in the biomedical domain. Experiments on PubMed dataset show that our proposed model outperforms the recent SOTA (state-of-the-art) model by ROUGE-1/2/L. (C) 2020 Elsevier B.V. All rights reserved.
机译:近年来,与传统方法相比,在提取总结任务中的深度神经网络中的性能得到了显着改善。然而,在生物医学的提取摘要领域,现有方法无法良好地利用域名的外部知识;此外,现有的深神经网络模型省略了文档结构特征。在本文中,我们提出了一种新颖的模型,称为Biobertsum,以更好地捕获令牌级和句子级语境表示,它使用在大规模生物医学Corpora上预先培训的域感知双向语言模型作为编码器,进一步进行微调单一生物医学文档中提取文本摘要任务的语言模型。特别是,我们采用句子位置嵌入机制,这使得模型能够学习句子的位置信息并实现文档的结构特征。据我们所知,这是第一项使用预先训练的语言模型和微调策略在生物医学领域中的提取摘要任务的工作。 PubMed DataSet的实验表明,我们所提出的模型优于Rouge-1/2 / L最近的Sota(最先进的)模型。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号