首页> 外文期刊>JMIR Medical Informatics >Clinical Context–Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation
【24h】

Clinical Context–Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation

机译:使用深神经网络的临床情境感知生物医学文本摘要:模型开发与验证

获取原文
           

摘要

Background Automatic text summarization (ATS) enables users to retrieve meaningful evidence from big data of biomedical repositories to make complex clinical decisions. Deep neural and recurrent networks outperform traditional machine-learning techniques in areas of natural language processing and computer vision; however, they are yet to be explored in the ATS domain, particularly for medical text summarization. Objective Traditional approaches in ATS for biomedical text suffer from fundamental issues such as an inability to capture clinical context, quality of evidence, and purpose-driven selection of passages for the summary. We aimed to circumvent these limitations through achieving precise, succinct, and coherent information extraction from credible published biomedical resources, and to construct a simplified summary containing the most informative content that can offer a review particular to clinical needs. Methods In our proposed approach, we introduce a novel framework, termed Biomed-Summarizer, that provides quality-aware Patient/Problem, Intervention, Comparison, and Outcome (PICO)-based intelligent and context-enabled summarization of biomedical text. Biomed-Summarizer integrates the prognosis quality recognition model with a clinical context–aware model to locate text sequences in the body of a biomedical article for use in the final summary. First, we developed a deep neural network binary classifier for quality recognition to acquire scientifically sound studies and filter out others. Second, we developed a bidirectional long-short term memory recurrent neural network as a clinical context–aware classifier, which was trained on semantically enriched features generated using a word-embedding tokenizer for identification of meaningful sentences representing PICO text sequences. Third, we calculated the similarity between query and PICO text sequences using Jaccard similarity with semantic enrichments, where the semantic enrichments are obtained using medical ontologies. Last, we generated a representative summary from the high-scoring PICO sequences aggregated by study type, publication credibility, and freshness score. Results Evaluation of the prognosis quality recognition model using a large dataset of biomedical literature related to intracranial aneurysm showed an accuracy of 95.41% (2562/2686) in terms of recognizing quality articles. The clinical context–aware multiclass classifier outperformed the traditional machine-learning algorithms, including support vector machine, gradient boosted tree, linear regression, K-nearest neighbor, and na?ve Bayes, by achieving 93% (16127/17341) accuracy for classifying five categories: aim, population, intervention, results, and outcome. The semantic similarity algorithm achieved a significant Pearson correlation coefficient of 0.61 (0-1 scale) on a well-known BIOSSES dataset (with 100 pair sentences) after semantic enrichment, representing an improvement of 8.9% over baseline Jaccard similarity. Finally, we found a highly positive correlation among the evaluations performed by three domain experts concerning different metrics, suggesting that the automated summarization is satisfactory. Conclusions By employing the proposed method Biomed-Summarizer, high accuracy in ATS was achieved, enabling seamless curation of research evidence from the biomedical literature to use for clinical decision-making.
机译:背景技术自动文本摘要(ATS)使用户能够从生物医学存储库的大数据中检索有意义的证据,以进行复杂的临床决策。深度神经和经常性网络在自然语言处理和计算机视觉领域优于传统的机器学习技巧;但是,他们尚未在ATS领域探索,特别是对于医学文本摘要。客观的生物医学文本的传统方法遭受基本问题,如无法捕获临床环境,证据质量和概述段落的特征选择。我们旨在通过实现精确的,简洁和相干信息提取从可信公布的生物医学资源提取,并构建包含最具信息丰富的内容的简化摘要来规避这些局限性,这些内容可以为临床需求提供审查。方法采用我们提出的方法,我们介绍了一部新颖的框架,被称为生物概要,提供了基于质量感知的患者/问题,干预,比较和结果(CICO)的基于智能和背景的生物医学文本的摘要。生物化概述将预后质量识别模型与临床背景感知模型集成,以在最终摘要中找到生物医学制品体内的文本序列。首先,我们开发了一个深度神经网络二进制分类器,用于质量识别,以获取科学合理的研究并过滤其他人。其次,我们开发了一个双向长期内存经常性神经网络作为临床环境感知分类器,其在使用单词嵌入标记器生成的语义富集的功能上培训,以确定代表Pico文本序列的有意义的句子。第三,我们计算了使用Jaccard相似性与语义富集的查询和微微文本序列之间的相似性,其中使用医疗本体获得了语义富集。最后,我们通过研究类型,出版可信度和新鲜度分数来生成从高分微微序列的代表性摘要。结果评价预后质量识别模型使用与颅内动脉瘤相关的生物医学文献的大型数据集显示了识别质量文章的准确性95.41%(2562/2686)。临床上下文感知的多种子类分类器优于传统的机器学习算法,包括支持向量机,渐变增强树,线性回归,K-最近邻居和NA?Ve Bayes,通过实现93%(16127/17341)准确性进行分类五类:目标,人口,干预,结果和结果。语义相似性算法在语义富集之后,在公知的生物数据集(具有100对句子)上实现了0.61(0-1级)的显着皮尔逊相关系数,其在基线Jaccard相似度上提高了8.9%。最后,我们发现了三个域专家对不同指标进行的评估之间的高度正相关性,表明自动摘要令人满意。结论通过采用拟议的方法生物化摘要,实现了高精度,实现了从生物医学文献中的研究证据的无缝策划,用于临床决策。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号