首页> 外文期刊>JMIR Medical Informatics >Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis
【24h】

Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis

机译:使用变压器模型预测临床句子对之间的语义相似性:评估和代表性分析

获取原文
           

摘要

Background Semantic textual similarity (STS) is a natural language processing (NLP) task that involves assigning a similarity score to 2 snippets of text based on their meaning. This task is particularly difficult in the domain of clinical text, which often features specialized language and the frequent use of abbreviations. Objective We created an NLP system to predict similarity scores for sentence pairs as part of the Clinical Semantic Textual Similarity track in the 2019 n2c2/OHNLP Shared Task on Challenges in Natural Language Processing for Clinical Data. We subsequently sought to analyze the intermediary token vectors extracted from our models while processing a pair of clinical sentences to identify where and how representations of semantic similarity are built in transformer models. Methods Given a clinical sentence pair, we take the average predicted similarity score across several independently fine-tuned transformers. In our model analysis we investigated the relationship between the final model’s loss and surface features of the sentence pairs and assessed the decodability and representational similarity of the token vectors generated by each model. Results Our model achieved a correlation of 0.87 with the ground-truth similarity score, reaching 6th place out of 33 teams (with a first-place score of 0.90). In detailed qualitative and quantitative analyses of the model’s loss, we identified the system’s failure to correctly model semantic similarity when both sentence pairs contain details of medical prescriptions, as well as its general tendency to overpredict semantic similarity given significant token overlap. The token vector analysis revealed divergent representational strategies for predicting textual similarity between bidirectional encoder representations from transformers (BERT)–style models and XLNet. We also found that a large amount information relevant to predicting STS can be captured using a combination of a classification token and the cosine distance between sentence-pair representations in the first layer of a transformer model that did not produce the best predictions on the test set. Conclusions We designed and trained a system that uses state-of-the-art NLP models to achieve very competitive results on a new clinical STS data set. As our approach uses no hand-crafted rules, it serves as a strong deep learning baseline for this task. Our key contribution is a detailed analysis of the model’s outputs and an investigation of the heuristic biases learned by transformer models. We suggest future improvements based on these findings. In our representational analysis we explore how different transformer models converge or diverge in their representation of semantic signals as the tokens of the sentences are augmented by successive layers. This analysis sheds light on how these “black box” models integrate semantic similarity information in intermediate layers, and points to new research directions in model distillation and sentence embedding extraction for applications in clinical NLP.
机译:背景技术语义文本相似性(STS)是一种自然语言处理(NLP)任务,涉及基于其含义将相似度分数分配给2个短片段。该任务在临床文本领域尤其困难,这通常具有专门的语言和频繁使用缩写。目的我们创建了一个NLP系统,以预测句子对的相似性分数,作为2019 N2C2 / OHNLP共享任务的临床语义文本相似度轨道的一部分,这是临床数据的自然语言处理中的挑战。随后我们试图分析从我们模型中提取的中间令牌向量,同时处理一对临床句子,以识别在变压器模型中建立了语义相似性的何处和何处。方法给出临床句子对,我们采取跨越多个独立微调的变压器的平均预测相似度得分。在我们的模型分析中,我们调查了句子对的最终模型损失和表面特征之间的关系,并评估了每个模型生成的令牌矢量的可解码性和代表性相似性。结果我们的型号达到了0.87的相关性,地面真理相似度得分,达到33支球队的第6名(首次得分为0.90)。在模型的损失的详细定性和定量分析中,当两个句子对都包含医疗处方的细节时,我们确定了系统未正确模型语义相似性,以及其对溢出语义相似性的一般趋势,给出了显着的令牌重叠。令牌矢量分析显示了用于预测来自变换器(BERT)-Style模型和XLNET之间的双向编码器表示之间的文本相似性的发散代表性策略。我们还发现,可以使用分类令牌的组合和第一层的变压器模型中的第一层的句子对表示之间的组合来捕获与预测STS相关的大量信息,该变压器模型中没有产生测试集的最佳预测。结论我们设计并培训了一种使用最先进的NLP模型来实现在新的临床STS数据集中实现非常竞争力的结果的系统。由于我们的方法使用没有手工制作的规则,它是这项任务的强烈深度学习基准。我们的主要贡献是对模型产出的详细分析和变压器模型学习的启发式偏见的调查。我们建议基于这些发现的未来改进。在我们的代表性分析中,我们探讨不同的变压器模型在语义信号的表示中如何聚集或分歧,因为句子的令牌被连续的层增强。这种分析揭示了这些“黑匣子”模型如何在中间层中集成语义相似性信息,并指向临床NLP中应用的模型蒸馏和句子嵌入提取的新研究方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号