...
首页> 外文期刊>BMC Medical Informatics and Decision Making >Parsing clinical text: how good are the state-of-the-art parsers?
【24h】

Parsing clinical text: how good are the state-of-the-art parsers?

机译:解析临床文本:最新的解析器有多好?

获取原文
           

摘要

Background Parsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain. Methods In this study, we investigated the performance of three state-of-the-art parsers: the Stanford parser, the Bikel parser, and the Charniak parser, using following two datasets: (1) A Treebank containing 1,100 sentences that were randomly selected from progress notes used in the 2010 i2b2 NLP challenge and manually annotated according to a Penn Treebank based guideline; and (2) the MiPACQ Treebank, which is developed based on pathology notes and clinical notes, containing 13,091 sentences. We conducted three experiments on both datasets. First, we measured the performance of the three state-of-the-art parsers on the clinical Treebanks with their default settings. Then we re-trained the parsers using the clinical Treebanks and evaluated their performance using the 10-fold cross validation method. Finally we re-trained the parsers by combining the clinical Treebanks with the Penn Treebank. Results Our results showed that the original parsers achieved lower performance in clinical text (Bracketing F-measure in the range of 66.6%-70.3%) compared to general English text. After retraining on the clinical Treebank, all parsers achieved better performance, with the best performance from the Stanford parser that reached the highest Bracketing F-measure of 73.68% on progress notes and 83.72% on the MiPACQ corpus using 10-fold cross validation. When the combined clinical Treebanks and Penn Treebank was used, of the three parsers, the Charniak parser achieved the highest Bracketing F-measure of 73.53% on progress notes and the Stanford parser reached the highest F-measure of 84.15% on the MiPACQ corpus. Conclusions Our study demonstrates that re-training using clinical Treebanks is critical for improving general English parsers' performance on clinical text, and combining clinical and open domain corpora might achieve optimal performance for parsing clinical text.
机译:背景分析生成句子(语法树)的句法结构,是自然语言处理(NLP)研究在包括医学在内的任何领域中的重要组成部分。尽管在通用英语领域开发的解析器(例如斯坦福解析器)已应用于临床文本,但尚没有对其在医学领域中的性能进行正式评估和比较。方法在本研究中,我们使用以下两个数据集调查了三个最先进的解析器:Stanford解析器,Bikel解析器和Charniak解析器的性能:(1)包含1,100个随机选择的句子的树库摘自2010 i2b2 NLP挑战赛中使用的进度记录,并根据基于Penn Treebank的指南手动进行了注释; (2)基于病理记录和临床记录开发的MiPACQ树库,包含13,091个句子。我们对两个数据集进行了三个实验。首先,我们使用默认设置在临床树库中测量了三个最新解析器的性能。然后,我们使用临床树库对解析器进行了重新训练,并使用10倍交叉验证方法评估了它们的性能。最后,我们通过结合临床Treebanks和Penn Treebanks重新训练了解析器。结果我们的结果表明,与普通英文文本相比,原始解析器在临床文本中的性能较低(Bracketing F-measure范围为66.6%-70.3%)。经过在临床Treebank上的再培训之后,所有解析器均取得了更好的性能,其中斯坦福解析器的最佳性能达到了最高的Bracketing F量度,进度记录为73.68%,MiPACQ语料库为83.72%,使用了十倍交叉验证。当使用组合的临床Treebanks和Penn Treebank时,在三个解析器中,Charniak解析器在进度记录上达到了最高的包围曝光F-措施,为73.53%,而Stanford解析器在MiPACQ语料库上达到了最高的F-措施,为84.15%。结论我们的研究表明,使用临床Treebanks进行再培训对于提高通用英语解析器在临床文本上的性能至关重要,而结合临床和开放域语料库可能会在解析临床文本方面达到最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号