【24h】

NLP Whack-A-Mole: Challenges in Cross-Domain Temporal Expression Extraction

机译:NLP WHACK-A-MOLE:跨域跨域时间表达提取的挑战

获取原文

摘要

Incorporating domain knowledge is vital in building successful natural language processing (NLP) applications. Many times, cross-domain application of a tool results in poor performance as the tool does not account for domain-specific attributes. The clinical domain is challenging in this aspect due to specialized medical terms and nomenclature, shorthand notation, fragmented text, and a variety of writing styles used by different medical units. Temporal resolution is an NLP task that, in general, is domain-agnostic because temporal information is represented using a limited lexicon. However, domain-specific aspects of temporal resolution are present in clinical texts. Here we explore parsing issues that arose when running our system, a tool built on Newswire text, on clinical notes in the THYME corpus. Many parsing issues were straightforward to correct; however, a few code changes resulted in a cascading series of parsing errors that had to be resolved before an improvement in performance was observed, revealing the complexity of temporal resolution and rule-based parsing. Our system now outperforms current state-of-the-art systems on the THYME corpus with little change in its performance on Newswire texts.
机译:结合领域知识是构建成功的自然语言处理(NLP)的应用至关重要。很多时候,一个工具,会导致性能差,因为工具不考虑特定领域的属性跨域应用。临床领域在这方面,由于专业的医学术语和命名,速记符号,零散的文本,以及各种书写不同的医疗单位使用的风格挑战。时间分辨率是NLP任务,在一般情况下,是域无关,因为时间信息是使用有限的词汇来表示。然而,时间分辨率的特定领域的方面存在于临床文本。这里,我们探讨解析运行我们的系统,建立在通社文本的工具,当在百里香语料库临床指出,出现的问题。许多分析问题有直接的正确;然而,一些代码修改导致了一个级联系列解析是有未观测到性能的改善之前必须解决的错误,揭示了时间分辨率和基于规则的分析的复杂性。我们的系统现在优于上在其上通社文章性能变化不大百里香语料库国家的最先进的当前系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号