首页> 外文期刊>Computer speech and language >Multi-domain evaluation framework for named entity recognition tools
【24h】

Multi-domain evaluation framework for named entity recognition tools

机译:用于命名实体识别工具的多域评估框架

获取原文
获取原文并翻译 | 示例
           

摘要

Extracting structured information from unstructured text is important for the qualitative data analysis. Leveraging NLP techniques for qualitative data analysis will effectively accelerate the annotation process, allow for large-scale analysis and provide more insights into the text to improve the performance. The first step for gaining insights from the text is Named Entity Recognition (NER). A significant challenge that directly impacts the performance of the NER process is the domain diversity in qualitative data. The represented text varies according to its domain in many aspects including taxonomies, length, formality and format. In this paper we discuss and analyse the performance of state-of-the-art tools across domains to elaborate their robustness and reliability. In order to do that, we developed a standard, expandable and flexible framework to analyse and test tools performance using corpora representing text across various domains. We performed extensive analysis and comparison of tools across various domains and from various perspectives. The resulting comparison and analysis are of significant importance for providing a holistic illustration of the state-of-the-art tools.
机译:从非结构化文本中提取结构化信息对于定性数据分析非常重要。利用NLP技术进行定性数据分析将有效地加速注释过程,允许进行大规模分析并提供对文本的更多见解以提高性能。从文本中获得见解的第一步是命名实体识别(NER)。直接影响NER过程性能的重大挑战是定性数据中的域多样性。所代表的文本在许多方面都取决于其领域,包括分类法,长度,形式和格式。在本文中,我们讨论并分析了跨领域的最新工具的性能,以阐述其鲁棒性和可靠性。为了做到这一点,我们开发了一个标准,可扩展且灵活的框架,以使用表示各个领域文本的语料库来分析和测试工具的性能。我们从各个角度对不同领域的工具进行了广泛的分析和比较。得到的比较和分析对于提供最新工具的整体说明非常重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号