首页> 外文学位 >High-performance knowledge-based entity extraction.
【24h】

High-performance knowledge-based entity extraction.

机译:高性能的基于知识的实体提取。

获取原文
获取原文并翻译 | 示例

摘要

Human language records most of the information and knowledge produced by organizations and individuals. The machine-based process of analyzing information in natural language form is called natural language processing (NLP). Information extraction (IE) is the process of analyzing machine-readable text and identifying and collecting information about specified types of entities, events, and relationships.;Named entity extraction is an area of IE concerned specifically with recognizing and classifying proper names for persons, organizations, and locations from natural language. Extant approaches to the design and implementation named entity extraction systems include: (a) knowledge-engineering approaches which utilize domain experts to hand-craft NLP rules to recognize and classify named entities; (b) supervised machine-learning approaches in which a previously tagged corpus of named entities is used to train algorithms which incorporate statistical and probabilistic methods for NLP; or (c) hybrid approaches which incorporate aspects of both methods described in (a) and (b).;Performance for IE systems is evaluated using the metrics of precision and recall which measure the accuracy and completeness of the IE task. Previous research has shown that utilizing a large knowledge base of known entities has the potential to improve overall entity extraction precision and recall performance. Although existing methods typically incorporate dictionary-based features, these dictionaries have been limited in size and scope.;The problem addressed by this research was the design, implementation, and evaluation of a new high-performance knowledge-based hybrid processing approach and associated algorithms for named entity extraction, combining rule-based natural language parsing and memory-based machine learning classification facilitated by an extensive knowledge base of existing named entities. The hybrid approach implemented by this research resulted in improved precision and recall performance approaching human-level capability compared to existing methods measured using a standard test corpus. The system design incorporated a parallel processing system architecture with capabilities for managing a large knowledge base and providing high throughput potential for processing large collections of natural language text documents.
机译:人类语言记录了组织和个人产生的大多数信息和知识。基于机器的自然语言形式信息分析过程称为自然语言处理(NLP)。信息提取(IE)是分析机器可读文本并识别和收集有关指定类型的实体,事件和关系的信息的过程。命名实体提取是IE的一个领域,专门涉及识别和分类人的专有名称,自然语言的组织和位置。设计和实现命名实体提取系统的现有方法包括:(a)知识工程方法,利用领域专家来手工编制NLP规则以识别和分类命名实体; (b)有监督的机器学习方法,其中使用先前标记的命名实体语料来训练算法,该算法结合了NLP的统计和概率方法;或(c)结合了(a)和(b)中所述方法的各个方面的混合方法。IE系统的性能是使用精确度和查全率来评估IE任务的准确性和完整性的。先前的研究表明,利用大量已知实体的知识库具有改善整体实体提取精度和召回性能的潜力。尽管现有方法通常都包含基于字典的功能,但是这些字典的大小和范围受到限制。;本研究解决的问题是基于高性能知识的新型混合处理方法和相关算法的设计,实现和评估对于命名实体提取,结合了基于规则的自然语言解析和基于内存的机器学习分类,而现有的命名实体具有广泛的知识库。与使用标准测试语料库测得的现有方法相比,此研究实施的混合方法提高了精度和召回性能,接近人的能力。该系统设计结合了并行处理系统体系结构,该体系结构具有管理大型知识库并为处理大量自然语言文本文档提供高吞吐量潜力的能力。

著录项

  • 作者

    Middleton, Anthony M.;

  • 作者单位

    Nova Southeastern University.;

  • 授予单位 Nova Southeastern University.;
  • 学科 Artificial Intelligence.;Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 317 p.
  • 总页数 317
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 人工智能理论;自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:37:49

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号