High-performance knowledge-based entity extraction.

机译：高性能的基于知识的实体提取。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Human language records most of the information and knowledge produced by organizations and individuals. The machine-based process of analyzing information in natural language form is called natural language processing (NLP). Information extraction (IE) is the process of analyzing machine-readable text and identifying and collecting information about specified types of entities, events, and relationships.;Named entity extraction is an area of IE concerned specifically with recognizing and classifying proper names for persons, organizations, and locations from natural language. Extant approaches to the design and implementation named entity extraction systems include: (a) knowledge-engineering approaches which utilize domain experts to hand-craft NLP rules to recognize and classify named entities; (b) supervised machine-learning approaches in which a previously tagged corpus of named entities is used to train algorithms which incorporate statistical and probabilistic methods for NLP; or (c) hybrid approaches which incorporate aspects of both methods described in (a) and (b).;Performance for IE systems is evaluated using the metrics of precision and recall which measure the accuracy and completeness of the IE task. Previous research has shown that utilizing a large knowledge base of known entities has the potential to improve overall entity extraction precision and recall performance. Although existing methods typically incorporate dictionary-based features, these dictionaries have been limited in size and scope.;The problem addressed by this research was the design, implementation, and evaluation of a new high-performance knowledge-based hybrid processing approach and associated algorithms for named entity extraction, combining rule-based natural language parsing and memory-based machine learning classification facilitated by an extensive knowledge base of existing named entities. The hybrid approach implemented by this research resulted in improved precision and recall performance approaching human-level capability compared to existing methods measured using a standard test corpus. The system design incorporated a parallel processing system architecture with capabilities for managing a large knowledge base and providing high throughput potential for processing large collections of natural language text documents.

机译：人类语言记录了组织和个人产生的大多数信息和知识。基于机器的自然语言形式信息分析过程称为自然语言处理（NLP）。信息提取（IE）是分析机器可读文本并识别和收集有关指定类型的实体，事件和关系的信息的过程。命名实体提取是IE的一个领域，专门涉及识别和分类人的专有名称，自然语言的组织和位置。设计和实现命名实体提取系统的现有方法包括：（a）知识工程方法，利用领域专家来手工编制NLP规则以识别和分类命名实体；（b）有监督的机器学习方法，其中使用先前标记的命名实体语料来训练算法，该算法结合了NLP的统计和概率方法；或（c）结合了（a）和（b）中所述方法的各个方面的混合方法。IE系统的性能是使用精确度和查全率来评估IE任务的准确性和完整性的。先前的研究表明，利用大量已知实体的知识库具有改善整体实体提取精度和召回性能的潜力。尽管现有方法通常都包含基于字典的功能，但是这些字典的大小和范围受到限制。;本研究解决的问题是基于高性能知识的新型混合处理方法和相关算法的设计，实现和评估对于命名实体提取，结合了基于规则的自然语言解析和基于内存的机器学习分类，而现有的命名实体具有广泛的知识库。与使用标准测试语料库测得的现有方法相比，此研究实施的混合方法提高了精度和召回性能，接近人的能力。该系统设计结合了并行处理系统体系结构，该体系结构具有管理大型知识库并为处理大量自然语言文本文档提供高吞吐量潜力的能力。

著录项

作者
Middleton, Anthony M.;
展开▼
作者单位

Nova Southeastern University.;

展开▼
授予单位 Nova Southeastern University.;
学科 Artificial Intelligence.;Computer Science.
学位 Ph.D.
年度 2009
页码 317 p.
总页数 317
原文格式 PDF
正文语种 eng
中图分类人工智能理论;自动化技术、计算机技术;
关键词
入库时间 2022-08-17 11:37:49

相似文献

外文文献
中文文献
专利

1. Determination of the isoflavone genistein in soybeans by high-performance liquid chromatography following cloud point extraction. [J] . Mirzaei M, Naeini A. K, Behzadi M. Journal of AOAC International . 2012,第3期

机译：浊点萃取后通过高效液相色谱法测定大豆中的异黄酮金雀异黄素。
2. A sensitive and validated method for determination of melamine residue in liquid milk by reversed phase high-performance liquid chromatography with solid-phase extraction. [J] . Hanwen Sun, Lixin Wang, Lianfeng Ai, Food Control . 2010,第5期

机译：固相萃取-反相高效液相色谱法测定液态奶中三聚氰胺残留的灵敏有效方法
3. Development and validation of a high-performance liquid chromatography assay for posaconazole utilizing solid-phase extraction. [J] . Storzinger D, Swoboda S, Lichtenstern C, Clinical chemistry and laboratory medicine: CCLM . 2008,第12期

机译：利用固相萃取开发和验证泊沙康唑的高效液相色谱分析方法。
4. A Knowledge-based Multi-entity and Cooperative System Architecture [C] . Manuel Mühlig, Lydia Fischer, Stephan Hasler, IEEE International Conference on Human-Machine Systems . 2020

机译：基于知识的多实体协作系统架构
5. Entity Analysis with Weak Supervision: Typing, Linking, and Attribute Extraction. [D] . Ling, Xiao. 2015

机译：具有弱监督的实体分析：键入，链接和属性提取。
6. Towards knowledge-based retrieval of medical images. The role of semantic indexing image content representation and knowledge-based retrieval. [O] . H. J. Lowe, I. Antipov, W. Hersh, 1998

机译：致力于基于知识的医学图像检索。语义索引图像内容表示和基于知识的检索的作用。
7. Determination of Estazolam in Plasma by High-Performance Liquid Chromatography with Solid-Phase Extraction. [O] . Masatomo MIURA, Tadashi OHKUBO, Kazunobu SUGAWARA, 2002

机译：用高效液相色谱法测定等离子体中的偏唑仑测定固相萃取。
8. Impact of Machine-Translated Text on Entity and Relationship Extraction. [R] . Mittrick, M. R., Richardson, J. T. 2014

机译：机器翻译文本对实体和关系提取的影响。

High-performance knowledge-based entity extraction.

摘要

著录项

相似文献

相关主题

期刊订阅