XONTO: An Ontology-Based System for Semantic Information Extraction from PDF Documents

机译：Xonto：基于本体的基于本体的语义信息提取系统，来自PDF文档

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Information extraction is of paramount importance in several real world applications in the areas of business intelligence, competitive and military intelligence. Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In this paper the novel ontology-based system named XONTO, that allows the semantic extraction of information from PDF unstructured documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses the semantic of the information to extract and the rules that, in turn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example.

机译：在商业智能，竞争和军事情报领域的几个现实世界应用中，信息提取至关重要。虽然提出了几种复杂和实际的方法，但它们仍然有限于许多方面。在本文中，提出了名为Xonto的新型本体论系统，允许从PDF非结构化文件中提取信息的语义提取。 Xonto系统建立在自我描述的本体中的想法，其中对象和类可以由命名描述符的一组规则配备。这些规则表示允许自动识别和提取PDF文档中包含的本体对象的模式，并且当信息以表格形式排列时，也可以识别在PDF文档中。这样，自我描述的本体论表达了要提取的信息的语义和依次填充自己的规则。在纸质中，Xonto系统行为和结构通过运行示例进行绘制。

著录项

来源
《International Conference on Tools with Artificial Intelligence》|2008年||共8页
会议地点
作者
Oro Ermelinda; Ruffolo Massimo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Information Extraction; Knowledge representation and reasoning; PDF format; attribute grammars; ontology;

机译：信息提取;知识表示和推理;PDF格式;属性语法;本体;

相似文献

外文文献
中文文献
专利

1. ONTOLOGY-BASED INFORMATION EXTRACTION FROM PDF DOCUMENTS WITH XONTO [J] . ERMELINDA ORO, MASSIMO RUFFOLO, DOMENICO SACCA International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2009,第5期

机译：使用Xonto从PDF文档中基于本体的信息提取
2. Semantic PDF Segmentation for Legacy Documents in Technical Documentation [J] . Jan Oevermann Procedia Computer Science . 2018,第1期

机译：技术文档中旧文档的语义PDF分割
3. Intelligent semantic concept mapping for semantic query rewriting/optimization in ontology-based information integration system [J] . Kwon J, Jeong D, Lee LS, International journal of software engineering and knowledge engineering . 2004,第5期

机译：基于本体的信息集成系统中用于语义查询重写/优化的智能语义概念映射
4. XONTO: An Ontology-Based System for Semantic Information Extraction from PDF Documents [C] . Oro Ermelinda, Ruffolo Massimo International Conference on Tools with Artificial Intelligence . 2008

机译：Xonto：基于本体的基于本体的语义信息提取系统，来自PDF文档
5. Automatic semantic header generator for PDF documents [D] . Xue, Furong 2004

机译：PDF文档的自动语义头生成器
6. iSMART: Ontology-based Semantic Query of CDA Documents [O] . Shengping Liu, Yuan Ni, Jing Mei, 2009

机译：iSMART：CDA文档的基于本体的语义查询
7. Ontology-based Semantic Classification of Unstructured Documents [O] . Ching Kang Cheng, Xiao Shan Pan, Franz Kurfess 2008

机译：基于本体的非结构化文档语义分类

XONTO: An Ontology-Based System for Semantic Information Extraction from PDF Documents

摘要

著录项

相似文献

相关主题

期刊订阅