Content enrichment with expressive document modelling to leverage the understanding of unstructured data

Ganesh Selvaraj; Karla Taboada; Eloy Gonzales; Habib Baluwala

首页> 外文期刊>MATEC Web of Conferences >Content enrichment with expressive document modelling to leverage the understanding of unstructured data

【24h】

Content enrichment with expressive document modelling to leverage the understanding of unstructured data

机译：利用表达性文档建模来丰富内容，以充分利用对非结构化数据的理解

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most information in an enterprise is in the form of unstructured data which is usually managed using a document database. One of the key challenges is to define a generalized data model for this unstructured data and any information extracted from it using content enrichment algorithms. It is more challenging to incorporate provenance and temporal capabilities to such data models. Semantic databases use ontologies such as PROV-O to represent their provenance information expressively, and relational databases use for example Slowly Changing Dimensions (SCDs) concepts to represent temporal information. In this paper, we present a document model which has features inspired from Dublin core, PROV-O and temporal methodologies to generalize information extracted from unstructured data using content enrichment algorithms. Provenance information enables comparison of enrichment models, allows reproducibility and facilitates complex filtering on the enriched data. Temporal metadata helps in versioning the document and enables point-intime and history queries conveniently.

机译：企业中的大多数信息都是非结构化数据的形式，通常使用文档数据库进行管理。关键挑战之一是使用内容丰富算法为这种非结构化数据以及从中提取的任何信息定义通用数据模型。将出处和时间功能纳入此类数据模型更具挑战性。语义数据库使用诸如PROV-O之类的本体来表示性地表示其来源信息，而关系数据库使用例如“缓慢变化的维度”（SCDs）概念来表示时间信息。在本文中，我们介绍了一个文档模型，该模型的特征受都柏林核心，PROV-O和时态方法的启发，可以使用内容丰富化算法对从非结构化数据中提取的信息进行概括。出处信息可对富集模型进行比较，可重现性并有助于对富集数据进行复杂的过滤。时态元数据有助于对文档进行版本控制，并方便地启用时间点和历史记录查询。

著录项

来源
《MATEC Web of Conferences》 |2019年第1期|共7页
作者
Ganesh Selvaraj; Karla Taboada; Eloy Gonzales; Habib Baluwala;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类一般工业技术;
关键词

相似文献

外文文献
中文文献
专利

1. Applying metadata models to unstructured content of construction documents: A view-based approach [J] . Weihua Mao, Yimin Zhu, Irtishad Ahmad Automation in construction . 2007,第2期

机译：将元数据模型应用于施工文档的非结构化内容：一种基于视图的方法
2. Documents as data: A content analysis and topic modeling approach for analyzing responses to ecological disturbances [J] . Altaweel Mark, Bone Christopher, Abrams Jesse Ecological informatics: an international journal on ecoinformatics and computational ecology . 2019,第期

机译：作为数据的文件：分析对生态干扰响应的内容分析与主题建模方法
3. A Model for Content Enrichment of Institutional Repositories Using Linked Data [J] . Vinit Kumar Journal of web librarianship . 2018,第1期

机译：使用链接数据的机构存储库内容丰富化模型
4. DC Proposal: Enriching Unstructured Media Content about Events to Enable Semi-automated Summaries, Compilations, and Improved Search by Leveraging Social Networks [C] . Thomas Steiner ISWC 2011;International semantic web conference . 2011

机译：DC建议：丰富事件的非结构化媒体内容，以通过利用社交网络来实现半自动摘要，编译和改进的搜索
5. Leveraging text content for management of construction project documents. [D] . Alqady, Mohammed. 2012

机译：利用文本内容来管理建设项目文档。
6. Leveraging Derived Data Elements in Data Analytic Models for Understanding and Predicting Hospital Readmissions [O] . Sharath Cholleti, Andrew Post, Jingjing Gao, 2012

机译：在数据分析模型中利用派生数据元素来理解和预测医院再入院率
7. DC Proposal: Enriching Unstructured Media Content About Events to Enable Semi-Automated Summaries, Compilations, and Improved Search by Leveraging Social Networks [O] . Thomas Steiner, Joaquim Gabarró Vallés (upc, Michael Hausenblas (deri 2013

机译：DC建议：丰富事件的非结构化媒体内容，以通过利用社交网络实现半自动摘要，编译和改进的搜索

Content enrichment with expressive document modelling to leverage the understanding of unstructured data

摘要

著录项

相似文献

相关主题

期刊订阅