Schema-independent scientific data cataloging framework

机译：架构独立的科学数据编目框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern scientific experiments generate vast volumes of data which are hard to keep track of. Consequently, scientists find it difficult to reuse and share these data sets. We address this problem by developing a schema-independent data cataloging framework for efficient management of scientific data. The proposed solution consists of an agent which automatically identifies new data products and extract metadata from them, as well as a server which indexes the metadata using a NoSQL database and provides a REST API for querying, sharing, and reusing the data sets. The novelty of our solution lies in the pluggable metadata extraction logic, extensible data product generation monitors, use of a NoSQL database, and the ability to dynamically add new metadata fields. The use of Apache Solr as the backend database enables the proposed solution to index and search data products much faster than a solution based on relational databases. For example, our Apache Solr based implementation can resolve full text, sub-string, prefix, and suffix queries 91 %-99 % faster than a MySQL-based implementation.

机译：现代科学实验产生了巨大的数据，很难跟踪。因此，科学家发现很难重用并共享这些数据集。我们通过开发独立于模式的数据编目框架来解决这个问题，以便于科学数据的有效管理。所提出的解决方案包括一种代理，它自动识别新数据产品并从中提取元数据，以及使用NoSQL数据库索引元数据的服务器，并提供用于查询，共享和重用数据集的REST API。我们的解决方案的新颖性在于可插拔元数据提取逻辑，可扩展数据产品生成监视器，使用NoSQL数据库以及动态添加新元数据字段的功能。 Apache Solr的使用作为后端数据库使提出的解决方案能够比基于关系数据库的解决方案更快地索引和搜索数据产品。例如，我们的Apache Solr的实现可以解决完整的文本，子字符串，前缀和后缀查询比MySQL的实现更快91％-99％。

著录项

来源
《Moratuwa Engineering Research Conference》|2015年||共6页
会议地点
作者
Nakandala Supun; Withana Sachith Dhanushka; Kumarasiri Dinu; Jayawardena Hirantha; Dilum Bandara H.M.N.; Perera Srinath; Marru Suresh; Pamidighantam Sudhakar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类输配电工程、电力网及电力系统;
关键词
indexing; metadata catalog; scientific data management;

机译：索引;元数据目录;科学数据管理;

相似文献

外文文献
中文文献
专利

1. Electronic Catalog of the Scientific and Technical Literature:Data Model and Architecture of Software Tools [J] . A. V. Shapkin, O. V. Fedorets, K. O. Malinina, Scientific & Technical Information Processing . 2011,第4期

机译：科技文献电子目录：软件工具的数据模型和体系结构
2. Automatic Schema-Independent Linked Data Instance Matching System [J] . Khai Nguyen, Ichise Ryutaro International journal on Semantic Web and information systems . 2017,第1期

机译：自动架构独立的链接数据实例匹配系统
3. Automatic Schema-Independent Linked Data Instance Matching System [J] . Khai Nguyen, Ryutaro Ichise International journal on Semantic Web and information systems . 2017,第1期

机译：自动架构独立的链接数据实例匹配系统
4. Schema-independent scientific data cataloging framework [C] . Nakandala Supun, Withana Sachith Dhanushka, Kumarasiri Dinu, Moratuwa Engineering Research Conference . 2015

机译：与模式无关的科学数据分类框架
5. A scientific workflow framework for scientific data querying and processing. [D] . Fei, Xubo. 2011

机译：用于科学数据查询和处理的科学工作流程框架。
6. Automatization and self-maintenance of the O-GlcNAcome catalog: a smart scientific database [O] . Florian Malard, Eugenia Wulff-Fuentes, Rex R Berendt, 2021

机译：O-GlcNacome目录的自动化和自我维护：智能科学数据库
7. Schema-Independent and Schema-Friendly Scientific Metadata Management 1 [O] . Scott Jensen, Beth Plale 2010

机译：独立于架构和模式的科学元数据管理1
8. Guidelines for Descriptive Cataloging of Reports: A Revision of COSATI (Committee on Scientific and Technical Information) Standard for Descriptive Cataloging of Government Scientific and Technical Reports [R] . 1985

机译：报告描述性编目指南：COsaTI（科学和技术信息委员会）政府科学和技术报告描述性编目标准的修订

Schema-independent scientific data cataloging framework

摘要

著录项

相似文献

相关主题

期刊订阅