首页> 外文会议>Moratuwa Engineering Research Conference >Schema-independent scientific data cataloging framework
【24h】

Schema-independent scientific data cataloging framework

机译:架构独立的科学数据编目框架

获取原文

摘要

Modern scientific experiments generate vast volumes of data which are hard to keep track of. Consequently, scientists find it difficult to reuse and share these data sets. We address this problem by developing a schema-independent data cataloging framework for efficient management of scientific data. The proposed solution consists of an agent which automatically identifies new data products and extract metadata from them, as well as a server which indexes the metadata using a NoSQL database and provides a REST API for querying, sharing, and reusing the data sets. The novelty of our solution lies in the pluggable metadata extraction logic, extensible data product generation monitors, use of a NoSQL database, and the ability to dynamically add new metadata fields. The use of Apache Solr as the backend database enables the proposed solution to index and search data products much faster than a solution based on relational databases. For example, our Apache Solr based implementation can resolve full text, sub-string, prefix, and suffix queries 91 %-99 % faster than a MySQL-based implementation.
机译:现代科学实验产生了巨大的数据,很难跟踪。因此,科学家发现很难重用并共享这些数据集。我们通过开发独立于模式的数据编目框架来解决这个问题,以便于科学数据的有效管理。所提出的解决方案包括一种代理,它自动识别新数据产品并从中提取元数据,以及使用NoSQL数据库索引元数据的服务器,并提供用于查询,共享和重用数据集的REST API。我们的解决方案的新颖性在于可插拔元数据提取逻辑,可扩展数据产品生成监视器,使用NoSQL数据库以及动态添加新元数据字段的功能。 Apache Solr的使用作为后端数据库使提出的解决方案能够比基于关系数据库的解决方案更快地索引和搜索数据产品。例如,我们的Apache Solr的实现可以解决完整的文本,子字符串,前缀和后缀查询比MySQL的实现更快91%-99%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号