首页> 外文期刊>Standards in Genomic Sciences >The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness
【24h】

The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness

机译:元数据覆盖率索引(MCI):用于量化数据库元数据丰富性的标准化指标

获取原文
           

摘要

Variability in the extent of the descriptions of data (‘metadata’) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the ‘Metadata Coverage Index’ (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the ‘Minimum Information about a Genome Sequence’ (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.
机译:公共存储库中数据描述(“元数据”)范围的变化迫使用户单独评估记录的质量,这很快就变得不切实际。根据记录的丰富性对记录进行评分,为质量提供了一种简单,客观的代理度量,可实现支持下游分析的过滤。从根本上讲,这样的描述应该促进改进。在这里,我们引入了这样一种衡量标准-“元数据覆盖率指数”(MCI):记录或描述中实际填写的可用字段所占的百分比。可以针对整个数据库,单个记录或其组成部分(例如,关注领域)计算MCI得分。此简单指标有很多潜在用途:例如;过滤,排序或搜索记录;评估临时集合的元数据可用性;确定特定记录类型的字段的填充频率,尤其是在符合标准方面;评估特定工具和资源的实用性,以及更普遍的数据收集实践;优先考虑记录以便进一步管理;作为资助项目的绩效指标;或量化策展增加的价值。在这里,我们使用来自基因组在线数据库(GOLD)的元数据演示了MCI评分的效用,其中包括符合基因组标准协会开发的“有关基因组序列的最小信息”(MIGS)标准的记录。我们讨论了挑战并解决了MCI分数的进一步应用问题;以显示批注质量随时间的提高,告知标准机构和存储库提供者有关其产品可用性和受欢迎程度的工作,并评估和赞扬策展人的工作。这样的索引为将元数据捕获实践以及未来的标准合规性纳入定量和客观框架提供了一步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号