首页> 外文学位 >Attribute-level versioning: A relational mechanism for version storage and retrieval.
【24h】

Attribute-level versioning: A relational mechanism for version storage and retrieval.

机译:属性级版本控制:一种用于版本存储和检索的关系机制。

获取原文
获取原文并翻译 | 示例

摘要

Data analysts today have at their disposal a seemingly endless supply of data and repositories hence, datasets from which to draw. New datasets become available daily thus making the choice of which dataset to use difficult. Furthermore, traditional data analysis has been conducted using structured data repositories such as relational database management systems (RDBMS). These systems, by their nature and design, prohibit duplication for indexed collections forcing analysts to choose one value for each of the available attributes for an item in the collection. Often analysts discover two or more datasets with information about the same entity. When combining this data and transforming it into a form that is usable in an RDBMS, analysts are forced to deconflict the collisions and choose a single value for each duplicated attribute containing differing values. This deconfliction is the source of a considerable amount of guesswork and speculation on the part of the analyst in the absence of professional intuition. One must consider what is lost by discarding those alternative values. Are there relationships between the conflicting datasets that have meaning? Is each dataset presenting a different and valid view of the entity or are the alternate values erroneous? If so, which values are erroneous? Is there a historical significance of the variances? The analysis of modern datasets requires the use of specialized algorithms and storage and retrieval mechanisms to identify, deconflict, and assimilate variances of attributes for each entity encountered. These variances, or versions of attribute values, contribute meaning to the evolution and analysis of the entity and its relationship to other entities. A new, distinct storage and retrieval mechanism will enable analysts to efficiently store, analyze, and retrieve the attribute versions without unnecessary complexity or additional alterations of the original or derived dataset schemas. This paper presents technologies and innovations that assist data analysts in discovering meaning within their data and preserving all of the original data for every entity in the RDBMS.
机译:如今,数据分析人员可随时使用似乎无休止的数据和存储库,因此可以从中提取数据集。每天都有新的数据集可用,因此很难选择要使用的数据集。此外,传统的数据分析已使用结构化数据存储库(如关系数据库管理系统(RDBMS))进行。这些系统就其性质和设计而言,禁止对索引集合进行重复,从而迫使分析师为集合中某项的每个可用属性选择一个值。分析人员通常会发现两个或多个包含有关同一实体的信息的数据集。当合并这些数据并将其转换为可在RDBMS中使用的形式时,分析人员被迫取消冲突冲突,并为每个包含不同值的重复属性选择一个值。在缺乏专业直觉的情况下,分析师的大量猜测和猜测是这种矛盾的根源。必须通过丢弃那些替代值来考虑损失了什么。相互冲突的有意义的数据集之间是否存在关系?每个数据集都呈现出不同的实体有效视图吗?或者备用值是否错误?如果是这样,哪个值是错误的?这些差异是否具有历史意义?现代数据集的分析需要使用专门的算法以及存储和检索机制,以识别,消除冲突并吸收遇到的每个实体的属性差异。这些差异或属性值的版本有助于实体的演变和分析及其与其他实体的关系。全新的独特存储和检索机制将使分析人员能够有效地存储,分析和检索属性版本,而无需不必要的复杂性或原始或派生数据集架构的其他更改。本文介绍了一些技术和创新,可帮助数据分析人员发现其数据中的含义并为RDBMS中的每个实体保留所有原始数据。

著录项

  • 作者

    Bell, Charles A.;

  • 作者单位

    Virginia Commonwealth University.;

  • 授予单位 Virginia Commonwealth University.;
  • 学科 Computer Science.; Engineering General.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 367 p.
  • 总页数 367
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;工程基础科学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号