首页> 外文期刊>International journal of open source software & processes >Efficient Algorithms for Cleaning and Indexing of Graph data
【24h】

Efficient Algorithms for Cleaning and Indexing of Graph data

机译:用于清洁和索引图形数据的高效算法

获取原文
获取原文并翻译 | 示例
           

摘要

Information extraction and analysis from the enormous graph data is expanding rapidly. From the survey, it is observed that 80% of researchers spend more than 40% of their project time in data cleaning. This signifies a huge need for data cleaning. Due to the characteristics of big data, the storage and retrieval is another major concern and is addressed by data indexing. The existing data cleaning techniques try to clean the graph data based on information like structural attributes and event log sequences. The cleaning of graph data on a single piece of information alone will not increase the performance of computation. Along with node, the label can also be inconsistent, so it is highly desirable to clean both to improve the performance. This paper addresses aforesaid issue by proposing graph data cleaning algorithm to detect the unstructured information along with inconsistent labeling and clean the data by applying rules and verify based on data inconsistency. The authors propose an indexing algorithm based on CSS-tree to build an efficient and scalable graph indexing on top of Hadoop.
机译:来自巨大图数据的信息提取和分析正在快速扩展。从调查开始,观察到80%的研究人员在数据清洁中花费超过40%的项目时间。这表示巨大需求数据清洁。由于大数据的特点,存储和检索是另一个主要问题,并通过数据索引来解决。现有数据清洁技术尝试根据结构属性和事件日志序列等信息清洁图形数据。单独的单个信息上的图形数据清理不会增加计算的性能。随着节点,标签也可以不一致,因此非常希望清洁两者以提高性能。本文通过提出图表数据清洁算法来解决上述问题,以检测非结构化信息以及通过应用规则和基于数据不一致验证来清洁数据。作者提出了一种基于CSS树的索引算法,在Hadoop顶部构建一个有效和可扩展的图形索引。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号