首页> 外文会议>IEEE International Conference on Data Science and Advanced Analytics >From one star to three stars: Upgrading legacy open data using crowdsourcing
【24h】

From one star to three stars: Upgrading legacy open data using crowdsourcing

机译:从一星级到三星级:使用众包升级旧式开放数据

获取原文

摘要

Despite recent open data initiatives in many countries, a significant percentage of the data provided is in non-machine-readable formats like image format rather than in a machine-readable electronic format, thereby restricting their usability. This paper describes the first unified framework for converting legacy open data in image format into a machine-readable and reusable format by using crowdsourcing. Crowd workers are asked not only to extract data from an image of a chart but also to reproduce the chart objects in spreadsheets. The properties of the reconstructed chart objects give their data structures including series names and values, which are useful for automatic processing of data by computer. Since results produced by crowdsourcing inherently contain errors, a quality control mechanism was developed that improves the accuracy of extracted tables by aggregating tables created by different workers for the same chart image and by utilizing the data structures obtained from the reproduced chart objects. Experimental results demonstrated that the proposed framework and mechanism are effective.
机译:尽管最近在许多国家采取了开放数据倡议,但提供的数据中有很大一部分是以非机器可读格式(例如图像格式)而不是以机器可读电子格式提供的,从而限制了它们的可用性。本文介绍了第一个统一框架,该框架通过使用众包将图像格式的旧式开放数据转换为机器可读和可重用的格式。不仅要求人群工作者从图表图像中提取数据,而且还要在电子表格中复制图表对象。重建的图表对象的属性提供其数据结构,包括序列名称和值,这对于计算机自动处理数据很有用。由于众包产生的结果固有地包含错误,因此开发了一种质量控制机制,该机制通过汇总由不同工作人员针对同一图表图像创建的表并利用从复制的图表对象获得的数据结构来提高提取表的准确性。实验结果表明,所提出的框架和机制是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号