首页> 外文OA文献 >Hybrid human-machine vision systems : image annotation using crowds, experts and machines
【2h】

Hybrid human-machine vision systems : image annotation using crowds, experts and machines

机译:混合人机视觉系统:使用人群,专家和机器进行图像标注

摘要

The amount of digital image and video data keeps increasing at an ever-faster rate. While "big data" holds the promise of leading science to new discoveries, raw image data in itself is not of much use. In order to statistically analyze the data, it must be quantified and annotated. We argue that entirely automated methods are not accurate enough to annotate data in the short term. Crowdsourcing is an alternative that provides higher accuracy, but is too expensive to scale to millions of images. Instead, the solution is hybrid human-machine vision systems, where the work of both humans and machines is balanced to be as cost-effective and accurate as possible. With this goal in mind, we begin by categorizing different types of image annotations, and describe how nonexpert annotators can be trained to carry out challenging image annotation tasks. Having identified which types of annotations are appropriate for most tasks, including binary, confidence, pair-wise and continuous annotations, we present models for crowdsourcing annotations from hundreds of expert and nonexpert annotators (humans). By trading off the bias and expertise of multiple annotators, we show that it is possible to achieve high-quality annotations with very few labels. We show that the number of labels can be further reduced by actively choosing the best annotators to carry out most of the work. Finally, we study the problem of estimating the performance of automated classifiers (machines) used to annotate large datasets where few ground truth labels are available. Using a semisupervised model for classifier confidence scores, we show that it is possible to accurately estimate classifier performance with very few labels.
机译:数字图像和视频数据的数量一直以越来越快的速度增长。尽管“大数据”具有引领科学发展新发现的希望,但原始图像数据本身并没有多大用处。为了对数据进行统计分析,必须对其进行量化和注释。我们认为,完全自动化的方法在短期内不足以注释数据。众包是提供更高准确度的替代方法,但过于昂贵,无法扩展到数百万张图像。相反,解决方案是混合式人机视觉系统,在该系统中,人与机器的工作之间应保持平衡,以尽可能提高成本效益和准确性。考虑到这一目标,我们首先对不同类型的图像注释进行分类,并描述如何训练非专家注释者来执行具有挑战性的图像注释任务。在确定了哪种注释类型适合大多数任务(包括二进制,置信度,成对和连续注释)后,我们提出了数百种专家和非专家注释者(人类)的众包注释模型。通过权衡多个批注者的偏见和专业知识,我们表明可以用很少的标签实现高质量的批注。我们表明,通过积极选择最佳的注释器来执行大部分工作,可以进一步减少标签的数量。最后,我们研究了估计自动分类器(机器)的性能的问题,该分类器用于注释大型地面数据集,而地面实况标签很少。使用分类器置信度得分的半监督模型,我们表明可以用很少的标签准确地估计分类器性能。

著录项

  • 作者

    Welinder Nils Peter Egon;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号