首页> 外文学位 >Learning Visual Classifiers From Limited Labeled Images.
【24h】

Learning Visual Classifiers From Limited Labeled Images.

机译:从有限的标签图像中学习视觉分类器。

获取原文
获取原文并翻译 | 示例

摘要

Recognizing humans and their activities from images and video is one of the key goals of computer vision. While supervised learning algorithms like Support Vector Machines and Boosting have offered robust solutions, they require large amount of labeled data for good performance. It is often difficult to acquire large labeled datasets due to the significant human effort involved in data annotation. However, it is considerably easier to collect unlabeled data due to the availability of inexpensive cameras and large public databases like Flickr and YouTube. In this dissertation, we develop efficient machine learning techniques for visual classification from small amount of labeled training data by utilizing the structure in the testing data, labeled data in a different domain and unlabeled data.;This dissertation has three main parts. In the first part of the dissertation, we consider how multiple noisy samples available during testing can be utilized to perform accurate visual classification. Such multiple samples are easily available in video-based recognition problem, which is commonly encountered in visual surveillance. Specifically, we study the problem of unconstrained human recognition from iris images. We develop a Sparse Representation-based selection and recognition scheme, which learns the underlying structure of clean images. This learned structure is utilized to develop a quality measure, and a quality-based fusion scheme is proposed to combine the varying evidence. Furthermore, we extend the method to incorporate privacy, an important requirement inpractical biometric applications, without significantly affecting the recognition performance.;In the second part, we analyze the problem of utilizing labeled data in a different domain to aid visual classification. We consider the problem of shifts in acquisition conditions during training and testing, which is very common in iris biometrics. In particular, we study the sensor mismatch problem, where the training samples are acquired using a sensor much older than the one used for testing. We provide one of the first solutions to this problem, a kernel learning framework to adapt iris data collected from one sensor to another. Extensive evaluations on iris data from multiple sensors demonstrate that the proposed method leads to considerable improvement in cross sensor recognition accuracy. Furthermore, since the proposed technique requires minimal changes to the iris recognition pipeline, it can easily be incorporated into existing iris recognition systems.;In the last part of the dissertation, we analyze how unlabeled data available during training can assist visual classification applications. Here, we consider still image-based vision applications involving humans, where explicit motion cues are not available. A human pose often conveys not only the configuration of the body parts, but also implicit predictive information about the ensuing motion. We propose a probabilistic framework to infer this dynamic information associated with a human pose, using unlabeled and unsegmented videos available during training. The inference problem is posed as a non-parametric density estimation problem on non-Euclidean manifolds. Since direct modeling is intractable, we develop a data driven approach, estimating the density for the test sample under consideration. Statistical inference on the estimated density provides us with quantities of interest like the most probable future motion of the human and the amount of motion information.
机译:从图像和视频中识别人类及其活动是计算机视觉的主要目标之一。虽然像支持向量机和Boosting这样的监督学习算法提供了强大的解决方案,但它们需要大量的标记数据才能获得良好的性能。由于数据注释涉及大量的人力,因此通常很难获取大型的标记数据集。但是,由于便宜的相机和Flickr和YouTube等大型公共数据库的可用性,收集未标记的数据要容易得多。本文利用测试数据,不同领域的标记数据和未标记的数据的结构,从少量的标记训练数据中开发了一种有效的机器学习技术,用于视觉分类。论文分为三个主要部分。在论文的第一部分中,我们考虑如何在测试过程中使用多个噪声样本来进行准确的视觉分类。这样的多个样本在基于视频的识别问题中很容易获得,这是视觉监视中经常遇到的问题。具体来说,我们研究了虹膜图像对人类识别的无限制问题。我们开发了一种基于稀疏表示的选择和识别方案,该方案可学习干净图像的基本结构。利用这种学习的结构来开发质量度量,并提出了基于质量的融合方案以结合各种证据。此外,我们将方法扩展为合并隐私,这是实际生物识别应用程序的重要要求,而又不会显着影响识别性能。第二部分,我们分析了在不同领域中利用标记数据来辅助视觉分类的问题。我们考虑了在训练和测试过程中采集条件发生变化的问题,这在虹膜生物识别中非常常见。特别是,我们研究了传感器失配问题,其中使用比用于测试的传感器更旧的传感器获取训练样本。我们提供了针对该问题的首个解决方案之一,即内核学习框架,可将从一个传感器收集的虹膜数据适配到另一个传感器。对来自多个传感器的虹膜数据的广泛评估表明,所提出的方法导致交叉传感器识别精度的显着提高。此外,由于所提出的技术仅需对虹膜识别管线进行最少的更改,就可以轻松地将其整合到现有的虹膜识别系统中。在论文的最后一部分,我们分析了训练期间可用的未标记数据如何帮助视觉分类应用。在这里,我们考虑基于静态图像的视觉应用,涉及人类,其中没有明确的运动提示。人的姿势通常不仅传达身体部位的构造,而且传达有关随后运动的隐含预测信息。我们提出了一个概率框架,使用训练过程中可用的未标记和未分段的视频来推断与人体姿势相关的动态信息。推论问题是非欧氏流形上的非参数密度估计问题。由于直接建模非常棘手,因此我们开发了一种数据驱动的方法,可以估算所考虑的测试样品的密度。对估计密度的统计推论为我们提供了令人感兴趣的数量,例如人类未来最可能的运动以及运动信息的数量。

著录项

  • 作者

    Pillai, Jaishanker K.;

  • 作者单位

    University of Maryland, College Park.;

  • 授予单位 University of Maryland, College Park.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 153 p.
  • 总页数 153
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号