首页> 外文学位 >A bottom-up extraction of atomic feature vectors and action sequences for video representation.
【24h】

A bottom-up extraction of atomic feature vectors and action sequences for video representation.

机译:自底向上提取用于视频表示的原子特征向量和动作序列。

获取原文
获取原文并翻译 | 示例

摘要

In this dissertation we aim to demonstrate novel applications of multi-object trackers for use in video representation. In our approach we first segment object tracks, extract features on these segments, and then use these features to build a custom vocabulary in order to annotate the segments. Similar to existing approaches to the problem of video, clip, and action unit matching, we extract descriptors for a video and use a bag-of-words approach to label videos holistically. However, unlike existing approaches, we make no assumption on the dictionary size, object types, or video classes. Instead of annotating frames, we annotate sub-tracks, which provides some level of intrinsic semantics (conceptually similar to action-units). We combine appearance-based and behavior-based features for each tracked object segment in incorporate appearance dynamics via temporal change and learn the vocabulary via unsupervised clustering. In this work, we use crowdsourced annotations to allow for evaluation of each step of our approach; namely the tasks of track segmentation, icon selection, and descriptor clustering for dictionary building. For evaluation of track segmentation we also needed to introduce a novel way to generate a ground truth for temporal segmentation tasks. The contributions of this thesis are as follows: 1. Cluster Analysis of visual data captured in small tracked windows (Chapter 2: Clustering Analysis). 2. Segmentation of data tracks into salient Sub-Tracks (Chapter 3: Track Segmentation), including a novel approach to extracting temporal segmentation ground truths from crowdsourced annotations. 3. Joint Appearance-Behavior Feature Extraction from Sub-Tracks (Chapter 4). 4. Automatic Dictionary Discovery and Video Sub-Track Annotation for Ranked Video Matching using Sub-Tracks and a Learned Appearance- Behavior Dictionary (Chapter 5). 5. Ground truth collection and exploitation for temporal segmentation and iconic Poselet selection (Sections 3.3.1 and 4.2.1, respectively). In each of these chapters we will look at commonly used algorithms for each task, explore related work, and evaluate their performance against crowdsource annotated ground truths.
机译:在本文中,我们旨在演示多对象跟踪器在视频表示中的新颖应用。在我们的方法中,我们首先对对象轨迹进行分段,在这些分段上提取特征,然后使用这些特征构建自定义词汇表以对分段进行注释。与解决视频,剪辑和动作单元匹配问题的现有方法类似,我们提取视频的描述符,并使用词袋方法对视频进行整体标记。但是,与现有方法不同,我们不对字典大小,对象类型或视频类别进行任何假设。我们不对框架进行注释,而是对子轨道进行注释,该子轨道提供了一定程度的内在语义(概念上类似于动作单元)。我们将每个跟踪对象段的基于外观和基于行为的功能相结合,通过时间变化来整合外观动态,并通过无监督的聚类学习词汇。在这项工作中,我们使用众包注释来评估我们方法的每个步骤。即轨道分割,图标选择和用于词典构建的描述符聚类的任务。为了评估航迹分割,我们还需要引入一种新颖的方法来为时间分割任务生成基本事实。本文的主要工作和成果如下:1.对在小窗口内捕获的视觉数据进行聚类分析(第二章:聚类分析)。 2.将数据磁道分割为显着的子磁道(第3章:磁道分割),包括一种从众包注释中提取时间分割基础事实的新颖方法。 3.从子轨道中提取联合外观行为特征(第4章)。 4.使用子轨道和学习的外观-行为字典对排名视频匹配进行自动字典发现和视频子轨道注释(第5章)。 5.用于时间分割和标志性Poselet选择的地面真相收集和开发(分别为3.3.1和4.2.1节)。在每章中,我们将研究每个任务的常用算法,探索相关工作,并根据众包注释的基础事实评估其性能。

著录项

  • 作者

    Burlick, Matthew.;

  • 作者单位

    Stevens Institute of Technology.;

  • 授予单位 Stevens Institute of Technology.;
  • 学科 Computer Science.;Multimedia Communications.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 114 p.
  • 总页数 114
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号