首页> 外文学位 >Identifying latent attributes from video scenes using knowledge acquired from large collections of text documents.
【24h】

Identifying latent attributes from video scenes using knowledge acquired from large collections of text documents.

机译:使用从大量文本文档中获取的知识,从视频场景中识别潜在属性。

获取原文
获取原文并翻译 | 示例

摘要

Peter Drucker, a well-known influential writer and philosopher in the field of management theory and practice, once claimed that "the most important thing in communication is hearing what isn't said." It is not difficult to see that a similar concept also holds in the context of video scene understanding. In almost every non-trivial video scene, most important elements, such as the motives and intentions of the actors, can never be seen or directly observed, yet the identification of these latent attributes is crucial to our full understanding of the scene. That is to say, latent attributes matter.;In this work, we explore the task of identifying latent attributes in video scenes, focusing on the mental states of participant actors. We propose a novel approach to the problem based on the use of large text collections as background knowledge and minimal information about the videos, such as activity and actor types, as query context. We formalize the task and a measure of merit that accounts for the semantic relatedness of mental state terms, as well as their distribution weights. We develop and test several largely unsupervised information extraction models that identify the mental state labels of human participants in video scenes given some contextual information about the scenes. We show that these models produce complementary information and their combination significantly outperforms the individual models, and improves performance over several baseline methods on two different datasets. We present an extensive analysis of our models and close with a discussion of our findings, along with a roadmap for future research.
机译:管理理论和实践领域著名的影响力作家和哲学家彼得·德鲁克(Peter Drucker)曾经宣称:“交流中最重要的事情是听别人说什么。”不难看出,在视频场景理解的背景下也有类似的概念。在几乎每个平凡的视频场景中,最重要的元素(例如演员的动机和意图)是永远无法看到或直接观察到的,但是识别这些潜在属性对于我们充分理解场景至关重要。就是说,潜在属性很重要。在这项工作中,我们探讨了在视频场景中识别潜在属性的任务,重点关注参与者演员的心理状态。我们基于大文本集作为背景知识以及关于视频的最少信息(例如活动和演员类型)作为查询上下文,提出了一种新颖的方法来解决该问题。我们将说明心理状态术语的语义相关性及其分布权重的任务和绩效评估形式化。我们开发并测试了几种在很大程度上不受监督的信息提取模型,这些模型在给定场景场景相关上下文信息的情况下,识别了视频场景中人类参与者的心理状态标签。我们显示,这些模型产生互补信息,并且它们的组合明显优于单个模型,并且在两个不同数据集上的几种基准方法上提高了性能。我们对模型进行了广泛的分析,并在讨论结果时进行了讨论,并提出了未来研究的路线图。

著录项

  • 作者

    Tran, Anh Xuan.;

  • 作者单位

    The University of Arizona.;

  • 授予单位 The University of Arizona.;
  • 学科 Artificial intelligence.;Information science.;Computer science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 134 p.
  • 总页数 134
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号