Deep Visual-semantic for Crowded Video Understanding

机译：深度视觉语义为拥挤的视频理解

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual-semantic features play a vital role for crowded video understanding. Convolutional Neural Networks (CNNs) have experienced a significant breakthrough in learning representations from images. However, the learning of visual-semantic features, and how it can be effectively extracted for video analysis, still remains a challenging task. In this study, we propose a novel visual-semantic method to capture both appearance and dynamic representations. In particular, we propose a spatial context method, based on the fractional Fisher vector (FV) encoding on CNN features, which can be regarded as our main contribution. In addition, to capture temporal context information, we also applied fractional encoding method on dynamic images. Experimental results on the WWW crowed video dataset demonstrate that the proposed method outperform the state of the art.

机译：视觉语义特征对于拥挤的视频理解起着至关重要的作用。卷积神经网络（CNNS）在图像的学习表示中经历了重大突破。然而，学习视觉语义特征，以及如何有效地提取视频分析，仍然是一个具有挑战性的任务。在这项研究中，我们提出了一种新的视觉语义方法来捕获外观和动态表示。特别地，我们提出了一种基于CNN特征上的分数Fisher向量（FV）的空间上下文方法，这可以被视为我们的主要贡献。另外，为了捕获时间上下文信息，我们还在动态图像上应用了分数编码方法。 WWW拥挤的视频数据集上的实验结果表明，所提出的方法优于现有技术。

著录项

来源
《SPIE Conference on Multispectral Image Processing and Pattern Recognition》|2017年|1 v. (loose-leaf)|共5页
会议地点
作者
Chunhua Deng; Junwen Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 N-532;
关键词
Visual-semantic feature; crowded; spatial context; dynamic representation;

机译：视觉语义特征;拥挤;空间背景;动态表示;

相似文献

外文文献
中文文献
专利

1. Crowded Scene Understanding by Deeply Learned Volumetric Slices [J] . Jing Shao, Chen Change Loy, Kai Kang, Circuits and Systems for Video Technology, IEEE Transactions on . 2017,第3期

机译：通过深度学习的体积切片了解拥挤的场景
2. Capturing and Understanding Workers' Activities in Far-Field Surveillance Videos with Deep Action Recognition and Bayesian Nonparametric Learning [J] . Luo Xiaochun, Li Heng, Yang Xincong, Computer-Aided Civil and Infrastructure Engineering . 2019,第4期

机译：通过深度动作识别和贝叶斯非参数学习在远距离监视视频中捕捉和理解工人的活动
3. Automatic content understanding with cascaded spatial-temporal deep framework for capsule endoscopy videos [J] . Chen Honghan, Wu Xiao, Tao Gan, Neurocomputing . 2017,第MARa15期

机译：通过级联的时空深层框架自动了解胶囊内窥镜视频的内容
4. Deep Visual-semantic for Crowded Video Understanding [C] . Chunhua Deng, Junwen Zhang International symposium on multispectral image processing and pattern recognition . 2017

机译：深入的视觉语义，适合拥挤的视频理解
5. Deep Learning for Medical Video Analysis and Understanding [D] . Li, Ying. 2021

机译：深度学习医学视频分析和理解
6. Combined 5G-Based Video Production and Distribution in a Crowded Stadium Event [O] . Ioannis P. Chochliouros, Anastasia S. Spiliopoulou, Pavlos Lazaridis, -1

机译：在拥挤的体育场活动中结合基于5G的视频制作和发行
7. Emotion Understanding in Videos Through Body, Context, and Visual-Semantic Embedding Loss [O] . Panagiotis Paraskevas Filntisis, Niki Efthymiou, Gerasimos Potamianos, 2020

机译：通过身体，背景和视觉语义嵌入损失的视频中的情感理解

Deep Visual-semantic for Crowded Video Understanding

摘要

著录项

相似文献

相关主题

期刊订阅