首页> 外文期刊>Journal of visual communication & image representation >Video action recognition based on visual rhythm representation
【24h】

Video action recognition based on visual rhythm representation

机译:基于视觉节奏表示的视频动作识别

获取原文
获取原文并翻译 | 示例
           

摘要

Advances in video acquisition and storage technologies have promoted a great demand for automatic recognition of actions. The use of cameras for security and surveillance purposes has applications in several scenarios, such as airports, parks, banks, stations, roads, hospitals, supermarkets, industries, stadiums, schools. An inherent difficulty of the problem is the complexity of the scene under usual recording conditions, which may contain complex background and motion, multiple people on the scene, interactions with other actors or objects, and camera motion. Most recent databases are built primarily with shared recordings on YouTube and with snippets of movies, situations where these obstacles are not restricted. Another difficulty is the impact of the temporal dimension since it expands the size of the data, increasing computational cost and storage space. In this work, we present a methodology of volume description using the Visual Rhythm (VR) representation. This technique reshapes the original volume of the video into an image, where two-dimensional descriptors are computed. We investigated different strategies for constructing the representation by combining configurations in several image domains and traversing directions of the video frames. From this, we propose two feature extraction methods, Naive Visual Rhythm (Naive VR) and Visual Rhythm Trajectory Descriptor (VRTD). The first approach is the straightforward application of the technique in the original video volume, forming a holistic descriptor that considers action events as patterns and formats in the visual rhythm image. The second variation focuses on the analysis of small neighborhoods obtained from the process of dense trajectories, which allows the algorithm to capture details unnoticed by the global description. We tested our methods in eight public databases, one of hand gestures (SKIG), two in first person (DogCentric and JPL), and five in third person (Weizmann, KTH, MuHAVi, UCF11 and HMDB51). The results show that the developed techniques are able to extract motion elements along with format and appearance information, achieving competitive accuracy rates compared to state-of-the-art action recognition approaches. (c) 2020 Elsevier Inc. All rights reserved.
机译:视频采集和存储技术的进步促进了对自动识别行动的巨大需求。用于安全和监视目的的相机具有在若干方案中的应用,例如机场,公园,银行,站,道路,医院,超市,行业,体育场,学校。问题的固有难度是通常在通常的录制条件下的场景的复杂性,这可能包含复杂的背景和运动,场景上的多人,与其他演员或物体的交互,以及相机运动。大多数最新数据库主要以YouTube的共享录制和电影片段,这些障碍不受限制的情况。另一个困难是时间维度的影响,因为它扩展了数据的大小,增加了计算成本和存储空间。在这项工作中,我们使用视觉节奏(VR)表示提供了体积描述的方法。该技术将视频的原始音量重新结束到图像中,其中计算二维描述符。我们调查了通过组合多个图像域中的配置和视频帧的遍历方向来构建表示的不同策略。由此,我们提出了两个特征提取方法,天真的视觉节奏(天真VR)和视觉节奏轨迹描述符(VRTD)。第一方法是在原始视频音量中的技术的直接应用,形成一个整体描述符,该描述符将动作事件视为视觉节奏图像中的模式和格式。第二个变型侧重于分析从致密轨迹的过程获得的小邻域,这允许算法通过全局描述捕获无疑的细节。我们在八个公共数据库中测试了我们的方法,手势之一(Skig),第一人称(Dogcentric和JPL),以及第三个人(Weizmann,Kth,Muhavi,UCF11和HMDB51)。结果表明,与最先进的动作识别方法相比,开发技术能够以格式和外观信息提取运动元件,以及格式和外观信息,实现竞争精度率。 (c)2020 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号