首页> 外文期刊>Electronic Letters on Computer Vision and Image Analysis: ELCVIA >An Adaptive and Integrated Multimodal Sensing And Processing Framework For Long-Range Moving Object Detection And Classification
【24h】

An Adaptive and Integrated Multimodal Sensing And Processing Framework For Long-Range Moving Object Detection And Classification

机译:远程运动物体检测与分类的自适应集成多模态传感与处理框架

获取原文
           

摘要

In applications such as surveillance, inspection and traffic monitoring, long-range detection and classification of targets (vehicles, humans, etc) is a highly desired feature for a sensing system. A single modality will no longer provide the required performance due to the challenges in detection and classification with low resolutions, noisy sensor signals, and various environmental factors due to large sensing distances. Multimodal sensing and processing, on the other hand, can provide complementary information from heterogeneous sensor modalities, such as audio, visual and range sensors. However, there is a lack of effective sensing mechanisms and systematic approaches for sensing and processing using multimodalities. In this thesis, a systematical framework is proposed for Adaptive and Integrated Multimodal Sensing and Processing (AIM-SP) that integrates novel multimodal long-range sensors, adaptive feature selection and learning-based object detection and classification for achieving the goal of adaptive and integrated multimodal sensing and processing. Based on the AIM-SP framework, we have made three unique contributions. First, we have designed a novel multimodal sensor system called Vision-Aided Automated Vibrometry (VAAV), consists of a laser Doppler vibrometer (LDV) and a pair of pan-tilt-zoom (PTZ) cameras, and the system is capable of automatically obtaining visual, range and acoustic signatures for moving object detection at a large distance. It provides a close loop adaptive sensing that allows determination of good surface points and quickly focusing the laser beam of the LDV based on the target detection, surface selection, and distance measurements by the PTZ pair and acoustic signal feedbacks of the LDV. Second, multimodal data of vehicles on both local roads and highways, acquired from multiple sensing sources, are integrated and represented in a Multimodal Temporal Panorama (MTP) for easy alignment and fast labelling of the multimodal data: visual, audio and range. Accuracy of target detection can be improved using multimodalities, and a visual reconstruction method is developed to remove occlusions, motion blurs and perspective distortions of moving vehicles so that scale- and perspective-invariant visual vehicle features are obtained. The concept of MTP is not limited to visual and audio information, but is also applicable when other modalities are available that can be presented in the same time axis. With various types of features extracted on aligned multimodal samples, we made our third contribution on feature modality selection using two approaches. The first approach uses multi-branch sequential-based feature searching (MBSF) and the second one uses boosting-based feature learning (BBFL). In our implementations, three types of visual features are used: aspect ratio and size (ARS), histograms of oriented gradients (HOGs), shape profile (SP), representing simple global scale features, statistical features, and global structure features, respectively. The audio features include short time energy (STE), spectral features (SPECs) which consists of spectral energy, entropy, flux and centroid, and perceptual features (PERCs) are Mel-frequency cepstral coefficients (MFFCs) for the perceptual features. The effectiveness of multimodal feature selection is thoroughly studied through empirical studies. The performance between MBSF and BBFL is compared based on our own dataset, which contains over 3000 samples of mainly four types of moving vehicles: sedans, pickup-trucks, vans and buses under various conditions. From this dataset, a subset of 667 samples of multimodal vehicle data is made publicly available at: http://www.cse.ohio-stata.edu/otcbvs-bench/. A number of important observations on the strengths and weakness of those features and their combinations are made as well.
机译:在诸如监视,检查和交通监控之类的应用中,目标(车辆,人等)的远程检测和分类是传感系统的高度期望的功能。由于低分辨率,嘈杂的传感器信号以及大的感测距离导致的各种环境因素带来的检测和分类挑战,单一模式将不再提供所需的性能。另一方面,多模式感测和处理可以从诸如声音,视觉和距离传感器之类的异构传感器模式中提供补充信息。然而,缺乏有效的感测机制和用于使用多模态进行感测和处理的系统方法。本文提出了一种自适应的,集成的多模态传感与处理系统框架(AIM-SP),该系统将新型的多模态远程传感器,自适应特征选择和基于学习的目标检测与分类集成在一起,以实现自适应和集成的目标多模式传感和处理。基于AIM-SP框架,我们做出了三项独特的贡献。首先,我们设计了一种新颖的多模式传感器系统,称为视觉辅助自动振动测量(VAAV),它由一个激光多普勒振动计(LDV)和一对水平倾斜变焦(PTZ)摄像机组成,并且该系统能够自动获得远距离运动物体检测的视觉,范围和声学特征。它提供了一种闭环自适应感应功能,可以根据目标检测,表面选择和PTZ对的距离测量以及LDV的声信号反馈,确定良好的表面点并快速聚焦LDV的激光束。其次,将从多个感测源获取的本地道路和高速公路上的车辆多模式数据进行集成,并在多模式时间全景图(MTP)中进行表示,以方便对齐和快速标记多模式数据:视觉,音频和范围。使用多模式可以提高目标检测的准确性,并且开发了一种视觉重建方法来消除运动车辆的遮挡,运动模糊和透视变形,从而获得比例和视角不变的视觉车辆特征。 MTP的概念不仅限于视觉和音频信息,而且在可以在同一时间轴上呈现的其他模式可用时也适用。通过在对齐的多峰样本上提取各种类型的特征,我们使用两种方法对特征模态选择做出了第三贡献。第一种方法使用基于多分支顺序的特征搜索(MBSF),第二种方法使用基于增强的特征学习(BBFL)。在我们的实现中,使用了三种类型的视觉特征:长宽比和大小(ARS),定向梯度直方图(HOG),形状轮廓(SP),分别代表简单的全局比例特征,统计特征和全局结构特征。音频特征包括短时能量(STE),由频谱能量,熵,通量和质心组成的频谱特征(SPEC),而感知特征(PERC)是感知特征的梅尔频率倒谱系数(MFFC)。多模式特征选择的有效性通过经验研究得到了彻底的研究。根据我们自己的数据集比较了MBSF和BBFL之间的性能,该数据集包含3000多个样本,这些样本主要是四种在不同条件下的移动车辆:轿车,皮卡车,厢式货车和公共汽车。从该数据集中,可在以下网址公开获得667种多式联运车辆数据样本的子集:http://www.cse.ohio-stata.edu/otcbvs-bench/。关于这些功能的优点和缺点以及它们的组合也有许多重要的发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号