首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks
【24h】

Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks

机译:重新思考RGB-D突出对象检测:模型,数据集和大型基准测试

获取原文
获取原文并翻译 | 示例
           

摘要

The use of RGB-D information for salient object detection (SOD) has been extensively explored in recent years. However, relatively few efforts have been put toward modeling SOD in real-world human activity scenes with RGB-D. In this article, we fill the gap by making the following contributions to RGB-D SOD: 1) we carefully collect a new Salient Person (SIP) data set that consists of similar to 1 K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusions, illuminations, and backgrounds; 2) we conduct a large-scale (and, so far, the most comprehensive) benchmark comparing contemporary methods, which has long been missing in the field and can serve as a baseline for future research, and we systematically summarize 32 popular models and evaluate 18 parts of 32 models on seven data sets containing a total of about 97k images; and 3) we propose a simple general architecture, called deep depth-depurator network (D(3)Net). It consists of a depth depurator unit (DDU) and a three-stream feature learning module (FLM), which performs low-quality depth map filtering and cross-modal feature learning, respectively. These components form a nested structure and are elaborately designed to be learned jointly. D(3)Net exceeds the performance of any prior contenders across all five metrics under consideration, thus serving as a strong model to advance research in this field. We also demonstrate that D(3)Net can be used to efficiently extract salient object masks from real scenes, enabling effective background-changing application with a speed of 65 frames/s on a single GPU. All the saliency maps, our new SIP data set, the D(3)Net model, and the evaluation tools are publicly available at https://github.com/DengPingFan/D3NetBenchmark.
机译:近年来,使用RGB-D突出物体检测(SOD)的使用信息已经过分探讨。然而,相对较少的努力已经朝着RGB-D的现实世界人类活动场景中建模的草皮。在本文中,我们通过对RGB-D SOD的贡献提出以下贡献来填补差距:1)我们仔细收集了一个包含类似于1 K高分辨率图像的新的突出人(SIP)数据集,涵盖不同的现实世界来自各种观点,姿势,闭塞,照明和背景的场景; 2)我们进行大规模的(以及到目前为止,最全面的)基准比较的比较当代方法,这些方法长期以来一直缺少该领域,并可作为未来研究的基准,我们系统地汇总了32种流行的型号和评估七个数据集的18个部分型号,总共包含约97K图像; 3)我们提出了一种简单的一般架构,称为深度深度疏基器网络(D(3)网)。它由深度克纳特单元(DDU)和三流特征学习模块(FLM)组成,其分别执行低质量深度映射滤波和跨模型特征学习。这些部件形成嵌套结构,并设计设计旨在共同学习。 D(3)净超出了所有五个指标的先前竞争者的表现,从而担任强大的模型,以推进该领域的研究。我们还证明了D(3)NET可用于从真实场景中有效地提取突出对象掩码,使得在单个GPU上具有65帧/秒的速度的有效的背景更改应用。所有显着性图,我们的新SIP数据集,D(3)NET模型以及评估工具在HTTPS://github.com/dengpingFan/d3NetBenchmark上公开可用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号