基于Grad-CAM与B-CNN的细粒度图像分类方 法研究

邓绍伟; 张伯泉

摘要

细粒度图像具有类间差异小，类内差异大的特点。图像之间的差异主要存在于细微的局部区域，局部区域定位及其代表性特征提取成为细粒度图像分类的主要研究问题之一。本文基于Grad-CAM和双线性卷积神经网络B-CNN模型对细粒度图像分类方法进行研究，它利用Grad-CAM模型定位原图像中的显著区域，并裁剪出显著性区域图像作为双线性CNN的输入，融合全局和局部的特征，从而完成分类。在CUB-200-2011、Stanford Dogs和Stanford Cars三个数据集上的实验表明，相较于传统模型，该方法能够更加准确定位图像特征显著区域，具有更好的分类效果。 Fine-grained images are characterized by small differences between classes and large differences within classes. The differences between images mainly exist in subtle local areas, and local area localization and its representative feature extraction have become one of the main research issues in fine-grained image classification. In this paper, the fine-grained categorization method is studied based on the Grad-CAM and the Bilinear Convolution Neural Networks B-CNN. It uses the Grad-CAM model to locate the salient region in the original image, and crops the salient region image as the input of the bilinear CNN, fusing the global and local features to complete the classification. Experiments on the three datasets of CUB-200-2011, Stanford Dogs and Stanford Cars show that compared with the traditional model, this method can more accurately locate areas with significant image features and have better classification effects.

机译：细粒度图像具有类间差异小，类内差异大的特点。图像之间的差异主要存在于细微的局部区域，局部区域定位及其代表性特征提取成为细粒度图像分类的主要研究问题之一。本文基于Grad-CAM和双线性卷积神经网络B-CNN模型对细粒度图像分类方法进行研究，它利用Grad-CAM模型定位原图像中的显着区域，并裁剪出显着性区域图像作为双线性CNN的输入，融合全局和局部的特征，从而完成分类。在CUB-200-2011、Stanford Dogs和Stanford Cars三个数据集上的实验表明，相较于传统模型，该方法能够更加准确定位图像特征显着区域，具有更好的分类效果。 Fine-grained images are characterized by small differences between classes and large differences within classes. The differences between images mainly exist in subtle local areas, and local area localization and its representative feature extraction have become one of the main research issues in fine-grained image classification. In this paper, the fine-grained categorization method is studied based on the Grad-CAM and the Bilinear Convolution Neural Networks B-CNN. It uses the Grad-CAM model to locate the salient region in the original image, and crops the salient region image as the input of the bilinear CNN, fusing the global and local features to complete the classification. Experiments on the three datasets of CUB-200-2011, Stanford Dogs and Stanford Cars show that compared with the traditional model, this method can more accurately locate areas with significant image features and have better classification effects.

基于Grad-CAM与B-CNN的细粒度图像分类方法研究

摘要

著录项

相关主题

期刊订阅

基于Grad-CAM与B-CNN的细粒度图像分类方 法研究

摘要

著录项

相关主题

期刊订阅

基于Grad-CAM与B-CNN的细粒度图像分类方法研究