In this paper, we strive to propose a self-interpretable framework, termed PrimitiveTree, that incorporates deep visual primitives condensed from deep features with a conventional decision tree, bridging the gap between deep features extracted from deep neural networks (DNNs) and trees' transparent decision-making processes. Specifically, we utilize a codebook, which embeds the continuous deep features into a finite discrete space (deep visual primitives) to distill the most common semantic information. The decision tree adopts the spatial location information and the mapped primitives to present the decision-making process of the deep features in a tree hierarchy. Moreover, the trained interpretable PrimitiveTree can inversely explain the constituents of the deep features, highlighting the most critical and semantic-rich image patches attributing to the final predictions of the given DNN. Extensive experiments and visualization results validate the effectiveness and interpretability of our method.
展开▼