Post-hoc Concept-Based Explanations

Intrinsically explainable

Build/modify the training procedure, so that predictions come with explanations.

Post-hoc explanations

Explain with a large class of already trained models.

Post-hoc Concept-Based Explanations

Feature-based

Concept-based

Stripes
Mane
Thin legs

Also: prototype-based, example importance

Image taken from [JO21] with method from [Sel+17]

The paper ...

introduces relevant post-hoc concept based explanation (PHCBEx) methods.

collects properties for good PHCBEx methods.

introduces a algorithmic template that every PHCBEx follows.

attributes each property to one of the steps.

discusses current approaches and points out direction for future work.

Methods

Concept Activation Vectors (CAV) [Kim+17]

Separate concept activations from non-concept activations at a layer $\bm{f}[:\ell]: \mathbb{R}^N\to\mathbb{R}^L$ of the neural network $\bm{f}: \mathbb{R}^N \to \mathbb{R}^M$

Human defined concepts $c \in [C]$
Examples and counter example of each concept $P_c, A_c \subset \mathbb{R}^N$

The CAR approach is similar, but introduces the RBF kernel.

Interpretable Basis Decomposition (IBD) [Zho+18]

Only use the second to last layer, as fixed $\ell$

For class $k$ and sparseness constraint $s$ , learn CAV equivalent

\bm{f}(\bm{x})[k] = \bm{w}_k^T \bm{f}[\ell:](\bm{x}) + b_k

Automated Concept-based Explanation (ACE) [Gho+19]

Completeness-aware Concept-Based Explanations (CAE) [Yeh+20]

Ensure, that ACE selects a set of concepts that are predictive of the class

Concept Recursive Activation FacTorization (CRAFT) [Fel+23b]

Do segmentation and embed them: $\bm{f}[:\ell](\mathcal{X}_k) = \mathrm{A} \in \mathbb{R}^{P\times L}$

Do dictionary learning, to obtain $\mathrm{U}$ and $\mathrm{W}$

(\mathrm{U}, \mathrm{W})= \argmin_{\mathrm{U}\geq 0, \mathrm{W}\geq 0} \frac{1}{2}\|\mathrm{A} - \mathrm{U}\mathrm{W}^T\|_F^2

$\mathrm{W}$ : CAV equivalents

$\mathrm{U}$ : Decomposition of the average activations per concept.

Do for multiple $\ell$ to obtain hierarchical concepts.

Invertible Concept-based Explanations (ICE) [Zha+21]

In the interest of time: Very similar to CRAFT, at a fixed layer

Multidimensional Concept Discovery (MCD) [VBS23]

Find approximate basis vectors for each segment cluster with PCA:
$\mathcal{C}_i \in \{\mathcal{C}_1, \ldots, \mathcal{C}_{N_c}\}$ (Each with dimension $D_i$

Identify $C_{N_c + 1} = \mathrm{span}(\mathcal{C}_1, \ldots, \mathcal{C}_{N_c})^\bot$

\bm{f}[:\ell](\bm{x}) = \sum_{i=1}^{N_c} \sum_{j=1}^{D_i} \alpha_{ij} \bm{c}_{ij}\quad \bm{f}[:\ell](\bm{x}) = \sum_{i=1}^{N_c} \sum_{j=1}^{D_i} \beta_{ij} \bm{w}_{ij}

$C_{N_c + 1}$ is the residual space, and used for a completeness score:

\eta(\mathcal{C}_i) = \frac{1 - \|\bm{w}^\bot\|_2^2}{\|\bm{w}\|_2^2}

systematization of explanations

properties of good explanations

Coherency Concepts should be perceptually similar to each other and dissimilar to others.
Meaningfulness Concepts should carry semantic significance on its own, making them recognizable without further context
Importance of a concept is defined as the relevance of the concept for the result of the prediction
Completeness of a set of concepts it being sufficient to explain the prediction of a model, i.e. no other class could be explained by this set of concepts
Compactness output minimal concepts, and still be complete

systematization of explanations

Tasks

Candidate concept acquisition A set of human interpretable concepts is obtained, that the explanation method has available to explain the prediction of the model.
Relevant concept selection The set of concepts is reduced to a subset of concepts, that are relevant for the set of predictions to be explained.
Concept importance calculation The contribution of each concept to the model’s prediction is quantified by an importance score.

Candidate concept acquisition

Easy way human defined concepts: CAV, CAR, ICB

Others: ACE, CAE, CRAFT, ICE, MCD; Use segmentation/dense areas of feature importance¹ and cluster the segments

Segmentation → Meaningfulness; Clustering → Coherency

¹ Feature importance and segmentation are closely related [Sel+17]

Train a classifier, to obtain CAVs

The classifier, determine the representation of concepts in latent space

Representation	Method
Canonical basis	Exp. of [Bau+17]
Orthogonal directions	Exp. of [Gra+23]
Approx. orthogonal directions	ICE, CRAFT
Linear directions	ACE, IBD, CAE, CAV
Linear subspaces	MCD
Smooth regions	CAE

Relevant Concept Selection

Latent space representation of the concepts candidates present in the input will be compared to the latent space representation of the possible concepts

So we need some kind of similarity

This influences completeness and compactness

Similarity	Method
Angle	CAV, ACE, CAE
Distance	CAR
Basis decomposition	IBD, ICE, CRAFT, MCD

Importance scores

Importance Score	Method
TCAV	CAV, ACE, ICE
TCAR	CAR
ConceptSHAP	CAE
Total Sobol indices	CRAFT
Basis decomposition	IBD, MCD

References

[JO21] Hyungsik Jung and Youngrock Oh. “Towards better explanations of class activation mapping”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, pp. 1336–1344.
[Kim+17] Been Kim et al. “Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)”. In: Proc. of the International Conference on Machine Learning. Vol. 35. 2017, pp. 2668–2677.
[Sel+17] Ramprasaath Selvaraju et al. “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization”. In: Proc. of the IEEE International Conference on Computer Vision (ICCV). 2017, pp. 618–626.
[Zho+18] Bolei Zhou et al. “Interpretable Basis Decomposition for Visual Explanation”. In: Proc. of the European Conference on Computer Vision. Vol. 15. 2018, pp. 122–138.
[CS22] Jonathan Crabbé and Mihaela van der Schaar. “Concept Activation Regions: A Generalized Framework For Concept-Based Expla- nations”. In: arXiv (2022).
[Gho+19] Amirata Ghorbani et al. “Towards Automatic Concept-based Explanations”. In: Advances in Neural Information Processing Systems. Vol. 32. 2019, pp. 9277–9286.
[Fel+23b] Thomas Fel et al. “Craft: Concept Recursive Activation Factorization for Explainability”. In: Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, pp. 2711–2721.
[Yeh+20] Chih-Kuan Yeh et al. “On Completeness-aware Concept-Based Explanations in Deep Neural Networks”. In: Advances in Neural Information Processing Systems 33 (2020), pp. 20554–20565.
[Zha+21] Ruihan Zhang et al. “Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors”. In: Proc. of the AAAI Conference on Artificial Intelligence. Vol. 35. 13. 2021, pp. 11682–11690.
[VBS23] Johanna Vielhaben, Stefan Blücher, and Nils Strodthoff. “Multidimensional concept discovery (MCD): A unifying framework with completeness guarantees”. In: arXiv (2023).
[Bau+17] David Bau et al. “Network Dissection: Quantifying Interpretability of Deep Visual Representations”. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, pp. 6541–6549.
[Gra+23] Mara Graziani et al. “Concept discovery and Dataset exploration with Singular Value Decomposition”. In: Proc. of the ICLR Workshop on Pitfalls of Limited Data and Computation for Trustworthy ML. 2023.