Post-hoc Concept-Based Explanations
Liam Wachter
Post-hoc Concept-Based Explanations
Intrinsically explainable
Build/modify the training procedure, so that predictions come with explanations.
Post-hoc explanations
Explain with a large class of already trained models.
Post-hoc Concept-Based Explanations
Feature-based
Also: prototype-based, example importance
Image taken from [JO21] with method from [Sel+17]
The paper ...
introduces relevant post-hoc concept based explanation (PHCBEx) methods.
collects properties for good PHCBEx methods.
introduces a algorithmic template that every PHCBEx follows.
attributes each property to one of the steps.
discusses current approaches and points out direction for future work.
Concept Activation Vectors (CAV) [Kim+17]
Separate concept activations from non-concept activations at a layer $\bm{f}[:\ell]: \mathbb{R}^N\to\mathbb{R}^L$ of the neural network $\bm{f}: \mathbb{R}^N \to \mathbb{R}^M$
Human defined concepts $c \in [C]$
Examples and counter example of each concept $P_c, A_c \subset \mathbb{R}^N$
The CAR approach is similar, but introduces the RBF kernel.
Interpretable Basis Decomposition (IBD) [Zho+18]
Only use the second to last layer, as fixed $\ell$
For class $k$ and sparseness constraint $s$, learn CAV equivalent
$$\bm{f}(\bm{x})[k] = \bm{w}_k^T \bm{f}[\ell:](\bm{x}) + b_k$$
Automated Concept-based Explanation (ACE) [Gho+19]
Completeness-aware Concept-Based Explanations (CAE) [Yeh+20]
Ensure, that ACE selects a set of concepts that are predictive of the class
Concept Recursive Activation FacTorization (CRAFT) [Fel+23b]
Do segmentation and embed them: $\bm{f}[:\ell](\mathcal{X}_k) = \mathrm{A} \in \mathbb{R}^{P\times L}$
Do dictionary learning, to obtain $\mathrm{U}$ and $\mathrm{W}$
$$(\mathrm{U}, \mathrm{W})= \argmin_{\mathrm{U}\geq 0, \mathrm{W}\geq 0} \frac{1}{2}\|\mathrm{A} - \mathrm{U}\mathrm{W}^T\|_F^2$$
$\mathrm{W}$: CAV equivalents
$\mathrm{U}$: Decomposition of the average activations per concept.
Do for multiple $\ell$ to obtain hierarchical concepts.
Invertible Concept-based Explanations (ICE) [Zha+21]
In the interest of time: Very similar to CRAFT, at a fixed layer
Multidimensional Concept Discovery (MCD) [VBS23]
Find approximate basis vectors for each segment cluster with PCA:
$\mathcal{C}_i \in \{\mathcal{C}_1, \ldots, \mathcal{C}_{N_c}\}$ (Each with dimension $D_i$
Identify $C_{N_c + 1} = \mathrm{span}(\mathcal{C}_1, \ldots, \mathcal{C}_{N_c})^\bot$
$$
\bm{f}[:\ell](\bm{x}) = \sum_{i=1}^{N_c} \sum_{j=1}^{D_i} \alpha_{ij} \bm{c}_{ij}\quad \bm{f}[:\ell](\bm{x}) = \sum_{i=1}^{N_c} \sum_{j=1}^{D_i} \beta_{ij} \bm{w}_{ij}
$$
$C_{N_c + 1}$ is the residual space, and used for a completeness score:
$$\eta(\mathcal{C}_i) = \frac{1 - \|\bm{w}^\bot\|_2^2}{\|\bm{w}\|_2^2}$$
systematization of explanations
properties of good explanations
- Coherency Concepts should be perceptually similar to each other and dissimilar to others.
- Meaningfulness Concepts should carry semantic significance on its own, making them recognizable without further context
- Importance of a concept is defined as the relevance of the concept for the result of the prediction
- Completeness of a set of concepts it being sufficient to explain the prediction of a model, i.e. no other class could be explained by this set of concepts
- Compactness output minimal concepts, and still be complete
systematization of explanations
Tasks
- Candidate concept acquisition A set of human interpretable concepts is obtained, that the explanation method has available to explain the prediction of the model.
- Relevant concept selection The set of concepts is reduced to a subset of concepts, that are relevant for the set of predictions to be explained.
- Concept importance calculation The contribution of each concept to the model’s prediction is quantified by an importance score.
Candidate concept acquisition
Easy way human defined concepts: CAV, CAR, ICB
Others: ACE, CAE, CRAFT, ICE, MCD; Use segmentation/dense areas of feature importance¹ and cluster the segments
Segmentation → Meaningfulness; Clustering → Coherency
¹ Feature importance and segmentation are closely related [Sel+17]
Train a classifier, to obtain CAVs
The classifier, determine the representation of concepts in latent space
Representation |
Method |
Canonical basis |
Exp. of [Bau+17] |
Orthogonal directions |
Exp. of [Gra+23] |
Approx. orthogonal directions |
ICE, CRAFT |
Linear directions |
ACE, IBD, CAE, CAV |
Linear subspaces |
MCD |
Smooth regions |
CAE |
Relevant Concept Selection
Latent space representation of the concepts candidates present in the input will be compared to the latent space representation of the possible concepts
So we need some kind of similarity
This influences completeness and compactness
Similarity |
Method |
Angle |
CAV, ACE, CAE |
Distance |
CAR |
Basis decomposition |
IBD, ICE, CRAFT, MCD |
Importance scores
Importance Score |
Method |
TCAV |
CAV, ACE, ICE |
TCAR |
CAR |
ConceptSHAP |
CAE |
Total Sobol indices |
CRAFT |
Basis decomposition |
IBD, MCD |
More in the paper.
Feel free to ask any questions.
References
[JO21] Hyungsik Jung and Youngrock Oh. “Towards better explanations of class activation mapping”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, pp. 1336–1344.
[Kim+17] Been Kim et al. “Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)”. In: Proc. of the International Conference on Machine Learning. Vol. 35. 2017, pp. 2668–2677.
[Sel+17] Ramprasaath Selvaraju et al. “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization”. In: Proc. of the IEEE International Conference on Computer Vision (ICCV). 2017, pp. 618–626.
[Zho+18] Bolei Zhou et al. “Interpretable Basis Decomposition for Visual Explanation”. In: Proc. of the European Conference on Computer Vision. Vol. 15. 2018, pp. 122–138.
[CS22] Jonathan Crabbé and Mihaela van der Schaar. “Concept Activation Regions: A Generalized Framework For Concept-Based Expla- nations”. In: arXiv (2022).
[Gho+19] Amirata Ghorbani et al. “Towards Automatic Concept-based Explanations”. In: Advances in Neural Information Processing Systems. Vol. 32. 2019, pp. 9277–9286.
[Fel+23b] Thomas Fel et al. “Craft: Concept Recursive Activation Factorization for Explainability”. In: Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, pp. 2711–2721.
[Yeh+20] Chih-Kuan Yeh et al. “On Completeness-aware Concept-Based Explanations in Deep Neural Networks”. In: Advances in Neural Information Processing Systems 33 (2020), pp. 20554–20565.
[Zha+21] Ruihan Zhang et al. “Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors”. In: Proc. of the AAAI Conference on Artificial Intelligence. Vol. 35. 13. 2021, pp. 11682–11690.
[VBS23] Johanna Vielhaben, Stefan Blücher, and Nils Strodthoff. “Multidimensional concept discovery (MCD): A unifying framework with completeness guarantees”. In: arXiv (2023).
[Bau+17] David Bau et al. “Network Dissection: Quantifying Interpretability of Deep Visual Representations”. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, pp. 6541–6549.
[Gra+23] Mara Graziani et al. “Concept discovery and Dataset exploration with Singular Value Decomposition”. In: Proc. of the ICLR Workshop on Pitfalls of Limited Data and Computation for Trustworthy ML. 2023.