Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation

Guy Amir*, Shahaf Bassan, Guy Katz

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The ability to interpret Machine Learning (ML) models is becoming increasingly essential. However, despite significant progress in the field, there remains a lack of rigorous characterization regarding the innate interpretability of different models. In an attempt to bridge this gap, recent work has demonstrated that it is possible to formally assess interpretability by studying the computational complexity of explaining the decisions of various models. In this setting, if explanations for a particular model can be obtained efficiently, the model is considered interpretable (since it can be explained “easily”). However, if generating explanations over an ML model is computationally intractable, it is considered uninterpretable. Prior research identified two key factors that influence the complexity of interpreting an ML model: (i) the type of the model (e.g., neural networks, decision trees, etc.); and (ii) the form of explanation (e.g., contrastive explanations, Shapley values, etc.). In this work, we claim that a third, important factor must also be considered for this analysis - the underlying distribution over which the explanation is obtained. Considering the underlying distribution is key in avoiding explanations that are socially misaligned, i.e., convey information that is biased and unhelpful to users. We demonstrate the significant influence of the underlying distribution on the resulting overall interpretation complexity, in two settings: (i) prediction models paired with an external out-of-distribution (OOD) detector; and (ii) prediction models designed to inherently generate socially aligned explanations. Our findings prove that the expressiveness of the distribution can significantly influence the overall complexity of interpretation, and identify essential prerequisites that a model must possess to generate socially aligned explanations. We regard this work as a step towards a rigorous characterization of the complexity of generating explanations for ML models, and towards gaining a mathematical understanding of their interpretability.

Original languageEnglish
Title of host publicationECAI 2024 - 27th European Conference on Artificial Intelligence, Including 13th Conference on Prestigious Applications of Intelligent Systems, PAIS 2024, Proceedings
EditorsUlle Endriss, Francisco S. Melo, Kerstin Bach, Alberto Bugarin-Diz, Jose M. Alonso-Moral, Senen Barro, Fredrik Heintz
PublisherIOS Press BV
Pages818-825
Number of pages8
ISBN (Electronic)9781643685489
DOIs
StatePublished - 16 Oct 2024
Event27th European Conference on Artificial Intelligence, ECAI 2024 - Santiago de Compostela, Spain
Duration: 19 Oct 202424 Oct 2024

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume392
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference27th European Conference on Artificial Intelligence, ECAI 2024
Country/TerritorySpain
CitySantiago de Compostela
Period19/10/2424/10/24

Bibliographical note

Publisher Copyright:
© 2024 The Authors.

Fingerprint

Dive into the research topics of 'Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation'. Together they form a unique fingerprint.

Cite this