Skip to main navigation Skip to search Skip to main content

ℵ-IPOMDP: Mitigating Deception in a Cognitive Hierarchy with Off-Policy Counterfactual Anomaly Detection

  • Nitay Alon*
  • , Joseph M. Barnby
  • , Stefan Sarkadi
  • , Lion Schulz
  • , Jeffrey S. Rosenschein
  • , Peter Dayan
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Social agents with finitely nested opponent models are vulnerable to manipulation by agents with deeper recursive capabilities. This imbalance, rooted in logic and the theory of recursive modelling frameworks, cannot be solved directly. We propose a computational framework called ℵ-IPOMDP, which augments the Bayesian inference of model-based RL agents with an anomaly detection algorithm and an out-of-belief policy. Our mechanism allows agents to realize that they are being deceived, even if they cannot understand how, and to deter opponents via a credible threat. We test this framework in both a mixed-motive and a zero-sum game. Our results demonstrate the ℵ-mechanism’s effectiveness, leading to more equitable outcomes and less exploitation by more sophisticated agents. We discuss implications for AI safety, cybersecurity, cognitive science, and psychiatry.

Original languageEnglish
Article number14
JournalJournal of Artificial Intelligence Research
Volume85
DOIs
StatePublished - 2026

Bibliographical note

Publisher Copyright:
© 2026 Copyright held by the owner/author(s).

Keywords

  • belief revision and update
  • multiagent systems
  • reasoning about actions and change
  • reinforcement learning

Fingerprint

Dive into the research topics of 'ℵ-IPOMDP: Mitigating Deception in a Cognitive Hierarchy with Off-Policy Counterfactual Anomaly Detection'. Together they form a unique fingerprint.

Cite this