Best-response multiagent learning in non-stationary environments

Michael Weinberg*, Jeffrey S. Rosenschein

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

82 Scopus citations

Abstract

This paper investigates a relatively new direction in Multiagent Reinforcement Learning. Most multiagent learning techniques focus on Nash equilibria as elements of both the learning algorithm and its evaluation criteria. In contrast, we propose a multiagent learning algorithm that is optimal in the sense of finding a best-response policy, rather than in reaching an equilibrium. We present the first learning algorithm that is provably optimal against restricted classes of non-stationary opponents. The algorithm infers an accurate model of the opponent's non-stationary strategy, and simultaneously creates a best-response policy against that strategy. Our learning algorithm works within the very general framework of n-player, general-sum stochastic games, and learns both the game structure and its associated optimal policy.

Original languageAmerican English
Title of host publicationProceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004
EditorsN.R. Jennings, C. Sierra, L. Sonenberg, M. Tambe
Pages506-513
Number of pages8
StatePublished - 2004
EventProceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004 - New York, NY, United States
Duration: 19 Jul 200423 Jul 2004

Publication series

NameProceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004
Volume2

Conference

ConferenceProceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004
Country/TerritoryUnited States
CityNew York, NY
Period19/07/0423/07/04

Fingerprint

Dive into the research topics of 'Best-response multiagent learning in non-stationary environments'. Together they form a unique fingerprint.

Cite this