An improved algorithm for solving communicating average reward Markov decision processes

Moshe Haviv*, Martin L. Puterman

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.

Original languageEnglish
Pages (from-to)229-242
Number of pages14
JournalAnnals of Operations Research
Volume28
Issue number1
DOIs
StatePublished - Dec 1991
Externally publishedYes

Keywords

  • communicating classes
  • Markov decision processes
  • multichain policies
  • policy iteration
  • unichain policies

Fingerprint

Dive into the research topics of 'An improved algorithm for solving communicating average reward Markov decision processes'. Together they form a unique fingerprint.

Cite this