Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes

Article dans une revue: In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely the strong uniform value. This solves two open problems. First, this shows that for any > 0, the decision-maker has a pure strategy σ which is-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, for any > 0, the decision-maker can guarantee the limit of the n-stage value minus in the infinite problem where the payoff is the expectation of the inferior limit of the time average payoff.

Auteur(s)

Xavier Venel, Bruno Ziliotto

Revue
  • SIAM Journal on Control and Optimization
Date de publication
  • 2016
Mots-clés
  • Partial Observation
  • Markov decision processes
  • Dynamic programming
  • Long-run average payoff
  • Uniform value
Pages
  • 1983-2008
Version
  • 1
Volume
  • 54