Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes
Journal article: In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely the strong uniform value. This solves two open problems. First, this shows that for any > 0, the decision-maker has a pure strategy σ which is-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, for any > 0, the decision-maker can guarantee the limit of the n-stage value minus in the infinite problem where the payoff is the expectation of the inferior limit of the time average payoff.
Author(s)
Xavier Venel, Bruno Ziliotto
Journal
- SIAM Journal on Control and Optimization
Date of publication
- 2016
Keywords
- Partial Observation
- Markov decision processes
- Dynamic programming
- Long-run average payoff
- Uniform value
Pages
- 1983-2008
URL of the HAL notice
Version
- 1
Volume
- 54