Pasar al contenido principal

Fixed-points of the distributional Bellman operator.

Fecha de inicio
Fecha de fin
Resumen: In distributional reinforcement learning beyond the expected returns of a policy complete return distributions are taken into account. The return distribution for a fixed policy is given as the fixed-point of an associated distributional Bellman operator (DBO). Existence and uniqueness of fixed-points of DBOs are discussed, as well as their tail properties. Further, distributional dynamic programming algorithms are presented to approximate the unknown return distributions together with error bounds, both within Wasserstein and Kolmogorov–Smirnov distances. For return distributions having probability density functions the algorithms yield approximations for these densities; error bounds are given within supremum norm. The concept of quantile-spline discretizations is introduced for these algorithms which shows promising results in simulation experiments, also in the presents of heavy tails.


 

The talk is based on two papers with Julian Gerstenberg and Denis Spiegel. 

 
 

Viernes 25/10 a las 10:30
Salón 703 de FING.

Sala zoom