CCoM Events Abstract

> > >

Directors:
Randolph E. Bank
Philip E. Gill
Michael Holst

Administrative Contact:
[ Click to Send Email ]

Yuhua Zhu
UCSD

Abstract:

In this paper, we address the problem of continuous-time reinforcement learning in scenarios where the dynamics follow a stochastic differential equation. When the underlying dynamics remain unknown and we have access only to discrete-time information, how can we effectively perform policy evaluation? We first demonstrate that the commonly used Bellman equation is a first-order approximation to the true value function. We then introduce a higher order PDE-based Bellman equation called PhiBE. We show that the solution to the i-th order PhiBE is an i-th order approximation to the true value function. Additionally, even the first-order PhiBE outperforms the Bellman equation in approximating the true value function when the system dynamics change slowly. We develop a numerical algorithm based on Galerkin method to solve PhiBE when we possess only discrete-time trajectory data. Numerical experiments are provided to validate the theoretical guarantees we propose.

Tuesday, February 6, 2024
11:00AM Zoom Only, ID 990 3560 4352

Center for Computational Mathematics 9500 Gilman Dr. #0112 La Jolla, CA 92093-0112 Tel: (858)534-9056 Fax: (858)534-5273