Does Solving the HJB Equation Really Result in an Optimal Control Law?

In optimal control theory, the Hamilton-Jacobi-Bellman equation is a PDE that gives a necessary and sufficient condition for optimal control with respect to a cost function. In other words, if we can solve the HJB equation, then we find the optimal control law. In most examples detailing how the HJB equation is solved, the discussion stops as soon as the answer is found. This is usually fine, but I've always wondered if the answer we find is actually the optimal answer. Can we explore the optimality of HJB equation solutions using graphs?

In this blog post, I'll walk through the basic steps required to find the optimal control law with the HJB equation. After, I'll play around with optimal solution by graphing solutions that are perturbed by small amounts, showing graphically how these solutions optimize the cost function.

Solving the HJB Equation

Suppose we define the following cost function:

$$ J(x(t),u(t),t) = h(x(T)) + \int_t^T g(x(\tau),u(\tau)) d\tau $$

Here, $g(x,u)$ is a (usually positive definite) function that describes the instantaneous cost that $J(x,u)$ accrues at time $t$. Similarly, $h(x,u)$ describes the final cost at terminal time $T$. Notice how in this context, the cost function is a function of time $t$, and describes the future costs accrued until $T$. This is a consequence of Bellman's principle of optimality. I'll spare you the long explanation, as a lot of other people have already covered it in many different contexts. The basic idea, though, is if I want to achieve a certain state in an optimal way, I should work backwards from that desired state to any other feasible state. By finding an optimal strategy as you work backwards, this must be the optimal strategy if you were to move forward instead. The cost function is also a function of the current state $x(t)$.

For a given starting state $x_0$, future states evolve according to some differential equation:

$$ \dot{x} = f(x,u) $$

Optimal control theory seeks a control law $u^(x)$ such that the cost $J$ is minimized. Such a policy can be used to define an optimal cost function $J^$:

$$ J^*(x,t)=\argmin_{u(t)}\left\{ h(x(T)) + \int_t^T g(x(\tau),u(\tau))d\tau\right\} $$

Notice that this optimal cost function is only a function of position and time, since we take argmin with respect to control $u(t)$. It can be shown that this optimal cost function must satisfy the Hamilton-Jacobi-Bellman equation:

$$ J^t(x,t) + \min{u(t)}\left\{ g(x,t) + J^_x \cdot f(x,u) \right\} = 0 $$

This is a PDE that relates the partial time derivative $J^_t$ of the optimal cost with the partial state derivative $J_x^$. This PDE has the following boundary condition:

$$ J^*(x(T),T) = h(x(T),T) $$

The optimal control law $u^*(t)$ is simply the control law that minimizes the term in the brackets:

$$ u^(x,t)=\argmin_{u}\left\{g(x,t)+J_x^\cdot f(x,u)\right\} $$

Note that in order to find this optimal control, we first need to find $J^_x$, which generally involves needing to find $J^(x,t)$, i.e., solving the HJB equation.

Solving for a Toy Example

Let us assume the following 1-dimensional, LTI state dynamics: