Derivation of the Hamilton-Jacobi-Bellman Equation

Recall the definition of the optimal cost-to-go function:

$$ J(t,x) = \min_{u(t)}\left( \phi(x(T)) + \int_t^T\mathcal{L}(x,u,\tau)d\tau \right) $$

Deriving the HJB equations involve definiting the value function in a recursive way. Take the expression inside the large paranthesis and split the integral up into two terms:

$$ \int_t^{t+dt}\mathcal{L}d\tau + \phi(x(T))+\int_{t+t}^{T}\mathcal{L}d\tau $$

The first term can be approximated as $\int_t^{t+dt}\mathcal{L}d\tau \approx \mathcal{L}dt$. The last two terms can be identified as the value function after the system has moved over a small time $dt$. Since the value function is affected by both changes in position $dx$ and time $dt$, we can rewrite this expression as:

$$ \mathcal{L}dt + J(x+dx,t+dt) \\ = \mathcal{L}dt + J + \frac{\partial J}{\partial x}dx + \frac{\partial J}{\partial t}dt \\ = \mathcal{L}dt + J + \frac{\partial J}{\partial x}\frac{dx}{dt}dt + \frac{\partial J}{\partial t}dt \\ \mathcal{L}dt + J + \frac{\partial J}{\partial x}fdt + \frac{\partial J}{\partial t}dt $$

The last expression is taken simply from the dynamics $\dot{x} = f(x,u)$. We can place this expression inside the definition of the value function:

$$ J(t,x) = \min_u\left( \mathcal{L}dt + J(t,x) + \frac{\partial J}{\partial x}fdt + \frac{\partial J}{\partial t}dt\right) $$

The $J(t,x)$ and $\frac{\partial J}{\partial t}dt$ terms can be brought outside of the minimization, since neither of those depend on $u$. There are $J(t,x)$ terms on both sides that cancel, and dividing by $dt$ leaves the HJB equation:

$$ -\frac{\partial J(t,x)}{\partial t} = \min_{u(t)}\left(\mathcal{L}(t,x,u)+\frac{\partial J(t,x)}{\partial x}f(t,x,u)\right) $$