Setting Up The Problem

As with any optimal control problem, we seek to find a control function $\bm{u}^*(t)$ that minimizes some cost function $J(\bm{x},\bm{u})$ over time:

$$ J(\bm{x},\bm{u})=h(\bm{x})+\int_0^Tg(\bm{x},\bm{u})dt $$

Additionally, we say that the state of the system $\bm{x}$ evolves according to the following equation of motion:

$$ \dot{\bm{x}}=a(\bm{x},\bm{u}) $$

One way to solve this problem is using the calculus of variations. Much like regular calculus, the calculus of variations seeks to find where the change $J(\cdot)$ is equal to zero. The difference is that regular calculus seeks to find a single number where a function is optimized, and the calculus of variations seeks to find entire functions where a functional is optimized.

The details of the calculus of variations warrant its own blog post which may be written in the future. For now, I will just say that the calculus of variations resutls in a set of differential equations that are necessary conditions for finding an optimal control function $\bm{u}^$. As with any differential equation, if we find a function $\bm{u}(t)$ that satisfies these differential equations, then we say that this function must be equal to the optimal control function, $\bm{u} = \bm{u}^$.

Pontryagin’s Minimum Principle

So what are these differential equations? First, we must define a special function that is called the Hamiltonian:

$$ H(\bm{x},\bm{u},\bm{p},t) = g(\bm{x},\bm{u})+\bm{p}(t)a(\bm{x},\bm{u}) $$

Here, $g(\cdot)$ is the stage cost and $a(\cdot)$ is the equation of motion, as defined above. You may notice the introduction of a new function $\bm{p}(t)$. This is called a costate, and it comes from a common optimization technique known as Lagrange multipliers. In general, lagrange multipliers are used in problems where we want to optimize a system that must satisfy certain constraints. In our case, what we want to optimize is the stage cost $g(\cdot)$, integrated over time, with the constraint that $\dot{\bm{x}} = a(\bm{x},\bm{u})$. At this point, we do not know what $\bm{p}(t)$ looks like over time. We only find this costate function by solving the differential equations that satisfy optimality.

These differential equations are known as Pontryagin’s minimum (PM) principle (sometimes called the maximum principle if we are trying to maximize our cost function instead). The PM principle has three differential equations:

$$ \dot{\bm{x}}=\frac{\partial H}{\partial \bm{p}}=a(\bm{x},\bm{u}) \\ \dot{\bm{p}}=-\frac{\partial H}{\partial \bm{x}}=-\frac{\partial a}{\partial \bm{x}}p-\frac{\partial g}{\partial \bm{x}} \\ 0 = \frac{\partial H}{\partial \bm{u}} $$

The PM principle also has two boundary conditions. The first simply says that our state over time $\bm{x}(t)$ must start at some point $\bm{x}_0$ that we define:

$$ \bm{x}(0) = \bm{x}_0 $$

The second boundary condition is at the end of the trajectory, and relates the value of costate $\bm{p}(T)$ and the value of the state $\bm{x}(T)$:

$$ \bm{p}(T)=\frac{\partial h}{\partial x}\bigg\vert_{\bm{x}(T)} $$

Notice that we have differential equations and two boundary value conditions, one at the beginning of the trajectory $t=0$ and one at the end $t=T$. This is called a mixed boundary condition. If our differential equations are linear, then this isn’t so bad and a solution can easily be found. If, however, we are dealing with nonlinear differential equations (which is often the case for real-world systems), then a general solution is not possible. We must therefore rely on numerical techniques to find solutions.

Numerical Solution to Optimal Control