In robotics, it is common to use a map of the environment as a tool for navigation and motion planning. Specifically, maps are really good at telling you where you are in the environment: this called localization. The ability to localize, though, is completely dependent on the quality of the map. Humans could construct this map, but you can imagine how much work that would require to measure out every detail. If the map is off by a little bit, how long will it take to correct that error and improve precision? What if something changes and you need to update the map? Since robots will use the map anyway, there is this idea of use the sensing modality of the robot to build the map: this is called mapping.
One issue that you may see is that in a realistic situation, there seems to be some conflicting workflow: in order to create a map, the robot needs a precise estimate on its current location (how can you tell the size of a room if you aren't confident in how far you walked from one end to the other?). But in order to get a precise estimate on the current location, you must have a map to help you localize. In this sense, Simultaneous Loclalization and Mapping (SLAM) is often described as a chicken-and-the-egg problem, where the solution of one process requires the end result to the other. How do you solve such a problem?
The answer is not too complicated: you must solve both problems at the same time -- hence the world "simultaneous". The idea is that separately, mapping and localization are pretty much the same process of estimating location given some process within some errors. In localization, you estimate your current position by knowing the exact position of landmarks on the map, and when mapping, you estimate the location of landmarks given a precise estimate of you current location. SLAM leans into the mathy jungle of Baysean estimation by recognizing that since the estimation process is the same, you can effectively lump all the positions that you want to estimate (robot and landmarks) into the same vector. By iteratively finding estimates for this vector, you solve both localization and mapping at the same time.
This blog post will focus on the most conceptually simple type of SLAM, called EKF SLAM. If you are not aware, the EKF (Extendended Kalman Filter) is way of estimating values of any variable that (1) follows a differential equation of motion through time, allowing you to predict the current value from previous values, and (2) is periodically observed, allowing you to correct you're prediction with measurements. This post will not be an explanation of how EKF SLAM works, per se, as there are many works that already do this quite well here is a video by Cyrill Stanchniss, whose youtube channel is a rich resource for any aspiring roboticist). Instead, I'll will explore some questions that are often left unexplored when EKF SLAM is talked about, as well as show you my personal implementation of the 2D EKF SLAM for robots with an odometry motion model.
First, let's look at the simple 1D case, i.e., robot movement on a straight line. It would look funny if the landmarks were on the robot's line of motion, so without loss of generality I'll put the landmarks and the robot on different y positions, and allow the robot to change its x position by moving right to left. Below is a picture of the setup with two landmarks:
Since the robot can only move left and right, the motion model of the robot is quite simple. Here is the discrete motion dynamics that describe how the x position of the robot $x_r$ changes under commanded velocity $u$ over a time step of $dt$:
$$ x_r^+ = x_r + u\times dt $$
The robot can only observe a landmark if it is within some sensing radius. Because this is a 1D simulation, I am only concerned with the x position of landmark $i$, denoted as $x_{li}$. When a landmark is observed, the robot observes the relative distances between it and the landmark:
$$ z_{li} = x_{li} - x_r $$
Both the robotic motion and the measurement model are linear, so the EKF performs the same as the regular KF in this situation. This won't be the case for the 2D EKF SLAM in the next section, but for now the linearity keeps the problem simple enough to see the concepts at play without too much complication.
In this simplified SLAM problem, we wish to estimate the x positions of the robot and the two landmarks, so three variables in total. In the EKF algorithm, our belief takes the form of a multivariate gaussian. This gaussian is defined by its average value $\mathbf{\mu}$, where we actually estimate the x positions to be, and the covariance matrix $\Sigma$, which defines the errors in this belief (along the diagonal) as well as the correlations between the x positions (on the off-diagonal elements). In general, these are defined as:
$$ \mathbf{\mu} = \begin{pmatrix}\mu_{rx} \\ \mu_{1x} \\ \mu_{2x}\end{pmatrix} \\ \mathbf{\Sigma} = \begin{pmatrix}\Sigma_{rr} & \Sigma_{r1} & \Sigma_{r2} \\ \Sigma_{1r} & \Sigma_{11} & \Sigma_{12} \\ \Sigma_{2r} & \Sigma_{21} & \Sigma_{22}\end{pmatrix} $$
To be clear, $\Sigma_{ri}$ is the covariance between the robot's x position and the $i$th landmark's x position.
Initializing the EKF SLAM process is pretty simple. First, we provide a estimate on the locations of the robot and landmarks, $\mathbf{\mu}$. As is the case with regular EKF estimation, these starting estimates are not so critical. I chose $\mathbf{\mu} = \begin{pmatrix} 0 & 0 & 0 \end{pmatrix}^T$. There is small but important point when choosing $\Sigma$: since SLAM is concerned with building a map of the environment, the origin can be defined however we choose. We can take advantage of this choice by saying that the origin is precisely where the robot starts. In terms of the mutlivariate gaussian, $\Sigma_{rr}=0$. The other landmarks are initialized with a large error, say $\Sigma_{ii}=100$.
Below is a gif showing how the estimates and uncertainties change over time in the simulation. A line is drawn between the robot and any landmark that it is close enough to observe.
Here are some observations that one can make: