Sunday, 3 June 2012

# The A* pathfinding algorithm

Game development introduced me to programming when I was around 10, and I've loved it ever since. One of the first formal algorithms I learned before entering university was A* (pronounced A-star), and I really had a great time doing so. It is one of the most widely used pathfinding algorithms and it's likely the one you'd be introduced to first when approaching the subject of pathfinding. A pathfinding algorithm takes a start point (also referred to as a node) and a goal and attempts to make the shortest path between the two given possible obstacles blocking the way.

I've always thought the simplest example of pathfinding is a 2D grid in a game, but it can be used to find a path from A to B on any type of graph. For example a graph where vertices are airports and edges are flights, A* could be used to get the shortest trip between one airport and another.

## Properties

A* has the following properties:

• It is complete; it will always find a solution if it exists.
• It can use a heuristic to significantly speed up the process.
• It can have variable node to node movement costs. This can be used when some nodes are more difficult to traverse, for example you may move slower across rocky terrain or up hill.
• It can search in many different directions if desired.

## How it works

It works by maintaining two lists; the open list and the closed list. The open list initially contains the start node, it contains all nodes that are yet to be considered. If the open list becomes empty then there is no possible path. The closed list starts out empty and contains all nodes that have already been considered/visited.

The core loop of the algorithm selects a node from the open list with the lowest estimated cost to reach the goal. If the selected node is not the goal it puts all valid neighboring nodes into the open list and repeats the process.

Part of the magic in the algorithm is that all nodes that are created keep a reference to their parent. This means that from any node we can track backwards to get a path from the node and the start node.

## Node

A node has a positioning value (eg. x, y), a reference to its parent and three 'scores' associated with it. These scores are how A* determines which nodes to consider first.

### G score

The g score is the base score of the node and is simply the incremental cost of moving from the start node to this node.

$$g(n) = g(n.parent) + cost(n.parent, n)$$

$$cost(n_1, n_2) = \text{the movement cost from }n_1\text{ to }n_2$$

### H score - the heuristic

The heuristic is a computationally easy estimate of the distance between each node and the goal. This being computationally easy is very important as the H score will be calculated at least once for every node considered before the goal is reached. The implementation of the H score can vary depending on the properties of the graph that is being searched, here are the most common heuristics.

#### Manhattan distance

This is the simplest heuristic and is ideal for grids that allow 4-way movement (up, down, left, right).

$$h(n) = |n.x - goal.x| + |n.y - goal.y|$$

#### Diagonal distance (uniform cost)

This heuristic is used for 8-way movement (diagonals) when the cost of diagonal movement is the same as the cost of non-diagonal. Note that this is very inaccurate if the costs for diagonal and non-diagonal are not the same.

$$h(n) = c\cdot max(|n.x - goal.x|, |n.y - goal.y|)$$

$$c = \text{cost of movement}$$

#### Diagonal distance

This heuristic is used for 8-way movement when the cost of diagonal movement differs from the non-diagonal cost. Remember that the cost of diagonal distance doesn't need to be exact and is usually worth it to use a constant multiplier rather than the square root as the square root operation is quite expensive.

$$h(n) = c_d d_{min} + c_n ( d_{max} - d_{min})$$

$$d_{max} = max(|n.x - goal.x|, |n.y - goal.y|)$$

$$d_{min} = min(|n.x - goal.x|, |n.y - goal.y|)$$

$$c_n = \text{cost of non-diagonal movement}$$

$$c_d = \text{cost of diagonal movement} = c_n \sqrt{2} \approx c_n \cdot 1.414$$

#### Euclidean distance

This heuristic can be used when movement is possible at any angle, it can also be quite expensive and may be worth considering a less expensive function.

$$h(n) = \sqrt{(n.x - goal.x)^2 + (n.y - goal.y)^2}$$

You should avoid using the square root all together and calculate $h(n)^2$. Then make sure that $g(n)^2$ is used to ensure both are relative.

$$h(n)^2 = (n.x - goal.x)^2 + (n.y - goal.y)^2$$

### F score

The f score is simply the addition of g and h scores and represents the total cost of the path via the current node.

$$f(n) = g(n) + h(n)$$

## Pseudocode

This psuedocode allows for 8-way directional movement.

function A*(start, goal)
open_list = set containing start
closed_list = empty set
start.g = 0
start.f = start.g + heuristic(start, goal)
while open_list is not empty
current = open_list element with lowest f cost
if current = goal
return construct_path(goal) // path found
remove current from open_list
for each neighbor in neighbors(current)
if neighbor not in closed_list
if neighbor is not in open_list
else
openneighbor = neighbor in open_list
if neighbor.g < openneighbor.g
openneighbor.g = neighbor.g
openneighbor.parent = neighbor.parent
return false // no path exists

function neighbors(node)
neighbors = set of valid neighbors to node // check for obstacles here
for each neighbor in neighbors
if neighbor is diagonal
neighbor.g = node.g + diagonal_cost // eg. 1.414 (pythagoras)
else
neighbor.g = node.g + normal_cost // eg. 1
neighbor.parent = node
return neighbors

function construct_path(node)
path = set containing node
while node.parent exists
node = node.parent