capstone · intermediate · 1 week

Route Planner (A* on OpenStreetMap)

concepts: templates · concepts · std::priority_queue · std::variant · smart pointers
builds on: Containers, Iterators, and the STL Shape · Algorithms + Ranges · Templates from First Principles · Concepts + `requires`

The brief

Build an A* search that finds the shortest path between two points on a real-world map, using OpenStreetMap data. The deliverable is a command-line tool that takes route-planner <osm.xml> <start_lat,lon> <goal_lat,lon> on the command line and prints the sequence of nodes (and total distance) along the optimal path.

This capstone is the canonical second project of any data-structures course, redone in modern C++. By the end you'll have written a generic graph interface (using concepts), a heap-backed priority queue, a Haversine heuristic, and an output formatter — all with value-typed, move-aware, RAII-clean code.

Scope (1 weekend)

The deliverable has four parts:

OSM parser. Read the XML, extract nodes (lat/lon, id) and ways (sequences of node ids, with a highway tag).
Graph builder. Convert ways into a sparse adjacency list. Each edge carries the geodesic distance between its endpoints.
A search.* Use std::priority_queue over (f_score, node_id) pairs. Heuristic: Haversine distance from each candidate to the goal.
CLI driver. Parse arguments, run the search, print the path.

The full pipeline should run a 50,000-node city graph in under a second on modern hardware.

The shape

A generic graph interface — the Graph concept — separates the algorithm from any specific representation:

#include <concepts>
#include <vector>

// What does it mean to be a graph?
template <class G>
concept Graph = requires (const G& g, typename G::node_id n) {
  typename G::node_id;
  typename G::cost_type;

  { g.neighbors(n) } -> std::ranges::input_range;
  { g.heuristic(n, n) } -> std::convertible_to<typename G::cost_type>;
  { g.edge_cost(n, n) } -> std::convertible_to<typename G::cost_type>;
};

// The algorithm:
template <Graph G>
std::vector<typename G::node_id>
a_star(const G& graph,
       typename G::node_id start,
       typename G::node_id goal);

Anything matching Graph can be searched. The OSM parser builds one implementation; you might also write a synthetic-grid graph for tests.

A* refresher

The pseudocode:

open_set ← priority_queue of (f_score, node), seeded with (h(start), start)
came_from ← empty map
g_score[start] ← 0

while open_set not empty:
  (_, current) ← open_set.pop()
  if current == goal:
    return reconstruct_path(came_from, current)

  for neighbor in graph.neighbors(current):
    tentative_g ← g_score[current] + graph.edge_cost(current, neighbor)
    if tentative_g < g_score[neighbor]:
      came_from[neighbor] ← current
      g_score[neighbor] ← tentative_g
      f_score ← tentative_g + graph.heuristic(neighbor, goal)
      open_set.push((f_score, neighbor))

return no path

The trick to making this fast in C++: g_score and came_from are std::unordered_map<NodeId, …>. The priority queue stores std::pair<Cost, NodeId> in min-heap order (so std::greater).

Stale entries in the priority queue (a node re-pushed with a better score after an old entry is still in the queue) are handled by checking g_score[node] at pop time and skipping if the popped f_score doesn't match.

Implementation rubric

Your code should hit these targets:

Header-only a_star.h — pure template, takes a Graph, returns a std::vector<NodeId>. No knowledge of OSM.
osm_graph.h / .cpp — parses the XML, holds the adjacency list, implements the Graph concept.
haversine.h — pure function for great-circle distance. Easy to unit-test independently.
main.cpp — CLI, error handling, path-printing.
tests/ — at minimum: Haversine identities, a synthetic 4-node graph with known shortest path, and a small OSM sample fixture.

The build should be one cmake invocation. The whole thing should run in well under a second on a city-sized graph; if it doesn't, profile.

Stretch goals

In order of difficulty:

Bidirectional A*. Run two A* searches simultaneously, one from start, one from goal, meeting in the middle. Often 2-3× faster for long routes.
Multiple heuristics. Add std::variant<Haversine, Manhattan, Euclidean> for the heuristic, dispatched via std::visit.
Routes with constraints. "Avoid toll roads," "prefer cycling paths," etc. Use the way's tags to weight or filter edges.
Contraction hierarchies. A preprocessing pass that builds a shortcut graph; A* on the contracted graph is dramatically faster. This is how OSRM, GraphHopper, and Google Maps work in practice.

Test data

OpenStreetMap exports are available at geofabrik.de. Start small — a neighborhood-sized .osm.xml (a few MB) before tackling a whole city.

For a guaranteed-small fixture, the Karlsruhe sample bundled with many OSM-parsing libraries is a classic starting point: about 12,000 nodes, 4,000 ways. Pittsburgh's downtown is around 50,000 nodes; Manhattan is ~250,000.

What you'll have learned

By the end:

Designed a concept that separates an algorithm from a data structure (the Graph interface).
Used std::priority_queue with a custom comparator, and learned why "stale entry" handling matters for graph search.
Implemented the Haversine formula (lat/lon to meters) and understood why a monotonic, admissible heuristic is what makes A* optimal.
Parsed XML with one of the standard libraries (pugixml, tinyxml2), understanding the tradeoffs (DOM vs SAX, memory vs speed).
Profiled a real C++ program and learned where the hot spots actually are (usually parsing, then heap operations, then hash lookups).

Reference reading: Russell & Norvig's Artificial Intelligence: A Modern Approach on A* and admissible heuristics; the OSM "Map Features" wiki for tag semantics; Robert Geisberger's PhD thesis on contraction hierarchies if you tackle that stretch goal.

🛠️ Route Planner (A* on OpenStreetMap)