Route Planner (A* on OpenStreetMap)
The brief
Build an A* search that finds the shortest path between two points on a
real-world map, using OpenStreetMap data. The deliverable is a
command-line tool that takes route-planner <osm.xml> <start_lat,lon> <goal_lat,lon>
on the command line and prints the sequence of nodes (and total distance)
along the optimal path.
This capstone is the canonical second project of any data-structures course, redone in modern C++. By the end you’ll have written a generic graph interface (using concepts), a heap-backed priority queue, a Haversine heuristic, and an output formatter — all with value-typed, move-aware, RAII-clean code.
Scope (1 weekend)
The deliverable has four parts:
- OSM parser. Read the XML, extract
nodes (lat/lon, id) andways (sequences of node ids, with ahighwaytag). - Graph builder. Convert ways into a sparse adjacency list. Each edge carries the geodesic distance between its endpoints.
- A search.* Use
std::priority_queueover(f_score, node_id)pairs. Heuristic: Haversine distance from each candidate to the goal. - CLI driver. Parse arguments, run the search, print the path.
The full pipeline should run a 50,000-node city graph in under a second on modern hardware.
The shape
A generic graph interface — the Graph concept — separates the
algorithm from any specific representation:
#include <concepts>
#include <vector>
// What does it mean to be a graph?
template <class G>
concept Graph = requires (const G& g, typename G::node_id n) {
typename G::node_id;
typename G::cost_type;
{ g.neighbors(n) } -> std::ranges::input_range;
{ g.heuristic(n, n) } -> std::convertible_to<typename G::cost_type>;
{ g.edge_cost(n, n) } -> std::convertible_to<typename G::cost_type>;
};
// The algorithm:
template <Graph G>
std::vector<typename G::node_id>
a_star(const G& graph,
typename G::node_id start,
typename G::node_id goal);
Anything matching Graph can be searched. The OSM parser builds one
implementation; you might also write a synthetic-grid graph for tests.
A* refresher
The pseudocode:
open_set ← priority_queue of (f_score, node), seeded with (h(start), start)
came_from ← empty map
g_score[start] ← 0
while open_set not empty:
(_, current) ← open_set.pop()
if current == goal:
return reconstruct_path(came_from, current)
for neighbor in graph.neighbors(current):
tentative_g ← g_score[current] + graph.edge_cost(current, neighbor)
if tentative_g < g_score[neighbor]:
came_from[neighbor] ← current
g_score[neighbor] ← tentative_g
f_score ← tentative_g + graph.heuristic(neighbor, goal)
open_set.push((f_score, neighbor))
return no path
The trick to making this fast in C++: g_score and came_from are
std::unordered_map<NodeId, …>. The priority queue stores
std::pair<Cost, NodeId> in min-heap order (so std::greater).
Stale entries in the priority queue (a node re-pushed with a better
score after an old entry is still in the queue) are handled by checking
g_score[node] at pop time and skipping if the popped f_score
doesn’t match.
Implementation rubric
Your code should hit these targets:
- Header-only
a_star.h— pure template, takes aGraph, returns astd::vector<NodeId>. No knowledge of OSM. osm_graph.h/.cpp— parses the XML, holds the adjacency list, implements theGraphconcept.haversine.h— pure function for great-circle distance. Easy to unit-test independently.main.cpp— CLI, error handling, path-printing.tests/— at minimum: Haversine identities, a synthetic 4-node graph with known shortest path, and a small OSM sample fixture.
The build should be one cmake invocation. The whole thing should run
in well under a second on a city-sized graph; if it doesn’t, profile.
Stretch goals
In order of difficulty:
- Bidirectional A*. Run two A* searches simultaneously, one from start, one from goal, meeting in the middle. Often 2-3× faster for long routes.
- Multiple heuristics. Add
std::variant<Haversine, Manhattan, Euclidean>for the heuristic, dispatched viastd::visit. - Routes with constraints. “Avoid toll roads,” “prefer cycling
paths,” etc. Use the
way’s tags to weight or filter edges. - Contraction hierarchies. A preprocessing pass that builds a shortcut graph; A* on the contracted graph is dramatically faster. This is how OSRM, GraphHopper, and Google Maps work in practice.
Test data
OpenStreetMap exports are available at
geofabrik.de. Start small — a
neighborhood-sized .osm.xml (a few MB) before tackling a whole city.
For a guaranteed-small fixture, the Karlsruhe sample bundled with many OSM-parsing libraries is a classic starting point: about 12,000 nodes, 4,000 ways. Pittsburgh’s downtown is around 50,000 nodes; Manhattan is ~250,000.
What you’ll have learned
By the end:
- Designed a concept that separates an algorithm from a data structure (the Graph interface).
- Used
std::priority_queuewith a custom comparator, and learned why “stale entry” handling matters for graph search. - Implemented the Haversine formula (lat/lon to meters) and understood why a monotonic, admissible heuristic is what makes A* optimal.
- Parsed XML with one of the standard libraries (
pugixml,tinyxml2), understanding the tradeoffs (DOM vs SAX, memory vs speed). - Profiled a real C++ program and learned where the hot spots actually are (usually parsing, then heap operations, then hash lookups).
Reference reading: Russell & Norvig’s Artificial Intelligence: A Modern Approach on A* and admissible heuristics; the OSM “Map Features” wiki for tag semantics; Robert Geisberger’s PhD thesis on contraction hierarchies if you tackle that stretch goal.