Concurrency · #11 of 13

Threads, `std::async`, and Futures

Starting concurrent C++ safely — pick the right tool, RAII the rest

Why it matters

C++ has had threads in the standard library since 2011. By 2026 they've matured into something you can actually reach for without recoiling. The standard library gives you four levels of abstraction:

std::thread — the raw OS thread wrapper. Powerful, easy to use wrong (you must .join() before destruction).
std::jthread (C++20) — std::thread plus auto-join on destruction plus a cancellation token. The default tool now.
std::async — "I want this work to happen asynchronously; give me a future for the result." No thread management required.
std::future + std::promise — the value pipe that lets one thread hand a result to another.

This lesson is about picking the right one and using it safely. The next lesson (12) covers the low-level atomic primitives that everything above is built on, plus the memory model that makes them work.

`std::thread`: the OS thread, wrapped

std::thread constructs an OS thread that immediately starts running the function you give it.

#include <iostream>
#include <thread>

void worker(int n) {
  std::cout << "worker(" << n << ") on thread\n";
}

int main() {
  std::thread t(worker, 42);
  // … do other work …
  t.join();    // wait for t to finish before main exits
  return 0;
}

The one rule that bites everyone: a std::thread must be either joined or detached before it dies. Otherwise its destructor calls std::terminate. This is why std::jthread exists — it joins automatically.

`std::jthread`: the modern default

std::jthread was added in C++20 as "the std::thread you should actually use." It does three things std::thread doesn't:

Auto-joins in its destructor — no std::terminate ambush.
Accepts a std::stop_token as the first argument of the callable, letting the thread cooperatively check whether it's been asked to stop.
Has .request_stop() to signal that stop_token.

#include <iostream>
#include <thread>
#include <chrono>

void poll(std::stop_token stop) {
using namespace std::chrono_literals;
int ticks = 0;
while (!stop.stop_requested()) {
  std::this_thread::sleep_for(50ms);
  ticks++;
}
std::cout << "stopped after " << ticks << " ticks\n";
}

int main() {
std::jthread t(poll);
std::this_thread::sleep_for(std::chrono::milliseconds(220));
t.request_stop();
// jthread destructor auto-joins. No call to .join() needed.
return 0;
}

idle

`jthread` + `stop_token` is the modern C++ pattern for cancellable background work. RAII handles the join.

expected output

stopped after 4 ticks
# (exact tick count depends on scheduler — somewhere around 4-5)

Or run locally

g++ -std=c++23 -O2 snippet.cpp && ./a.out

When you want to launch a thread, reach for std::jthread. Reach for std::thread only when you need to deliberately detach (rare) or when you're on a platform that hasn't shipped C++20.

The default rule: don't share mutable data between threads. If each thread has its own copy, there is no race. C++'s pass-by-value semantics make this natural — moving a std::vector into a thread's callable hands ownership to the new thread.

When you do need shared mutable state, the entry-level tool is std::mutex + std::lock_guard:

#include <mutex>

std::mutex m;
int shared_counter = 0;

void increment() {
  std::lock_guard<std::mutex> lock(m);     // RAII: acquires here, releases on scope exit
  shared_counter++;
}                                          // ← lock released

std::lock_guard is RAII over the mutex. You can't forget to unlock — the destructor does it. If increment throws halfway through, the lock still releases. This is the same pattern as std::unique_ptr (lesson 05) applied to a different resource.

For multiple mutexes, use std::scoped_lock (C++17) — it acquires them all atomically with deadlock-avoidance built in.

`std::async`: I want a result, not a thread

Often you don't care which thread runs the work. You just want a result, eventually. std::async gives you that:

#include <iostream>
#include <future>
#include <chrono>

int compute(int seed) {
std::this_thread::sleep_for(std::chrono::milliseconds(50));
return seed * seed;
}

int main() {
// Launch the work; get a future for the result.
std::future<int> f = std::async(std::launch::async, compute, 7);

std::cout << "doing other work…\n";
// … other work could happen here …

// .get() blocks until the result is ready, then returns it.
int result = f.get();
std::cout << "result = " << result << "\n";
return 0;
}

idle

`std::async` launches work, hands you a future. `.get()` waits for the result. No explicit thread management.

expected output

doing other work…
result = 49

Or run locally

g++ -std=c++23 -O2 snippet.cpp && ./a.out

Always pass std::launch::async explicitly. The default (std::launch::any) lets the implementation decide whether to actually run async or defer until .get() — surprisingly common source of "why is my parallel code not parallel?" bugs.

Note: the returned std::future will block in its destructor if it came from std::async. That means std::async(...) without saving the future is synchronous — the temporary's destructor waits. This is also the kind of thing concepts can't catch and you only learn from being bitten once.

`std::future` + `std::promise`: the value pipe

If you need to hand a value from one thread to another by hand (not launch a function), use std::promise and std::future as a single-use pipe:

std::promise<int> p;
std::future<int>  f = p.get_future();

std::jthread producer([&p]() {
  // … expensive work …
  p.set_value(42);                // unblocks anything awaiting f
});

int x = f.get();                  // blocks until producer's set_value

A promise is the write end; a future is the read end. Each value is passed exactly once. If the promise is destroyed without set_value, the future's .get() throws a std::future_error("broken_promise").

For multiple readers, use std::shared_future instead — multiple consumers can .get() the same value. For multiple writes, you want a condition variable or a channel-style primitive (the standard doesn't ship channels; we'll talk about them in the capstone).

Condition variables: wait until X

When one thread needs to wait for another to signal (not just compute a value), std::condition_variable is the right tool:

#include <mutex>
#include <condition_variable>

std::mutex m;
std::condition_variable cv;
bool ready = false;

void waiter() {
  std::unique_lock lock(m);
  cv.wait(lock, []{ return ready; });   // releases m while waiting, re-acquires on wake
  // … now ready == true, lock is held …
}

void signaller() {
  {
    std::lock_guard lock(m);
    ready = true;
  }
  cv.notify_one();
}

The predicate form (cv.wait(lock, pred)) is the only spelling worth using — it handles spurious wakeups for you (yes, wait can return without notify being called; the predicate keeps you correct).

`std::atomic` as a teaser

For one specific case — a single value shared across threads, with no other state to keep consistent — you don't need a mutex. You need std::atomic:

#include <atomic>

std::atomic<int> counter{0};

void increment() {
  counter++;                       // atomic — no race, no mutex needed
}

This compiles to one or two CPU instructions, depending on the architecture. It's faster than locking a mutex. The cost: you can only modify one atomic at a time atomically. If you need "set A and B together," you're back to needing a mutex.

The full atomic story — including memory ordering, why memory_order_relaxed is faster than the default, and how the happens-before relation lets you reason about visibility — is the next lesson.

Tools you should know exist

A short list, with the one-line "what it's for":

| Tool | What it's for | |---|---| | std::jthread | A cancellable thread that auto-joins. The default. | | std::async | "Run this and give me a future." | | std::future / std::promise | One-shot value pipe between threads. | | std::shared_future | Multiple readers of the same one-shot value. | | std::packaged_task | A callable wrapped to publish its result into a future. | | std::mutex / std::lock_guard | Mutual exclusion, RAII-locked. | | std::scoped_lock | Multiple mutexes at once, deadlock-free. | | std::shared_mutex / std::shared_lock | Read-write lock. | | std::condition_variable | Wait until a predicate becomes true. | | std::atomic<T> | Single-value lock-free access. | | std::barrier (C++20) | N threads sync at a meeting point. | | std::latch (C++20) | Single-use countdown gate. | | std::counting_semaphore (C++20) | Generalized semaphore. | | std::stop_token (C++20) | Cooperative cancellation. |

That's the toolbox. You'll need a small handful of these regularly; the rest are there when you need them.

The patterns

A few common shapes worth recognizing:

Fork-join parallelism: split work N ways, launch N futures, gather N results.

std::vector<std::future<int>> jobs;
for (auto& chunk : chunks)
  jobs.push_back(std::async(std::launch::async, process, chunk));

int total = 0;
for (auto& j : jobs) total += j.get();

Producer-consumer: a thread-safe queue + condition variable. Producers push, consumers pop and process. The capstone C4 builds this.

Pipeline: each stage runs on its own thread, with bounded queues between stages. Common in audio, video, and data-ingestion code.

Thread pool: N worker threads pull tasks from a shared queue. The right abstraction for "I have lots of small jobs." Standard library doesn't ship one yet (C++26 may add std::execution for this).

When concurrency is the wrong answer

Concurrency adds correctness problems. If your code is fast enough single-threaded, leave it single-threaded. Three common red flags:

You're parallelizing something memory-bound, not CPU-bound. N threads on a memory-bound loop usually go slower, not faster, because they thrash the cache.
You're using threads for I/O. Async I/O (epoll, io_uring, libuv-style event loops) is almost always better than blocking N threads on N file descriptors.
The problem is GUI responsiveness. Use a single background thread for the work and a message queue back to the UI thread. Don't sprinkle threads everywhere.

The rule of thumb: add concurrency for measured CPU-bound speedups. Profile first.

Key takeaways

Default to std::jthread for thread launching. It auto-joins and has a cooperative stop_token. Plain std::thread is the older, more dangerous version.
Default to std::async(std::launch::async, fn, args...) when you want a result without managing the thread.
Pass std::launch::async explicitly. Without it, the implementation may defer execution until .get().
std::future from std::async blocks on destruction. A fire-and-forget call accidentally becomes synchronous.
Use std::lock_guard / std::scoped_lock for mutex locking; never call .lock() / .unlock() by hand.
std::atomic<T> is the right tool for a single shared value, not for shared-state coherence. For "modify two things together," you want a mutex.
Most concurrency bugs are race conditions and lifetime mistakes. Both are about what a thread sees and when — which is lesson 12.

What's next

Lesson 12 — atomics and the C++ memory model. The lesson most courses get wrong because it requires explaining what a CPU can reorder, what the compiler can reorder, and what memory_order actually guarantees. The capstone C4 (Concurrent Web Crawler) then exercises everything from this phase.

🧵 Threads, `std::async`, and Futures

Why it matters

std::thread: the OS thread, wrapped

std::jthread: the modern default

Sharing data: don't, or do it with a mutex

std::async: I want a result, not a thread

std::future + std::promise: the value pipe