Threads, `std::async`, and Futures
Starting concurrent C++ safely — pick the right tool, RAII the rest
Why it matters
C++ has had threads in the standard library since 2011. By 2026 they’ve matured into something you can actually reach for without recoiling. The standard library gives you four levels of abstraction:
std::thread— the raw OS thread wrapper. Powerful, easy to use wrong (you must.join()before destruction).std::jthread(C++20) —std::threadplus auto-join on destruction plus a cancellation token. The default tool now.std::async— “I want this work to happen asynchronously; give me a future for the result.” No thread management required.std::future+std::promise— the value pipe that lets one thread hand a result to another.
This lesson is about picking the right one and using it safely. The next lesson (12) covers the low-level atomic primitives that everything above is built on, plus the memory model that makes them work.
std::thread: the OS thread, wrapped
std::thread constructs an OS thread that immediately starts running
the function you give it.
#include <iostream>
#include <thread>
void worker(int n) {
std::cout << "worker(" << n << ") on thread\n";
}
int main() {
std::thread t(worker, 42);
// … do other work …
t.join(); // wait for t to finish before main exits
return 0;
}
The one rule that bites everyone: a std::thread must be either
joined or detached before it dies. Otherwise its destructor calls
std::terminate. This is why std::jthread exists — it joins
automatically.
std::jthread: the modern default
std::jthread was added in C++20 as “the std::thread you should
actually use.” It does three things std::thread doesn’t:
- Auto-joins in its destructor — no
std::terminateambush. - Accepts a
std::stop_tokenas the first argument of the callable, letting the thread cooperatively check whether it’s been asked to stop. - Has
.request_stop()to signal that stop_token.
#include <iostream>
#include <thread>
#include <chrono>
void poll(std::stop_token stop) {
using namespace std::chrono_literals;
int ticks = 0;
while (!stop.stop_requested()) {
std::this_thread::sleep_for(50ms);
ticks++;
}
std::cout << "stopped after " << ticks << " ticks\n";
}
int main() {
std::jthread t(poll);
std::this_thread::sleep_for(std::chrono::milliseconds(220));
t.request_stop();
// jthread destructor auto-joins. No call to .join() needed.
return 0;
}
`jthread` + `stop_token` is the modern C++ pattern for cancellable background work. RAII handles the join.
stopped after 4 ticks
# (exact tick count depends on scheduler — somewhere around 4-5) Or run locally
g++ -std=c++23 -O2 snippet.cpp && ./a.out When you want to launch a thread, reach for std::jthread. Reach for
std::thread only when you need to deliberately detach (rare) or when
you’re on a platform that hasn’t shipped C++20.
Sharing data: don’t, or do it with a mutex
The default rule: don’t share mutable data between threads. If each
thread has its own copy, there is no race. C++‘s pass-by-value semantics
make this natural — moving a std::vector into a thread’s callable hands
ownership to the new thread.
When you do need shared mutable state, the entry-level tool is
std::mutex + std::lock_guard:
#include <mutex>
std::mutex m;
int shared_counter = 0;
void increment() {
std::lock_guard<std::mutex> lock(m); // RAII: acquires here, releases on scope exit
shared_counter++;
} // ← lock released
std::lock_guard is RAII over the mutex. You can’t forget to unlock —
the destructor does it. If increment throws halfway through, the lock
still releases. This is the same pattern as std::unique_ptr (lesson 05)
applied to a different resource.
For multiple mutexes, use std::scoped_lock (C++17) — it acquires them
all atomically with deadlock-avoidance built in.
std::async: I want a result, not a thread
Often you don’t care which thread runs the work. You just want a result,
eventually. std::async gives you that:
#include <iostream>
#include <future>
#include <chrono>
int compute(int seed) {
std::this_thread::sleep_for(std::chrono::milliseconds(50));
return seed * seed;
}
int main() {
// Launch the work; get a future for the result.
std::future<int> f = std::async(std::launch::async, compute, 7);
std::cout << "doing other work…\n";
// … other work could happen here …
// .get() blocks until the result is ready, then returns it.
int result = f.get();
std::cout << "result = " << result << "\n";
return 0;
}
`std::async` launches work, hands you a future. `.get()` waits for the result. No explicit thread management.
doing other work…
result = 49 Or run locally
g++ -std=c++23 -O2 snippet.cpp && ./a.out Always pass std::launch::async explicitly. The default (std::launch::any)
lets the implementation decide whether to actually run async or defer
until .get() — surprisingly common source of “why is my parallel code
not parallel?” bugs.
Note: the returned std::future will block in its destructor if it
came from std::async. That means std::async(...) without saving the
future is synchronous — the temporary’s destructor waits. This is also
the kind of thing concepts can’t catch and you only learn from being
bitten once.
std::future + std::promise: the value pipe
If you need to hand a value from one thread to another by hand (not
launch a function), use std::promise and std::future as a single-use
pipe:
std::promise<int> p;
std::future<int> f = p.get_future();
std::jthread producer([&p]() {
// … expensive work …
p.set_value(42); // unblocks anything awaiting f
});
int x = f.get(); // blocks until producer's set_value
A promise is the write end; a future is the read end. Each value is
passed exactly once. If the promise is destroyed without set_value,
the future’s .get() throws a std::future_error("broken_promise").
For multiple readers, use std::shared_future instead — multiple
consumers can .get() the same value. For multiple writes, you want a
condition variable or a channel-style primitive (the standard doesn’t
ship channels; we’ll talk about them in the capstone).
Condition variables: wait until X
When one thread needs to wait for another to signal (not just compute
a value), std::condition_variable is the right tool:
#include <mutex>
#include <condition_variable>
std::mutex m;
std::condition_variable cv;
bool ready = false;
void waiter() {
std::unique_lock lock(m);
cv.wait(lock, []{ return ready; }); // releases m while waiting, re-acquires on wake
// … now ready == true, lock is held …
}
void signaller() {
{
std::lock_guard lock(m);
ready = true;
}
cv.notify_one();
}
The predicate form (cv.wait(lock, pred)) is the only spelling worth
using — it handles spurious wakeups for you (yes, wait can return
without notify being called; the predicate keeps you correct).
std::atomic as a teaser
For one specific case — a single value shared across threads, with no
other state to keep consistent — you don’t need a mutex. You need
std::atomic:
#include <atomic>
std::atomic<int> counter{0};
void increment() {
counter++; // atomic — no race, no mutex needed
}
This compiles to one or two CPU instructions, depending on the
architecture. It’s faster than locking a mutex. The cost: you can only
modify one atomic at a time atomically. If you need “set A and B
together,” you’re back to needing a mutex.
The full atomic story — including memory ordering, why
memory_order_relaxed is faster than the default, and how the
happens-before relation lets you reason about visibility — is the next
lesson.
Tools you should know exist
A short list, with the one-line “what it’s for”:
| Tool | What it’s for |
|---|---|
std::jthread | A cancellable thread that auto-joins. The default. |
std::async | ”Run this and give me a future.” |
std::future / std::promise | One-shot value pipe between threads. |
std::shared_future | Multiple readers of the same one-shot value. |
std::packaged_task | A callable wrapped to publish its result into a future. |
std::mutex / std::lock_guard | Mutual exclusion, RAII-locked. |
std::scoped_lock | Multiple mutexes at once, deadlock-free. |
std::shared_mutex / std::shared_lock | Read-write lock. |
std::condition_variable | Wait until a predicate becomes true. |
std::atomic<T> | Single-value lock-free access. |
std::barrier (C++20) | N threads sync at a meeting point. |
std::latch (C++20) | Single-use countdown gate. |
std::counting_semaphore (C++20) | Generalized semaphore. |
std::stop_token (C++20) | Cooperative cancellation. |
That’s the toolbox. You’ll need a small handful of these regularly; the rest are there when you need them.
The patterns
A few common shapes worth recognizing:
Fork-join parallelism: split work N ways, launch N futures, gather N results.
std::vector<std::future<int>> jobs;
for (auto& chunk : chunks)
jobs.push_back(std::async(std::launch::async, process, chunk));
int total = 0;
for (auto& j : jobs) total += j.get();
Producer-consumer: a thread-safe queue + condition variable. Producers push, consumers pop and process. The capstone C4 builds this.
Pipeline: each stage runs on its own thread, with bounded queues between stages. Common in audio, video, and data-ingestion code.
Thread pool: N worker threads pull tasks from a shared queue. The
right abstraction for “I have lots of small jobs.” Standard library
doesn’t ship one yet (C++26 may add std::execution for this).
When concurrency is the wrong answer
Concurrency adds correctness problems. If your code is fast enough single-threaded, leave it single-threaded. Three common red flags:
- You’re parallelizing something memory-bound, not CPU-bound. N threads on a memory-bound loop usually go slower, not faster, because they thrash the cache.
- You’re using threads for I/O. Async I/O (
epoll,io_uring,libuv-style event loops) is almost always better than blocking N threads on N file descriptors. - The problem is GUI responsiveness. Use a single background thread for the work and a message queue back to the UI thread. Don’t sprinkle threads everywhere.
The rule of thumb: add concurrency for measured CPU-bound speedups. Profile first.
Key takeaways
- Default to
std::jthreadfor thread launching. It auto-joins and has a cooperative stop_token. Plainstd::threadis the older, more dangerous version. - Default to
std::async(std::launch::async, fn, args...)when you want a result without managing the thread. - Pass
std::launch::asyncexplicitly. Without it, the implementation may defer execution until.get(). std::futurefromstd::asyncblocks on destruction. A fire-and-forget call accidentally becomes synchronous.- Use
std::lock_guard/std::scoped_lockfor mutex locking; never call.lock()/.unlock()by hand. std::atomic<T>is the right tool for a single shared value, not for shared-state coherence. For “modify two things together,” you want a mutex.- Most concurrency bugs are race conditions and lifetime mistakes. Both are about what a thread sees and when — which is lesson 12.
What’s next
Lesson 12 — atomics and the C++ memory model. The lesson most courses
get wrong because it requires explaining what a CPU can reorder, what
the compiler can reorder, and what memory_order actually guarantees.
The capstone C4 (Concurrent Web Crawler) then exercises everything from
this phase.