capstone · intermediate · 1 week

Custom Allocator

concepts: placement new · alignment · std::pmr · profiling · Rule of Five
builds on: Stack, Heap, Lifetimes, and Scope · RAII + Smart Pointers · Move Semantics + Rule of Zero/Five

The brief

Write a slab allocator that beats new / delete for fixed-size, high-churn allocations on a real workload. Plug it into a std::vector through std::pmr. Then profile it against the system allocator and convince yourself the abstraction really is zero-cost.

This is the canonical first capstone of the language. Once you've done it, the C++ standard library stops being a black box: every container, every smart pointer, every algorithm is "this, but more general."

Scope (1 weekend)

Build a SlabAllocator<T> with these guarantees:

Single allocation pool. On construction, reserve one contiguous buffer big enough for N Ts. Track which slots are free via a bitmap or a free-list embedded in the unused slots themselves.
allocate() returns an aligned T*. Use alignof(T) and std::byte storage. Verify alignment with assert(reinterpret_cast<uintptr_t>(p) % alignof(T) == 0);.
deallocate(T*) returns the slot to the free list. Verify the pointer is inside the buffer and isn't already free.
Out-of-pool requests fall back — either return nullptr, throw std::bad_alloc, or chain to the system allocator. Document which and why.

That's the core. The interesting bits — std::pmr integration, profiling, the Rule of Five for the allocator itself — are below.

The shape

A header-only skeleton, just enough to compile:

#pragma once
#include <cstddef>
#include <new>
#include <vector>

template <class T, std::size_t N>
class SlabAllocator {
  alignas(T) std::byte storage[N * sizeof(T)];
  // … free-list bookkeeping …

 public:
  using value_type = T;

  SlabAllocator() noexcept;
  ~SlabAllocator();

  // Rule of Five — what should each of these do for an allocator?
  SlabAllocator(const SlabAllocator&) = delete;
  SlabAllocator& operator=(const SlabAllocator&) = delete;
  SlabAllocator(SlabAllocator&&) noexcept;
  SlabAllocator& operator=(SlabAllocator&&) noexcept;

  [[nodiscard]] T* allocate(std::size_t n);
  void             deallocate(T* p, std::size_t n) noexcept;
};

That's the deliverable's surface. Filling it in is the project.

`std::pmr` integration

The C++17 polymorphic-memory-resource library is the standard's answer to "how do I plug a custom allocator into a container without making the allocator part of the container's type?" Wrap your slab in a std::pmr::memory_resource:

class SlabResource : public std::pmr::memory_resource {
  // hold one SlabAllocator<std::byte, BIG_N>
 protected:
  void* do_allocate(std::size_t bytes, std::size_t align) override;
  void  do_deallocate(void* p, std::size_t bytes, std::size_t align) override;
  bool  do_is_equal(const memory_resource& other) const noexcept override;
};

Then any std::pmr::vector<T>, std::pmr::map<K, V>, etc. can be constructed with your resource and use the slab.

Profiling rubric

You'll only know the slab is faster by measuring. Write a tiny benchmark:

Allocate and deallocate 10 million Ts in a loop, alternating with the system allocator.
Use std::chrono::steady_clock to wall-clock both runs.
Use perf stat ./bench to compare cycles, cache-misses, page-faults.

The slab should win at least 3× for fixed-size allocations on a hot loop. If it doesn't, your bottleneck is somewhere else — find it.

Stretch goals

In order of difficulty:

Concurrent slab. Make allocate and deallocate thread-safe with a per-bin lock. Then benchmark against tcmalloc or mimalloc on a multi-threaded workload.
Variable-size bins. Hold several slabs sized for different Ts and dispatch based on the requested size. This is what production allocators (jemalloc, mimalloc) do.
Garbage-collected slab. Add a gc() method that walks live shared_ptrs and reclaims orphaned slots. Defeats the purpose of C++ idiomatically, but you'll learn what GC actually costs.
No-allocation-mode. Add a build flag that traps any call to ::operator new outside the slab. Useful for embedded targets.

What you'll have learned

By the end of this capstone you'll have:

Internalized alignment as a hard constraint, not advice.
Written a non-trivial Rule of Five type (the allocator itself — what does it mean to move-construct an allocator that owns a buffer?).
Used std::pmr to integrate with the standard containers without changing their type signatures.
Measured the cost of new / delete against a hand-rolled allocator on a real workload — and now have a number, not a feeling, for how much the system allocator costs.
Built the kind of thing the standard library is built out of.

Reference reading: P. Halpern's N3525: Polymorphic Allocators; Howard Hinnant's writeup on short_alloc; Bloomberg's bsl talks on arena-allocator design.

🛠️ Custom Allocator