Custom Allocator
The brief
Write a slab allocator that beats new / delete for fixed-size,
high-churn allocations on a real workload. Plug it into a std::vector
through std::pmr. Then profile it against the system allocator and
convince yourself the abstraction really is zero-cost.
This is the canonical first capstone of the language. Once you’ve done it, the C++ standard library stops being a black box: every container, every smart pointer, every algorithm is “this, but more general.”
Scope (1 weekend)
Build a SlabAllocator<T> with these guarantees:
- Single allocation pool. On construction, reserve one contiguous
buffer big enough for
NTs. Track which slots are free via a bitmap or a free-list embedded in the unused slots themselves. allocate()returns an alignedT*. Usealignof(T)andstd::bytestorage. Verify alignment withassert(reinterpret_cast<uintptr_t>(p) % alignof(T) == 0);.deallocate(T*)returns the slot to the free list. Verify the pointer is inside the buffer and isn’t already free.- Out-of-pool requests fall back — either return
nullptr, throwstd::bad_alloc, or chain to the system allocator. Document which and why.
That’s the core. The interesting bits — std::pmr integration, profiling,
the Rule of Five for the allocator itself — are below.
The shape
A header-only skeleton, just enough to compile:
#pragma once
#include <cstddef>
#include <new>
#include <vector>
template <class T, std::size_t N>
class SlabAllocator {
alignas(T) std::byte storage[N * sizeof(T)];
// … free-list bookkeeping …
public:
using value_type = T;
SlabAllocator() noexcept;
~SlabAllocator();
// Rule of Five — what should each of these do for an allocator?
SlabAllocator(const SlabAllocator&) = delete;
SlabAllocator& operator=(const SlabAllocator&) = delete;
SlabAllocator(SlabAllocator&&) noexcept;
SlabAllocator& operator=(SlabAllocator&&) noexcept;
[[nodiscard]] T* allocate(std::size_t n);
void deallocate(T* p, std::size_t n) noexcept;
};
That’s the deliverable’s surface. Filling it in is the project.
std::pmr integration
The C++17 polymorphic-memory-resource library is the standard’s answer to
“how do I plug a custom allocator into a container without making the
allocator part of the container’s type?” Wrap your slab in a
std::pmr::memory_resource:
class SlabResource : public std::pmr::memory_resource {
// hold one SlabAllocator<std::byte, BIG_N>
protected:
void* do_allocate(std::size_t bytes, std::size_t align) override;
void do_deallocate(void* p, std::size_t bytes, std::size_t align) override;
bool do_is_equal(const memory_resource& other) const noexcept override;
};
Then any std::pmr::vector<T>, std::pmr::map<K, V>, etc. can be
constructed with your resource and use the slab.
Profiling rubric
You’ll only know the slab is faster by measuring. Write a tiny benchmark:
- Allocate and deallocate 10 million
Ts in a loop, alternating with the system allocator. - Use
std::chrono::steady_clockto wall-clock both runs. - Use
perf stat ./benchto comparecycles,cache-misses,page-faults.
The slab should win at least 3× for fixed-size allocations on a hot loop. If it doesn’t, your bottleneck is somewhere else — find it.
Stretch goals
In order of difficulty:
- Concurrent slab. Make
allocateanddeallocatethread-safe with a per-bin lock. Then benchmark againsttcmallocormimallocon a multi-threaded workload. - Variable-size bins. Hold several slabs sized for different
Ts and dispatch based on the requested size. This is what production allocators (jemalloc, mimalloc) do. - Garbage-collected slab. Add a
gc()method that walks liveshared_ptrs and reclaims orphaned slots. Defeats the purpose of C++ idiomatically, but you’ll learn what GC actually costs. - No-allocation-mode. Add a build flag that traps any call to
::operator newoutside the slab. Useful for embedded targets.
What you’ll have learned
By the end of this capstone you’ll have:
- Internalized alignment as a hard constraint, not advice.
- Written a non-trivial Rule of Five type (the allocator itself — what does it mean to move-construct an allocator that owns a buffer?).
- Used
std::pmrto integrate with the standard containers without changing their type signatures. - Measured the cost of
new/deleteagainst a hand-rolled allocator on a real workload — and now have a number, not a feeling, for how much the system allocator costs. - Built the kind of thing the standard library is built out of.
Reference reading: P. Halpern’s N3525: Polymorphic Allocators; Howard
Hinnant’s writeup on short_alloc; Bloomberg’s bsl talks on
arena-allocator design.