Allocator Optimization
Lesson, slides, and applied problem sets.
View SlidesLesson
Block Splitting and Allocator Optimization
When you allocate 50 bytes from a 1000-byte free block, 950 bytes are wasted. Block splitting solves this. This module also covers advanced topics: debugging with gdb, mmap for large allocations, and thread safety.
The Problem: Internal Fragmentation
Request: malloc(50)
Found: [Block: 1000 bytes free]
Result: [Block: 1000 bytes USED, but only 50 needed]
950 bytes wasted inside the block!
This is internal fragmentation - waste inside allocated blocks.
The Solution: Split the Block
Before:
[Header | 1000 bytes free ]
After malloc(50) with splitting:
[Header | 50 used ][New Header | ~926 bytes free ]
Now future allocations can use the remaining space.
When to Split
Only split if the remainder is useful:
#define BLOCK_MAGIC 0xDEADBEEF
#define MIN_BLOCK_SIZE (sizeof(block_header_t) + 16)
int can_split(block_header_t *block, size_t size) {
size_t aligned = align8(size);
if (block->size < aligned) return 0;
size_t remaining = block->size - aligned;
return remaining >= MIN_BLOCK_SIZE;
}
Don't split if remaining space is tiny - you'd just create an unusable fragment.
Splitting Implementation
block_header_t *split_block(block_header_t *block, size_t size) {
size_t aligned = align8(size);
if (!can_split(block, size)) {
return NULL; // No split
}
// Find where new block starts
block_header_t *new_block = (block_header_t*)(
(char*)block + sizeof(block_header_t) + aligned
);
// Initialize new block with magic for validation
new_block->size = block->size - aligned - sizeof(block_header_t);
new_block->free = 1;
new_block->magic = BLOCK_MAGIC;
new_block->next = block->next;
// Update original block
block->size = aligned;
block->next = new_block;
return new_block;
}
Debugging Allocators with GDB
Memory allocator bugs are notoriously hard to find. GDB is essential:
Basic Commands
# Compile with debug symbols
gcc -g -O0 my_malloc.c test.c -o test
# Start gdb
gdb ./test
Inspecting Block Headers
# Break when entering malloc
(gdb) break my_malloc
# Run until breakpoint
(gdb) run
# Print block header
(gdb) print *block
$1 = {size = 104, free = 1, magic = 3735928559, next = 0x0}
# Verify magic (0xDEADBEEF = 3735928559)
(gdb) print/x block->magic
$2 = 0xdeadbeef
# Walk the free list
(gdb) set $b = free_list_head
(gdb) while $b != 0
> print *$b
> set $b = $b->next
> end
Catching Corruption
# Watchpoint on magic field corruption
(gdb) watch block->magic
(gdb) condition 1 block->magic != 0xDEADBEEF
# Break on invalid free
(gdb) break my_free if block->magic != 0xDEADBEEF
Common Debugging Patterns
- Use-after-free: Set freed memory to
0xFEEDFACE, break if accessed - Double-free: Check if block already marked free
- Buffer overflow: Write sentinel patterns at block boundaries
Using Your Allocator in Real Programs
Once the allocator works, you can load it into existing binaries without recompiling them.
Build a Shared Library
Linux:
clang -O0 -g -W -Wall -Wextra -shared -fPIC malloc.c -o malloc.so
macOS:
clang -O0 -g -W -Wall -Wextra -dynamiclib malloc.c -o malloc.dylib
Note: sbrk is deprecated on recent macOS versions, so behavior can be unreliable. For learning, it's fine; for production, use mmap/vm_allocate-style approaches instead.
Inject into a Process
Linux (bash/zsh):
export LD_PRELOAD=/absolute/path/malloc.so
macOS:
export DYLD_INSERT_LIBRARIES=/absolute/path/malloc.dylib
Now run any binary (e.g., ls) and it will use your allocator.
mmap for Large Allocations
Real allocators use mmap for large allocations instead of sbrk:
#include <sys/mman.h>
#define MMAP_THRESHOLD (128 * 1024) // 128KB
void *allocate_large(size_t size) {
size_t total = sizeof(block_header_t) + size;
void *ptr = mmap(NULL, total,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1, 0);
if (ptr == MAP_FAILED) return NULL;
block_header_t *block = ptr;
block->size = size;
block->free = 0;
block->magic = BLOCK_MAGIC;
block->next = NULL;
return (void*)(block + 1);
}
void free_large(void *ptr, size_t size) {
block_header_t *block = ((block_header_t*)ptr) - 1;
munmap(block, sizeof(block_header_t) + size);
}
Why mmap for Large Blocks?
- Returns memory to OS:
munmapimmediately frees pages - No fragmentation: Each allocation is independent
- Security: Fresh pages are zero-initialized
- Virtual address space: Doesn't grow the heap
Hybrid Strategy
void *my_malloc(size_t size) {
if (size >= MMAP_THRESHOLD) {
return allocate_large(size);
}
// Use sbrk-based allocator for small blocks
return allocate_from_heap(size);
}
Thread Safety
Our allocator is NOT thread-safe. Here's why and how to fix it:
The Race Condition
// Thread 1 // Thread 2
block = find_free_block(100); block = find_free_block(100);
// Both find same block!
block->free = 0; block->free = 0;
return payload(block); return payload(block);
// BOTH return the SAME memory!
Solution 1: Global Lock (Simple but Slow)
#include <pthread.h>
static pthread_mutex_t malloc_lock = PTHREAD_MUTEX_INITIALIZER;
void *my_malloc(size_t size) {
pthread_mutex_lock(&malloc_lock);
void *result = my_malloc_internal(size);
pthread_mutex_unlock(&malloc_lock);
return result;
}
void my_free(void *ptr) {
pthread_mutex_lock(&malloc_lock);
my_free_internal(ptr);
pthread_mutex_unlock(&malloc_lock);
}
Solution 2: Per-Thread Arenas (Fast but Complex)
Production allocators (jemalloc, tcmalloc) use thread-local arenas:
__thread arena_t *thread_arena = NULL;
void *my_malloc(size_t size) {
if (!thread_arena) {
thread_arena = create_arena();
}
return arena_alloc(thread_arena, size);
}
Benefits:
- No lock contention between threads
- Cache-friendly (each thread has local data)
- Scales with CPU cores
Solution 3: Lock-Free Structures (Expert Level)
Using atomic operations and CAS (Compare-And-Swap):
#include <stdatomic.h>
_Atomic block_header_t *free_list_head = NULL;
void add_to_free_list(block_header_t *block) {
block_header_t *expected;
do {
expected = atomic_load(&free_list_head);
block->next = expected;
} while (!atomic_compare_exchange_weak(
&free_list_head, &expected, block));
}
Trade-offs Summary
Block Splitting:
- Pros: Reduces internal fragmentation
- Cons: More blocks, slight overhead
mmap for Large Blocks:
- Pros: Returns memory to OS, no fragmentation
- Cons: System call overhead, not for small allocations
Thread Safety:
- Global lock: Simple but contention bottleneck
- Per-thread arenas: Fast but memory overhead
- Lock-free: Fastest but very complex to implement correctly
Minimum Block Size
Choosing MIN_BLOCK_SIZE wisely:
- Too small: create many unusable fragments
- Too large: waste space that could be used
16 bytes is reasonable - can hold at least two pointers or small data.
Module Items
Aligned Memory Allocation
Implement allocation with power-of-two alignment guarantees
Block Splitting
Split oversized blocks to reduce internal fragmentation
Block Coalescing
Merge adjacent free blocks to reduce external fragmentation
Debug the Allocator
Find and fix 4 bugs in a broken memory allocator using GDB
Optimization Quiz
Test understanding of block optimization