Why Random UUIDs (v4) Kill Database Performance: A Deep Dive into B-tree Index Issues

Mar 30, 2025

Akash Ashok

I am a software engineer

How Random UUIDs Damage B-tree Performance in Databases

UUID v4 primary keys can severely degrade SQL database performance due to how they interact with B-tree indexes. This article explains:

How B-tree indexes store and organize data
Why UUID v4’s randomness causes specific problems for B-trees
How these problems manifest in read and write operations
Why sequential IDs solve these B-tree performance issues

Understanding B-tree Indexes

Modern databases rely on B-tree indexes for efficient data retrieval. A B-tree is a self-balancing tree data structure that:

Maintains sorted data: Records are stored in sorted order by key
Organizes data in pages: Fixed-size chunks (typically 4-16KB)
Balances automatically: Stays roughly the same depth for all records
Enables efficient searches: Allows O(log n) lookups by traversing from root to leaf

The Page-Based Structure of B-trees

B-trees organize data in pages (sometimes called nodes):

Root Page: The topmost entry point into the tree, contains keys and values
Internal Pages: Contain keys, values, and pointers to child pages
Leaf Pages: Store keys and values, with no child pointers
Physical Adjacency: Pages are stored as physical blocks on disk

Note: Many database implementations actually use B+ tree variants, where data records are stored only in leaf nodes, and internal nodes contain only keys and pointers. The performance principles discussed in this article apply to both B-trees and B+ trees.

Each page is typically a fixed size (4KB, 8KB, or 16KB) and contains:

Sorted key values
Associated data records or pointers to records
Pointers to child pages (except in leaf pages for B+ trees)
Page metadata

This structure is optimized for keys that are inserted in a predictable, usually sequential, order.

How UUID v4 Disrupts B-tree Efficiency

UUID v4 generates completely random values:

550e8400-e29b-41d4-a716-446655440000  ← completely random
6ba7b810-9dad-11d1-80b4-00c04fd430c8  ← no relation to previous ID
f47ac10b-58cc-4372-a567-0e02b2c3d479  ← inserted later but could sort anywhere

This randomness fundamentally conflicts with B-tree optimization assumptions in two critical ways:

1. Read Pattern Problems: Cache Inefficiency

UUID v4 creates problematic access patterns during reads, as demonstrated in this animation:

When using UUID v4 in a B-tree:

Random Page Access: Each lookup requires navigating to randomly distributed pages
Poor Buffer Cache Utilization: Database buffer pool becomes ineffective
- With sequential IDs, recently accessed pages likely contain upcoming records
- With UUID v4, each query likely needs different pages, causing cache thrashing
High Cache Miss Rate: Few lookups benefit from cached data
Excess Physical I/O: Each cache miss triggers a disk read
No Locality of Reference: Related records created together aren’t stored together

2. Write Pattern Problems: The Page Split Cascade

The most severe impact of UUID v4 on B-trees occurs during write operations:

B-tree Page Split Mechanics

When a B-tree page becomes full, it undergoes a “page split”:

A new page is allocated
The full page’s contents are divided roughly in half
Half the records move to the new page
Parent pages must be updated with new key ranges and pointers
Parent pages may also split in a cascading reaction

With sequential IDs, page splits occur predictably at the right edge of the tree. With UUID v4, they occur randomly throughout the tree:

The UUID v4 Page Split Problem and Write Amplification

Random Insertion Points: Each new UUID v4 targets a random leaf page
Uniform Distribution: Over time, virtually all pages reach capacity
Epidemic of Page Splits: Pages continually split throughout the tree
- Sequential IDs: ~10-20 page splits per million records
- UUID v4: 5,000-10,000+ page splits per million records (500× more)
Cascading Splits: One split often triggers others up the tree
Write Amplification: A single logical write operation triggers multiple physical writes
- Original data write
- Page split operations (new page allocation)
- Updates to parent pages
- WAL (Write-Ahead Log) entries for all changes
- Metadata updates
Index Fragmentation: Tree becomes increasingly imbalanced and fragmented

Each page split causes:

Multiple random disk reads and writes
WAL (Write-Ahead Log) operations
Lock contention
Tree rebalancing
Buffer pool pollution
Increased write amplification (5-10× more physical I/O per logical write)

Comparison of B-tree Behavior

Aspect	With Sequential IDs	With UUID v4
Insertion Pattern	Append to rightmost page	Random throughout tree
Page Utilization	90-100% (efficient)	50-70% (wasteful)
Page Splits	Rare, predictable	Frequent, unpredictable
Buffer Cache Hits	High (spatial locality)	Low (random access)
Disk I/O	Minimal, sequential	Excessive, random
Write Amplification	Low (1-2× per write)	High (5-10× per write)
Index Size	Compact	2-5× larger due to fragmentation
Insertion Performance	Fast, consistent	3-10× slower

How Sequential IDs Fix B-tree Problems

In contrast to random identifiers, sequential IDs (such as UUID v7 or auto-increment values) create B-tree-friendly access patterns:

018f0f3d-1c3b-7c3b-9c3b-1c3b7c3b9c3b  ← timestamp prefix (Mar 2023)
018f0f3d-1c3b-7c3b-9c3b-1c3b7c3b9c3c  ← sequential; just +1 in last position
018f0f3e-2a7d-8c4f-a32e-5b6c9d8f3e7a  ← later timestamp, sorts after previous IDs

The sequential nature ensures records are inserted in order, creating efficient B-tree patterns:

With sequential IDs in B-trees:

Efficient Append Pattern: Inserts happen primarily at the rightmost edge
Optimized Page Splits: Occur predictably and infrequently
High Cache Hit Rate: Related records remain in buffer cache
Minimized I/O: Fewer page reads/writes required
Compact Storage: Higher page utilization (90-100% vs 50-70%)

Comparing ID Types for B-tree Performance

Feature	Random IDs (e.g., UUID v4)	Sequential IDs (e.g., UUID v7)
Structure	Randomly generated bits	Time-ordered component + some randomness
B-tree Insertion Pattern	Random (worst case)	Sequential (optimal)
Page Split Frequency	Extremely high	Minimal
Cache Locality	Poor	Excellent

Conclusion

The impact of random IDs like UUID v4 on B-tree performance goes beyond theoretical concerns—it fundamentally conflicts with how B-trees are optimized to work. The random nature creates a perfect storm of problems: excessive page splits, write amplification, poor cache utilization, increased I/O, and index fragmentation.

Write amplification is particularly damaging because each logical insert operation can trigger 5-10× more physical I/O operations, severely limiting database throughput and potentially shortening the lifespan of SSDs. This cascading effect explains why database performance can degrade so dramatically with random primary keys.

By choosing sequential identifiers, you align your primary key generation with the underlying assumptions and optimizations of B-tree indexes, minimizing both read and write amplification.

For most database applications using B-tree indexes, sequential IDs provide the optimal balance between uniqueness guarantees and performance, ensuring your database operates efficiently as it scales.