Skip to content

Cache Fundamentals

Modern processors are dramatically faster than main memory. Without caches, the CPU would spend most of its time waiting for data. The memory hierarchy bridges this speed gap with progressively smaller, faster storage levels.

Approximate latencies (2024 numbers):

LevelSizeLatencyBandwidth
Register~1 KB~0.3 ns
L1 Cache32–64 KB~1 ns~1 TB/s
L2 Cache256 KB–1 MB~3–5 ns~500 GB/s
L3 Cache4–64 MB~10–20 ns~200 GB/s
DRAM8–256 GB~50–100 ns~50 GB/s
SSDTBs~10–100 μs~7 GB/s

The key insight: L1 is ~100x faster than DRAM. Caches exploit temporal locality (recently accessed data is likely accessed again) and spatial locality (nearby data is likely accessed soon).

When the CPU reads address AA:

  1. Hit: AA is in the cache → return data immediately
  2. Miss: AA is not in the cache → fetch from next level, store in cache, then return

The cache is organized into cache lines (typically 64 bytes). Even if you only need 4 bytes, the entire 64-byte line is fetched — this is how spatial locality is exploited.

When multiple cores each have their own L1/L2 caches but share memory, a fundamental problem arises: if Core 0 writes to address AA and Core 1 has a cached copy of AA, Core 1’s copy is now stale.

This is the cache coherence problem, and solving it requires protocols like MSI and MESI (covered in Section 2).

  • Cache Organization — how address bits map to cache lines, sets, and ways
  • Cache Operations — read/write policies, eviction, and write-back vs write-through