Skip to content

Modern Coherence Systems

The MSI/MESI protocols we studied use snooping: every cache watches a shared bus for coherence messages. This has a fundamental scalability problem — the bus becomes a bottleneck as core count increases.

Core 0 Core 1 Core 2 Core 3
| | | |
└────────┴────────┴────────┘
Shared Bus
  • Every coherence message is broadcast to all caches
  • Each cache must check if the message is relevant
  • Bus bandwidth is shared — O(N)O(N) listeners per message
  • Works well for 2–8 cores
Core 0 Core 1 Core 2 Core 3
| | | |
└────────┴────────┴────────┘
Directory / NoC
(tracks who has what)
  • A directory records which caches hold each line
  • Coherence messages are sent only to relevant caches (point-to-point)
  • Uses a Network-on-Chip (NoC) instead of a shared bus
  • Scales to dozens or hundreds of cores
AspectSnoopingDirectory
LatencyLower (broadcast is immediate)Higher (lookup directory first)
BandwidthO(N)O(N) per operationO(1)O(1) per operation
ScalabilityPoor (8–16 cores max)Good (100s of cores)
ComplexitySimplerMore complex
Storage overheadNoneDirectory entries per cache line
ProcessorCoresProtocol
Intel Core (desktop)4–24Ring bus with snoop filter
Intel Xeon (server)28–60Mesh interconnect, directory-based
AMD EPYC64–128Infinity Fabric, directory-based
Apple M-series8–24Custom interconnect
ARM Neoverse64–128CMN (Coherent Mesh Network), directory

Modern designs often combine approaches: use snooping within a cluster of cores and directory-based protocol between clusters.