Skip to content

TSO & Relaxed Models

TSO relaxes exactly one ordering: Store → Load. A store can be delayed (sitting in the store buffer) while subsequent loads execute.

Each core has a store buffer — a FIFO queue of pending writes:

  1. When the CPU executes a store, it goes into the store buffer (fast, no memory access)
  2. The store buffer drains to the cache/memory asynchronously
  3. Loads check the store buffer first (store-buffer forwarding), then the cache

This means a core can see its own stores immediately, but other cores see them later. That’s the Store→Load reorder.

PreservedReason
Load → LoadLoads are not reordered with each other
Load → StoreA load completes before a subsequent store
Store → StoreStore buffer is FIFO — stores drain in order
RelaxedReason
Store → LoadStore sits in buffer; subsequent load executes from cache

x86 processors implement TSO (Intel formalized this as the x86-TSO model). This is why most concurrent programs “just work” on x86 — TSO is close enough to SC that only subtle patterns break.

The one dangerous pattern:

Core 0: Core 1:
x = 1 y = 1
r0 = y r1 = x
Under TSO: r0=0 AND r1=0 IS possible!

Both stores can be in their respective store buffers when the loads execute.

ARM and RISC-V allow all four reorderings by default:

ReorderingARMx86
Store → LoadAllowedAllowed
Store → StoreAllowedForbidden
Load → StoreAllowedForbidden
Load → LoadAllowedForbidden

ARM’s big.LITTLE designs and RISC-V’s diverse implementations benefit from maximum flexibility. The hardware can:

  • Reorder loads to hide memory latency
  • Coalesce or reorder stores for efficiency
  • Speculate more aggressively

The tradeoff: programmers must use explicit fence instructions whenever ordering matters.

Select a model to see which reorderings are allowed:

Sequential Consistency
Theoretical ideal
No reordering allowed. All operations appear to execute in program order. Simplest to reason about, but most restrictive for hardware optimization.
ReorderingAllowed?
Load → LoadNo (preserved)
Load → StoreNo (preserved)
Store → StoreNo (preserved)
Store → LoadNo (preserved)
Example — Store Buffer (Store→Load reorder):
Core 0:           Core 1:
  x = 1             y = 1
  r0 = y            r1 = x

SC:      r0=0, r1=0 is IMPOSSIBLE
TSO:     r0=0, r1=0 is POSSIBLE (both stores buffered)
Relaxed: r0=0, r1=0 is POSSIBLE

To enforce ordering on relaxed architectures, use fence (barrier) instructions:

ArchitectureFull fenceStore fenceLoad fence
x86MFENCESFENCELFENCE
ARMDMB ISHDMB ISHSTDMB ISHLD
RISC-Vfence rw, rwfence w, wfence r, r

Modern programming models use acquire and release rather than raw fences:

  • Acquire (on load): no subsequent memory operation can be reordered before this load
  • Release (on store): no preceding memory operation can be reordered after this store
// C++ example
flag.store(1, std::memory_order_release); // all prior writes visible
// ...
while (flag.load(std::memory_order_acquire) == 0) {} // all subsequent reads see prior writes

This maps directly to hardware:

  • x86: acquire/release are free (TSO provides them naturally)
  • ARM: acquire uses LDAR, release uses STLR (special instructions)
  • RISC-V: acquire/release fence annotations