Stable Learning

This is a collection of learning materials maintained by the lab. Each topic is structured as a progressive reading path with interactive visualizations. Pick a topic below and start from the beginning, or jump to any section.

Reinforcement Learning

From MDPs to modern policy optimization for LLM alignment.

Prerequisites: calculus, basic probability, familiarity with neural networks.

1. Action Chain & Rewards MDPs, states, actions, rewards, and the discount factor.

2. Policy Gradient REINFORCE, the score function estimator, and variance reduction.

3. PPO Proximal Policy Optimization, clipped surrogate objectives, and GAE.

4. GRPO Group Relative Policy Optimization — removing the critic for LLM alignment.

Cache Coherence & Consistency

How multicore CPUs keep caches correct, from hardware protocols to memory fences.

Prerequisites: basic computer architecture (registers, memory, assembly helps but not required).

1. Cache Fundamentals Memory hierarchy, cache organization (lines, sets, ways), and read/write policies.

2. Coherence Protocols MSI, MESI, MOESI — the problem of stale data and how snooping solves it.

3. Consistency Models Sequential consistency, TSO (x86), relaxed models (ARM, RISC-V), and why ordering matters.

4. Modern Systems Directory-based protocols, NUMA, memory fences, and acquire-release semantics.