The Clipped Surrogate

How Clipping Works

The PPO objective takes the minimum of the unclipped and clipped surrogate:

L^{CLIP}(\theta) = \mathbb{E}_t \left[ \min \left( r_t(\theta) \hat{A}_t, \; \text{clip}(r_t(\theta), 1-\epsilon, 1+\epsilon) \hat{A}_t \right) \right]

Let’s break this down for both cases:

We want to increase $\pi_\theta(a_t \mid s_t)$ , which increases $r_t$ . But the clip caps the benefit at $r_t = 1 + \epsilon$ :

L_t = \min(r_t \hat{A}_t, (1+\epsilon) \hat{A}_t) = \begin{cases} r_t \hat{A}_t & \text{if } r_t \leq 1+\epsilon \\ (1+\epsilon)\hat{A}_t & \text{if } r_t > 1+\epsilon \end{cases}

We want to decrease $\pi_\theta(a_t \mid s_t)$ , which decreases $r_t$ . The clip prevents over-correction below $r_t = 1 - \epsilon$ :

L_t = \min(r_t \hat{A}_t, (1-\epsilon) \hat{A}_t) = \begin{cases} r_t \hat{A}_t & \text{if } r_t \geq 1-\epsilon \\ (1-\epsilon)\hat{A}_t & \text{if } r_t < 1-\epsilon \end{cases}

Adjust $\epsilon$ and toggle the advantage sign to see how the clipping region changes:

$\epsilon$	Effect
0.1	Very conservative — slow but stable
0.2	Standard (OpenAI default)
0.3	More aggressive — faster but riskier

In practice, $\epsilon = 0.2$ works well across many tasks, including RLHF for language models.