States and Actions
State Space
Section titled “State Space”A state captures everything the agent needs to make a decision. In practice, states can be:
- Discrete — e.g., grid positions in a maze, board configurations in a game
- Continuous — e.g., joint angles and velocities of a robot, pixel values in an image
The key requirement is the Markov property: the state must contain enough information to predict the future without knowing the past.
Action Space
Section titled “Action Space”The action space defines what the agent can do:
- Discrete actions — e.g., move left/right/up/down, select a token from vocabulary
- Continuous actions — e.g., torque applied to a joint, steering angle
For language models, the action space is the vocabulary — at each step, the model selects a token.
Policy
Section titled “Policy”The policy maps states to a distribution over actions. It’s parameterized by (e.g., the weights of a neural network).
- Deterministic policy:
- Stochastic policy:
In deep RL, the policy is typically a neural network. For language models, the policy is the model — it outputs a probability distribution over the next token given the context.
The Agent-Environment Loop
Section titled “The Agent-Environment Loop”At each timestep :
- Agent observes state
- Agent samples action
- Environment transitions to
- Agent receives reward
This loop repeats until a terminal state is reached (episodic) or continues forever (continuing).