A Scalar Reward refers to a numerical value assigned to an action or outcome to indicate its desirability or quality. In the context of reinforcement learning, a scalar reward is given to an agent after it takes an action in a particular state, reflecting the benefit (or cost) of that action. The goal of the agent is typically to maximize the cumulative reward over time.
Here’s a breakdown:
- Scalar: This term indicates that the reward is a single number (as opposed to a vector or matrix).
- Reward: In reinforcement learning, after an agent takes an action in a certain state, it receives a reward from the environment. This reward provides feedback on the quality of the action taken.
For example, in a game where an agent is trying to collect coins, picking up a coin might result in a scalar reward of +1. Conversely, if the agent hits an obstacle, it might receive a reward of -1.
The scalar reward helps the agent learn which actions are beneficial and which are not. Over time, by trying different actions and observing the rewards, the agent can learn a policy that maximizes its expected cumulative reward.
« Back to Glossary Index