What is Reinforcement Learning?

“Reinforcement Learning is a very different beast. The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.” — Hands on Machine Learning with Scikit-Learn, Keras and TensorFlow - Aurélien Géron.

👨‍💻 Reinforcement learning is a type of machine learning in which the computer learns by taking actions in an environment and receiving rewards or penalties for those actions.

Think of it like a child learning how to play a video game. The child tries different actions in the game, such as jumping or shooting, and receives points or loses lives as a result. Over time, the child learns which actions lead to success and which actions lead to failure, and adjusts their behavior accordingly.

reinforcement learning

In the same way, a reinforcement learning algorithm takes actions in a virtual environment and receives rewards or penalties based on the outcomes of those actions. The algorithm uses this feedback to learn what actions lead to the highest rewards and adjust its behavior accordingly.

🗝️ Reinforcement learning is used in many real-world applications, such as game playing, robotics, and autonomous vehicles. The key to success with reinforcement learning is to carefully design the reward function, which determines what the algorithm should be trying to optimize, and to make sure the algorithm has enough time to learn and explore the environment.

“Reinforcement learning is a subfield of machine learning where the machine “lives” in an environment and is capable of perceiving the state of that environment as a vector of features. The machine can execute actions in every state. Different actions bring different rewards and could also move the machine to another state of the environment.” — The Hundred Page Machine Learning - Perter Norvig.