top of page
Search

Reinforcement Learning Simplified

  • Writer: Vyas Anirudh
    Vyas Anirudh
  • Apr 23, 2023
  • 3 min read

I am going to give my best attempt at explaining the basics (theory & code) of Reinforcement Learning(RL) in very simple terms. This is a very complex topic so be ready to have your brain-muscles running 💪


I am going to use the game of Chess for illustrating the core concepts.




Imagine that you just bought a new chess board and are very excited challenge your friend, but you have no clue how to play the game. However, your friend is very kind and patient who is willing to teach you the rules as you move forward in the game.



Core Concepts 🌟

Now, let me add some technical terms here so that you can formulate this game into a reinforcement learning problem.

  • The chess game, chess board and your friend together make up the environment,

  • The person playing the game, in this case it is You, is called the agent

  • When you make a move i.e., move a piece on the chess board, that is called an action

  • Your brain (which is helping you decide what action to take) is called the policy

  • Once you finish the move, your friend will offer some kind advice/feedback on the way you played. However, this friend of yours only knows to communicate in Integers. So, they will say “1” when they think your move is a good one and they will say “-1” when they think your move is a bad one and “0” if it is a neutral move. These numbers “1” and “0” are known as a reward

  • Let’s say you take a picture of the chess board after certain number of turns have taken place. That picture represents the current state of the game. This is basically a description of the state of the game

  • An episode is the entirety of the game from start to finish. Or in certain video games when the character dies or reaches a certain goal. From here onwards I am only going to use the terms mentioned above in bold font. Hopefully you now have understood the core concepts of Reinforcement Learning (RL). If not, make sure to read the above points once again as it is very crucial to understand these terms. Any problem that you think can be solved using RL will first need to be formulated as an RL problem. Only then will you be able to use the right algorithms and setup your solution in an accurate manner.


A sample run 💡


Below I am going to describe few steps that will simulate few rounds of the game to make sure you “reinforce” (pun intended) your own understanding of the concepts.


  1. Initial state of the game(picture)

  2. Agent moves Queen to d3(a position on the board)

  3. Environment provides reward -1 and gives the new State

  4. Agent moves Queen back to d1

  5. Environment provides reward 0 and gives the new State

  6. Agent moves pawn to e4

  7. Environment provides reward +1 and gives the new State, now Agent knows that this is the right move because it received a reward of +1

  8. Agent moves knight to c3

  9. Environment provides reward +1 and gives the new State

  10. And so on…


As the agent keeps playing, it starts to learn the rules based on the feedback being provided by the environment. This training loop is repeated until the episode ends. There are various RL algorithms that can be used to update the agent’s policy. If this piques your interest then try out this really amazing course!



Python Implementation 👩💻🧑💻

- This is based on the example provided here



import gym
#create_the_environment
env = gym.make("LunadLander-v2")
observation = env.reset()
for _ in range(200):
    action = env.action_space.sample()#sample_random_action
    observation, reward, done, info = env.step(action) #execute_the_action
    if done:
        observation = env.reset()
        print("Reward:",reward)
env.close()
  • This is a very basic implementation where the agent takes random actions without any learning.

  • The high-level framework however remains the same and algorithms such as Q-learning, Deep Q-learning, PPO and more would be applied here to teach the agent how to take the right actions and reach the desired goal


 
 
 

Comments


bottom of page