Motivation: This project is an attempt to get hands on experience of the workings of Qlearning + PLanner and at the same time create something novel. We define a new domain called the "Bank World" domain. In this domain there is a bank at the center of a 10x10 grid and multiple gems and agents are scattered around the grid. The objective is to collect the gems and bring them to the bank in the most optimal way.
The Planner uses manhatten distance to calculate the distances of the gems from an agent.