Policy-search-in-a-markov-decision-process by deerishi

This code evaluates an optimal policy in a Markov Decision Process. We use a 3x3 Grid World with the Goal State at 3,3 with a reward of 10 and the rest of the non terminal states with a reward of -1.

Policy-search-in-a-markov-decision-process

Markov Decision Process, Value Iteration