What is the value in the top-left state after performing another step of value iteration?
  • We take the max of the four possible actions:
    • left: 0.8(0+0.9(-10)) + 0.1(0+0.9(-10)) + 0.1(-100+0.9(0)) = -18.1
    • up: 0.8(0+0.9(-10)) + 0.1(0+0.9(-10)) + 0.1(-100+0.9(0)) = -18.1
    • down: 0.1(0+0.9(-10)) + 0.1(-100+0.9(0)) + 0.8(-100+0.9(0)) = -90.9
    • right: 0.1(0+0.9(-10)) + 0.1(-100+0.9(0)) + 0.8(-100+0.9(0)) = -90.9
  • Therefore the new value is -18.1.

Valid HTML 4.0 Transitional