AIspace

What would the optimal policy be if the punishment in the top-left square was changed to each of the following values: 5, 0.0001, 0?

For a punishment of 5:
pol

For a punishment of 0.0001:
pol

For a punishment of 0:
pol