Computational, behavioral and neural correlates of human reinforcement learning are well understood in decision making with discrete choices. However, little has been known about more generalized decision making in a continuous choice space. Here, we designed an fMRI experiment in which subjects search a hidden target in a 2-dimensional space given binary feedbacks. Subjects received monetary reward each time they selected a point close enough to a hidden target which was randomly set after 10 searches. A “reward zone” centered around a hidden target continued to shrink only after each rewarded trial, guiding subjects to search the hidden target. We suggested two computational models accounting for individual subjects’ search behavior:
Figure 1. Experiment design and a representative example of search behavior
Figure 2. Bayesian update of posterior distribution provided with positive or negative feedback
Figure 3. Model comparison analysis: Comparison between Maximum a Posterior (MAP) model and Maximum Information Gain (MIG) model
Maximum a posterior model (MAP) assuming the Bayesian update of expected reward probability with greedy selection (full exploitation) and maximum Information gain (MIG) model suggesting a choice with maximizing the reduction of uncertainty of the posterior expected reward probability (full exploration).
Figure 4. Directed versus random exploration
For the preliminary fMRI results, we found model-based reward prediction error strongly modulated activities in the ventral striatum compatible with previous related studies and activities in the anterior hippocampus reflecting memory-guided decision making. We also discuss cortical substrates of arbitrating exploitation versus exploration, predicted by the suggested computational models.
Figure 5. Preliminary model-based fMRI analysis