Neural computations underlying human reinforcement learning in a continuous space

Computational, behavioral and neural correlates of human reinforcement learning are well understood in decision making with discrete choices. However, little has been known about more generalized decision making in a continuous choice space. Here, we designed an fMRI experiment in which subjects search a hidden target in a 2-dimensional space given binary feedbacks. Subjects received monetary reward each time they selected a point close enough to a hidden target which was randomly set after 10 searches. A “reward zone” centered around a hidden target continued to shrink only after each rewarded trial, guiding subjects to search the hidden target. We suggested two computational models accounting for individual subjects’ search behavior:

Figure 1. Experiment design and a representative example of search behavior

Figure 2. Bayesian update of posterior distribution provided with positive or negative feedback

Figure 3. Model comparison analysis: Comparison between Maximum a Posterior (MAP) model and Maximum Information Gain (MIG) model

Maximum a posterior model (MAP) assuming the Bayesian update of expected reward probability with greedy selection (full exploitation) and maximum Information gain (MIG) model suggesting a choice with maximizing the reduction of uncertainty of the posterior expected reward probability (full exploration).


Figure 4. Directed versus random exploration

For the preliminary fMRI results, we found model-based reward prediction error strongly modulated activities in the ventral striatum compatible with previous related studies and activities in the anterior hippocampus reflecting memory-guided decision making. We also discuss cortical substrates of arbitrating exploitation versus exploration, predicted by the suggested computational models.

Figure 5. Preliminary model-based fMRI analysis