Abstract
Adolescents are notorious for engaging in reward-seeking behaviors, a tendency attributed to heightened activity in the brain’s reward systems during adolescence. It has been suggested that reward sensitivity in adolescence might be adaptive, but evidence of an adaptive role has been scarce. Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, we found that adolescents showed better reinforcement learning and a stronger link between reinforcement learning and episodic memory for rewarding outcomes. This behavioral benefit was related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. These findings reveal an important role for the hippocampus in reinforcement learning in adolescence and suggest that reward sensitivity in adolescence is related to adaptive differences in how adolescents learn from experience.
Introduction
Adolescents are highly sensitive to reward (Andersen et al., 1997, Brenhouse et al., 2008, Galván et al., 2006, Somerville and Casey, 2010, van Duijvenvoorde et al., 2014), which has been linked to the emergence of maladaptive behaviors (Brenhouse and Andersen, 2011, Galván, 2013, Spear, 2000). It has been suggested that this reward sensitivity may also be adaptive by promoting learning and exploration, which are critical for transitioning to independence (Casey, 2015, Spear, 2000). However, evidence for enhanced learning in adolescence and associated neural mechanisms have remained elusive. We sought to test the hypothesis that adolescents would be better than adults at learning from reinforcement and that this benefit would be related to enhanced activity in brain regions that support learning and memory, particularly the striatum and the hippocampus.
Advances in understanding neural mechanisms of reinforcement learning in adults have leveraged computational reinforcement learning models to quantify trial-by-trial learning signals in the brain (Daw et al., 2005, Daw et al., 2011, O’Doherty et al., 2003). Such models highlight the important role of prediction errors (PEs), which reflect the extent to which reinforcement received on a given trial deviates from what is expected. By reflecting trial-by-trial deviations between predictions and outcomes, prediction errors provide a learning signal that updates subsequent behavior. fMRI studies in adults and adolescents have shown that prediction errors correlate with blood-oxygen-level-dependent (BOLD) activity in the striatum (e.g., Christakou et al., 2013, Cohen et al., 2010, Hare et al., 2008, O’Doherty et al., 2003, van den Bos et al., 2012). Despite some reports of enhanced striatal activity in adolescents, reports of developmental differences in prediction error-related striatal activity are mixed (Christakou et al., 2013, Cohen et al., 2010, van den Bos et al., 2012), and so far, none have shown a link between enhanced striatal BOLD activity in adolescents and enhanced learning. This suggests that, to the extent that adolescents’ reward sensitivity could be related to benefits for learning, these may be accounted for by other brain systems.
A natural brain candidate region for supporting reinforcement learning in adolescence is the hippocampus, known for its role in long-term episodic memory (e.g., Davachi, 2006, Gabrieli, 1998, Squire et al., 2004). The hippocampus also contributes to reward-related behaviors, including reinforcement learning, reward-guided motivation, and value-based decision making. Studies in adults show that the hippocampus and the striatum interact cooperatively to support both episodic encoding and reinforcement learning (Adcock et al., 2006, Bunzeck et al., 2010, Wimmer and Shohamy, 2012). These findings suggest that reward sensitivity in adolescence could be related to enhanced hippocampal activity, to better reinforcement learning, and to better episodic memory for rewarding events. But, so far, the role of the hippocampus in reinforcement learning in adolescence has not been studied.
We used a learning task in combination with fMRI and reinforcement learning models to address this gap. We hypothesized that, compared to adults, (1) adolescents would be better at learning from reinforcing outcomes; (2) adolescents would show a greater relation between reinforcement learning and episodic memory for rewarding events during learning; and (3) these differences in learning would be related to enhanced activity in the hippocampus and stronger coupling between the hippocampus and the striatum.
Participants learned incrementally, based on trial-by-trial reinforcement, to associate cues with outcomes (Figure 1A). The association between cues and outcomes was probabilistic, requiring continual use of reinforcement to update choices. Reinforcement was simply the word “correct” or “incorrect” and was not motivated by monetary incentives to avoid confounds related to the motivational significance of monetary reward across age groups. To test episodic memory for reinforcement events, we included a unique picture of an object that was incidental to the reinforcement itself in each outcome (Figure 1B). This design allowed us to measure (1) incremental learning based on trial-by-trial reinforcement, (2) episodic memory for reinforcement events, which are positive versus negative, and (3) the role of the hippocampus and the striatum in both forms of learning.
Figure 1. Behavioral Task to Assess Trial-by-Trial Incremental Learning and Episodic Memory(A) Learning phase: on each trial, a centrally presented cue appeared below two targets. Participants pressed a button to predict which flower a butterfly would land on and received probabilistic reinforcement along with a trial-unique picture of a commonplace object.(B) Memory test: participants saw a picture of an object, judged whether the picture was “old” or “new,” and then rated their level of confidence in that choice.
Results
Enhanced Reinforcement Learning in Adolescents
We tested whether adolescents (n = 41, 13–17 years old) differed from adults (n = 31, 20–30 years old) at learning from reinforcements, comparing (1) overall performance and (2) estimated learning rates from the reinforcement learning model. Learning performance was quantified as the percent of trials for which participants responded with the outcome most often associated with a given cue (e.g., Poldrack et al., 2001, Shohamy et al., 2004). A repeated measures (RM)-ANOVA (block × group) revealed that both age groups showed significant learning, but, consistent with our prediction, adolescents’ learning exceeded that of adults (Figure 2A; main effect of block: F3,210 = 20.2, p = 0.000; block × group interaction: F3,210 = 4.04, p = 0.008). Similar results were found for optimal choice by trial (mixed-effect regression, main effect of trial: z = 7.13, p = 0.000; group × trial interaction: z = −2.97, p = 0.003), and we also found a better fit of the interaction model (χ2 = 8.2, p = 0.004) after penalizing for model complexity (Akaike, 1974).
Figure 2. Behavioral Results: Adolescents Differ from Adults in Reinforcement Learning and in Association between Reinforcement Learning and Episodic Memory(A) Learning accuracy. Both groups learned over time, but adolescents’ learning exceeded adults’. Points reflect mean optimal choice for 24 or 30 (fMRI) trials; error bars show ±1 SEM.(B) Learning rate parameter estimates from a reinforcement learning model. Adolescents had a lower learning rate than adults, reflecting more incremental updating of choice based on reinforcement. Error bars show ±1 SEM.(C) Memory accuracy (d′) for trial-unique pictures that had been presented during reinforcement events in the learning task. Memory accuracy was computed separately by presented reinforcement to determine whether adolescents differed in their memory for positive and negative events. Adolescents and adults had better memory for images that accompanied positive, rather than negative, reinforcement. Error bars show ±1 SEM between participants. ∗∗∗p < 0.000, ∗p < 0.05.(D) The relationship between trial-by-trial reinforcement learning signals and later episodic memory for the reinforcement event. Only adolescents showed a reliable relationship between the magnitude of prediction error learning signals and likelihood of remembering episodic details of the reinforcement event. Lines show association between level of prediction error and the predicted probability from the fitted model for memory accuracy. Error bars around the fitted line show ±1 SEM.
To further characterize trial-by-trial responses, we applied a standard reinforcement learning model to each participant’s choice data (Equation 1 in Supplemental Experimental Procedures). We chose to fit a canonical model, which represents a standard class of models used extensively in studies of brain correlates of reward prediction errors in adults (see Daw et al., 2011). We estimated a learning rate parameter for each participant (α), which reflects the extent to which feedback on each trial is used to update later choices. Here, a lower learning rate is better because the probabilistic associations between cues and targets are fixed; a lower learning rate suggests that learning is guided by accumulating evidence over a greater number of trials rather than shifting behavior based on the outcome of any single trial (e.g., Daw, 2011).
Importantly, the model provided a good fit to the observed behavior across both groups (one-way t test comparing a null model, t71 = −39.70, p = 0.000, Akaike’s Information Criterion [AIC] used to penalize model complexity), and the model fits did not differ between them (independent samples t test, t70 = 1.35, p = 0.2). Consistent with their overall better learning, adolescents had a lower learning rate than adults (t70 = −3.0, p = 0.004; Figure 2B), indicating more incremental learning. Moreover, across groups, there was a significant negative correlation between learning rate and improved performance on the task (r70 = −0.43, p = 0.000; Figure S1A), indicating that lower learning rates were indeed related to better performance. Reaction times decreased over time for both groups, with no differences between them, suggesting that differences in learning are not due to general differences in responses to task demands (Figure S1B).
Memory Positivity Bias in Adolescents and Adults
We first assessed episodic memory for the trial-unique objects that were presented during learning, separating trials by whether subjects had been shown positive (“correct”) versus negative (“incorrect”) outcomes. We found a significant effect of reinforcement (RM-ANOVA, F1,70 = 24.6, p = 0.000; no effect of group, F1,70 = 1.6, p = 0.2; no interaction, F1,70 = 1.2, p = 0.3; Figure 2C; Supplemental Information; Table S1), indicating that both groups showed a “positivity bias”—better memory for positive, rather than negative, reinforcement events.
Trial-by-Trial Prediction Errors Are Associated with Episodic Memory in Adolescents, but Not in Adults
We next tested whether reinforcement learning measures were related to episodic memory using model-derived estimates of trial-by-trial prediction errors (δ) (Equation 1 in Supplemental Experimental Procedures). Prediction errors provide an estimate of how surprising each trial’s outcome was, which we used as a within-participant regressor for both behavioral and brain imaging analysis.
We found that prediction errors were related to memory accuracy and that this effect significantly interacted with group (mixed-effect regression interaction: PE × group, z = 2.4, p = 0.02; no main effect of PE, p = 0.2; or group, p = 0.7). This interaction reflected a significant relationship between prediction error and memory among the adolescents (z = 5.2, p = 0.000; Figure 2D), but not the adults (z = 1.3, p = 0.2). Thus, in adolescents, but not adults, episodic memory for outcomes was related to prediction errors. A similar effect was found for the relationship between reinforcement learning and the positivity bias in episodic memory across participants (Figures S1C and S1D).
Prediction Error Signals in the Hippocampus in Adolescents
A subset of 25 adolescents and 22 adults underwent fMRI while performing the learning task (behavioral effects in the fMRI group were the same as in the full behavioral sample; see Figures S1E–S1I). To interrogate the brain systems underlying differences in behavior between groups, we regressed prediction errors against BOLD activity within each participant and compared the groups in regions of a priori interest in the hippocampus and the striatum (for whole brain results, see Table S2; for analyses of value in the ventromedial prefrontal cortex [vmPFC], see Supplemental Information).
We found that prediction errors were correlated with BOLD activity in the striatum in both groups, with no significant differences between them (see Figure S2D; Table S2). In the hippocampus, by contrast, adolescents had significantly greater prediction error-related BOLD activity than adults (Figure 3C; Figure S2C).
Figure 3. Greater Prediction Error-Related Activation in the Hippocampus in Adolescents(A) Adolescents (n = 25) showed significant activation bilaterally in the hippocampus in two clusters (left: family-wise error (FWE)-p < 0.01, z = 4.15, peak [−16, −8, −20]; right: FWE-p < 0.03, z = 3.23, peak [24, −20, −12]).(B) Adults (n = 22) did not show above threshold activation in the hippocampus.(C) Direct comparisons between groups within the hippocampus showed significantly greater activation in the left hippocampus in the adolescents than in the adults (FWE-p < 0.03, z = 3.54, peak [−16, −8, −22]). See Figures S2A–S2C.
Given the behavioral link between reinforcements and memory in the adolescents, we investigated whether episodic memory was related to functional connectivity between the hippocampus and the striatum. We used a psychophysiological interaction (PPI) analysis with the time series from a hippocampal seed as the physiological variable (Figure 4A) and reinforcement valence of the outcome event (correct > incorrect) as the psychological variable. We found significant connectivity between the hippocampus and the putamen in adolescents (but not adults) that was greater for correct than incorrect outcomes (z = 2.68, family-wise error (FWE)-p < 0.01, 155 voxels, peak [−16, 10, −6]; Figure 4B). We then extracted the interaction value for each participant from the PPI and correlated this measure of learning-related connectivity with an independent within-participant behavioral measure of memory positivity bias (Figure 4C). We found a significant correlation between connectivity during learning and the extent to which memories for positive reinforcement events were enhanced for the adolescents (r = 0.62, p = 0.000), but not the adults (r = 0.05, p = 0.84), and a significant difference in the correlations between the groups (comparison of Fisher z transformed correlation coefficients z = 2.16, p = 0.03).
Figure 4. Functional Connectivity during Learning Relates to Memory Positivity Bias in Adolescents(A) Time series within the hippocampus showed functional coupling with the putamen for the contrast of correct > incorrect presented reinforcement.(B) Interaction between the physiological and psychological regressors in adolescents (limited to a hypothesis-driven search within the bilateral striatum; z = 2.68, FWE-p < 0.01, peak [−16, 10, −6]).(C) Result of the interaction term from the PPI was extracted for each participant and correlated with behavioral memory bias. There was an association between learning-related connectivity and the enhancement of memory for positive reinforcement in the adolescents, but not in the adults.
Discussion
The negative implications of reward sensitivity in adolescents have been well documented, but much less is known about the possible adaptive side for learning. Our results show that adolescents were better at learning from outcomes, outperforming adults. We also found that in adolescents, but not adults, trial-by-trial reinforcement learning is related to episodic memory for reinforcement events, such that memory was better for surprisingly positive versus negative outcomes. These behavioral benefits were related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. Finally, only in the adolescents, functional connectivity between these learning systems was related to the extent of bias toward better memory for positive reinforcement events.
This is the first demonstration of a role for the hippocampus in reinforcement learning in adolescents. Our results imply that, as adolescents navigate through new life experiences, learning from reinforcement is linked to how episodic memories are shaped and to the extent to which they are biased toward encoding more of the good than the bad. This feature of learning is important to consider in relation to decision making because it speaks to the sorts of biases that adolescents may encounter when they draw on prior experience to inform current decisions.
It is important to note that the adolescents in our study were not better at all types of learning; rather, the benefits were selective to reinforcement-based updating and reward-related memory. Overall, episodic memory in the adolescents was not better than in adults, and there were no differences between the groups in memory for just positive or just negative learning events. Instead, the groups differed specifically in the strength of the interaction between these two forms of learning.
These findings suggest that, in adolescents, there is less differentiation between different forms of learning and their neural substrates when compared with adults. One possible interpretation of this finding is that it may be related, in part, to the known delay in development of prefrontal control mechanisms in adolescence (Somerville and Casey, 2010 for review). Although it is not known precisely how the arbitration between different learning and decision systems takes place in the adult brain, it has been suggested that the prefrontal cortex may play an important role (Daw et al., 2005, Poldrack and Packard, 2003). Indeed, an influential model of adolescent decision making posits a dynamic imbalance between appetitive motivational brain systems, including the striatum, and inhibitory control systems in the prefrontal cortex (Galván, 2013, Somerville and Casey, 2010). Our findings extend this framework and show that the striatum may be just one learning system, along with the hippocampus, that has relatively greater influence during adolescence. Specifically, our findings suggest that the functional development of midbrain dopaminergic reward systems and their connectivity with the striatum and the hippocampus in adolescence is positioned to affect both strengthening of reward-guided habits and actions, as well as episodic memory for motivational events. Future studies will need to assess the role of control and flexibility to identify whether prefrontal systems regulate the interactions between the striatum and the hippocampus.
The current study aimed to evaluate the link between reinforcement learning and episodic memory by concurrently presenting incidental trial-unique stimuli with reinforcement. An important direction for future research will be to determine whether these findings extend to goal-directed episodic encoding. In adults, striatal activity has been shown to relate to goal-directed modulation of episodic memory (Han et al., 2010). Prior work in adolescents has shown greater sensitivity to reward-predictive cues in the striatum (Galván et al., 2006). Together with the current findings, this suggests that goal-directed cue processing in adolescents may elicit greater cooperation between the hippocampus and the striatum and better goal-directed encoding. This possibility remains to be tested.
Another important question is how subregions of the striatum contribute to learning and interact with the hippocampus. We found prediction error-related BOLD activity in the ventral striatum, as has been shown repeatedly (e.g., Bartra et al., 2013, Clithero and Rangel, 2014). This region is connected with the hippocampus (e.g., Haber and Knutson, 2010) and interacts with it functionally (e.g., Kahn and Shohamy, 2013). However, our functional connectivity analysis revealed activity in a separate region in the putamen that correlated with the hippocampus. Research in adults identified a similar region displaying connectivity with the hippocampus during cue-value learning (Wimmer et al., 2014). While functional connectivity in BOLD data does not necessarily reflect anatomical connectivity, these findings raise important questions for future work about the interacting circuits supporting reinforcement learning and episodic memory in adolescence.
Our findings are generally consistent with studies of episodic memory in development (Ghetti et al., 2010, Ofen et al., 2012). Previous research has shown that adolescents have adult-like recognition memory (Ghetti et al., 2010), whereas younger children have worse episodic encoding (Ghetti et al., 2010) and retrieval (DeMaster et al., 2014, Lloyd et al., 2009). Findings regarding developmental changes in the hippocampus have been mixed. Studies of item recognition report no differences in the hippocampus during development (Ofen et al., 2007). But other studies indicate that changes in the hippocampus do continue into adolescence (Daugherty et al., 2016, Lee et al., 2014) and are related to differences in associative memory performance in adolescents (Ghetti et al., 2010, DeMaster et al., 2014).
Many new experiences occur during adolescence. Some work suggests that, at least when looking back from adulthood, adolescence is a time in which particularly powerful and positive memories are formed (Haque and Hasking, 2010, Rubin and Berntsen, 2003, Thomsen et al., 2011). Of course, adolescence is also a time when psychopathology may begin to emerge (Casey et al., 2015, Ernst et al., 2009, Padmanabhan and Luna, 2014). Both perspectives emphasize the importance of learning from experiences during this time of development. The heightened sensitivity of striatal learning systems may put reward-seeking actions into overdrive but can also confer a benefit in learning from predictable, but variable, outcomes, as we show here. Our findings demonstrate that this reinforcement sensitivity has implications for what kinds of memories are formed in adolescence and how these memories drive behavior.
Experimental Procedures
All recruitment, screening, consent and assent, and testing procedures were approved by the University of California, Los Angeles, Institutional Review Board (IRB) and Columbia University IRB. For descriptions of experimental materials and procedures, see Supplemental Experimental Procedures.