Psychology and Aging
© 2020 American Psychological Association 2020, Vol. 35, No. 8, 1090 –1104
ISSN: 0882-7974 http://dx.doi.org/10.1037/pag0000567
Age-Dependent Statistical Learning Trajectories Reveal Differences in
Information Weighting
Steffen A. Herff Shanshan Zhen and Rongjun Yu
École Polytechnique Fédérale de Lausanne; Western Sydney National University of Singapore
University; and Institute of High Performance Computing, Agency
for Science, Technology and Research, Singapore
Kat R. Agres
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
National University of Singapore and Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Statistical learning (SL) is the ability to generate predictions based on probabilistic dependencies in the
environment, an ability that is present throughout life. The effect of aging on SL is still unclear. Here,
we explore statistical learning in healthy adults (40 younger and 40 older). The novel paradigm tracks
learning trajectories and shows age-related differences in overall performance, yet similarities in learning
rates. Bayesian models reveal further differences between younger and older adults in dealing with
uncertainty in this probabilistic SL task. We test computational models of 3 different learning strategies:
(a) Win-Stay, Lose-Shift, (b) Delta Rule Learning, (c) Information Weights to explore whether they
capture age-related differences in performance and learning in the present task. A likely candidate
mechanism emerges in the form of age-dependent differences in information weights, in which young
adults more readily change their behavior, but also show disproportionally strong reactions toward
erroneous predictions. With lower but more balanced information weights, older adults show slower
behavioral adaptation but eventually arrive at more stable and accurate representations of the underlying
transitional probability matrix.
Keywords: statistical learning, cognitive assessment, continuous paradigm, age-related differences,
information weights
Supplemental materials: http://dx.doi.org/10.1037/pag0000567.supp
Statistical learning (SL) describes the ability to generate predic- approach makes intuitive sense, as SL is already present in infancy
tions based on probabilistic dependencies in the environment. The (Roseberry, Richie, Hirsh-Pasek, Golinkoff, & Shipley, 2011; Saf-
majority of SL research focuses on early childhood development fran, Aslin, & Newport, 1996). SL in older adults, however, has
or young adults (see Krogh, Vlach, & Johnson, 2013; Daltrozzo & received far less scientific attention. Considering the worldwide
Conway, 2014 and Saffran & Kirkham, 2018 for reviews). This increase in life expectancy and age of retirement (WHO, 2015,
This article was published Online First August 13, 2020. gapore Ministry of Education (MOE2016-T2-1-015) awarded to Rongjun
X Steffen A. Herff, Digital and Cognitive Musicology Lab, École Yu.
Polytechnique Fédérale de Lausanne; Music Cognition and Action Group, Steffen A. Herff developed the paradigm and designed, coded, as
The MARCS Institute for Brain, Behaviour and Development, Western well as prepared the experiment. Kat R. Agres helped develop the
Sydney University; and Department of Social and Cognitive Computing, experimental design and paradigm. Data collection was performed or
Institute of High Performance Computing, Agency for Science, Technol- supervised by Steffen A. Herff and Shanshan Zhen. Data were analyzed
ogy and Research, Singapore. Shanshan Zhen, Department of Psychology, and interpreted by Steffen A. Herff. The manuscript was written by
National University of Singapore. Rongjun Yu, Department of Psychology Steffen A. Herff with Shanshan Zhen, Rongjun Yu, and Kat R. Agres
and NUS Graduate School for Integrative Sciences and Engineering, Na- providing comments. The project idea and collaboration were initiated
tional University of Singapore. X Kat R. Agres, Yong Siew Toh Conser- by Kat R. Agres and Rongjun Yu provided lab space and equipment. All
vatory of Music, National University of Singapore; and Department of authors approved the final version of this article. We have no known
Social and Cognitive Computing, Institute of High Performance Comput- conflict of interest to disclose. We archived a preprint of the present
ing, Agency for Science, Technology and Research, Singapore. work, which can be accessed at https://psyarxiv.com/kuy6p; Herff,
We thank Lauren Fairley, Jon Prince, and Estefanía Cano for construc- Zhen, Yu, and Agres (2019).
tive comments on a draft, and Arihant Singhai, Ren Jie Tay, Bo Yuan, and Correspondence concerning this article should be addressed to Steffen
Jing Wen Chai for their support during data collection. We thank Feng Lei A. Herff, Digital and Cognitive Musicology Lab, École Polytechnique
for advice on the choice of cognitive assessment tests and organizing Fédérale de Lausanne, INN. 115, 1015 Lausanne, Switzerland. E-mail:
training on administering the tests. The study was supported by the Sin-
[email protected]
1090
STATISTICAL LEARNING AND INFORMATION WEIGHTS 1091
2017), it is important to further our understanding of learning in quences. The result is a TP matrix whereby each circle has a given
older adults. SL can be considered the outcome of a mechanism probability to be followed by another circle. Importantly, this
that extracts probabilistic information. Despite the overwhelming probability is not 100%, based on the precise sequences used, as
evidence for SL in humans, the fundamental mechanisms or learn- well as the interspliced random sequences. Reaction time measures
ing strategies that allow humans to extract such probabilistic revealed reduced learning in older adults compared to younger
information are not yet fully understood (Krogh et al., 2013; adults. Studies that provide support for better SL in younger adults
Saffran & Kirkham, 2018). Here, we investigate age-related dif- on probabilistic SL tasks tend to utilize tasks where reaction time
ferences in SL ability and mechanisms in older and younger adults. (RT) is the primary measure of performance (Curran, 1997;
The main contribution of the present study is showing age-related Feeney, Howard, & Howard, 2002; D. V. Howard et al., 2004;
differences in SL and identifying a candidate learning mechanism J. H. Howard & Howard, 1997). This observation is important
of SL that can capture the age-related differences. because differences in RT do not necessarily reflect learning
performance (Aizenstein et al., 2006). Indeed, RT effects may be
the result of a strategic change across age in terms of a speed–
SL in Older Adults
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
accuracy trade-off (Forstmann et al., 2011; Salthouse, 1979). As a
This document is copyrighted by the American Psychological Association or one of its allied publishers.
There is a growing body of evidence suggesting that older adults result, paradigms that require participants to respond both quickly
employ different strategies for learning, response selection, and and accurately have limited applicability for the present goal of
decision-making, compared to younger adults (Hinault, Lemaire, investigating SL ability and its mechanisms, even if they do
& Touron, 2017; Löckenhoff & Carstensen, 2007; Mata, von produce age-related differences (Curran, 1997). The present study
Helversen, & Rieskamp, 2010; Nassar et al., 2016; Schirda, Val- also focusses on a probabilistic task, but it is concerned with
entine, Aldao, & Prakash, 2016). For example, younger partici- age-related differences in predictive decision-making perfor-
pants seem more proficient in combining multiple mnemonic mance, rather than the speed in which learned responses are made.
strategies compared to older participants (Hinault et al., 2017). To testing accuracy rather than speed, Palmer, Hutson, and
Furthermore, compared to younger adults, older participants seem Mattys (2018) presented participants with a continuous auditory
to treat positive information preferentially compared to negative stream of an artificial language with no interruptions. Accordingly,
information such as when recalling information about physicians the only way of extracting individual words was by tracking the
or health plans (Löckenhoff & Carstensen, 2007). Older adults also underlying transitional probabilities between syllables. This is
appear to utilize uncertainty of information to a lesser extent than because transitional probabilities within words are much higher
younger participants, and younger adults show large behavioral than across word boundaries. After exposure to the continuous
adjustments to relatively minor prediction errors (Nassar et al., stream, participants differentiated between words from the artifi-
2016). Taken together, this body of literature suggests that older cial language, nonwords (that did not exist in the artificial lan-
adults utilize different strategies to extract information to make guage), and part-words (foil words generated by combining sylla-
decisions or predictions compared to young adults. SL tasks lend bles across word boundaries). Although no age-related differences
themselves to investigate these differences as they test behavioral occurred in differentiating words from nonwords, younger partic-
outcomes that are the result of extracting statistical regularities ipants outperformed older participants when differentiating words
from the environment. from part-words (Palmer et al., 2018; Palmer & Mattys, 2016).
Prior research on age-related differences in SL has yielded This result suggests that older adults may utilize different strate-
conflicting evidence depending on the precise paradigms (e.g., gies than younger adults to extract statistical information. How-
deterministic vs. probabilistic) and measures (e.g., RTs vs. accu- ever, such differences may also be due to an age-related decline in
racy) deployed. This combined with a general uncertainty about cognitive function (e.g., memory). To account for possible differ-
the mechanisms behind SL leaves questions about age-related ences in cognitive function, we also collect cognitive assessment
shifts in information extraction strategies largely unanswered. SL data from the participants in our experiment.
of deterministic sequences (e.g., “B” always follows “A”) is re- To capture potential differences in learning strategies, we ana-
markably similar across age. Cherry and Stadler (1995) presented lyze learning trajectories between groups of younger and older
participants with four circles on a screen that flashed in a deter- adults. Rather than analyzing overall SL performance alone, we
ministic sequence, and participants predicted the next lit circle via focus on individuals’ learning trajectories (slopes), as previous
speeded button-press. Both younger and older adults performed at work suggests this measure provides valuable insight into individ-
ceiling in terms of accuracy, and both age groups improved RTs uals’ cognitive capacities and the time course of learning novel
with each sequence repetition. Although older adults showed over- information (Kaufman et al., 2010; Misyak, Christiansen, & Tom-
all slower RTs, learning rates were comparable across the two blin, 2010; Siegelman, Bogaerts, Christiansen, & Frost, 2017).
groups. Other studies also reported little evidence of age-related Learning trajectories are of particular interest for the present study
differences in learning of deterministic sequences (Daltrozzo & because the time course of information integration may more
Conway, 2014; Frensch & Miner, 1994; D. V. Howard & Howard, accurately characterize age-related differences than the absolute
1989, 1992; Salthouse, McGuthry, & Hambrick, 1999). However, performance. Analyzing learning trajectories can also be especially
age-related differences emerge when sequences are probabilistic— informative when the underlying TP matrix contains transitions
that is, governed by an underlying transitional probability (TP) where the most likely next event is by far the most probable one
matrix (e.g., “B” is most likely to follow “A,” and “C” is less likely (high-certainty state), as well as transitions where the most likely
to follow “A”). Curran (1997) also presented four circles to par- next event is less obvious (low-certainty state; Shafir, Reich, Tsur,
ticipants; however, the sequences of flashing circles switched back Erev, & Lotem, 2008). This is because a TP matrix with various
and forth between predetermined sequences and random se- different states of uncertainty allows for more precise observation
1092 HERFF, ZHEN, YU, AND AGRES
of participants’ information integration, which in turn allows com- or high end of the probability spectrum (e.g., ⌬ ⫽ .2 with
putational models to provide a far more detailed investigation of PPerceived ⫽ .3, PReal ⫽ .5 vs. ⌬ ⫽ .2 with PPerceived ⫽ .55, PReal ⫽
differences in participants’ learning strategies. .75)? Adjusting behavior based on the observed discrepancy be-
tween a prediction and the observed reality is an intuitive and
well-established learning mechanism (Rescorla & Wagner, 1972).
Statistical Learning Mechanisms
This learning mechanism can be described as ‘delta-rule’ learning,
In the present work, we consider three learning mechanisms: (a) because response probabilities change by a proportion of the
Win-Stay, Lose-Shift, (b) Delta Rule Learning based on probability prediction error (here, the difference between the predicted prob-
spectrum, and (c) Information Weights. Investigating these three ability of a transition, and the real probability of a transition;
learning mechanisms and exploring whether they can explain Greve, Cooper, Kaula, Anderson, & Henson, 2017). In a probabi-
age-related differences in SL constitutes the main focus of the listic SL task, effective use of delta-rule learning relies on an
present work. The choice of these three mechanisms was predom- estimate of one’s own perception of the transitional probabilities,
inantly guided by their conceptual simplicity and presence in the as well as the true underlying probabilities. Prior literature sug-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
literature, as well as ease of implementation. gests that younger adults, compared to older adults, show rela-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
tively large adjustments even to small prediction errors (Nassar et
al., 2016). Furthermore, the present implementation of delta-rule
Win-Stay, Lose-Shift
learning also allows the mechanism to vary depending on whether
The first learning mechanism we consider captures whether the true probability is likely or unlikely. This implementation
participants—when forming a prediction—predominantly rely on decision is motivated by findings in the literature suggesting that
the outcome from their last response when faced with the same SL may differ between age groups and as a function of task
decision. Such Win-Stay, Lose-Shift strategies are commonly ob- complexity (Curran, 1997; Feeney et al., 2002; D. V. Howard et
served in decision-making tasks (Nowak & Sigmund, 1993; Wor- al., 2004; J. H. Howard & Howard, 1997; Palmer & Mattys, 2016).
thy, Hawthorne, & Otto, 2013) and previous research indicates Given that we cannot assume that learning a less likely transition
age-related differences in decision-making in terms of Win-Stay, (e.g., PReal ⫽ .5) is equally difficult as learning a more likely
Lose-Shift usage. For example, older participants tended to rely transition (PReal ⫽ .75), we need this additional mechanism to
more strongly on a Win-Stay, Lose-Shift mechanism in a proba- capture potential age-related differences.
bilistic inference task (Mata et al., 2010). Computational modeling Observing delta-rule learning in the present probabilistic SL
of Win-Stay, Lose-Shift shows that, in theory, it is an effective task would provide evidence that participants are extracting the
strategy for language learning (Matsen & Nowak, 2004). Partici- underlying statistical regularities. If delta-rule learning describes
pants could theoretically utilize a Win-Stay, Lose-Shift strategy to participants’ behavior in the present task, we predict younger
solve a SL task. However, for any SL paradigm, predominantly adults to adjust their behavior more rapidly than older adults, based
relying on such a strategy would be potentially problematic. This on previous findings (Nassar et al., 2016). Furthermore, if task
is because SL tasks are often designed with the assumption that difficulty differs as a function of transitional probability, we expect
participants continuously sample information from the environ- to see stronger delta-rule learning in more probable transitions. If
ment to extract statistical regularities, rather than only when they this is the case, we also hypothesize an interaction with age,
are prompted to respond and only from the last time they provided whereby older participants’ delta rule learning decreases to a
a response. Relying on a Win-Stay, Lose-Shift mechanism would greater extent compared to younger adults as transitional certainty
suggest that participants deploy a simple response heuristic to increases. This is because the increased task complexity may
achieve statistical learning without extracting the full underlying function as a greater obstacle to the older adults compared to the
set of transitional probabilities. This is because Win-stay, Lose- younger adults. It is worth noting that estimating the delta required
Shift does not rely on extracting statistical properties—instead it for delta-rule learning can be difficult, particularly in a probabi-
relies exclusively on memory of the last relevant response. In the listic task.
case that we observe the Win-Stay, Lose-Shift strategy in the
present task, a stronger reliance on this strategy is hypothesized in
Information Weights
older participants (Mata et al., 2010).
In the present work, we propose a learning mechanism that is
more parsimonious than the Delta-Rule model, as it does not rely
Delta-Rule Learning Based on Probability Spectrum
on estimating delta, and yet would be effective in extracting
The second mechanism we consider aims to capture whether statistical regularities and reveal age-related differences—Infor-
participants’ responses are predominantly driven by two factors. mation Weights. The model simply assesses the weights (change in
First, the distance between the currently perceived transitional response probabilities) that younger and older adults attach to
probabilities to the new estimated probabilities after receiving new positive (e.g., “B” follows “A”) and negative (e.g., “B” does not
information (e.g., feedback). Second, the absolute values of the follow “A”) observations. Effectively, this cognitive model sim-
estimated real probability. In other words, it measures whether plifies the mechanism behind statistical learning to a continuous
participants more strongly adjust their predictions when the per- sampling of information with a “positive” weight that reflects
ceived probabilities are further away from the true transitional increasing the likelihood of making a particular choice when the
probabilities (e.g., ⌬ ⫽ .5 with PPerceived ⫽ .25 and PReal ⫽ .75 vs. specific transition is observed in the sequence, and a “negative”
⌬ ⫽ .25 with PPerceived ⫽ .5, PReal ⫽ .75), and whether this weight that reflects decreasing the likelihood of making the par-
adjustment differs depending on whether it occurs toward the low ticular choice when the specific transition is not observed. As a
STATISTICAL LEARNING AND INFORMATION WEIGHTS 1093
result, with only two parameters (“positive” and “negative” The Present Paradigm
weight) that differ between individuals or groups, the model could
Based on Siegelman, Bogaerts, and Frost’s (2017) criticism of
be able to explain differences in response behavior. This mecha-
existing SL paradigms, a new auditory SL paradigm that focuses
nism can be understood as a generalization of Thorndike’s law of
on learning trajectories was developed (Herff, Nur, Lee, Lee, &
effect (Thorndike, 1898). A model of this learning mechanism
Agres, 2019; Herff & Prince, 2020). In this task, participants
allows the comparison of participants in regard to how willing they
listened to a continuous stream of four different sounds and were
are to update their response probabilities. It also allows an assess-
occasionally prompted to indicate the most likely next sound. The
ment of whether participants rely more on positive or negative
paradigm showed high test–retest reliability in older adults (r ⫽
information. This is an interesting perspective that the previous .84), and correlated well with measures of cognitive function (r ⫽
delta-rule model cannot capture, but may be an important consid- .56). Furthermore, the task satisfies the needs outlined in the
eration in the context of age-related differences in SL. Prior previous section: It is probabilistic, measures accuracy, tracks
research suggests general age-related differences in the processing learning trajectories, and the TP matrix can be adjusted to contain
of positive (prediction was fulfilled) and negative (prediction was low- and high-certainty transitions. The auditory domain is also a
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
not fulfilled) feedback, with older participants showing a tendency promising target to measure SL ability and link it to cognitive
This document is copyrighted by the American Psychological Association or one of its allied publishers.
to rely more on positive feedback compared to younger adults ability. This is because the auditory domain specializes in process-
(Eppinger & Kray, 2011; Ferdinand & Kray, 2013). These studies ing stimuli that unfold in time (Pérez-González & Malmierca,
provide explicit feedback, however, and do not take place within 2014) and relies heavily on extracting statistical information from
the framework of SL. If participants’ response patterns can be the environment (Agres, Abdallah, & Pearce, 2018; Barascud,
modeled through Information Weights and not through Win-Stay, Pearce, Griffiths, Friston, & Chait, 2016; Sohoglu & Chait, 2016).
Lose-Shift, then this would suggest that participants are extracting However, similar to previous SL paradigms, many participants
the underlying statistical regularities. We predict younger adults to performed at chance level, and a relatively small sample size was
show higher information weights than older adults. Furthermore, used (n ⫽ 27; Herff, Nur, et al., 2019). The authors suggested
based on the literature reviewed above, we predict older adults to deploying more trials and modifying the task to be multimodal.
have a larger positive-to-negative weight ratio when compared to Consequently, we use Herff, Nur, et al. (2019)’s SL paradigm to
younger adults. capture learning trajectories, incorporating more trials (150 instead
of 50), a multimodal implementation (auditory-visual), and a new
TP matrix that accommodates low- and high-certainty states. Fur-
SL, Cognitive Function, and the Present Paradigm ther details of the paradigm are described in the method section.
An additional consideration when looking at SL differences in
older adults is the possibility that poorer performance may reflect Aim and Motivation
an age-related decline in cognitive function (e.g., memory). A
decline in cognitive function could lead to lower performance due In summary, the present study investigates age-dependent dif-
to task-specific requirements (e.g., auditory memory) or because of ferences in SL. We utilize a continuous, multimodal, probabilistic
a direct influence of cognitive function on SL. Much effort has paradigm to reveal SL trajectories in younger and older adults. The
been made to investigate the relationship between SL and cogni- probabilistic TP matrix governing the task contains low- and
high-certainty transitions to help us identify potential learning
tive function (Feldman, Kerr, & Streissguth, 1995; Kaufman et al.,
strategies that capture SL and potential age-related differences.
2010; Siegelman, Bogaerts, & Frost, 2017; Siegelman, & Frost,
Predominantly, we explore whether three mechanisms of learning
2015). However, despite SL’s crucial involvement across sensory
can describe SL and potential age-related differences, specifically,
modalities (Creel, Newport, & Aslin, 2004; Kirkham, Slemmer, &
(a) Win-Stay, Lose-Shift, (b) Delta Rule Learning based on prob-
Johnson, 2002; Moldwin, Schwartz, & Sussman, 2017) research
ability spectrum, and (c) Information Weights. To account for the
attempting to link SL to traditional cognitive assessments has
potentially moderating effect of age-related differences in cogni-
yielded limited evidence for a direct link between SL and cognitive tive function, we also explore whether traditional cognitive assess-
function (e.g., r from ⫺.06 to.19 in Feldman et al., 1995; Kaufman ments correlate with SL performance in this task and can explain
et al., 2010; Siegelman et al., 2017). In addition, previous attempts response differences between the age groups.
to use SL as a measure of individual aptitude or to link it to various
established measures of cognitive function have been plagued by a
plethora of difficulties (Siegelman et al., 2017). These include low Method
test–retest reliability (r ⫽ .44 in Kaufman et al., 2010), and low
performance in the participants (21– 47% of participants at chance
level, see Siegelman et al., 2017 for a review). Consequently, the
General Procedure
low correlations with measures of cognitive function could either After providing informed consent, participants took part in a
be a product of the aforementioned methodological issues or cognitive assessment (⬃30min), followed by the SL paradigm
indeed indicative that SL is mostly independent of other cognitive (⬃45min). The present data collection was part of a large EEG
skills. To capture cognitive function as a possible covariate and project collaboration between the Agency for Science, Technology
test its contribution to the question of whether SL is directly and Research (AⴱSTAR) and the National University of Singapore
influenced by cognitive function, we administer a battery of cog- (NUS). Analysis of the collected EEG data will be reported else-
nitive assessments, further described in the method section. where.
1094 HERFF, ZHEN, YU, AND AGRES
Participants Figure 1) are considered high-certainty states, as the most likely
next state is evident with a 75% transitional probability. The other
Data from 40 younger adults were recorded from the student two states (“B,” “C,” blue in Figure 1) are low-certainty states, as
population at the National University of Singapore (Mage ⫽ 21.4 the most likely next state is less evident with only a 50% transi-
SDage ⫽ 2.7); 40 older adults (defined as 60 ⫹ years old) were tional probability. For example, the most likely state after A is B,
recruited from the community (Mage ⫽ 66.7, SDage ⫽ 4.2). Par- with a probability of 75%. The most likely state after “B” is “D,”
ticipants were required to have normal or corrected-to-normal with a probability of only 50%. The probability of repeating a state
hearing, be literate in English, be able to provide informed consent, is zero, thus a response indicating repetition is considered a rule
and be able to travel to the study site independently. Participation violation. Cumulative Rule Violations (CRV) as well as Cumula-
was reimbursed with SGD 40. The study was IRB approved tive High Probability Pathway Choices (CHPC) are the two mea-
(S-17–372). surements of SL performance used here. CRV refers to the number
of rule violations (responses indicating a repetition; red arrows in
Stimuli and Equipment Figure 1) accumulated up to a given trial. CHPC refers to the
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Statistical learning paradigm. The present study deployed a number of high-probability responses (responses correctly identi-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
continuous SL paradigm designed to capture learning trajectories fying the most likely next state; green arrows in Figure 1) accu-
(Herff, Nur, et al., 2019). A long series of states was generated mulated up to a given trial. Thus, good performance is indicated by
whereby each state could be one of four options. The four states high CHPC and low CRV (Herff, Nur, et al., 2019). Note that
were differentiated through sound (sine waves at 165 Hz (E3), 220 while likely correlated, the two measures assess different aspects
Hz (A3), 294 Hz (D4), 392 Hz (G4), each 500ms in duration). of SL, as CHPC captures whether participants learn the most likely
Participants heard this long series of four possible states, and the next event, and CRV assesses whether participants learn to exclude
series paused every 7.5 to 11.5 s (15–23 tones), at which point the impossible outcomes. In other words, on a given trial, participants
participants were prompted to indicate which tone they thought may avoid a rule violation but still not pick the high-probability
would occur next. The number of tones between stopping points choice—there are two other low-probability choices (black arrows
was variable to avoid potential expectancy effects of when the next in Figure 1). Similarly, participants may not make the correct high
interruption would occur. After a response, the sequence would probability choice, but that does not necessarily mean that they
continue. The sequence was instantiated in both the auditory and chose a rule violation (repetition).
visual modality. Four horizontally aligned circles on the screen
were associated with the four sounds (in order of lowest to highest Cognitive Assessment
pitch, left to right). For each tone, a circle flashed as the respective
sound was played. After each stop in the sequence, participants A battery of cognitive tests was administered to prevent any
indicated their response by clicking on the circle that they thought confounding of group differences in general cognitive ability with
would occur next (four alternative forced-choice). The response SL learning trajectories. Furthermore, prior research has shown
window was not timed. Participants did not receive explicit feed- conflicting evidence as to whether SL is predicted by cognitive
back, however, since the sequence continued after each response, ability (Feldman et al., 1995; Herff, Nur, et al., 2019; Kaufman et
feedback was implicitly provided by the following state and al., 2010; Siegelman et al., 2017). Though not the main focus of
whether or not it matched the participants’ prediction. In total, 150 this study, we hope that collecting cognitive ability data in addition
responses (trials) per participant were collected. to the present SL paradigm may also contribute to the debate. The
Transitional probability matrix. The TP matrix governing selection of cognitive tests was informed by consulting a clinician
the four states can be seen in Figure 1. The overall probability of specialized in working with auditory learning tasks in an older
each state is identical (25%). Two states (“A,” “D,” purple in adult population (see Feng et al., 2017; Tan et al., 2018 for studies
utilizing the same battery of tests). The tests aim to provide an
overview of cognitive ability that may be relevant to auditory
learning tasks in general.
Specifically, the battery of cognitive tests deployed here com-
prises the Rey Auditory Verbal Learning Test (RAVLT; Rey,
1958), Digit Span task (backward and forward), Verbal Fluency
task (see Randolph, Braun, Goldberg, & Chase, 1993), Symbol
Digit Modality Test (Smith, 1982) in written (DSW) and verbal
(DSV) form, and Color Trails Test (D’Elia, Satz, Uchiyama, &
White, 1996). All assessors were formally trained and the tests
were administered as described in the Neuropsychological Assess-
Figure 1. Schematic representation of the TP matrix. The two main ments Training Manual for Assessors (Yu, 2018). A short sum-
measures of SL performed used here are Cumulative Rule Violations mary of each test follows below.
(CRV, accumulation of response associated with a red arrow) and Cumu-
RAVLT. The test comprises multiple parts. In part one, par-
lative High Probability Choices (CHPC, accumulation of responses asso-
ciated with a green arrow). Because the most likely next state is clearer
ticipants listen to a list of 15 words (List-A) and then attempts to
(75%, purple) in state “A” and “D” compared to states “B” and “C” (50%, recall them. This procedure is repeated five times, and the number
blue), states A and D are considered high-certainty states, and states B and of correct recalls is counted after each iteration. In the models, this
C are considered low-certainty states. See the online article for the color is coded as RAVLT1 to RAVLT5. In the second part, the participant
version of this figure. listens to a different 15-item word list (List-B), and the number of
STATISTICAL LEARNING AND INFORMATION WEIGHTS 1095
correctly recalled items is coded as RAVLTB. Afterward, partici- Statistical Learning, Age, and Certainty
pants are asked to recall the items from List-A again, and the
A total of 12,000 responses were collected, evenly distributed
number of correctly recalled items is coded as RAVLTRECA. After
across the four states (A ⫽ 25.57%, B ⫽ 25.92%, C ⫽ 24.02%,
a delay, filled with the Digit Span Test and Color Trail test (see
D ⫽ 24.48%). We used a simulation-based approach to assess
below), the RAVLT assesses delayed recall by requiring partici-
chance and ideal performance (see Supplement S0). 95% CIs were
pants to recall the items of List-A once more. The number of calculated around simulated guessing participants and simulated
correctly recalled items is coded as RAVLTDelayedRacall. In the ideal Bayesian learners. The results are summarized in Table 1,
third part of the RAVLT, participants listen to a list of 50 items, 15 and Figure 2 depicts overall learning trajectories.
of which were in List-B, and the participants aim to identify words A generalized Bayesian mixed-effects model predicted the re-
that have been presented before. The number of correctly recog- sponses that lie on the high-probability pathway. The model was
nized words is coded in the models as RAVLTRecognition. This provided with a fixed effect for Trial (1–150, representing the
RAVLT assesses verbal memory in terms of recognition as well as learning trajectory over the course of the experiment), Age
recall. (younger adults vs. older adults), Certainty (low-certainty state vs.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Digit span task. This task consists of two parts. In the first high-certainty state), as well as all interactions. The model was
This document is copyrighted by the American Psychological Association or one of its allied publishers.
part, participants are asked to listen to short sequences of numbers also provided with random effects for Participant and the precise
and repeat them verbally. The task consists of two items for each Sequence presented. Further information about the models can be
sequence length. If both sequences are not correctly repeated, the found in Supplement S1. We report coefficient estimates (),
task stops, and the total number of correctly recalled strings is estimated error (EE) in the coefficients, as well as evidence (Odds)
coded as DigitSpanFWD. Afterward, the same task is repeated ratios for the individual hypotheses (a given coefficient being
with different numbers. This time, however, participants are re- larger or smaller than zero). For convenience, we denote effects
quired to repeat the numbers backward. The number of correctly with “ⴱ” as those which can be considered ‘significant’ at an ␣ ⫽
recalled sequences is coded as DigitSpanBWD. The Digit Span .05 level. This corresponds to odds ratios ⬎ ⫽ 19 (odds 95/5 ⫽ 19;
tasks assess working memory capacity. Milne & Herff, 2020).
Trial ( ⫽ .14, EE ⫽ .05, Odds( ⬎ 0) ⫽ 579.65ⴱ) predicted
Color trails test. The test consists of two parts. In part one,
the probability of high-probability pathway responses, indicating
participants connect numbered circles in ascending order on a sheet of
that learning took place. Age ( ⫽ ⫺.31, EE ⫽ .09, Odds( ⬍
paper. In the second part, participants connect numbers and letters, by
0) ⬎ 9999ⴱ) also carried predictive value, with younger adults
alternating between numbers (in ascending order) and letters (in
overall being more likely to produce high-probability pathway
alphabetic order). The test assesses visual attention and task switch- responses. The low-certainty states led to overall fewer high-
ing capability. Time to completion is measured separately for the probability pathway responses ( ⫽ ⫺.61, EE  ⫽ .07, Odds( ⬍
two parts, and both are included in the models, referred to as 0) ⬎ 9999ⴱ), indicating that participants were able to discern the
ColorTrail1 and ColorTrail2. differences between states in the TPs. The LowCertainty ⫻ Trial
Verbal fluency task. This task requires participants to name interaction ( ⫽ ⫺.31, EE ⫽ .07, Odds( ⬍ 0) ⬎ 9999ⴱ)
as many animals as possible in 60 s. The number of different predicted reduced high-probability pathway responses in low-
animal names is coded as SemanticFluencyAnimals in the models. certainty states as the experiment progresses. This can be seen in
The test assesses linguistic storage and retrieval. Figure 3 in the positive slope for the high-certainty states, and the
Symbol digit modality test. In the first part, participants are negative slope for the low-certainty states. The Trial ⫻ Cer-
provided with a visual key that links the numbers 1 to 9 to nine tainty ⫻ Age interaction ( ⫽ .15, EE ⫽ .05, Odds( ⬎ 0) ⫽
different visual symbols. Participants then have 90 s to transcribe 733.69ⴱ) showed that (as the experiment progresses) younger
a list of symbols as their matching number. The number of cor- adults’ likelihood to produce high-probability pathway responses
rectly linked symbols is coded as DigitSymbolWritten. In the decreases more strongly in the low-certainty states compared to
second part, participants are provided with a new response sheet older adults. Figure 3 depicts this finding—the blue line (low-
and repeat the task; however, this time they speak the number certainty state) has a steeper slope for younger adults (left panel)
aloud, rather than writing it on the sheet. The number of correctly than older adults. Importantly, the Trial ⫻ Age interaction did not
linked symbols in the second part is coded as DigitSymbolVerbal. carry predictive value ( ⫽ ⫺.03, EE ⫽ .03, Odds( ⬍ 0) ⫽
The tests assess association memory, divided attention, and visual
scanning.
Table 1
SL Performance Summary
Results
More than Less than Ideal
chance chance performance
The results are structured in three parts. First, we report overall Age group N in CHPC in CRV range
SL performance in both age groups and how they differ between
Younger Adults 40 36 38 15
low- and high-certainty states. Then, we attempt to model the Older Adults 40 32 32 7
results through three learning mechanisms: (a) Win-Stay, Lose-
Shift, (b) Delta Rule Learning based on probability spectrum, and Note. SL ⫽ statistical learning. More than chance in Cumulative High
Probability Pathway Choices (CHPC), and less than chance in Cumulative
(c) Information Weights. Finally, we explore the relationship be- Rule Violations (CRV) indicate successful learning of the transitional
tween the battery of cognitive assessments and SL performance. probability (TP) matrix.
1096 HERFF, ZHEN, YU, AND AGRES
Younger Adults − CHPC Older Adults − CHPC
90 90
Chance
CHPC
CHPC
60 60
Chance−95%CI
Ideal−95%CI
30 30
0 0
0 50 100 150 0 50 100 150
Trial Trial
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Younger Adults − CRV Older Adults − CRV
This document is copyrighted by the American Psychological Association or one of its allied publishers.
50 50
40 40
30 30
CRV
CRV
20 20
10 10
0 0
0 50 100 150 0 50 100 150
Trial Trial
Figure 2. Overall performance in the SL task. The top row shows Cumulative High-Probability Choices
(CHPC). The bottom row shows Cumulative Rule Violations (CRV). The left column shows data from younger
adults, and the right column shows data from older adults. Each thin line represents one participant. The bold
solid lines represent chance performance. Above chance in CHPC, and below chance in CRV, indicates good
performance. The dotted line shows a theoretical ideal performer. The gray bands represent 95% CIs around
chance and ideal performance.
3.88). This means that learning trajectories in high-certainty states prediction. We did not find evidence for the LastPredHPP ⫻
were comparable between the two age groups, as shown in Figure LastPredCorrect interaction, suggesting participants did not pre-
3 (the red lines, depicting high-certainty states, have similar slopes dominantly rely on the information of their last prediction
across age groups). ( ⫽ ⫺.05, EE ⫽ .15, Odds( ⬍ 0) ⫽ 1.80). Low evidence for
For CRVs, we combined the data from low- and high-certainty the LastPredHPP ⫻ LastPredCorrect ⫻ OlderAdult interaction
states, as both have 0% TPs of repeating states. Age ( ⫽ .34, shows that this behavior did also not differ between age groups
EE ⫽ .14, Odds( ⬎ 0) ⫽ 136.40ⴱ) predicted the probability of ( ⫽ - .25, EE ⫽ .20, Odds( ⬍ 0) ⫽ 8.88), and therefore does
rule violations, with older adults (M ⫽ 0.0972, SD ⫽ 0.2963) on not explain the age-dependent behavior toward low-certainty states
average showing more rule violations than younger adults (M ⫽ (see Supplement S3.1 for the full model).
0.0463, SD ⫽ 0.2102). Both Trial ( ⫽ ⫺.03, EE ⫽ .03 Model 2: Delta Rule Learning Based on Probability
Odds( ⬎ 0) ⫽ 5.61) as well as the Trial ⫻ Age interaction ( ⫽ Spectrum. Both age groups deployed a learning mechanism
.1, EE ⫽ .04, Odds( ⬍ 0) ⫽ 1.60) did not show an effect. This whereby they adjusted their behavior more strongly for larger
is most likely because of the small number of rule violations (see errors as captured by strong evidence for the ActualMinusRespon-
Supplement S1 for a summary and the risk ratios of the SL, age, seProbs coefficient ( ⫽ 2.20, EE ⫽ .18, Odds( ⬎ 0) ⬎ 9999ⴱ).
and certainty models). Due to the overall smaller degree of vari-
The ActualMinusResponseProbs ⫻ OlderAdult interaction term
ability in the CRV data, the following models focus on CHPC.
reveals that this adjustment was larger in the young adults than the
older adults ( ⫽ ⫺.71, EE ⫽ .24, Odds( ⬎ 0) ⫽ 733.69ⴱ).
Statistical Learning Mechanisms Evidence for the ActualMinusResponseProbs ⫻ StateSpecificRe-
To further explore the cognitive basis of age-related differences ponseProbs interaction shows that both groups also adjusted their
in SL, we tested three cognitive models. In particular, we hoped to behavior depending on where in the probability spectrum the
reveal a mechanism that captures the age-related differences in SL incongruence between believed and real probability occurs, with
of low- and high-certainty states (see Figure 3). stronger behavioral changes toward the higher end ( ⫽ .43,
Model 1: Win-Stay, Lose-Shift Strategy. The first model EE ⫽ .24, Odds( ⬎ 0) ⫽ 25.47ⴱ). However, we found no
assessed whether participants predominantly used the outcome evidence that this incongruency mechanism differs between the
from their previous response to the same state when forming a age groups in the ActualMinusResponseProbs ⫻ StateSpecificRe-
STATISTICAL LEARNING AND INFORMATION WEIGHTS 1097
Younger Adults Older Adults
0.8
Probability of High Probability Pathway Response
0.6
Certainty
High
Low
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
0.4
This document is copyrighted by the American Psychological Association or one of its allied publishers.
0.2
1 50 100 150 1 50 100 150
Trial
Figure 3. Effects of age and certainty state on SL as measured by the probability of producing a response
compatible with the high-probability pathway. Both age groups show clear learning trajectories. Younger adults
show a higher intercept at the beginning of the experiment compared to older participants. Learning trajectories
(slopes) are comparable between the two age groups on high-certainty states (red lines). Interestingly, both
groups appear to underestimate the probability of the most likely response in the low-certainty states (blue lines).
This is particularly pronounced in the younger adults, who, for low-certainty states, produced increasingly fewer
responses over the course of the experiment that lie on the high-probability pathway. The bands indicate 95%CIs.
See the online article for the color version of this figure.
ponseProbs ⫻ OlderAdult interaction term ( ⫽ ⫺.34, EE ⫽ .32, mation weights. The divergence across age group (DKL(PD-
Odds( ⬍ 0) ⫽ 5.84). As a result, this model does not explain the FOlderAdults || PDFYoungerAdults) ⫽ 3.1054) is substantially larger
age-dependent differences in low-certainty responses shown in compared to the Kullback-Leibler divergence distribution
Figure 3 either (see Supplement S3.2 for the full model). obtained from 10,000 random permutations of the Age group
Model 3: Information weights. The third model is a parsi- vector (DKL-Mean(PDFGroupA || PDFGroupB) ⫽ .00012,
monious explanation and simply assesses the weights that younger DKL-SD(PDFGroupA || PDFGroupB) ⫽ .00008). In summary, we
and older adults attach to positive (e.g., “B” follows “A”) and found strong support that the younger and older adult cohorts
negative (e.g., “B” does not follow “A”) observations. Because the operate on different information weights. This can also be seen in
Bayesian models provide slope coefficients of behavioral change Figure 5.
in both age groups at two different transitional probabilities for the
high-probability pathway, we have two equations for each age group,
each with two unknowns. As a result, we can use Gaussian elimination SL and Cognitive Ability
(see Supplement S3.3) to obtain the weights of older adults (Positive-
Figure 6 provides an overview of the magnitudes of the corre-
WeightOlderAdult ⫽ .27, NegativeWeightOlderAdult ⫽ ⫺.37) and
younger adults (PositiveWeightYoungerAdult ⫽ 45, Negative- lation values between SL as measured by CHPC and CRV by the
WeightYoungerAdult ⫽ ⫺.79) attached to the continued sampling of pos- end of the experiment, and all cognitive assessments conducted.
itive and negative observations in a simplified decision-making model. The dendrogram is the result of hierarchical clustering of these
The resulting weights are seen in Figure 4. magnitudes. Supplement S2 contains the full correlation matrix.
To obtain the distribution of weights in Figure 4, the information Figure 6 shows that SL and most cognitive assessments tend to
weights for both groups were calculated after each iteration of the be clustered in two distinct groups of measurement. This, com-
Bayesian Model. Since the model ran on 10,000 iterations, with bined with the overall low correlations (all r ⬍ .33, see Supple-
1000 warmups on four cores, Figure 4 uses the data of a total of ment S2), points toward SL being distinct to the construct targeted
36,000 posterior distributions. A Hotelling T2 test using 10,000 by most cognitive assessment tests. However, this does not ex-
permutations shows a significant difference between the distribu- clude the possibility that there are individual cognitive assessments
tion of information weights in older adults from that of younger that relate to SL. To address this as well as the small participants-
adults (t2(2,71997) ⫽ 112447.7, p ⫽ ⬍ .0001). Further support to-predictors ratio, a stepwise regression (both-ways, ⌬BIC pen-
was found by calculating Kullback-Leibler divergence on the alty term) was performed to reveal the best predictors for CHPC
probability density functions of younger and older adults’ infor- and CRV. For CHPC, RAVLT1 was the only remaining predictor,
1098 HERFF, ZHEN, YU, AND AGRES
−1.5
−1.0
Negative Weight
Age
Older Adults
Younger Adults
−0.5
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
0.0
0.00 0.25 0.50 0.75 1.00
Positive Weight
Figure 4. Estimated weights distribution to positive and negative observations in both age groups. Positive
weights indicate predicted change toward providing a given answer, after observing a transition suggesting this
answer (the positive number indicates that the probability increases). Negative weights indicate predicted change
away from providing a given answer, after observing a transition which suggests that this is not the answer (the
negative number indicates that the probability decreases). Both groups show clear signs of learning by using both
positive and negative observations. This is indicated by the nonzero weights on both axes for both groups, and
by the fact that in both groups, positive weights all fall within the range of positive numbers (increase in
probability to provide the response), and negative weights all fall within the range of negative numbers (decrease
in probability to provide the response). Younger adults show larger sways in their predictions as shown by the
larger weights on both axes compared to older adults. Although both younger and older adults weight negative
observations more strongly than positive, this is substantially more pronounced in the younger adults group.
See the online article for the color version of this figure.
and for CRV, the DigitSymbolWritten test was the only surviving underlying statistical structure in the task. Both age groups showed
predictor. similar learning trajectories of the most likely next event when the
Consequently, we deployed linear Bayesian mixed effects mod- transition was likely (high certainty). When it came to dealing with
els predicting CRV and CHPC scores. The models were provided less certain transitional probabilities, learning trajectories diverged
with a fixed factor for Age, Trial, as well as the RAVLT1 and between age groups. To explain these findings, we tested three
DigitSymbolWritten scores. All interaction terms were fully pa- cognitive models. We found that younger and older adults utilize
rameterized, with the exception of RAVLT1 and DigitSymbolWrit- similar strategies, but younger adults are more willing to change
ten interaction terms, as they are of no interest to the present their behavior by placing strong weight on negative observations
design. We found that for both cognitive assessments, Trial ⫻ during their decision-making process. In addition, scores on some
RAVLT1 ( ⫽ 1.01, EE ⫽ .09, Odds( ⬎ 0) ⫽ ⬎ 9999ⴱ) and traditional cognitive assessments were found to mediate perfor-
Trial ⫻ DigitSymbolWritten ( ⫽ .74, EE ⫽ .10, Odds( ⬎ mance. This effect was stronger in older adults but did not explain
0) ⫽ ⬎ 9999ⴱ), larger scores predicted steeper statistical learning the age-related response pattern in the present task and may only
trajectories. Furthermore, the Trial ⫻ RAVLT1 ⫻ OlderAdult ( ⫽ be indicative of task-specific demands. The main contribution of
.78, EE ⫽ .12, Odds( ⬎ 0) ⫽ ⬎ 9999ⴱ) as well as Trial ⫻ this study is demonstrating age-related differences in statistical
DSW ⫻ OlderAdult ( ⫽ .40, EE ⫽ .15, Odds( ⬎ 0) ⫽ 231.26ⴱ) learning which can be modeled through systematic shifts in infor-
interaction terms showed that these effects are stronger in older mation weights when sampling information.
adults compared to younger. This can also be seen in Figure 7 in
the larger difference between the two colored lines in older adults SL Performance and Age
compared to younger adults (see Supplement S2 for the full
models). Many SL paradigms suffer from overall low performance (Sieg-
elman et al., 2017). Following previous suggestions (Herff, Nur, et
al., 2019), we deployed a large number of trials and multimodal
Discussion
stimuli and found clear signs of learning in the majority of partic-
We investigated differences in SL trajectories between younger ipants in both age groups. Overall, more young adults learned the
and older adults, and the extent to which both groups learned the most likely next event (CHPC) and approximated ideal perfor-
STATISTICAL LEARNING AND INFORMATION WEIGHTS 1099
1.0
density
0.5
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
0.0
5 4 3 2 1
10 10 10 10 10 100
Figure 5. Kullback-Leibler divergence between the probability density functions of the information weights of
younger and older adults. The dotted red line indicates the Kullback-Leibler divergence observed between the
information weights of the younger and older adults in the present study. The distribution in black can be used
to assess divergence values that could occur by chance. The large distance on a log-scale between the dotted red
line and the chance distributions supports that younger and older adults deploy different information weights.
The distribution was obtained by 10,000 iterations of calculating the divergence after shuffling the Age group
vector. The x-axis is log scaled. See the online article for the color version of this figure.
mance compared to older adults. This is in line with previous unlikely transitions. However, if transitional probabilities only
studies that also showed an age-related decline in SL of probabi- impacted task difficulty, then the prior literature would suggest
listic stimuli (Curran, 1997; Feeney et al., 2002; J. H. Howard & that younger adults should outperform older adults on low-
Howard, 1997). Furthermore, more older adults failed to learn that certainty states (Curran, 1997; Feeney et al., 2002; D. V. Howard
immediate state repetitions (CRV) were impossible. This is an et al., 2004; J. H. Howard & Howard, 1997; Palmer & Mattys,
interesting observation that requires further exploration in the 2016). We did not observe this pattern in the present data. As a
future, as presently CRV did not provide enough variability to be result, we conclude that transitional likelihood cannot be used as a
effectively modeled and the analysis focused on CHPC instead. direct proxy to task difficulty. That is not to say that difficulty does
Trial-wise analysis of CHPC revealed that younger adults show not vary with transitional likelihood, as the current design cannot
more high-probability responses initially, but the learning trajec- exclude this possibility. However, the present results strongly
tories over time are comparable between the groups. This could be indicate that transitional likelihood has other profound impacts on
indicative of a more conservative strategy deployed by older adults learning, beyond a potential impact on task difficulty.
initially, such as a stronger ‘prior’ inclination toward equiprobable Within the high-certainty states, learning trajectories between
responses in the beginning. Participants’ behavior in light of likely the two age groups were not significantly different from one
and unlikely transitions reveal further insight. another. However, when faced with less certain transitional prob-
Certainty states. In any probabilistic scenario, it is important abilities, the response pattern in older adults stayed relatively
to consider that the relationship between the probability of an constant and close to the actual underlying transitional probabili-
outcome and an individual’s predictions or decisions may not be a ties throughout the experiment. Conversely, younger adults
linear one. Indeed, examples of where observed probabilities and showed an initial strong tendency toward the most likely event,
the resulting predictions or decisions have a distinctly nonlinear followed by a rapid decay in their likelihood of responding with
relationship are well-documented (see Barberis, 2013 for a re- the next most likely state (see Figure 3). At first glance, this
view). In the present paradigm, we were able to observe younger observation is somewhat startling. Why should younger adults drift
and older adults’ behavior when dealing with likely or unlikely away from the true underlying probability, when they were per-
transitions, as the present paradigm contains both low- and high- fectly capable of identifying it in the beginning? A key difference
certainty states. Here, we consider that low and high certainty may between the likely and unlikely transitions is that for unlikely
function as a proxy to task difficulty. This is because it is possible transitions, a correct prediction (e.g., “B” will follow “A”) may
that probabilities of unlikely transitions are more difficult to ex- often not be realized, even though it may be the most likely
tract, compared to likely transitions. Indeed, the present results transition (e.g., of all options “B” is the most likely one to follow
suggest that learning of likely transitions is faster compared to “A”). If younger adults showed a strong adverse reaction (e.g.,
1100 HERFF, ZHEN, YU, AND AGRES
RAVLTRECA
RAVLTDelayedRecall
RAVLT4
RAVLT3
RAVLT5
RAVLTRecognition
RAVLT1
RAVLT2
DigiSymbolVerbal
DigiSymbolWritten
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
ColourTrail2
This document is copyrighted by the American Psychological Association or one of its allied publishers.
ColourTrail1
SemanticFluencyAnimals
RAVLTB
CHPC
CRV
DigitSpanBWD
DigitSpanFWD
RAVLTB
DigitSpanFWD
DigitSpanBWD
CRV
ColourTrail1
ColourTrail2
DigiSymbolVerbal
RAVLT2
CHPC
SemanticFluencyAnimals
DigiSymbolWritten
RAVLT1
RAVLTRecognition
RAVLT5
RAVLT3
RAVLT4
RAVLTDelayedRecall
RAVLTRECA
Color Key
0.2 0.6 1
Value
Figure 6. Hierarchical clustering of the magnitudes of the correlation coefficients of SL and all cognitive
assessments. Even though both digit span tests are clustered the closest to Cumulative Rule Violations (CRV)
and Cumulative High-Probability Choices (CHPC), a stepwise regression revealed that RAVLT1 and DigitSym-
bolWritten carry the most predictive value for SL. The black lines highlight the cells related to CRV and CHPC.
RAVLT ⫽ Rey Auditory Verbal Learning Test; BWD ⫽ backward; FWD ⫽ forward. See the online article for
the color version of this figure.
frustration) to negative observations (“B” did not follow “A,” implicit feedback could be performed reasonably well with a
despite the perception that “B” is the most likely), then this would Win-Stay, Lose-Shift strategy (e.g., Matsen & Nowak, 2004). If a
conceptually capture this pattern of results. We will return to this SL task can theoretically be achieved with strategies that do not
point with a more formal explanation when discussing the results extract the underlying statistical dependencies, then additional
of the Information Weights model. This is because the way par- steps need to be taken when interpreting the ability to perform a SL
ticipants react to various degrees of certainty can be indicative of task as evidence that the learners extracted the underlying TP
the underlying learning mechanisms used. Here, we tested three matrix. The fact that both groups did not utilize a Win-Stay,
cognitive models to further explore the learning mechanisms un- Lose-Shift strategy lends strength to the present paradigm and the
derlying SL, as well as age-related differences. results.
Learning mechanisms. We found evidence that both age Delta-rule learning describes the results within both age groups
groups draw information from the continuous sequence, rather well. By adjusting their behavior more strongly the further their
than only from the last time they provided a response specific to own beliefs differ from the actual underlying probabilities, partic-
the current state. That is, neither age group utilizes a Win-Stay, ipants were better able to perform the task. As predicted, younger
Lose-Shift Strategy. This is an important observation, as reliance adults do this to a greater extent than older adults. The second
on Win-Stay, Lose-Shift would have suggested that participants hypothesis about delta-rule learning was also confirmed: partici-
were not extracting the full underlying TP matrix, but instead only pants were more willing to adjust their behavior at the higher end
rely on a simple response heuristic to perform the SL task. Theo- of the probability spectrum. Based on present results, delta-rule
retically, many probabilistic SL tasks that provide explicit or learning could be a crucial mechanism involved in statistical
STATISTICAL LEARNING AND INFORMATION WEIGHTS 1101
Younger Adults Older Adults Younger Adults Older Adults
100 100
75 75
RAVLT1 DSW
CHPC
CHPC
50 High 50 High
Low Low
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
25 25
This document is copyrighted by the American Psychological Association or one of its allied publishers.
0 0
1 50 100 150 1 50 100 150 1 50 100 150 1 50 100 150
Trial Trial
Figure 7. Marginal effects plots of Age, SL, and cognitive assessment scores. Both age groups show higher
predicted Cumulative High-Probability Choices (CHPC) values with high RAVLT1 (2.14) and high DSW (2.64,
red lines) compared to low RAVLT1 (⫺2.35) and low DSW scores (⫺2.08, blue lines). The larger distances
between the red and the blue lines in older adults compared to the younger adults visualizes the three-way
interaction. The bands represent 95% CIs. RAVLT ⫽ Rey Auditory Verbal Learning Test; DSW ⫽ symbol digit
modality test in written modality. See the online article for the color version of this figure.
learning, and it could explain some of the age-related differences rapid discarding of impossible or unlikely—and therefore unreli-
observed. However, the hypothesized interaction in delta-rule able— outcomes. However, it would also lead to a greater shift
learning between transitional likelihood and age was not observed. away from the true underlying transitional probabilities. The de-
As a result, age-related differences in delta-rule learning cannot crease over time of high-probability choices in low-certainty states
explain the age-related differences observed in the low-certainty in the present study could be an example of this possibility.
responses discussed in the previous section. The information Interestingly, the lower but more balanced weights in older adults,
weights model, on the other hand, can. in the long run, would yield more accurate yet slower behavioral
There is ample evidence that correct predictions are intimately changes. This fits the general observation that older adults weight
tied with internally generated rewards (Fiser, Berkes, Orbán, & accuracy over speed (Forstmann et al., 2011; Salthouse, 1979).
Lengyel, 2010), which increase the probability of the same pre- The information weights perspective also integrates well with
diction in the future, similar to a Bayesian observer. However, the previous findings. Nassar et al. (2016) found large behavioral
decrease in probability caused by a negative observation (“B” does adjustments to relatively minor predictions errors in younger
not follow “A”) may not be identical to the increase in probability adults, but not older adults. This observation could be well-
caused by a positive observation (“B” does follow “A”). With the described by younger adults placing larger weights on negative
data collected here, we were able to calculate the weights that observations, as the present information weights model revealed.
younger and older adults attach to positive and negative transi- A potential explanation for the age-related shift in information
tional observations. We find that information weights offer a weights may be provided by socioemotional selectivity theory.
parsimonious mechanistic description that captures the present Socioemotional selectivity theory posits that goal-directed be-
results well. As hypothesized, younger adults attached larger havior is strongly influenced by an individual’s perspective on
weights to both types of observations compared to older adults, time (Carstensen, 1992, 1995; Carstensen, Fung, & Charles, 2003).
which could explain why younger adults initially show faster Specifically, when time is perceived as open-ended, expensive and
behavioral changes. Most importantly, younger adults strongly potentially risky long-term goals are considered. However, when
weight the information of negative observations over positive ones time is perceived as being limited, greater importance is put on the
when it comes to formulating future predictions. Older adults also present, for example, by prioritizing emotional wellbeing and
rely on negative information more than on positive, but to a stability. As age progresses, individuals tend to perceive time as
substantially lesser extent than younger participants. As a result, passing faster, often paired with an increasing confrontation with
our second hypothesis about information weights is supported. It is one’s own mortality. In the current study, the balanced information
important to note that older participants here appear close to weights utilized by the older adults result in a slower change of
equiweighting for positive and negative observations. behavior that is much less prone to dramatic shifts in behavior, and
Overweighting negative observations, as younger participants would eventually arrive at a stable homeostatic state with a re-
did, appears sensible from an evolutionary perspective, as it allows sponse distribution that mirrors the precise underlying transitional
1102 HERFF, ZHEN, YU, AND AGRES
probabilities. As behavioral change and inaccurate decisions are requirements (e.g., auditory tracking) instead of a direct influence
more expensive for individuals that perceive time as “running- of cognitive ability on SL (Feldman et al., 1995; Herff, Nur, et al.,
out,” the balanced approach to weighting incoming information 2019; Kaufman et al., 2010; Siegelman et al., 2017). This is in line
would be the most rational choice for older individuals, rather than with previous literature that suggests that SL and general cognitive
deploying a weighting that prioritizes quick yet imprecise adapta- function are largely independent (Feldman et al., 1995; Kaufman
tion (as embraced by the younger adults). et al., 2010; Siegelman et al., 2017). Based on the present results,
Establishing individuals’ information weights could be a useful it seems unlikely that differences in cognitive function are what
tool for customizing and optimizing learning. Specifically, it drives the age-related differences in SL observed here.
seems that as age progresses, positive observations (“B” follows
“A”) become more important for learning than negative observa-
Conclusion
tions. This finding could be relevant for an aging workforce that is
required to adapt and learn new skills (WHO, 2015, 2017). How- The paradigm deployed here tracked learning trajectories and
ever, it is important to note that the present study cannot distin- revealed differences between younger and older adults in SL when
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
guish “age” from “education” because the between-subjects nature it comes to dealing with uncertainty. A possible explanation was
This document is copyrighted by the American Psychological Association or one of its allied publishers.
of the design means that age-related differences may be a feature found in the form of age-dependent differences in information
of aging, or a result of different upbringings between the two weighting, in which younger adults more readily adjust their
generations. In both cases, the information weights perspective behavior, but also weight negative observations (e.g., “B” does not
may be useful as awareness of information weights may help in follow “A”) more strongly than positive observations (e.g., “B”
understanding the judgments made by oneself and others. It is does follow “A”) compared to older adults. The weights deployed
apparent in the present results that different information weights by younger adults favor rapid behavioral adaptation, whereas the
can lead to different decisions at different timepoints. As a result, weights used by older adults favor more precise behavioral adap-
expressing decisions as a function of the information weights may tation over time. We hope that future research using this paradigm
help in bridging opposing judgments. Furthermore, exploring an will provide precise estimates of individuals’ information weight-
information weights perspective could also be useful in deepening ing of positive and negative predictive outcomes.
our understanding of mental disorders that can be understood as
information filters (e.g., depression, see Gaddy & Ingram, 2014).
For example, depression could be characterized as elevated nega- References
tive— or reduced positive—information weights. This question Agres, K., Abdallah, S., & Pearce, M. (2018). Information-theoretic prop-
represents a promising area for future research. erties of auditory sequences dynamically influence expectation and
SL and cognitive ability. To test whether results could also memory. Cognitive Science, 42, 43–76. http://dx.doi.org/10.1111/cogs
be explained by differences in cognitive function, we collected a .12477
battery of cognitive assessments from both age groups. Across Aizenstein, H. J., Butters, M. A., Clark, K. A., Figurski, J. L., Andrew
younger and older adults, we found evidence that higher cognitive Stenger, V., Nebes, R. D., . . . Carter, C. S. (2006). Prefrontal and striatal
activation in elderly subjects during concurrent implicit and explicit
assessment scores predict steeper learning trajectories. Impor-
sequence learning. Neurobiology of Aging, 27, 741–751. http://dx.doi
tantly, this effect was exacerbated in older adults. Specifically, .org/10.1016/j.neurobiolaging.2005.03.017
whereas older adults with high cognitive assessment scores show Barascud, N., Pearce, M. T., Griffiths, T. D., Friston, K. J., & Chait, M.
similar SL performance compared to young adults with high (2016). Brain responses in humans reveal ideal observer-like sensitivity
cognitive assessment scores, older adults with low cognitive as- to complex acoustic patterns. Proceedings of the National Academy of
sessment scores show lower SL performance compared to younger Sciences of the United States of America, 113(5), E616 –E625. http://dx
adults with matched scores. A possible explanation could be that .doi.org/10.1073/pnas.1508523113
low cognitive assessment scores in older adults may be indicative Barberis, N. C. (2013). Thirty years of prospect theory in economics: A
of age-related cognitive decline that affects various functions in review and assessment. The Journal of Economic Perspectives, 27,
the brain, whereas low scores in younger adults are less likely to 173–196. http://dx.doi.org/10.1257/jep.27.1.173
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects
be indicative of functional impairments. The multimodal paradigm
structure for confirmatory hypothesis testing: Keep it maximal. Journal
may have reduced the cognitive demand of the task, but it also of Memory and Language, 68, 255–278. http://dx.doi.org/10.1016/j.jml
required visual-audio coordination, which in turn might pose a .2012.11.001
challenge, as perceptual processes decline with age. If this is the Bürkner, P. (2017). Brms: An r package for bayesian multilevel models
case, then this could also be a contributing factor explaining why using stan. Journal of Statistical Software, 80, 1–28. http://dx.doi.org/
the cognitive assessment scores were stronger predictors of SL in 10.18637/jss.v080.i01
older adults. Bürkner, P. (2018). Advanced bayesian multilevel modeling with the r
Of the large number of cognitive tests deployed, the two most package brms. arXiv, 10(1), 395– 411. http://dx.doi.org/10.32614/RJ-
promising predictors of SL were the RAVLT 1 as well as the Digit 2018-017
Symbol (written) Modality test. This makes intuitive sense, as the Carstensen, L. L. (1992). Social and emotional patterns in adulthood:
Support for socioemotional selectivity theory. Psychology and Aging, 7,
Digit Symbol Modality test was designed to capture associative
331–338. http://dx.doi.org/10.1037/0882-7974.7.3.331
learning, and the RAVLT tests auditory memory. Clustering based Carstensen, L. L. (1995). Evidence for a life-span theory of socioemotional
on the correlation magnitudes and overall low correlations (r ⫽ selectivity. Current Directions in Psychological Science, 4, 151–156.
.33) further suggest that SL ability and traditional cognitive as- http://dx.doi.org/10.1111/1467-8721.ep11512261
sessments most likely target different underlying constructs and Carstensen, L. L., Fung, H. H., & Charles, S. T. (2003). Socioemotional
any predictive information observed may be due to task-specific selectivity theory and the regulation of emotion in the second half of life.
STATISTICAL LEARNING AND INFORMATION WEIGHTS 1103
Motivation and Emotion, 27, 103–123. http://dx.doi.org/10.1023/A: negative chord sequences. Poster presented at Brain. Cognition. Emo-
1024569803230 tion. Music., University of Kent Canterbury, Canterbury, England.
Cherry, K. E., & Stadler, M. A. (1995). Implicit learning of a nonverbal http://dx.doi.org/10.17605/OSF.IO/EQ9JU
sequence in younger and older adults. Psychology and Aging, 10, 379 – Herff, S. A., Zhen, S., Yu, R., & Agres, K. R. (2019). Age-dependent
394. http://dx.doi.org/10.1037/0882-7974.10.3.379 statistical learning trajectories reveal differences in information weight-
Creel, S. C., Newport, E. L., & Aslin, R. N. (2004). Distant melodies: ing. Psyarxiv. http://dx.doi.org/10.31234/osf.io/kuy6p
Statistical learning of nonadjacent dependencies in tone sequences. Hinault, T., Lemaire, P., & Touron, D. (2017). Strategy combination during
Journal of Experimental Psychology: Learning, Memory, and Cogni- execution of memory strategies in young and older adults. Memory, 25,
tion, 30, 1119 –1130. http://dx.doi.org/10.1037/0278-7393.30.5.1119 619 – 625. http://dx.doi.org/10.1080/09658211.2016.1200626
Curran, T. (1997). Effects of aging on implicit sequence learning: Account- Howard, D. V., & Howard, J. H., Jr. (1989). Age differences in learning
ing for sequence structure and explicit knowledge. Psychological Re- serial patterns: Direct versus indirect measures. Psychology and Aging,
search, 60(1–2), 24 – 41. http://dx.doi.org/10.1007/BF00419678 4, 357–364. http://dx.doi.org/10.1037/0882-7974.4.3.357
Daltrozzo, J., & Conway, C. M. (2014). Neurocognitive mechanisms of Howard, D. V., & Howard, J. H., Jr. (1992). Adult age differences in the
statistical-sequential learning: What do event-related potentials tell us? rate of learning serial patterns: Evidence from direct and indirect tests.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Frontiers in Human Neuroscience, 8, 437. http://dx.doi.org/10.3389/ Psychology and Aging, 7, 232–241. http://dx.doi.org/10.1037/0882-7974
This document is copyrighted by the American Psychological Association or one of its allied publishers.
fnhum.2014.00437 .7.2.232
D’Elia, L., Satz, P., Uchiyama, C. L., & White, T. (1996). Color trails test: Howard, D. V., Howard, J. H., Jr., Japikse, K., DiYanni, C., Thompson, A.,
Ctt. Odessa, FL: Psychological Assessment Resources Odessa. & Somberg, R. (2004). Implicit sequence learning: Effects of level of
Eppinger, B., & Kray, J. (2011). To choose or to avoid: Age differences in structure, adult age, and extended practice. Psychology and Aging, 19,
learning from positive and negative feedback. Journal of Cognitive 79 –92. http://dx.doi.org/10.1037/0882-7974.19.1.79
Neuroscience, 23, 41–52. http://dx.doi.org/10.1162/jocn.2009.21364 Howard, J. H., Jr., & Howard, D. V. (1997). Age differences in implicit
Feeney, J. J., Howard, J. H., Jr., & Howard, D. V. (2002). Implicit learning learning of higher order dependencies in serial patterns. Psychology and
of higher order sequences in middle age. Psychology and Aging, 17, Aging, 12, 634 – 656. http://dx.doi.org/10.1037/0882-7974.12.4.634
351–355. http://dx.doi.org/10.1037/0882-7974.17.2.351 Kaufman, S. B., Deyoung, C. G., Gray, J. R., Jiménez, L., Brown, J., &
Feldman, J., Kerr, B., & Streissguth, A. P. (1995). Correlational analyses
Mackintosh, N. (2010). Implicit learning as an ability. Cognition, 116,
of procedural and declarative learning performance. Intelligence, 20,
321–340. http://dx.doi.org/10.1016/j.cognition.2010.05.011
87–114. http://dx.doi.org/10.1016/0160-2896(95)90007-1
Kirkham, N. Z., Slemmer, J. A., & Johnson, S. P. (2002). Visual statistical
Feng, L., Lim, W.-S., Chong, M.-S., Lee, T.-S., Gao, Q., Nyunt, M. S., . . .
learning in infancy: Evidence for a domain general learning mechanism.
Ng, T.-P. (2017). Depressive symptoms increase the risk of mild neu-
Cognition, 83(2), B35–B42. http://dx.doi.org/10.1016/S0010-
rocognitive disorders among elderly Chinese. The Journal of Nutrition,
0277(02)00004-5
Health & Aging, 21, 161–164. http://dx.doi.org/10.1007/s12603-016-
Krogh, L., Vlach, H. A., & Johnson, S. P. (2013). Statistical learning across
0765-3
development: Flexible yet constrained. Frontiers in Psychology, 3, 598.
Ferdinand, N. K., & Kray, J. (2013). Age-related changes in processing
http://dx.doi.org/10.3389/fpsyg.2012.00598
positive and negative feedback: Is there a positivity effect for older
Löckenhoff, C. E., & Carstensen, L. L. (2007). Aging, emotion, and
adults? Biological Psychology, 94, 235–241. http://dx.doi.org/10.1016/j
health-related decision strategies: Motivational manipulations can re-
.biopsycho.2013.07.006
duce age differences. Psychology and Aging, 22, 134 –146. http://dx.doi
Fiser, J., Berkes, P., Orbán, G., & Lengyel, M. (2010). Statistically optimal
.org/10.1037/0882-7974.22.1.134
perception and learning: From behavior to neural representations. Trends
Mata, R., von Helversen, B., & Rieskamp, J. (2010). Learning to choose:
in Cognitive Sciences, 14, 119 –130. http://dx.doi.org/10.1016/j.tics
.2010.01.003 Cognitive aging and strategy selection learning in decision making.
Forstmann, B. U., Tittgemeyer, M., Wagenmakers, E. J., Derrfuss, J., Psychology and Aging, 25, 299 –309. http://dx.doi.org/10.1037/
Imperati, D., & Brown, S. (2011). The speed-accuracy tradeoff in the a0018923
elderly brain: A structural model-based approach. The Journal of, 31, Matsen, F. A., & Nowak, M. A. (2004). Win-stay, lose-shift in language
17242–17249. http://dx.doi.org/10.1523/JNEUROSCI.0309-11.2011 learning from peers. Proceedings of the National Academy of Sciences,
Frensch, P. A., & Miner, C. S. (1994). Effects of presentation rate and USA of the United States of America, 101, 18053–18057. http://dx.doi
individual differences in short-term memory capacity on an indirect .org/10.1073/pnas.0406608102
measure of serial learning. Memory & Cognition, 22, 95–110. http://dx Milne, A. J., & Herff, S. A. (2020). The perceptual relevance of balance,
.doi.org/10.3758/BF03202765 evenness, and entropy in musical rhythms. Cognition, 203, 104233.
Gaddy, M. A., & Ingram, R. E. (2014). A meta-analytic review of mood- http://dx.doi.org/10.1016/j.cognition.2020.104233
congruent implicit memory in depressed mood. Clinical Psychology Misyak, J. B., Christiansen, M. H., & Tomblin, J. B. (2010). On-line
Review, 34, 402– 416. http://dx.doi.org/10.1016/j.cpr.2014.06.001 individual differences in statistical learning predict language processing.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Frontiers in Psychology, 1, 31. http://dx.doi.org/10.3389/fpsyg.2010
Rubin, D. B. (2013). Bayesian data analysis. London, England: Chap- .00031
man and Hall/CRC. http://dx.doi.org/10.1201/b16018 Moldwin, T., Schwartz, O., & Sussman, E. S. (2017). Statistical learning of
Greve, A., Cooper, E., Kaula, A., Anderson, M. C., & Henson, R. (2017). melodic patterns influences the brain’s response to wrong notes. Journal
Does prediction error drive one-shot declarative learning? Journal of of Cognitive Neuroscience, 29, 2114 –2122. http://dx.doi.org/10.1162/
Memory and Language, 94, 149 –165. http://dx.doi.org/10.1016/j.jml jocn_a_01181
.2016.11.001 Nassar, M. R., Bruckner, R., Gold, J. I., Li, S. C., Heekeren, H. R., &
Herff, S. A., Nur, A., Lee, J., Lee, T., & Agres, K. (2019, July). Statistical Eppinger, B. (2016). Age differences in learning emerge from an insuf-
learning ability as a measure of cognitive function. Paper presented at ficient representation of uncertainty in older adults. Nature Communi-
the 41st Annual Conference of the Cognitive Science Society, Montreal, cations, 7, 11609. http://dx.doi.org/10.1038/ncomms11609
Canada, 24 –27 July. http://dx.doi.org/10.31234/osf.io/u4ry6 Nowak, M., & Sigmund, K. (1993). A strategy of win-stay, lose-shift that
Herff, S. A., & Prince, J. B. (2020, May). Learning, mood, and music: outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature, 364,
Depression, anxiety, and stress reflect processing biases in positive and 56 –58. http://dx.doi.org/10.1038/364056a0
1104 HERFF, ZHEN, YU, AND AGRES
Palmer, S. D., Hutson, J., & Mattys, S. L. (2018). Statistical learning for Siegelman, N., Bogaerts, L., Christiansen, M. H., & Frost, R. (2017).
speech segmentation: Age-related changes and underlying mechanisms. Towards a theory of individual differences in statistical learning. Phil-
Psychology and Aging, 33, 1035–1044. http://dx.doi.org/10.1037/ osophical Transactions of the Royal Society B: Biological Sciences, 372,
pag0000292 20160059. http://dx.doi.org/10.1098/rstb.2016.0059
Palmer, S. D., & Mattys, S. L. (2016). Speech segmentation by statistical Siegelman, N., Bogaerts, L., & Frost, R. (2017). Measuring individual
learning is supported by domain-general processes within working mem- differences in statistical learning: Current pitfalls and possible solutions.
ory. The Quarterly Journal of Experimental Psychology, 69, 2390 – Behavior Research Methods, 49, 418 – 432. http://dx.doi.org/10.3758/
2401. http://dx.doi.org/10.1080/17470218.2015.1112825 s13428-016-0719-z
Pérez-González, D., & Malmierca, M. S. (2014). Adaptation in the auditory Siegelman, N., & Frost, R. (2015). Statistical learning as an individual
system: An overview. Frontiers in Integrative Neuroscience, 8, 19. ability: Theoretical perspectives and empirical evidence. Journal of
http://dx.doi.org/10.3389/fnint.2014.00019 Memory and Language, 81, 105–120. http://dx.doi.org/10.1016/j.jml
Randolph, C., Braun, A. R., Goldberg, T. E., & Chase, T. N. (1993). .2015.02.001
Semantic fluency in Alzheimer’s, Parkinson’s, and Huntington’s dis- Smith, A. (1982). Symbol digit modalities test. Los Angeles, CA: Western
ease: Dissociation of storage and retrieval failures. Neuropsychology, 7, Psychological Services Los Angeles.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
82– 88. http://dx.doi.org/10.1037/0894-4105.7.1.82 Sohoglu, E., & Chait, M. (2016). Detecting and representing predictable
Rescorla, R., & Wagner, A. R. (1972). A theory of Pavlovian conditioning:
This document is copyrighted by the American Psychological Association or one of its allied publishers.
structure during auditory scene analysis. eLife, 5, e19113. http://dx.doi
Variations in the effectiveness of reinforcement and nonreinforcement. .org/10.7554/eLife.19113
Classical conditioning II: Current research and theory, 2, 64 –99.
Tan, J., Tsakok, F. H. M., Ow, E. K., Lanskey, B., Lim, K. S. D., Goh,
Rey, A. (1958). L’examenclinique en psychologie [the psychological ex-
L. G., . . . Feng, L. (2018). Study protocol for a randomized controlled
amination]. Paris: Presses Universitaires de France.
trial of choral singing intervention to prevent cognitive decline in at-risk
Roseberry, S., Richie, R., Hirsh-Pasek, K., Golinkoff, R. M., & Shipley,
older adults living in the community. Frontiers in Aging Neuroscience,
T. F. (2011). Babies catch a break: 7- to 9-month-olds track statistical
10, 195. http://dx.doi.org/10.3389/fnagi.2018.00195
probabilities in continuous dynamic events. Psychological Science, 22,
Thorndike, E. L. (1898). Animal intelligence: An experimental study of the
1422–1424. http://dx.doi.org/10.1177/0956797611422074
associative processes in animals. The Psychological Review: Mono-
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by
graph Supplements, 2(4), i–109. http://dx.doi.org/10.1037/10780-000
8-month-old infants. Science, 274, 1926 –1928. http://dx.doi.org/10
WHO. (2015). World report on ageing and health. Retrieved from https://
.1126/science.274.5294.1926
Saffran, J. R., & Kirkham, N. Z. (2018). Infant statistical learning. Annual www.who.int/ageing/events/world-report-2015-launch/en/
Review of Psychology, 69, 181–203. http://dx.doi.org/10.1146/annurev- WHO. (2017). Amendments to the staff regulations and staff rules. Re-
psych-122216-011805 trieved from https://apps.who.int/gb/ebwha/pdf_files/EB141/B141_11-
Salthouse, T. A. (1979). Adult age and the speed-accuracy trade-off. en.pdf
Ergonomics, 22, 811– 821. http://dx.doi.org/10.1080/001401379 Worthy, D. A., Hawthorne, M. J., & Otto, A. R. (2013). Heterogeneity of
08924659 strategy use in the Iowa gambling task: A comparison of win-stay/lose-
Salthouse, T. A., McGuthry, K. E., & Hambrick, D. Z. (1999). A frame- shift and reinforcement learning models. Psychonomic Bulletin & Re-
work for analyzing and interpreting differential aging patterns: Appli- view, 20, 364 –371. http://dx.doi.org/10.3758/s13423-012-0324-9
cation to three measures of implicit learning. Aging, neuropsychology, Yu, C. H. (2018). Neuropsychological assessments training manual for
and Cognition, 6, 1–18. http://dx.doi.org/10.1076/anec.6.1.1.789 assessors (T. Y. Qian & S. J. Ching, Eds.; Version 3.1, Approved by
Schirda, B., Valentine, T. R., Aldao, A., & Prakash, R. S. (2016). Age- K. E. Heok, L. Feng Ed.). Singapore: Yong Loo Lin School of Medi-
related differences in emotion regulation strategies: Examining the role cine’s Department of Psychological Medicine.
of contextual factors. Developmental Psychology, 52, 1370 –1380. http://
dx.doi.org/10.1037/dev0000194
Shafir, S., Reich, T., Tsur, E., Erev, I., & Lotem, A. (2008). Perceptual Received January 24, 2020
accuracy and conflicting effects of certainty on risk-taking behaviour. Revision received June 29, 2020
Nature, 453, 917–920. http://dx.doi.org/10.1038/nature06841 Accepted July 5, 2020 䡲