Current advances in digital cognitive assessment for preclinical Alzheimer's disease
Fredrik Öhman
Jason Hassenstab
David Berron
Michael Schöll
Kathryn V. Papp
SimpleOriginal

Summary

Digital cognitive assessments for preclinical Alzheimer’s are advancing with greater tech use among older adults. Studies show promising validity versus traditional tests, suggesting a future role in remote screening and monitoring.

2021

Current advances in digital cognitive assessment for preclinical Alzheimer's disease

Keywords clinical assessment; clinical trials; cognition; computerized assessment; digital cognitive biomarkers; home‐based assessment; preclinical Alzheimer's disease; smartphone‐based assessment

Abstract

There is a pressing need to capture and track subtle cognitive change at the preclinical stage of Alzheimer's disease (AD) rapidly, cost-effectively, and with high sensitivity. Concurrently, the landscape of digital cognitive assessment is rapidly evolving as technology advances, older adult tech-adoption increases, and external events (i.e., COVID-19) necessitate remote digital assessment. Here, we provide a snapshot review of the current state of digital cognitive assessment for preclinical AD including different device platforms/assessment approaches, levels of validation, and implementation challenges. We focus on articles, grants, and recent conference proceedings specifically querying the relationship between digital cognitive assessments and established biomarkers for preclinical AD (e.g., amyloid beta and tau) in clinically normal (CN) individuals. Several digital assessments were identified across platforms (e.g., digital pens, smartphones). Digital assessments varied by intended setting (e.g., remote vs. in-clinic), level of supervision (e.g., self vs. supervised), and device origin (personal vs. study-provided). At least 11 publications characterize digital cognitive assessment against AD biomarkers among CN. First available data demonstrate promising validity of this approach against both conventional assessment methods (moderate to large effect sizes) and relevant biomarkers (predominantly weak to moderate effect sizes). We discuss levels of validation and issues relating to usability, data quality, data protection, and attrition. While still in its infancy, digital cognitive assessment, especially when administered remotely, will undoubtedly play a major future role in screening for and tracking preclinical AD.

1. INTRODUCTION

In Alzheimer's disease (AD), the major pathophysiological processes (accumulation of amyloid beta protein [Aβ] into plaques assumed to be followed by the aggregation of hyperphosphorylated tau protein [p‐tau]into neurofibrillary tangles [NFT]), begin years to decades prior to the clinical dementia syndrome. During this preclinical phase, cognitive functions are largely unaffected, but as neuropathological burden increases over time, subtle cognitive decrements emerge. The preclinical phase offers a promising window for preventing decline, which emphasizes that capturing the subtle changes in cognition during this phase is immensely important.

1.1. Associations between paper‐and‐pencil cognitive measures and AD biomarkers in preclinical AD

In clinically normal (CN) individuals, abnormal levels of Aβ (Aβ+), as measured in cerebrospinal fluid (CSF) (e.g., levels of Aβ42 or Aβ42/40) or with positron emission tomography (PET) neuroimaging, are considered indicative of an early AD pathological process. During this preclinical stage of the disease continuum, the cross‐sectional association between Aβ and cognitive deficits is generally weak, or insignificant. However, CN individuals with higher Aβ burden (Aβ+) exhibit faster cognitive decline, and often progress to a clinical stage faster than those with lower levels of Aβ. This cognitive decline is subtle and detectable only over several years. In one study, Aβ+ CN participants declined at an average rate of −0.42 z‐score units per 18 months, while another study showed a decline of between –0.07 and –0.15 average z‐score units per year. The strongest association between AD biomarkers and cognitive decline is for memory function, but there are also reports of decline across other cognitive domains, including executive and visuospatial functions.

The second major pathogenic process in preclinical AD is tau aggregation into NFT, also measurable in CSF (e.g., p‐tau) and with PET neuroimaging. Compared to Aβ, tau has been considered more closely linked to cognitive impairment during the AD process. In CN individuals, tau burden has been associated with memory impairment and longitudinal cognitive decline. Given that tau PET is still relatively newer than Aβ PET, less is known about the longitudinal relation between tau and cognition, including later clinical progression. Generally, those with higher tau are at greater risk for longitudinal cognitive decline; importantly, however, this decline is several fold faster in Aβ+ CN individuals.

1.2. Paper‐and‐pencil versus digitized cognitive assessment

The relationship between paper‐and‐pencil measures of cognition and AD biomarkers among CN older adults is complex, but observed correlations are generally relatively weak, particularly cross‐sectionally. Longitudinally, these relationships are more consistently observed and of greater magnitude, with CN older adults with elevated biomarker levels exhibiting cognitive decline. Weak relationships between cognition and AD biomarkers may be partially attributable to the limitations of paper‐and‐pencil assessments, most of which were designed to detect frank impairment in clinical populations as opposed to being designed to detect subtle preclinical impairment. Furthermore, normal fluctuations in cognitive performance, practice effects, and cognitive reserve may obscure the detection of subtle cognitive decline.

The use of digital technology to assess cognition has the potential to mitigate some of the limitations of current paper‐and‐pencil assessments. For example, mobile devices enable more frequent testing, resulting in more reliable and informative longitudinal data and are more accessible and cost‐effective thanks to self‐administration. Computerized measures that automatically generate alternative forms may help minimize practice and version effects. Artificial intelligence (AI) methods such as deep learning enable faster, novel, and potentially more sensitive analysis of cognitive data.

Digital assessments also pose new challenges. Many studies using remote assessments struggle to maintain participant engagement. Digital storage and sharing of cognitive data pose questions related to data privacy, particularly when devices may collect additional identifiable personal data (e.g., voice recordings). Unsupervised digital assessments require systems to ensure the individual assigned to a remote assessment is the individual taking that assessment. Rapidly developing technologies and operating systems pose challenges to selecting and maintaining a single version of a digital assessment over time. Finally, while secular trends suggest that older adults are increasingly familiar and comfortable with new technology, a not insignificant population may be excluded from research with digital assessments due to lack of familiarity, technical skills, or access.

Digital technology has not yet replaced paper‐and‐pencil assessments, particularly not in clinical trials, because multiple questions remain unanswered: Does digital technology capture cognitive information analogous to gold‐standard paper‐and‐pencil measures? Is there a fundamental difference between capturing data with a rater versus a device? How reliable and feasible is digital technology? These questions are just beginning to be addressed in a more widespread fashion as use of digital technology is rapidly evolving, for instance, in research on preclinical AD.

HIGHLIGHTS

Digital assessments for preclinical Alzheimer's disease (AD) vary by intended setting (e.g., remote vs. in‐clinic), level of supervision (e.g., self vs. supervised), and device origin (personal vs. study‐provided).

At least 11 articles characterize digital cognitive assessment in biomarker‐defined preclinical AD, but the literature generally remains nascent, particularly for remote and novel assessments.

Multiple digital assessment instruments exhibit predominantly weak to moderate relationships with AD biomarkers in preclinical groups. More work is needed to confirm the concrete diagnostic potential in preclinical disease stages.

Potential benefits and challenges are discussed within the framework of future implementation in clinical trials, including recommendations for future studies.

RESEARCH IN CONTEXT

Systematic Review: The authors reviewed the literature from sources such as PubMed, Scopus, and PsycINFO, as well as grant and clinical trial databases and conference presentations. Publications reporting on novel, digital cognitive assessment methods in cognitively healthy individuals characterized as preclinical Alzheimer's disease (AD) by established biomarker evidence were appropriately cited and discussed.

Interpretation: We included and discussed different platforms and approaches used to enable both on‐site and remotely administered digital assessment to identify early cognitive impairment and decline in preclinical AD. Their sensitivity to AD biomarkers was found ranging from predominantly weak to moderate. Several promising newly developed assessment instruments were identified, currently under evaluation. Our findings have implications for the use of these instruments for the enrichment of clinical trials with relevant participants.

Future Directions: This article emphasizes the potential of novel assessment instruments to advance cognitive assessment in the early identification of preclinical AD. Before being fully implemented in clinical practice and screening for clinical trials, however, further research is needed to establish the concrete associations between assessment outcome and established biomarkers sensitive for the earliest signs of AD pathology. Last, we recommend conducting feasibility studies to investigate potential barriers for future implementation.

1.3. Organization of results

In this context, our objectives were to systematically review the current landscape of digital cognitive tests for use in preclinical AD and to describe the extent of validation of these digital cognitive tests against (1) gold‐standard cognitive tests and test composites (paper‐and‐pencil measures) and (2) biomarkers of Aβ and tau pathology. Furthermore, we will critically discuss the potential and pitfalls of digital cognitive assessments in the context of implementation in clinical trials, and to provide an outlook for the future of digital cognitive assessment. Our goal, however, was not to give an exhaustive overview of mobile technology or computer testing for use in elderly populations in general. Additionally, we do not address the separate field of passive monitoring to infer cognition using sensors and wearables.

We first describe the current understanding of the associations between cognitive performance on conventional paper‐and‐pencil measures and AD biomarkers. Subsequently, we discuss digital assessments organized into three groups based on technological platform and/or setting: (1) primarily in‐clinic computerized and tablet‐based, (2) primarily unsupervised environment and smartphone‐ or tablet‐based, and (3) novel data collection systems and analysis procedures (e.g., digital pen, eye‐tracking, and language analysis; novel methods for data analysis, e.g., using AI approaches).

For each digital assessment, validation is discussed in terms of (1) biomarker validation and (2) paper‐and‐pencil validation.

Paper‐and‐pencil validation involved comparing digital measures to conventional measures such as relevant global cognitive composites (e.g., Preclinical Alzheimer Cognitive Composite [PACC]) or domain‐specific test composites.

2. METHODS

2.1. Search strategies

From January 2020 to December 2020, we searched three electronic databases (PubMed, Scopus, PsycINFO) for relevant publications (using search terms such as digital, mobile, smartphone, tablet, Alzheimer's, preclinical, amyloid), two online registers (ClinicalTrials.gov and National Institutes of Health [NIH] research portfolio) for relevant trials and awarded grants. A second search using the names of digital tests and companies identified in the first search was performed. We also searched two conferences for any relevant preliminary results: Clinical Trials on Alzheimer's Disease conference (CTAD) 2020 and Alzheimer's Association International Conference (AAIC) 2020.

2.2. Inclusion and exclusion criteria

Published articles, ongoing studies, and clinical trials using digital cognitive assessment were selected if they involved individuals identified with preclinical AD. Preclinical AD was defined either based on biomarker evidence of Aβ plaque pathology either by cortical Aβ PET ligand binding or low CSF Aβ42 and/or NFT pathology (elevated CSF p‐tau or cortical tau PET ligand binding). Using the National Institute on Aging and Alzheimer's Association (NIA‐AA) Research Framework revised guidelines, we defined preclinical AD as corresponding to the earliest stages in the numeric clinical staging (stage 1–2). We excluded studies that only included participants meeting criteria for clinical diagnoses, such as mild cognitive impairment (MCI) or dementia.

2.3. Procedures

A total of 469 articles were screened using the web‐app Rayyan, of which 458 were excluded due to failure to meet inclusion criteria, and 11 were included in the review. Grant applications were screened from the NIH research portfolio, but no additional study was included. Since this initial literature search, two additional newly published articles were included. Preliminary results from seven conference presentations have also been included, specifically from CTAD 2020 and AAIC 2020. The resulting relatively small, heterogeneous, and methodologically inconsistent body of literature limited our review's methodology. Therefore, we performed a qualitative synthesis rather than a meta‐analysis.

3. RESULTS

3.1. Primarily in‐clinic computerized and tablet‐based cognitive assessment

An established area of digital development in cognitive testing is adapting traditional cognitive measures onto computerized platforms such as the Pearson's Q‐interactive for Wechsler Adult Intelligence Scale or the Montreal Cognitive Assessment (MoCA) Electronic Test. Furthermore, clinical trial data management companies such as Medavante and Clinical Ink have adapted traditional cognitive measures to be administered as electronic clinical outcome assessments. Automatic scoring and recording mitigate common error sources, but these systems, by definition, do not reimagine neuropsychological testing. A number of computerized cognitive tests have been developed to detect cognitive decline. These may include stand‐alone apps and programs as well as web‐based apps that can be completed either on personal computers (PC) or tablets. Some of these tests consist of digitized versions of traditional paper‐and‐pencil neuropsychological tests, while others involve newly developed tests designed to be completed without the active participation or presence of an examiner; these include, for example, Savonix, BrainCheck, Cogniciti, Mindmore, BAC, NIH‐Toolbox, CANTAB, and Cogstate, among others. Each vary in their approach, degree of commercialization, security and regulatory readiness, and degree of “gamification.” They also differ in their respective target populations and clinical indications. Here, we focus on the systems and platforms specifically or mainly designed to detect the earliest cognitive decline in AD. See Table 1 for an overview of the validation of these types of cognitive assessment instruments. In Figure 1, a selection of primarily in‐clinic computerized and tablet‐based cognitive assessments are exemplified.

TABLE 1. Validation of primarily in‐clinic computerized and tablet‐based cognitive assessment

Table 1

Abbreviations: Aβ, amyloid beta; β, standardized β coefficients; CN, clinically normal; Cogstate CBB, Cogstate Brief Battery; Cogstate CPAL, Cogstate Continuous Paired Associate Learning; d, Cohen's d; NIHTB‐CB, National Institutes of Health Toolbox Cognition Battery; CANTAB, Cambridge Neuropsychological Test Automated Battery; PACC, Preclinical Alzheimer Cognitive Composite; PiB, Pittsburgh compound B positron emission tomography; ρ r, Spearman correlation; r, correlation coefficient.

Note: Only published articles are included in this table.

FIGURE 1.

fig 1

A, Cogstate One Back tests. Copyright© 2020 Cogstate. All rights reserved. Used with Cogstate's permission. B, CANTAB Spatial Span and Paired Associates Learning. Copyright Cambridge Cognition. All rights reserved. C, NIH‐Toolbox Pattern Comparison Processing Speed Test Age 7+ v2.1. Used with permission NIH Toolbox, © 2020 National Institutes of Health and Northwestern University

3.1.1. Cogstate digital cognitive testing system

Cogstate is a commercial company based in Australia. A founding principle behind the Cogstate Brief Battery (CBB) was to mitigate the effects of language and culture on cognitive assessment. Therefore, their measures of response time, working memory, and continuous visual memory are completed using the universal stimulus set of common playing cards. However, additional non–card‐playing tasks are also available (e.g., a paired associative learning task and a maze learning task). This test battery was initially developed in the early 2000s for PC (where participants would respond via keystrokes) but is now available for tablets. A second founding principle behind Cogstate tasks is a more reliable measurement of change over time through randomized alternative versions to reduce confounding practice effects. The Cogstate system was initially designed to be administered by an examiner, but there have been recent efforts for remote administration; additionally, once logged into the platform, the tasks are easy to progress through independently. Recently, the CBB has been made available for unsupervised testing using a web browser. A recent report from the Healthy Brain Project in Australia showed high acceptability and usability for this unsupervised cognitive testing in a non‐clinical sample. They observed low rates of missing data and the psychometric characteristics of the CBB were similar to those collected from supervised testing.

A more recent iteration of Cogstate tasks is the C3 (Computerized Cognitive Composite) which includes the CBB in addition to two measures potentially sensitive to changes in early AD based on evidence from the cognitive neuroscience literature: the Behavioral Pattern Separation–Object Version (BPS‐O) and The Face‐Name Associative Memory Test (FNAME). Behavioral versions of the FNAME and a modified version of the BPS‐O were selected for inclusion in the C3 as they have been shown sensitive to activity in the medial temporal lobes in individuals at risk for AD based on biomarkers.

In a large sample of older adults (n = 4486), C3 performance was shown to be moderately correlated with cognitive performance on a composite of paper‐and‐pencil measures (PACC). 7 A smaller study similarly showed this correlation between the C3 and paper‐and‐pencil measures. It also showed that the Cogstate C3 battery's memory tasks were best at identifying individuals’ subtle cognitive impairment, as defined by PACC performance. Combined, these findings suggest that these computerized tasks are valid measures of cognitive function and may be used for further study of cognitive decline in preclinical AD.

The Cogstate test batteries are used in several ongoing studies and clinical trials, for example, the Wisconsin Registry for Alzheimer's Prevention (WRAP), Alzheimer's Disease Neuroimaging Initiative 3 (ADNI3), Cognitive Health in Ageing Register: Cohort Study, and the Dominantly Inherited Alzheimer Network‐Trials Unit (DIAN‐TU). The C3 is currently being used in the Anti‐Amyloid Treatment in Asymptomatic Alzheimer's Disease (A4) study and the Study to Protect Brain Health Through Lifestyle Intervention to Reduce Risk.

Preclinical AD biomarker validation

Screening data from the A4 study showed that, among a large sample of CN elderly, elevated Aβ as assessed with [18F]florbetapir‐PET was associated with slightly worse C3 performance. Other observational studies have not shown associations between CBB performance and Aβ status in preclinical AD cross‐sectionally, but some studies have demonstrated that Aβ+ individuals decline on CBB over time. For example, in the Australian Imaging, Biomarkers and Lifestyle (AIBL) study, decline in episodic and working memory over 36 months was associated with higher baseline Aβ burden in CN participants. Researchers from the Mayo Clinic Study on Aging used similar methods in a population‐based sample, and in contrast, they did not find any significant association between Aβ and CBB decline. In another study from AIBL, performance on a continuous paired associative learning task (CPAL) within the Cogstate battery was explored for Aβ+ and Aβ– CN. Over 36 months, Aβ– task performance improved over time, whereas Aβ+ showed no practice effect. In CN, the absence of benefit from repeated exposure over time was associated with a higher Aβ burden.

3.1.2. The computerized National Institutes of Health Toolbox Cognition Battery (NIH‐TB)

The National Institutes of Health (NIH) Toolbox Cognitive Battery (TB‐CB) was designed as an easily accessible and low‐cost means to provide researchers with standard and brief cognitive measures for various settings. Development of the NIH TB‐CB was a large‐scale effort across government funding, scientists (250+), and institutions (80+). It consists of seven established neuropsychological tests, selected and adapted to a digital platform by an expert panel. The NIH TB‐CB tests assess a range of cognitive domains (attention and executive functions, language, processing speed, working memory, and episodic memory). It was released in 2012 for PC, and a tablet version is now also available that has been validated against standard neuropsychological measures, as well as against established cognitive composites for use in preclinical AD. To ensure valid results, an examiner is still required to administer the app; however, some tests have recently been implemented for remote administration via screen sharing in a web browser.

The NIH TB‐CB is currently implemented in several clinical trials and longitudinal studies in aging and early AD, for example, in the Risk Reduction for Alzheimer's Disease study, the Comparative Effectiveness Dementia & Alzheimer's Registry, and the ongoing project Advancing Reliable Measurement in Alzheimer's Disease and Cognitive Aging (ARMADA). The latter study, ARMADA, is an NIH‐funded large multi‐site project in the United States, aiming to validate the NIH Toolbox in several demographically diverse CN and clinical cohorts, including earlier underrepresented demographical groups. ARMADA's additional goals are to further facilitate the use of NIH TB‐CB in aging research through the formation of a consortium with the National Alzheimer's Coordinating Center and in collaboration with researchers from other existing cohorts.

Preclinical AD biomarker validation

There are a handful of studies examining NIH TB‐CB in aging and dementia populations; however, there is currently a limited number of published studies on NIH TB‐CB and preclinical AD biomarkers. A recent study in 118 CN older adults did not find an association between AD neuroimaging markers of Aβ and any of the NIH TB‐CB cognitive tasks. However, they did find a weak association between measures of processing speed and executive functions and higher Braak regions of tau pathology.

3.1.3. The Cambridge Neuropsychological Test Automated Battery (CANTAB)

The Cambridge Neuropsychological Test Automated Battery (CANTAB) is intended as a language‐independent and culturally neutral cognitive assessment tool initially developed by the University of Cambridge in the 1980s, but now commercially provided by the company Cambridge Cognition. CANTAB has been used in a wide range of clinical settings and clinical trials, including aging studies. CANTAB mostly uses non‐verbal stimuli, and it includes measures of working memory, planning, attention, and visual episodic memory. Administration of CANTAB was initially on PC but is now available through CANTAB mobile (tablet‐based). Additionally, CANTAB offers an online platform for recruitment by pre‐screening patients using their cognitive assessment instruments.

Preclinical AD biomarker validation

In the Dallas Lifespan Brain Study, CN individuals underwent Aβ PET with [18F]florbetapir) and the CANTAB Verbal Recognition Test, including measures of memory recall and recognition. In this test, the participants are shown a sequence of words on a touchscreen. Subsequently, the participant is asked to recall the words, and the task ends with a recognition task. The researchers found that in relatively younger adults (30 to 55), higher Aβ was moderately associated with diminished memory recall and recognition, whereas the effect weakened as people aged and amyloid levels increased.

3.2. Remotely administered tablet‐ and smartphone‐based cognitive assessment

Demographic survey trends in the United States from 2019 indicated that 77% of Americans aged 50+ own smartphones with that number climbing annually. Similar numbers are being reported from European countries. Simultaneously, there has been an increase in smartphone‐based apps designed for cognitive assessment in older populations. The appeal and implication of smartphone‐based cognitive assessment for detection and tracking in preclinical AD are obvious. It is highly scalable, allowing for remote assessment in a much larger population compared to samples acquired through in‐clinic and supervised assessment. It allows for more frequent assessment with potentially more sensitive cognitive paradigms. With mobile technology, cognitive assessment can be performed in a familiar environment and may thus increase the ecological validity (i.e., the generalizability to real‐life setting) of the task. Having a participant complete tasks on their own phone (as opposed to a study‐issued device) may be more reflective of their cognition in everyday life. Improved ecological validity of smartphone‐based assessment is timely, as researchers and regulators emphasize the importance of demonstrating the clinical meaningfulness of cognitive change in a preclinical AD population. Furthermore, the participant being in a familiar environment during cognitive assessments may reduce the risk of the "white‐coat effect” (participants underperforming on tasks in a medical environment). Remote and mobile tracking of cognitive functioning provide an extra opportunity for an individual to track their own cognitive health over time, potentially leading to increased commitment to their well‐being. Finally, for those willing to participate in demanding clinical trials, reducing in‐clinic visits through remote testing may mitigate the overall participant burden and encourage those in more remote areas to participate.

However, despite the potential of smartphone‐based assessment, multiple issues remain, including challenges related to (1) feasibility (e.g., older adults’ openness to completing smartphone assessments, compliance, attrition, privacy issues), (2) validity (e.g., ensuring alignment between smartphone‐based vs. gold standard cognitive assessment data, guaranteeing the identity of the examinee), and (3) reliability (e.g., variability between hardware and operating systems, diminished control over the test‐taking environment).

Given the recent rapid expansion of interest in this area, we focus on observed themes for smartphone‐based instruments that are in early (but varying) stages of development. Identified themes include (1) improving reliability of assessment through ambulatory/momentary testing, (2) using mobile and serial assessment to identify subtle decrements in learning and practice effects, (3) targeting cognitive processes more specific to decline in preclinical AD, (4) and harnessing the potential of big‐data collection. Validity data in relation to in‐clinic cognitive assessment and AD biomarkers is discussed where available. See Figure 2 for selected examples of smartphone‐based assessment applications. Table 2 displays the validation of remotely administered tablet‐ and smartphone‐based cognitive assessments.

FIGURE 2.

Fig 2

A, Ambulatory Research in Cognition (ARC) Symbols Test, Grids Test, and Prices Test. Used with permission from J. Hassenstab. B, neotiv Objects‐in‐Rooms Recall test. Used with permission from neotiv GmbH. C, Boston Remote Assessment for Neurocognitive Health (BRANCH). Used with permission from K. V. Papp

TABLE 2. Validation of remotely administered tablet‐ and smartphone‐based cognitive assessment and other novel types of cognitive assessment

Table 2

Abbrevations: Aβ, amyloid beta; AUC, area under the curve; β, beta interaction effect; CN, clinically normal; CSF, cerebrospinal fluid; d, Cohen's d; ORCA‐LLT, Online Repeatable Cognitive Assessment‐Language Learning Task; PiB, Pittsburgh Compound B PET; ρ r, Spearman's rank correlation; p r, Pearsons correlation coefficient; VPC, Visual Paired Association.

Note: Only published articles are included in this table.

3.2.1. Feasibility of using mobile devices to capture cognitive function

While retention in longitudinal study designs is especially challenging for studies using remotely administered testing, adherence in short studies is promising. In a recent study, 1594 CN subjects (age = 40 to 65) completed a testing session using a web‐based version of four playing card tasks within the Cogstate battery. High adherence to instructions and low rates of missing data were observed (1.9%), indicating high acceptability. Error rates were consistently low across tests and did not vary due to the self‐reported environment (e.g., with others present or in a public space). Another recent study investigated adherence during 36 days using a smartphone‐based app. Thirty‐five CN participants (age = 40 to 59) completed very short daily cognitive tasks, where 80% completed all tasks, with 88% of the participants still active at the end of the study. More problematic, a recent report from eight digital health studies (providing study‐app usage data from > 100,000 participants) in the United States describes substantial participant attrition (e.g., participants losing engagement over time), confounding the generalizability of data obtained. Monetary compensation improved retention, and boding well for preclinical AD studies, older age was associated with longer study participation duration. However, participants involved in trials that included in‐clinic visits had the highest compliance, suggesting that attrition in fully remote longitudinal studies remains a significant challenge.

3.2.2. Improving reliability: ambulatory/momentary cognitive assessment

The premise behind ambulatory/momentary cognitive assessment is that single‐timepoint assessments fail to capture the endemic variability in human cognitive performance impacted by a host of factors, including mood, stress, or time of day. Capturing the most representative sample of an individual's cognition at a given interval is one promising approach to improving the sensitivity of measurement by reducing variability and increasing reliability. Using a “burst” design, a more reliable composite measure of cognitive performance is derived by averaging performance over multiple assessment timepoints administered in short succession (e.g., four assessments per day for 7 days).

Sliwinski et al. developed the brief smartphone‐based app Mobile Monitoring of Cognitive Change (M2C2) aimed at capturing cognition more frequently in an uncontrolled and naturalistic setting. In a younger (age 25 to 65) but highly diverse (9% White) sample, they showed that brief smartphone‐based cognitive assessments of perceptual speed and working memory in an uncontrolled environment were correlated with in‐clinic cognitive performance. The proportion of total variance in performance attributable to differences between people (accounting for within‐person variance across each test session and number of test sessions) was high, illustrating the excellent level of reliability achieved using a burst design.

Similarly, Hassenstab designed the Ambulatory Research in Cognition app (ARC) for use in the DIAN study. In contrast with previous studies that have relied on study‐provided devices, participants download the app onto their own devices and indicate the days and times they are available to be tested. Participants subsequently receive notifications to take ARC, which lasts a few minutes, 4 times per day for 1 week. ARC evaluates working spatial memory (Grids Test), processing speed (Symbols Test), and associative memory (Prices Test). Preliminary results suggest that ARC is reliable, correlated with in‐clinic cognitive measures and AD biomarkers, and well‐liked by participants. Further work is required to determine whether ambulatory cognitive data are (1) more strongly related to AD‐biomarker burden in CN older adults compared to conventional in‐clinic assessments and (2) whether these data represent a more reliable measure of cognitive and clinical progression compared to conventional in‐clinic assessments.

3.2.3. Using mobile and serial assessment to identify subtle decrements in learning and practice effects

A diminished practice effect, that is, a lack of the characteristic improved performance on retesting, has been suggested as a subtle indicator of cognitive change prior to overt decline. Mobile technology allows for much more frequent serial assessment. For example, a recent study provided iPads to 94 participants to take home and to complete a monthly challenging associative memory task requiring memory for face‐name pairs (FNAME) for 1 year. They found an association between diminished learning and greater amyloid and tau PET burden among CN, with the Aβ ± group differences in memory performance emerging by the fourth exposure.

Work using a web‐based version of FNAME and other memory tasks, called the Boston Remote Assessment for Neurocognitive Health (BRANCH), was designed to move learning paradigms from study‐provided tablets to smartphones and to reduce the time interval for serial assessment (e.g., from months to days). These tasks focus on cognitive processes supported by the medial temporal lobes and thus best able to characterize AD‐related memory changes. BRANCH primarily consists of measures of associative memory, pattern separation, and semantically facilitated learning and recall. BRANCH also uses paradigms and stimuli relevant to everyday cognitive tasks.

In a similar vein, the Online Repeatable Cognitive Assessment‐Language Learning Test (ORCA‐LLT) developed by Dr. Lim at Monash University in Australia, asks participants to learn the English word equivalents of 50 Chinese characters for 25 minutes daily over 6 days. The task is web‐based and completed on a participant's own device in their home. They found that learning curves were diminished in 38 Aβ+ versus 42 Aβ– CN older adults, and the magnitude of this difference was very large.

The assessment of learning curves over short time intervals using smartphones may serve as a cost‐effective screening tool to enrich samples for AD biomarker positivity prior to expensive assays. For example, a clinical study found that lower practice effects over 1 week were associated with a nearly 14 times higher odds of being Aβ+ on a composite measure using [18F]flutemetamol. Future work with larger CN samples and further optimized learning paradigms may show similar discriminability properties of learning curves to AD biomarker positivity in a preclinical sample. Capturing learning curves over short intervals using remote smartphone‐based assessment may provide a more rapid means of assessing whether a novel treatment has beneficial effects on cognition. This could assist in more rapidly discontinuing futile treatment trials or, more importantly, trials with deleterious effects on cognition. However, how the repeated measures of short‐term learning curves can be used to track cognitive progression remains unexplored. Methods to establish this relationship are in development but will require validation studies to overcome logistical and technical challenges.

3.2.4. Targeting relevant cognitive functions

While there is significant heterogeneity in the nature and progression of cognitive decline within AD, the availability of AD biomarkers and adoption of findings from the cognitive neuroscience literature have allowed researchers to hone in on cognitive processes potentially more sensitive and specific to AD. For example, researchers from the Otto‐von‐Guericke University in Magdeburg and the German Center for Neurodegenerative Diseases (DZNE) have been involved in the development of a digital platform including a mobile app for smartphones and tablets (neotiv), which consists of memory tests focused on object and scene mnemonic discrimination, pattern completion, face‐name association, and complex scene recognition. The object and scene mnemonic discrimination paradigm was designed to capture memory function associated with an object‐based (anterior‐temporal) and a spatial (posterior‐medial) memory system. While functional magnetic resonance imaging (fMRI) studies have shown that both memory systems were activated in individuals performing the task, age‐related and performance‐dependent changes in functional activity have been observed in the anterior temporal lobe in older adults. Furthermore, two studies in biomarker‐characterized individuals revealed that object mnemonic discrimination performance was associated with measures of tau pathology (i.e., anterior temporal tau‐PET binding and CSF p‐tau levels), while there was evidence for an association of performance in the scene mnemonic discrimination task with Aβ‐PET signal in posterior‐medial brain regions. The complex scene recognition task has been shown to rely on a wider episodic memory network and task performance was associated with CSF total‐tau levels. All of the above tests have been implemented in a digital platform for unsupervised testing using smartphones and tablets. Recently, relationships of these tests and biomarkers for tau pathology as well as strong relationships with in‐clinic neuropsychological assessments have been demonstrated (i.e., Alzheimer's Disease Assessment Scale‐Cognitive subscale delayed word recall, PACC).

Participants download the app onto their own mobile devices and undergo a short introduction and training session as well as a short vision screening. Participants subsequently receive notifications to complete tests according to a predefined study schedule to acquire longitudinal trajectories with high frequency. To minimize practice effects, stimulus material has been piloted in large scale web‐based behavioral assessments, and parallel test sets with matched task difficulty have been created. The neotiv platform is currently included in several AD cohort studies (e.g., DELCODE, BioFINDER‐2, and WRAP).

3.2.5. Harnessing the potential of “big data”: citizen science projects

Citizen science is a concept in which the general public is involved in collaborative projects, for example, by collecting their own data for use in research, and can be a way to gather large amounts of data on individuals at risk of developing AD. One such citizen science project is The Many Brains Project, with their research platform TestMyBrain.org. The Many Brains Project has yielded some of the largest samples in cognition research, with more than 2.5 million people tested since 2008; however, the study is not specific to AD. Another study, the Models of Patient Engagement for Alzheimer's Disease study, is an EU‐funded international initiative aiming to identify individuals with early AD hidden in their communities and traditionally not found in memory clinic settings. Through web‐based cognitive screening instruments, individuals from the public with a heightened risk of AD are identified and invited to a memory clinic to undergo a full diagnostic evaluation.

At the Oxford University's Big Data Institute, researchers have developed the smartphone app Mezurio, included in several European studies (e.g., PREVENT, GameChanger, Remote Assessment of Disease and Relapse in Alzheimer's Disease study, BioFINDER‐2). One of them, GameChanger, is a citizen science project with more than 16,000 participants from the general UK population completing remote, frequent cognitive assessments with the Mezurio app. Through this project, healthy volunteers can perform tests on their smartphones, thus providing population norms for different age groups and demographic groups. Mezurio is installed on a smartphone and has several game‐like tests examining episodic memory (Gallery Game and Story Time), connected language (Story Time), and executive function (Tilt Task), including multiple recall tasks, and longer delays of up to several days. In a recent study investigating the feasibility of Mezurio in middle‐aged participants, the participants demonstrated high compliance indicating that this app may be suitable for longitudinal follow‐up of cognition.

In Germany, Dr. Berron et al. from the German Center for Neurodegenerative Diseases (DZNE) developed a Germany‐wide citizen science project (“Exploring memory together”) focused on the feasibility of unsupervised digital assessments in the general population. Besides demographic factors that affect task performance, there are several factors in everyday life (e.g., taking the test in the evening compared to during the day) that could contribute to performance on remote unsupervised cognitive assessments. Preliminary results from more than 1700 participants (ages 18 to 89) identified important factors that need to be considered in future remote studies, including time of day, the time between learning and retrieval, and (for one task) screen size of the mobile device. They concluded that investigating memory function using remote and unsupervised assessments is feasible in an adult population.

3.3. Novel data collection systems and analysis procedures

Other promising assessment instruments under evaluation in different studies on preclinical AD are the analyses of spoken language, eye movements, spatial navigation performance, and digital pen stroke data. Some of these tasks require stand‐alone equipment (e.g., eye‐tracker, digital pen), while others can use existing platforms or devices (e.g., device‐embedded cameras, such as those in a personal laptop or the front‐facing camera on a smartphone). Figure 3 exemplifies some of these assessment instruments. Table 2 displays the validation of novel types of assessment instruments. Some instruments, such as commercial‐grade eye‐tracking cameras, or digital pens, are still not widely accessible, hampering their implementation. Predominantly passive monitoring of cognition, such as speech recording and eye movement‐tracking, may prove less stressful and time‐consuming than conventional cognitive tests. Participants complete a task of objective cognition (e.g., drawing a clock, or describing a picture) while subtle aspects of their performance are being recorded (e.g., pen strokes, eye movements, language). The result is a large quantity of data about performance, which must then be reduced or synthesized to glean relevant performance features. Using machine learning (ML) or deep learning, researchers have started to investigate whether automated analyses and classification of test performance according to specific criteria (e.g., biomarker or clinical status) can aid in sensitive screening in preclinical AD. In a clinical context, ML can be used as a clinical decision support system, building prediction models that achieve high accuracy in clinical diagnosis and selecting patients for clinical trials at the early stages of dementia development.

FIGURE 3.

Fig 3

A, Sea Hero Quest Wayfinding and Path integration. Used with permission from M. Hornberger. B, Digital Maze Test from survey perspective and landmarks from a first‐person perspective. Used with permission from D. Head. C, Data and analysis process for digital Clock Drawing Test (dCDT), from data collection, the artificial intelligence (AI) analysis steps, and the machine learning (ML) analysis and reporting. Used with permission from Digital Cognition Technologies

3.3.1. Spoken language analysis and automated language processing

New technical developments have provided further insight into language deficits in preclinical AD, and several newly developed analysis instruments are now available. For these instruments, speech production is usually recorded during spontaneous speech; verbal fluency tasks; or describing a picture, typically using the Cookie Theft picture from the Boston Diagnostic Aphasia Examination. Speech is typically recorded using either a stand‐alone audio recorder or embedded microphone, and after that, transcribed and analyzed using computer software. For example, using a picture task, researchers can analyze speech, such as verbs and nouns in spontaneous speech; the complexity of sentences, as represented by grammatical complexity and verb usage; diversity of words; and the flow of speaking, such as speech repetitions and pauses.

Transcription of speech to text, a vital part of speech analysis, is by definition time consuming, but there has been an increasing effort to apply machine‐ and deep‐learning technology to detect cognitive impairment in AD. For example, researchers from the European research project Dem@care and the EIT Digital project ELEMENT demonstrated that automated analysis of verbal fluency could distinguish between healthy aging and clinical AD, as well as vocal analysis using smartphone apps for automatic differentiation between different clinical AD groups. In the latter, speech was recorded while performing short verbal cognitive tasks, including verbal fluency, picture description, counting down, and a free speech task. Thereafter, it was used to train automatic classifiers for detecting MCI and AD, based on ML methods. The question is still whether they can identify early cognitive impairment and decline in preclinical AD.

Intensive development of these methods is ongoing at several research sites. For example, researchers are exploring subtle speech‐related cognitive decline in early AD through the European Deep Speech Analysis (DeepSpA) project. The DeepSpA project uses telecommunication‐based assessment instruments for early screening and monitoring in clinical trials, using remote semiautomated analysis methods. In the United States, researchers in the Framingham Health Study are recording and analyzing speech obtained during neuropsychological assessments (from 8800+ neuropsychological examinations from 5376+ participants). Similarly, in Sweden, researchers are recording speech (1000+ participants) during neuropsychological assessments in the Gothenburg H70 Birth Cohort Studies. The studies mentioned above are in cooperation with the company ki elements, a spin‐off of the German Research Center for AI. Another company, Canadian Winterlight Labs, has developed a tablet‐based app to identify cognitive impairment using spoken language. The app is currently being evaluated in the Winterlight's Healthy Aging Study, an ongoing longitudinal normative study.

Preclinical AD biomarker validation

While most researchers have investigated MCI patients, researchers from the Netherlands‐based Subjective Cognitive Impairment Cohort recorded spontaneous speech in CN individuals performing three open‐ended tasks (e.g., describing an abstract painting). After manual transcription, the researchers extracted linguistic parameters using a fully automated freely available software (T‐Scan). Using conventional neuropsychological tests, the participants performed within normal range, regardless of Aβ status (either using CSF Aβ1‐42 or [18F]florbetapir‐PET). Interestingly, a modest correlation was seen between abnormal Aβ and subtle speech changes (fewer specific words).

3.3.2. Eye‐tracking

In AD research, commercial‐grade eye‐tracking cameras have been shown to detect abnormal eye movements in clinical groups. These are high frame rate cameras managing to collect a wealth of data on eye movement behavior, including saccades (simultaneous eye movements), and fixation (eyes focusing on areas). Eye movements can be recorded within specific tasks, for example, reading a text or performing a memory test. For example, Peltsch et al. measured the ability to inhibit unwanted eye movements within a task using visual stimuli. This data can, in turn, be analyzed automatically using commercial software or inspected manually by researchers. However, eye‐tracking devices are so far expensive and are not widely available in clinical settings. A solution for this may be the use of device‐embedded cameras (e.g., in a laptop or tablet) to capture eye‐movement during tasks, such as performing memory tests. Bott et al. from the company Neurotrack Technologies showed that device‐embedded cameras (i.e., in a PC), which are low cost and have high scalability, are feasible to capture valid eye‐movement data of sufficient quality. In this study, eye movements were recorded in CN participants during a visual recognition memory task. They observed a modest association between eye movements and cognitive performance on a paper‐and‐pencil composite. Interestingly, both device‐embedded cameras and commercial‐grade eye‐tracking cameras showed robust data of sufficient quality. This suggests that device‐embedded eye‐tracking methods may be useful for further study of AD‐related cognitive decline in CN. Besides accuracy of performance, eye trackers yield data on additional eye movement behaviors, opening opportunities for new types of potentially meaningful outcomes.

3.3.3. Digital pen

Digital pens look like regular pens but have an embedded camera and sensors that can capture position and pen stroke data with high spatial and temporal resolution. Outcomes include time in air and surface, velocity, and pressure. This results in the collection of hundreds or thousands of datapoints and variables in contrast with traditional paper‐and‐pencil measures wherein point estimates of reaction time and accuracy are the primary outcomes. Big‐data techniques such as ML can then be applied to these datasets to extract relevant signal. For example, Digital Cognition Technologies captured data from thousands of individuals completing the standard clock drawing test. They subsequently developed the digital Clock Drawing Test (dCDT), which features an extensive scoring system based on ML techniques that describes performance outcomes related to information processing, simple motor functioning, and reasoning (among many others). This approach allows researchers to capture an individual's inefficiencies in completing a cognitive task despite overall intact performance, which has the potential to collect and analyze much more subtle aspects of behavior systematically.

Preclinical AD biomarker validation

Preliminary results from a study of older adults found that worse performance on dCDT, particularly on a visuospatial reasoning subscore, was associated with greater Aβ burden on PET and exhibited better discrimination between those with high versus low Aβ in contrast with standard neuropsychological tests included in a multi‐domain cognitive composite.

3.3.4. Virtual reality and spatial navigation

In virtual reality (VR)‐based tests, participants perform tasks of varying complexity in computer‐generated environments. These tasks are traditionally presented on computer screens (e.g., laptops or tablets) with which participants interact using a joystick, keyboard, touch screen, or VR head‐mounted display. The Four Mountains Test is an example of a VR‐based test, available for use on iPad. It measures spatial function by alternating viewpoints and textures of the four mountains’ typographical layout within a computer‐generated landscape. The clinical usefulness of the test has been demonstrated in clinical studies; however, its relation to preclinical AD biomarkers is still unknown. Another example is a VR path integration task, developed by researchers from Cambridge University. In this task, participants are asked to explore virtual open arena environments. Using a professional‐grade VR headset, the participants are then asked to walk back to specific locations. In a clinical study, it was superior to other cognitive assessments in differentiating MCI from CN, and was correlated to CSF biomarkers (total tau and Aβ). This task is currently being evaluated in a preclinical population with biomarker data.

The launch of the online mobile game Sea Hero Quest yielded great interest, and as of today, > 4.3 million people have played it. Deutsche Telekom collaborated with scientists from University College London and University of East Anglia to create this mobile game. The idea is to gather data to create population norms from several countries, enabling the development of easily administered spatial navigation tasks to detect AD. Preliminary results suggest that Sea Hero Quest is comparable to real‐world navigation experiments, suggesting that it does not only capture mere video gaming skills.

Preclinical AD biomarker validation

In a recent study, performance on the Sea Hero Quest mobile game was found to discriminate healthy aging from genetically at‐risk individuals of AD. They used Sea Hero Quest performance in a smaller apolipoprotein E (APOE) genotyped cohort. Despite having no clinically detectable cognitive deficits, individuals genetically at risk performed worse on spatial navigation. Wayfinding performance was able to discriminate between APOE carriers and non‐carriers.

In another study, participants from the Knight Alzheimer's Disease Research Center underwent a virtual maze task measuring spatial navigation. This virtual maze was created using commercial software and is presented on a laptop, participants maneuvering through a joystick. The mazes consist of a series of interconnected hallways with several landmarks. Their findings indicated that Aβ+ positivity (CSF Aβ42+) was associated with lower wayfinding performance. For inclusion in future studies, the spatial navigation task has been made available for remote use through a web‐based interface.

4. DISCUSSION

In this systematic review, several digital assessments were identified on multiple delivery platforms (e.g., tablets, smartphones, and external hardware), intended to measure cognition or behaviors in preclinical AD (see Figure 4 for an overview of assessments and their platforms). These assessment instruments varied by intended setting (e.g., remote vs. in‐clinic), level of supervision (e.g., self vs. supervised), and device origin (personal vs. study‐provided). Studies validating assessment instruments for more established platforms (e.g., PC, tablet) are more common than those developed for more novel platforms (e.g., smartphone). However, many of these newly developed tests are currently being evaluated in several biomarker studies.

FIGURE 4.

Fig 4

Overview of cognitive tests and their platforms. BRANCH, Boston Remote Assessment for Neurocognitive Health; ORCA‐LLT, Online Repeatable Cognitive Assessment‐Language Learning Test; NIH‐TB, National Institutes of Health Toolbox; CANTAB, Cambridge Neuropsychological Test Automated Battery; ARC, Ambulatory Research in Cognition; M2C2, Monitoring of Cognitive Change; dCDT, digital Clock Drawing Test. *Is available for use through a web browser

A critical part of early detection of preclinical AD in CN is the ability of cognitive tests to identify evidence of subtle cognitive impairment or decline over time. Primarily in‐clinic administered tests have demonstrated a cross‐sectional relationship to preclinical AD biomarkers, in parity with traditional neuropsychological assessment, with weak to moderate effect sizes. Longitudinal studies using conventional paper‐and‐pencil assessments are mixed, but most demonstrate subtle declines in preclinical phases of AD. Remotely administered tests, on the other hand, have been less explored, but work from several preclinical biomarker studies is underway. A handful of validation studies in the literature and preliminary results of a smartphone‐based memory test show a relationship to tau pathology. In a small but promising study using a remotely administered web‐based assessment of learning, learning curves in Aβ+ were significantly slower than those in Aβ–, warranting further study of this approach. Other novel assessment instruments, including speech analysis, eye‐tracking, and VR have demonstrated potential usefulness for further study of relevant preclinical AD biomarkers. Interestingly, preliminary results using a digital clock drawing test has shown high sensitivity to changes among CN individuals with positive AD biomarkers.

Future longitudinal studies should include longitudinal biomarker data, also exploring the validity for changing biomarkers over time. Future studies also need to investigate the ability to detect clinical progression, that is, from preclinical AD to MCI and dementia, warranting extensive longitudinal studies of CN individuals.

4.2. Validation with established cognitive composites

In addition to validation using preclinical AD biomarkers, an alternative means of validation is to compare digital assessments against conventional cognitive measures used in large‐scale studies. This type of validation can supplement biomarker studies or provide important preliminary data before using more costly biomarker studies. A handful of assessment instruments, including a tablet‐based test, eye‐tracking assessment, and a smartphone app, have been validated against relevant global cognitive composites, indicating some validity for further study of biomarkers in preclinical AD. However, as the correlation between conventional composites and preclinical AD is already weak, pen‐and‐paper validation alone is not enough to be able to claim that a test is suitable as a measure of subtle cognitive changes in preclinical AD.

4.3. Potential of digital cognitive assessment instruments in different settings

The contexts in which new technology is being used impose different requirements on a test's capabilities. The requirements are, for instance, higher if the test results are outcomes in a clinical study than if used for participant selection for inclusion into studies. Tablet‐based tests, similar to traditional cognitive test batteries, have already been implemented in clinical trials. They are primarily intended to be administered with the help of a trained rater. Unsupervised and remotely administered tests have not yet shown sufficient robustness to be used in this context, and there remain concerns regarding reliability, adherence, privacy, and user identification.

The different digital assessment instruments discussed in this review enable different usage. Supervised digital assessment instruments could provide robust outcomes in clinical trials, with benefits such as automatic recording of response and scoring, making it easier to follow study protocols, reducing the risk of error, and increasing inter‐rater reliability. Remotely administered tests could serve as a cost‐effective pre‐screening before more expensive and invasive examinations, such as lumbar puncture and brain imaging, are recommended. In clinical trials, mobile devices could be used to identify individuals at greatest risk of cognitive decline, who are most likely to benefit from a specific intervention. Close follow‐up of people's cognitive function from their home environment may also enable high‐quality evaluation of interventions.

4.4. Importance of data security, privacy, and adherence

As an effect of the increasingly digitized cognitive testing, the issue of data security and privacy issues have been raised by regulatory authorities. Pharmaceutical companies have also emphasized the importance of these issues. One such consideration is the data storage and transmission between servers, crucial when data are to be stored and processed on servers that are not under the direct control of the study. When commercial companies are involved, questions can arise regarding ownership of data and conflicts of interest.

Data protection in the United States is governed by several laws enacted on federal and state levels (e.g., Patient Safety and Quality Improvement Act and Health Information Technology for Economic and Clinical Health Act). In the European Union (EU), the General Data Protection Regulation (GDPR) governs data storage and processing in EU countries. It affects scientific cooperation between countries inside and outside the EU. Technological development places increased demands on developers and researchers to familiarize themselves with regulatory issues, especially now that new types of personal data are gathered to a greater extent and across country borders.

Finally, an important and necessary focus is to ensure that data captured remotely, in an uncontrolled environment, is reliable and an accurate reflection of an individual's cognitive functioning. Here also the importance of adherence comes into play and, although there is increasing evidence that unsupervised testing can be done, large longitudinal health studies also indicate that there are significant problems with participant attrition. Work remains to ensure valid and reliable results for participants performing unsupervised testing in large clinical trials.

5. CONCLUSION

This review highlights the wealth of digital assessment instruments currently being evaluated in preclinical populations. Digital technology can be used to assess the subtle cognitive decline that defines biomarker‐confirmed preclinical AD. Potential benefits include increased sensitivity and reliability, and it could add value to individuals through increased accessibility, engagement, and reduced participant burden. Digital assessments may have clinical trial implications by optimized screening, facilitating case finding, and providing more sensitive clinical outcomes. Several promising tests are currently in development and are undergoing validation, but work remains before many of these can be considered alongside conventional in‐clinic cognitive assessments. We have begun to understand the reliability and validity of cognitive assessments obtained in naturalistic environments, which is required before beginning cognitive testing outside research centers on a large scale. Last, more feasibility studies investigating potential barriers for implementation are needed, including the challenges of adherence, privacy, and data security.

Open Article as PDF

Abstract

There is a pressing need to capture and track subtle cognitive change at the preclinical stage of Alzheimer's disease (AD) rapidly, cost-effectively, and with high sensitivity. Concurrently, the landscape of digital cognitive assessment is rapidly evolving as technology advances, older adult tech-adoption increases, and external events (i.e., COVID-19) necessitate remote digital assessment. Here, we provide a snapshot review of the current state of digital cognitive assessment for preclinical AD including different device platforms/assessment approaches, levels of validation, and implementation challenges. We focus on articles, grants, and recent conference proceedings specifically querying the relationship between digital cognitive assessments and established biomarkers for preclinical AD (e.g., amyloid beta and tau) in clinically normal (CN) individuals. Several digital assessments were identified across platforms (e.g., digital pens, smartphones). Digital assessments varied by intended setting (e.g., remote vs. in-clinic), level of supervision (e.g., self vs. supervised), and device origin (personal vs. study-provided). At least 11 publications characterize digital cognitive assessment against AD biomarkers among CN. First available data demonstrate promising validity of this approach against both conventional assessment methods (moderate to large effect sizes) and relevant biomarkers (predominantly weak to moderate effect sizes). We discuss levels of validation and issues relating to usability, data quality, data protection, and attrition. While still in its infancy, digital cognitive assessment, especially when administered remotely, will undoubtedly play a major future role in screening for and tracking preclinical AD.

INTRODUCTION

Alzheimer's disease (AD) involves specific brain changes that begin years or even decades before a person shows signs of dementia. These changes include the buildup of amyloid beta protein (Aβ) into plaques, often followed by the clumping of tau protein (p-tau) into neurofibrillary tangles. During this early, preclinical phase, thinking abilities remain mostly unaffected. However, as the burden of these brain changes increases over time, subtle declines in cognitive function start to emerge. This preclinical period offers a critical opportunity for interventions aimed at preventing further decline, making it essential to identify these subtle cognitive changes.

1.1. Associations between paper‐and‐pencil cognitive measures and AD biomarkers in preclinical AD

For individuals who are clinically normal, abnormal levels of Aβ are considered an early sign of AD pathology. These levels can be measured in cerebrospinal fluid (CSF), such as Aβ42 or the Aβ42/40 ratio, or through positron emission tomography (PET) neuroimaging. In this preclinical stage, the immediate relationship between Aβ levels and cognitive deficits is generally weak or not significant. However, clinically normal individuals with higher Aβ burden show a faster rate of cognitive decline and often progress to a clinical stage more quickly than those with lower Aβ levels. This cognitive decline is subtle and typically becomes detectable only after several years. For instance, one study found that Aβ-positive, clinically normal participants declined at an average rate of −0.42 z-score units over 18 months, while another study reported a decline of −0.07 to −0.15 z-score units per year. The strongest link between AD biomarkers and cognitive decline is consistently observed in memory function, though declines have also been reported in other cognitive areas, including executive function and visuospatial skills.

The second major pathological process in preclinical AD is the accumulation of tau protein into neurofibrillary tangles, which can also be measured in CSF (e.g., p-tau) and with PET neuroimaging. Tau is often considered more directly linked to cognitive impairment during the AD process than Aβ. In clinically normal individuals, a higher tau burden has been associated with memory problems and ongoing cognitive decline. Since tau PET imaging is a newer technique compared to Aβ PET, less is known about the long-term relationship between tau and cognition, including progression to clinical stages. Generally, individuals with higher tau levels face a greater risk of long-term cognitive decline. Importantly, however, this decline is several times faster in Aβ-positive, clinically normal individuals.

1.2. Paper‐and‐pencil versus digitized cognitive assessment

The relationship between traditional paper-and-pencil cognitive measures and AD biomarkers in older adults who are clinically normal is complex. Observed correlations are generally weak, especially when measured at a single point in time. Over longer periods, these relationships are more consistently found and are stronger, with clinically normal older adults showing elevated biomarker levels experiencing cognitive decline. The weak links between cognition and AD biomarkers may partly stem from the limitations of paper-and-pencil assessments. Most of these tests were designed to detect obvious impairment in clinical populations, rather than subtle changes in the preclinical phase. Additionally, normal fluctuations in cognitive performance, the effects of repeated practice, and individual differences in cognitive reserve can make it harder to detect subtle cognitive decline.

The use of digital technology for cognitive assessment offers the potential to overcome some of the limitations of current paper-and-pencil tests. For example, mobile devices allow for more frequent testing, which can result in more reliable and informative longitudinal data. They are also more accessible and cost-effective because individuals can administer the tests themselves. Computerized measures that automatically generate different versions of tests can help minimize practice effects and the impact of using different test forms. Artificial intelligence (AI) methods, such as deep learning, enable faster, new, and potentially more sensitive analysis of cognitive data.

However, digital assessments also present new challenges. Many studies using remote assessments struggle to keep participants engaged over time. Digital storage and sharing of cognitive data raise concerns about data privacy, especially when devices may collect additional identifiable personal information, such as voice recordings. Unsupervised digital assessments require systems to confirm that the person assigned to a remote assessment is indeed the one taking it. The rapid development of technologies and operating systems makes it challenging to select and maintain a single version of a digital assessment over time. Finally, while trends suggest that older adults are increasingly familiar and comfortable with new technology, a significant portion of the population may still be excluded from research using digital assessments due to a lack of familiarity, technical skills, or access.

Digital technology has not yet replaced paper-and-pencil assessments, particularly in clinical trials, because several questions remain unanswered: Do digital technologies capture cognitive information in a way that is comparable to gold-standard paper-and-pencil measures? Is there a fundamental difference between collecting data with a human rater versus a device? How reliable and practical is digital technology? These questions are only now beginning to be widely addressed as the use of digital technology, for instance in preclinical AD research, rapidly evolves.

1.3. Organization of results

This systematic review aims to identify the current digital cognitive tests used in preclinical AD and describe how well these tests have been validated. Validation is assessed in relation to: (1) gold-standard cognitive tests and test composites (paper-and-pencil measures), and (2) biomarkers of Aβ and tau pathology. Furthermore, the review critically discusses the potential benefits and drawbacks of digital cognitive assessments for use in clinical trials and offers a future outlook for digital cognitive assessment. The goal of this review is not to provide an exhaustive overview of all mobile technology or computer testing for general use in older populations, nor does it address the separate field of passive monitoring of cognition using sensors and wearables.

The review first discusses the current understanding of how cognitive performance on conventional paper-and-pencil measures relates to AD biomarkers. Subsequently, it examines digital assessments, categorizing them into three groups based on technology and setting: (1) primarily in-clinic computerized and tablet-based assessments, (2) primarily unsupervised, smartphone- or tablet-based assessments, and (3) novel data collection systems and analysis procedures (e.g., digital pens, eye-tracking, language analysis, and new data analysis methods like AI).

For each digital assessment discussed, validation is presented in terms of (1) biomarker validation and (2) paper-and-pencil validation. Paper-and-pencil validation involves comparing digital measures to conventional measures, such as relevant global cognitive composites (e.g., the Preclinical Alzheimer Cognitive Composite [PACC]) or domain-specific test composites.

METHODS

2.1. Search strategies

Between January and December 2020, electronic databases (PubMed, Scopus, PsycINFO) were searched for relevant publications using terms such as "digital," "mobile," "smartphone," "tablet," "Alzheimer's," "preclinical," and "amyloid." Two online registers (ClinicalTrials.gov and National Institutes of Health [NIH] research portfolio) were also searched for relevant trials and awarded grants. A second search was conducted using the names of digital tests and companies identified in the initial search. Additionally, two conferences, Clinical Trials on Alzheimer's Disease conference (CTAD) 2020 and Alzheimer's Association International Conference (AAIC) 2020, were searched for preliminary results.

2.2. Inclusion and exclusion criteria

Published articles, ongoing studies, and clinical trials using digital cognitive assessment were included if they involved individuals identified with preclinical AD. Preclinical AD was defined by biomarker evidence of Aβ plaque pathology (via cortical Aβ PET ligand binding or low CSF Aβ42) and/or neurofibrillary tangle (NFT) pathology (elevated CSF p-tau or cortical tau PET ligand binding). Following the National Institute on Aging and Alzheimer's Association (NIA-AA) Research Framework revised guidelines, preclinical AD corresponded to the earliest stages in the numeric clinical staging (stage 1–2). Studies that only included participants meeting criteria for clinical diagnoses, such as mild cognitive impairment (MCI) or dementia, were excluded.

2.3. Procedures

A total of 469 articles were screened using the web-app Rayyan. Of these, 458 were excluded for not meeting the inclusion criteria, and 11 were included in the review. Grant applications from the NIH research portfolio were screened, but no additional studies were included from this source. Since the initial literature search, two newly published articles were added. Preliminary results from seven conference presentations, specifically from CTAD 2020 and AAIC 2020, were also included. The resulting body of literature was relatively small, varied, and methodologically inconsistent, which limited the review's methodology. Therefore, a qualitative synthesis was performed instead of a meta-analysis.

RESULTS

3.1. Primarily in‐clinic computerized and tablet‐based cognitive assessment

A well-established area of digital development in cognitive testing involves adapting traditional cognitive measures to computerized platforms, such as Pearson's Q-interactive for the Wechsler Adult Intelligence Scale or the Montreal Cognitive Assessment (MoCA) Electronic Test. Furthermore, clinical trial data management companies, like Medavante and Clinical Ink, have adapted traditional cognitive measures for electronic clinical outcome assessments. Automated scoring and recording reduce common sources of error, but these systems, by their nature, do not fundamentally change neuropsychological testing. Various computerized cognitive tests have been developed to detect cognitive decline. These can include standalone applications and programs, as well as web-based applications that can be completed on personal computers (PCs) or tablets. Some tests are digitized versions of traditional paper-and-pencil neuropsychological tests, while others are newly developed tests designed for self-completion without the active presence of an examiner. Examples include Savonix, BrainCheck, Cogniciti, Mindmore, BAC, NIH-Toolbox, CANTAB, and Cogstate, among others. These differ in their approach, commercialization, security, regulatory readiness, and degree of "gamification." They also vary in their target populations and clinical indications. The focus here is on systems and platforms specifically or mainly designed to detect the earliest cognitive decline in AD. Table 1 provides an overview of the validation of these types of cognitive assessment instruments. Figure 1 illustrates a selection of primarily in-clinic computerized and tablet-based cognitive assessments.

TABLE 1. Validation of primarily in‐clinic computerized and tablet‐based cognitive assessment Abbreviations: Aβ, amyloid beta; β, standardized β coefficients; CN, clinically normal; Cogstate CBB, Cogstate Brief Battery; Cogstate CPAL, Cogstate Continuous Paired Associate Learning; d, Cohen's d; NIHTB‐CB, National Institutes of Health Toolbox Cognition Battery; CANTAB, Cambridge Neuropsychological Test Automated Battery; PACC, Preclinical Alzheimer Cognitive Composite; PiB, Pittsburgh compound B positron emission tomography; ρ r, Spearman correlation; r, correlation coefficient. Note: Only published articles are included in this table.

FIGURE 1. A, Cogstate One Back tests. Copyright© 2020 Cogstate. All rights reserved. Used with Cogstate's permission. B, CANTAB Spatial Span and Paired Associates Learning. Copyright Cambridge Cognition. All rights reserved. C, NIH‐Toolbox Pattern Comparison Processing Speed Test Age 7+ v2.1. Used with permission NIH Toolbox, © 2020 National Institutes of Health and Northwestern University

3.1.1. Cogstate digital cognitive testing system

Cogstate is a commercial company based in Australia. A core principle behind the Cogstate Brief Battery (CBB) was to minimize the impact of language and culture on cognitive assessment. Thus, its measures of response time, working memory, and continuous visual memory use a universal set of common playing cards as stimuli. Other non-card tasks, such as a paired associative learning task and a maze learning task, are also available. This test battery was initially developed in the early 2000s for PCs, where participants responded via keystrokes, but it is now available for tablets. Another founding principle of Cogstate tasks is to provide more reliable measurement of change over time by using randomized alternative versions to reduce confounding practice effects. The Cogstate system was originally designed for administration by an examiner, but recent efforts have focused on remote administration. Once logged into the platform, the tasks are easy to navigate independently. The CBB has recently become available for unsupervised testing via a web browser. A recent report from the Healthy Brain Project in Australia indicated high acceptability and usability for this unsupervised cognitive testing in a non-clinical sample, observing low rates of missing data and psychometric characteristics similar to those from supervised testing.

A more recent version of Cogstate tasks is the C3 (Computerized Cognitive Composite), which includes the CBB along with two measures potentially sensitive to early AD changes, based on evidence from cognitive neuroscience: the Behavioral Pattern Separation–Object Version (BPS-O) and The Face-Name Associative Memory Test (FNAME). Behavioral versions of the FNAME and a modified BPS-O were chosen for inclusion in the C3 because they have shown sensitivity to activity in the medial temporal lobes in individuals at risk for AD based on biomarkers. In a large sample of older adults (n = 4486), C3 performance showed a moderate correlation with cognitive performance on a composite of paper-and-pencil measures (PACC). A smaller study similarly demonstrated this correlation between the C3 and paper-and-pencil measures. It also found that the Cogstate C3 battery's memory tasks were most effective at identifying subtle cognitive impairment, as defined by PACC performance. Together, these findings suggest that these computerized tasks are valid measures of cognitive function and may be used for further study of cognitive decline in preclinical AD. The Cogstate test batteries are currently used in several ongoing studies and clinical trials, including the Wisconsin Registry for Alzheimer's Prevention (WRAP), Alzheimer's Disease Neuroimaging Initiative 3 (ADNI3), Cognitive Health in Ageing Register: Cohort Study, and the Dominantly Inherited Alzheimer Network–Trials Unit (DIAN-TU). The C3 is also currently being used in the Anti-Amyloid Treatment in Asymptomatic Alzheimer's Disease (A4) study and the Study to Protect Brain Health Through Lifestyle Intervention to Reduce Risk.

Regarding preclinical AD biomarker validation, screening data from the A4 study showed that, among a large sample of clinically normal elderly individuals, elevated Aβ as assessed with [18F]florbetapir-PET was associated with slightly worse C3 performance. Other observational studies have not shown cross-sectional associations between CBB performance and Aβ status in preclinical AD, but some studies have demonstrated that Aβ-positive individuals decline on CBB over time. For example, in the Australian Imaging, Biomarkers and Lifestyle (AIBL) study, a decline in episodic and working memory over 36 months was associated with a higher baseline Aβ burden in clinically normal participants. Researchers from the Mayo Clinic Study on Aging used similar methods in a population-based sample but, in contrast, did not find any significant association between Aβ and CBB decline. In another AIBL study, performance on a continuous paired associative learning task (CPAL) within the Cogstate battery was explored for Aβ-positive and Aβ-negative clinically normal individuals. Over 36 months, Aβ-negative individuals showed improved task performance, whereas Aβ-positive individuals showed no practice effect. In clinically normal individuals, the absence of benefit from repeated exposure over time was associated with a higher Aβ burden.

3.1.2. The computerized National Institutes of Health Toolbox Cognition Battery (NIH‐TB)

The National Institutes of Health (NIH) Toolbox Cognitive Battery (TB-CB) was designed as an easily accessible and low-cost tool to provide researchers with standardized, brief cognitive measures for various settings. Its development was a large-scale collaboration involving government funding, over 250 scientists, and more than 80 institutions. It comprises seven established neuropsychological tests, selected and adapted for a digital platform by an expert panel. The NIH TB-CB tests assess a range of cognitive domains, including attention and executive functions, language, processing speed, working memory, and episodic memory. Released in 2012 for PCs, a tablet version is now also available and has been validated against standard neuropsychological measures and established cognitive composites for use in preclinical AD. An examiner is still required to administer the app to ensure valid results, though some tests have recently been adapted for remote administration via screen sharing in a web browser.

The NIH TB-CB is currently used in several clinical trials and longitudinal studies focused on aging and early AD, such as the Risk Reduction for Alzheimer's Disease study, the Comparative Effectiveness Dementia & Alzheimer's Registry, and the ongoing project Advancing Reliable Measurement in Alzheimer's Disease and Cognitive Aging (ARMADA). The ARMADA study, an NIH-funded multi-site project in the United States, aims to validate the NIH Toolbox in diverse clinically normal and clinical cohorts, including previously underrepresented demographic groups. ARMADA also seeks to further facilitate the use of NIH TB-CB in aging research through a consortium with the National Alzheimer's Coordinating Center and collaboration with researchers from other existing cohorts.

Regarding preclinical AD biomarker validation, a few studies examine the NIH TB-CB in aging and dementia populations. However, there is currently a limited number of published studies on the NIH TB-CB and preclinical AD biomarkers. A recent study involving 118 clinically normal older adults did not find an association between AD neuroimaging markers of Aβ and any of the NIH TB-CB cognitive tasks. However, it did find a weak association between measures of processing speed and executive functions and higher Braak regions of tau pathology.

3.1.3. The Cambridge Neuropsychological Test Automated Battery (CANTAB)

The Cambridge Neuropsychological Test Automated Battery (CANTAB) is designed as a language-independent and culturally neutral cognitive assessment tool. It was initially developed by the University of Cambridge in the 1980s and is now commercially provided by Cambridge Cognition. CANTAB has been used in a wide range of clinical settings and clinical trials, including aging studies. CANTAB primarily uses non-verbal stimuli and includes measures of working memory, planning, attention, and visual episodic memory. While initially administered on PCs, it is now available through CANTAB mobile (tablet-based). Additionally, CANTAB offers an online platform for patient recruitment through pre-screening using its cognitive assessment instruments.

Regarding preclinical AD biomarker validation, in the Dallas Lifespan Brain Study, clinically normal individuals underwent Aβ PET with [18F]florbetapir and completed the CANTAB Verbal Recognition Test, which measures memory recall and recognition. In this test, participants view a sequence of words on a touchscreen, then are asked to recall them, and the task concludes with a recognition task. Researchers found that in relatively younger adults (30 to 55 years old), higher Aβ levels were moderately associated with diminished memory recall and recognition. However, this effect weakened as individuals aged and amyloid levels increased.

3.2. Remotely administered tablet‐ and smartphone‐based cognitive assessment

Demographic survey trends in the United States from 2019 showed that 77% of Americans aged 50 and older own smartphones, with this number increasing annually. Similar figures are reported from European countries. Concurrently, there has been a rise in smartphone-based applications designed for cognitive assessment in older populations. The appeal and implications of smartphone-based cognitive assessment for detection and tracking in preclinical AD are clear. It is highly scalable, allowing for remote assessment in a much larger population compared to samples acquired through in-clinic and supervised assessments. It permits more frequent assessment with potentially more sensitive cognitive paradigms. With mobile technology, cognitive assessment can be performed in a familiar environment, which may increase the ecological validity (i.e., generalizability to real-life settings) of the task. Having a participant complete tasks on their own phone, as opposed to a study-issued device, may better reflect their cognition in everyday life. The improved ecological validity of smartphone-based assessment is timely, as researchers and regulators emphasize the importance of demonstrating the clinical meaningfulness of cognitive change in a preclinical AD population. Furthermore, participants being in a familiar environment during cognitive assessments may reduce the risk of the "white-coat effect," where participants underperform in a medical setting. Remote and mobile tracking of cognitive functioning provides an additional opportunity for individuals to track their own cognitive health over time, potentially leading to increased commitment to their well-being. Finally, for those willing to participate in demanding clinical trials, reducing in-clinic visits through remote testing may ease the overall participant burden and encourage individuals in more remote areas to participate.

However, despite the potential of smartphone-based assessment, several issues remain, including challenges related to: (1) feasibility (e.g., older adults’ openness to smartphone assessments, compliance, attrition, privacy issues), (2) validity (e.g., ensuring alignment between smartphone-based and gold-standard cognitive assessment data, guaranteeing the identity of the examinee), and (3) reliability (e.g., variability between hardware and operating systems, diminished control over the test-taking environment).

Given the recent rapid expansion of interest in this area, the focus is on observed themes for smartphone-based instruments that are in early (but varying) stages of development. Identified themes include: (1) improving assessment reliability through ambulatory/momentary testing, (2) using mobile and serial assessment to identify subtle declines in learning and practice effects, (3) targeting cognitive processes more specific to decline in preclinical AD, and (4) harnessing the potential of big-data collection. Validation data concerning in-clinic cognitive assessment and AD biomarkers are discussed where available. Figure 2 provides selected examples of smartphone-based assessment applications. Table 2 displays the validation of remotely administered tablet- and smartphone-based cognitive assessments.

FIGURE 2. A, Ambulatory Research in Cognition (ARC) Symbols Test, Grids Test, and Prices Test. Used with permission from J. Hassenstab. B, neotiv Objects‐in‐Rooms Recall test. Used with permission from neotiv GmbH. C, Boston Remote Assessment for Neurocognitive Health (BRANCH). Used with permission from K. V. Papp

TABLE 2. Validation of remotely administered tablet‐ and smartphone‐based cognitive assessment and other novel types of cognitive assessment Abbrevations: Aβ, amyloid beta; AUC, area under the curve; β, beta interaction effect; CN, clinically normal; CSF, cerebrospinal fluid; d, Cohen's d; ORCA‐LLT, Online Repeatable Cognitive Assessment‐Language Learning Task; PiB, Pittsburgh Compound B PET; ρ r, Spearman's rank correlation; p r, Pearsons correlation coefficient; VPC, Visual Paired Association. Note: Only published articles are included in this table.

3.2.1. Feasibility of using mobile devices to capture cognitive function

While participant retention in longitudinal studies is particularly challenging for remotely administered testing, adherence in short studies shows promise. In a recent study, 1594 clinically normal individuals (aged 40 to 65) completed a testing session using a web-based version of four playing card tasks from the Cogstate battery. High adherence to instructions and low rates of missing data (1.9%) were observed, indicating good acceptability. Error rates remained consistently low across tests and did not vary based on the self-reported environment (e.g., with others present or in a public space). Another recent study investigated adherence over 36 days using a smartphone-based app. Thirty-five clinically normal participants (aged 40 to 59) completed very short daily cognitive tasks; 80% completed all tasks, and 88% remained active participants at the end of the study. More problematic, a recent report from eight digital health studies in the United States, providing app usage data from over 100,000 participants, described significant participant attrition (i.e., participants losing engagement over time), which affects the generalizability of the data obtained. Monetary compensation improved retention, and, promisingly for preclinical AD studies, older age was associated with longer study participation. However, participants in trials that included in-clinic visits showed the highest compliance, suggesting that attrition in fully remote longitudinal studies remains a significant challenge.

3.2.2. Improving reliability: ambulatory/momentary cognitive assessment

The concept behind ambulatory/momentary cognitive assessment is that single-timepoint assessments do not fully capture the natural variability in human cognitive performance, which is influenced by factors like mood, stress, or time of day. Capturing the most representative sample of an individual's cognition at a given interval is one promising approach to improve measurement sensitivity by reducing variability and increasing reliability. Using a "burst" design, a more reliable composite measure of cognitive performance is derived by averaging performance across multiple assessment timepoints administered in short succession (e.g., four assessments per day for 7 days).

Sliwinski et al. developed the brief smartphone-based app Mobile Monitoring of Cognitive Change (M2C2) to capture cognition more frequently in an uncontrolled, naturalistic setting. In a younger (aged 25 to 65), yet highly diverse (9% White) sample, they demonstrated that brief smartphone-based cognitive assessments of perceptual speed and working memory, conducted in an uncontrolled environment, correlated with in-clinic cognitive performance. The high proportion of total variance in performance attributable to differences between people (after accounting for within-person variance across each test session and number of test sessions) illustrated the excellent level of reliability achieved using a burst design.

Similarly, Hassenstab designed the Ambulatory Research in Cognition app (ARC) for use in the DIAN study. Unlike previous studies that relied on study-provided devices, participants download the app onto their own devices and indicate their preferred days and times for testing. Participants then receive notifications to take ARC, which lasts a few minutes, four times a day for one week. ARC evaluates working spatial memory (Grids Test), processing speed (Symbols Test), and associative memory (Prices Test). Preliminary results suggest that ARC is reliable, correlates with in-clinic cognitive measures and AD biomarkers, and is well-received by participants. Further research is needed to determine whether ambulatory cognitive data are (1) more strongly related to AD-biomarker burden in clinically normal older adults compared to conventional in-clinic assessments and (2) whether these data represent a more reliable measure of cognitive and clinical progression than conventional in-clinic assessments.

3.2.3. Using mobile and serial assessment to identify subtle decrements in learning and practice effects

A reduced practice effect, meaning a lack of the typical improvement in performance upon retesting, has been suggested as a subtle indicator of cognitive change before obvious decline. Mobile technology enables much more frequent serial assessment. For example, a recent study provided iPads to 94 participants to take home and complete a challenging associative memory task requiring memory for face-name pairs (FNAME) monthly for one year. The study found an association between diminished learning and greater amyloid and tau PET burden among clinically normal individuals, with Aβ-positive versus Aβ-negative group differences in memory performance becoming apparent by the fourth exposure.

Research using a web-based version of FNAME and other memory tasks, known as the Boston Remote Assessment for Neurocognitive Health (BRANCH), was designed to shift learning paradigms from study-provided tablets to smartphones and to shorten the interval for serial assessment (e.g., from months to days). These tasks focus on cognitive processes supported by the medial temporal lobes, which are thus best suited to characterize AD-related memory changes. BRANCH primarily includes measures of associative memory, pattern separation, and semantically facilitated learning and recall. BRANCH also uses paradigms and stimuli relevant to everyday cognitive tasks.

In a similar vein, the Online Repeatable Cognitive Assessment–Language Learning Test (ORCA-LLT), developed by Dr. Lim at Monash University in Australia, asks participants to learn the English word equivalents of 50 Chinese characters for 25 minutes daily over six days. The task is web-based and completed on a participant's own device at home. The study found that learning curves were diminished in 38 Aβ-positive versus 42 Aβ-negative clinically normal older adults, and the magnitude of this difference was very large.

Assessing learning curves over short time intervals using smartphones may serve as a cost-effective screening tool to enrich study samples for AD biomarker positivity before expensive assays are performed. For example, one clinical study found that lower practice effects over one week were associated with nearly 14 times higher odds of being Aβ-positive on a composite measure using [18F]flutemetamol. Future work with larger clinically normal samples and further optimized learning paradigms may demonstrate similar discriminatory properties of learning curves for AD biomarker positivity in a preclinical sample. Capturing learning curves over short intervals using remote smartphone-based assessment may provide a faster way to determine if a new treatment has beneficial cognitive effects. This could help in more quickly discontinuing ineffective or, more importantly, harmful treatment trials. However, how repeated measures of short-term learning curves can be used to track cognitive progression remains unexplored. Methods to establish this relationship are under development but will require validation studies to overcome logistical and technical challenges.

3.2.4. Targeting relevant cognitive functions

While there is significant variability in the nature and progression of cognitive decline within AD, the availability of AD biomarkers and the adoption of findings from cognitive neuroscience have allowed researchers to focus on cognitive processes potentially more sensitive and specific to AD. For example, researchers from Otto-von-Guericke University in Magdeburg and the German Center for Neurodegenerative Diseases (DZNE) have developed a digital platform, including a mobile app for smartphones and tablets (neotiv). This app consists of memory tests focused on object and scene mnemonic discrimination, pattern completion, face-name association, and complex scene recognition. The object and scene mnemonic discrimination paradigm was designed to capture memory function associated with an object-based (anterior-temporal) and a spatial (posterior-medial) memory system. While functional magnetic resonance imaging (fMRI) studies have shown that both memory systems were activated in individuals performing the task, age-related and performance-dependent changes in functional activity have been observed in the anterior temporal lobe in older adults. Furthermore, two studies in biomarker-characterized individuals revealed that object mnemonic discrimination performance was associated with measures of tau pathology (i.e., anterior temporal tau-PET binding and CSF p-tau levels), while there was evidence for an association of performance in the scene mnemonic discrimination task with Aβ-PET signal in posterior-medial brain regions. The complex scene recognition task has been shown to rely on a wider episodic memory network, and task performance was associated with CSF total-tau levels. All of the above tests have been implemented in a digital platform for unsupervised testing using smartphones and tablets. Recently, relationships between these tests and biomarkers for tau pathology, as well as strong relationships with in-clinic neuropsychological assessments (i.e., Alzheimer's Disease Assessment Scale–Cognitive subscale delayed word recall, PACC), have been demonstrated.

Participants download the app onto their own mobile devices and complete a short introduction and training session, as well as a brief vision screening. Participants then receive notifications to complete tests according to a predefined study schedule to acquire high-frequency longitudinal trajectories. To minimize practice effects, stimulus material has been piloted in large-scale web-based behavioral assessments, and parallel test sets with matched task difficulty have been created. The neotiv platform is currently included in several AD cohort studies (e.g., DELCODE, BioFINDER-2, and WRAP).

3.2.5. Harnessing the potential of “big data”: citizen science projects

Citizen science involves the general public in collaborative projects, for instance, by collecting their own data for research. This approach can be a way to gather large amounts of data on individuals at risk of developing AD. One such citizen science project is The Many Brains Project, with its research platform TestMyBrain.org. The Many Brains Project has generated some of the largest samples in cognition research, with over 2.5 million people tested since 2008; however, this study is not specific to AD. Another initiative, the Models of Patient Engagement for Alzheimer's Disease study, is an EU-funded international project aiming to identify individuals with early AD who might be "hidden" in their communities and typically not found in memory clinic settings. Through web-based cognitive screening instruments, individuals from the public with a heightened risk of AD are identified and invited to a memory clinic for a full diagnostic evaluation.

At Oxford University’s Big Data Institute, researchers have developed the smartphone app Mezurio, which is included in several European studies (e.g., PREVENT, GameChanger, Remote Assessment of Disease and Relapse in Alzheimer's Disease study, BioFINDER-2). One of these, GameChanger, is a citizen science project with over 16,000 participants from the general UK population completing remote, frequent cognitive assessments with the Mezurio app. Through this project, healthy volunteers can perform tests on their smartphones, thereby providing population norms for different age and demographic groups. Mezurio is installed on a smartphone and features several game-like tests examining episodic memory (Gallery Game and Story Time), connected language (Story Time), and executive function (Tilt Task), including multiple recall tasks and longer delays of up to several days. In a recent study investigating Mezurio's feasibility in middle-aged participants, high compliance was demonstrated, indicating that this app may be suitable for longitudinal cognitive follow-up.

In Germany, Dr. Berron et al. from the German Center for Neurodegenerative Diseases (DZNE) developed a Germany-wide citizen science project ("Exploring memory together") focused on the feasibility of unsupervised digital assessments in the general population. Besides demographic factors that affect task performance, several everyday factors (e.g., taking the test in the evening compared to during the day) could contribute to performance on remote unsupervised cognitive assessments. Preliminary results from over 1700 participants (aged 18 to 89) identified important factors to consider in future remote studies, including time of day, the time between learning and retrieval, and (for one task) the screen size of the mobile device. Researchers concluded that investigating memory function using remote and unsupervised assessments is feasible in an adult population.

3.3. Novel data collection systems and analysis procedures

Other promising assessment instruments under evaluation in various preclinical AD studies involve the analysis of spoken language, eye movements, spatial navigation performance, and digital pen stroke data. Some of these tasks require standalone equipment (e.g., eye-tracker, digital pen), while others can utilize existing platforms or devices (e.g., device-embedded cameras in a personal laptop or a smartphone's front-facing camera). Figure 3 illustrates some of these assessment instruments. Table 2 displays the validation of novel types of assessment instruments. Some instruments, such as commercial-grade eye-tracking cameras or digital pens, are still not widely accessible, hindering their widespread implementation. Predominantly passive monitoring of cognition, such as speech recording and eye movement tracking, may prove less stressful and time-consuming than conventional cognitive tests. Participants complete an objective cognitive task (e.g., drawing a clock or describing a picture) while subtle aspects of their performance are recorded (e.g., pen strokes, eye movements, language). This results in a large quantity of performance data, which must then be reduced or synthesized to extract relevant features. Using machine learning (ML) or deep learning, researchers have begun investigating whether automated analyses and classification of test performance according to specific criteria (e.g., biomarker or clinical status) can aid in sensitive screening for preclinical AD. In a clinical context, ML can serve as a clinical decision support system, building prediction models that achieve high accuracy in clinical diagnosis and selecting patients for clinical trials at the early stages of dementia development.

FIGURE 3. A, Sea Hero Quest Wayfinding and Path integration. Used with permission from M. Hornberger. B, Digital Maze Test from survey perspective and landmarks from a first-person perspective. Used with permission from D. Head. C, Data and analysis process for digital Clock Drawing Test (dCDT), from data collection, the artificial intelligence (AI) analysis steps, and the machine learning (ML) analysis and reporting. Used with permission from Digital Cognition Technologies

3.3.1. Spoken language analysis and automated language processing

New technical developments have provided further insight into language deficits in preclinical AD, and several newly developed analysis instruments are now available. For these instruments, speech production is usually recorded during spontaneous speech, verbal fluency tasks, or when describing a picture (typically using the Cookie Theft picture from the Boston Diagnostic Aphasia Examination). Speech is generally recorded using either a standalone audio recorder or an embedded microphone, then transcribed and analyzed using computer software. For example, using a picture task, researchers can analyze speech elements such as verbs and nouns in spontaneous speech, sentence complexity (represented by grammatical complexity and verb usage), word diversity, and the flow of speaking (e.g., speech repetitions and pauses).

Transcribing speech to text, a crucial part of speech analysis, is inherently time-consuming. However, there has been increasing effort to apply machine learning and deep learning technology to detect cognitive impairment in AD. For instance, researchers from the European research project Dem@care and the EIT Digital project ELEMENT demonstrated that automated analysis of verbal fluency could distinguish between healthy aging and clinical AD. They also showed that vocal analysis using smartphone apps could automatically differentiate between different clinical AD groups. In the latter case, speech was recorded while performing short verbal cognitive tasks, including verbal fluency, picture description, counting down, and a free speech task. This data was then used to train automatic classifiers for detecting MCI and AD based on ML methods. The question remains whether these methods can identify early cognitive impairment and decline in preclinical AD.

Intensive development of these methods is ongoing at several research sites. For example, researchers are exploring subtle speech-related cognitive decline in early AD through the European Deep Speech Analysis (DeepSpA) project. The DeepSpA project uses telecommunication-based assessment instruments for early screening and monitoring in clinical trials, employing remote semi-automated analysis methods. In the United States, researchers in the Framingham Health Study are recording and analyzing speech obtained during neuropsychological assessments (from over 8800 examinations of more than 5376 participants). Similarly, in Sweden, researchers are recording speech from over 1000 participants during neuropsychological assessments in the Gothenburg H70 Birth Cohort Studies. The studies mentioned above are in cooperation with the company ki elements, a spin-off of the German Research Center for AI. Another company, Canadian Winterlight Labs, has developed a tablet-based app to identify cognitive impairment using spoken language. The app is currently being evaluated in Winterlight's Healthy Aging Study, an ongoing longitudinal normative study.

Regarding preclinical AD biomarker validation, most researchers have investigated MCI patients. However, researchers from the Netherlands-based Subjective Cognitive Impairment Cohort recorded spontaneous speech in clinically normal individuals performing three open-ended tasks (e.g., describing an abstract painting). After manual transcription, linguistic parameters were extracted using a fully automated, freely available software (T-Scan). Using conventional neuropsychological tests, participants performed within the normal range, regardless of their Aβ status (assessed using either CSF Aβ1-42 or [18F]florbetapir-PET). Interestingly, a modest correlation was observed between abnormal Aβ and subtle speech changes (fewer specific words).

3.3.2. Eye‐tracking

In AD research, commercial-grade eye-tracking cameras have demonstrated their ability to detect abnormal eye movements in clinical groups. These high-frame-rate cameras collect extensive data on eye movement behavior, including saccades (rapid, simultaneous eye movements) and fixations (when the eyes focus on specific areas). Eye movements can be recorded within specific tasks, such as reading a text or performing a memory test. For example, Peltsch et al. measured the ability to inhibit unwanted eye movements within a task using visual stimuli. This data can then be analyzed automatically using commercial software or manually inspected by researchers. However, eye-tracking devices are currently expensive and not widely available in clinical settings. A potential solution may involve using device-embedded cameras (e.g., in a laptop or tablet) to capture eye movements during tasks, such as memory tests. Bott et al. from Neurotrack Technologies showed that low-cost, highly scalable device-embedded cameras (i.e., in a PC) are feasible for capturing valid eye-movement data of sufficient quality. In their study, eye movements were recorded in clinically normal participants during a visual recognition memory task. They observed a modest association between eye movements and cognitive performance on a paper-and-pencil composite. Importantly, both device-embedded cameras and commercial-grade eye-tracking cameras yielded robust data of sufficient quality. This suggests that device-embedded eye-tracking methods may be useful for further study of AD-related cognitive decline in clinically normal individuals. Beyond accuracy of performance, eye trackers provide data on additional eye movement behaviors, opening opportunities for new types of potentially meaningful outcomes.

3.3.3. Digital pen

Digital pens resemble regular pens but contain an embedded camera and sensors that capture position and pen stroke data with high spatial and temporal resolution. Outcomes include time the pen is in the air or on the surface, velocity, and pressure. This allows for the collection of hundreds or thousands of data points and variables, in contrast to traditional paper-and-pencil measures where reaction time and accuracy are the primary outcomes. Big-data techniques like machine learning (ML) can then be applied to these datasets to extract relevant signals. For instance, Digital Cognition Technologies collected data from thousands of individuals completing the standard clock drawing test. They subsequently developed the digital Clock Drawing Test (dCDT), which features an extensive scoring system based on ML techniques that describes performance outcomes related to information processing, simple motor functioning, and reasoning, among many others. This approach enables researchers to capture an individual's inefficiencies in completing a cognitive task despite overall intact performance, offering the potential to systematically collect and analyze much more subtle aspects of behavior.

Regarding preclinical AD biomarker validation, preliminary results from a study of older adults found that worse performance on dCDT, particularly on a visuospatial reasoning subscore, was associated with greater Aβ burden on PET. This performance also showed better discrimination between those with high versus low Aβ compared to standard neuropsychological tests included in a multi-domain cognitive composite.

3.3.4. Virtual reality and spatial navigation

In virtual reality (VR)-based tests, participants perform tasks of varying complexity within computer-generated environments. These tasks are traditionally presented on computer screens (e.g., laptops or tablets) with which participants interact using a joystick, keyboard, touch screen, or VR head-mounted display. The Four Mountains Test is an example of a VR-based test, available for use on iPad. It measures spatial function by alternating viewpoints and textures of a topographical layout of four mountains within a computer-generated landscape. The clinical usefulness of this test has been demonstrated in clinical studies; however, its relationship to preclinical AD biomarkers is still unknown. Another example is a VR path integration task, developed by researchers from Cambridge University. In this task, participants explore virtual open arena environments. Using a professional-grade VR headset, participants are then asked to walk back to specific locations. In a clinical study, this task was superior to other cognitive assessments in differentiating MCI from clinically normal individuals and correlated with CSF biomarkers (total tau and Aβ). This task is currently being evaluated in a preclinical population with biomarker data.

The launch of the online mobile game Sea Hero Quest generated significant interest, and to date, over 4.3 million people have played it. Deutsche Telekom collaborated with scientists from University College London and the University of East Anglia to create this mobile game. The aim is to gather data to create population norms from several countries, enabling the development of easily administered spatial navigation tasks to detect AD. Preliminary results suggest that Sea Hero Quest is comparable to real-world navigation experiments, indicating that it captures more than just video gaming skills.

Regarding preclinical AD biomarker validation, a recent study found that performance on the Sea Hero Quest mobile game could distinguish healthy aging from genetically at-risk individuals for AD. Researchers used Sea Hero Quest performance in a smaller apolipoprotein E (APOE) genotyped cohort. Despite having no clinically detectable cognitive deficits, individuals who were genetically at risk performed worse on spatial navigation. Wayfinding performance was able to differentiate between APOE carriers and non-carriers.

In another study, participants from the Knight Alzheimer's Disease Research Center completed a virtual maze task measuring spatial navigation. This virtual maze was created using commercial software and presented on a laptop, with participants navigating using a joystick. The mazes consist of interconnected hallways with several landmarks. The findings indicated that Aβ-positivity (CSF Aβ42+) was associated with lower wayfinding performance. For inclusion in future studies, the spatial navigation task has been made available for remote use through a web-based interface.

DISCUSSION

This systematic review highlights the abundance of digital assessment instruments currently being evaluated in preclinical populations. Digital technology can be used to assess the subtle cognitive decline that characterizes biomarker-confirmed preclinical AD. Potential benefits include increased sensitivity and reliability, and it could add value to individuals through increased accessibility, engagement, and reduced participant burden. Digital assessments may have implications for clinical trials by optimizing screening, facilitating case finding, and providing more sensitive clinical outcomes. Several promising tests are currently under development and undergoing validation, but further work is needed before many of these can be considered comparable to conventional in-clinic cognitive assessments. Understanding the reliability and validity of cognitive assessments obtained in naturalistic environments has begun, which is required before widespread cognitive testing outside research centers can be initiated. Lastly, more feasibility studies investigating potential barriers to implementation are needed, including challenges related to adherence, privacy, and data security.

FIGURE 4. Overview of cognitive tests and their platforms. BRANCH, Boston Remote Assessment for Neurocognitive Health; ORCA‐LLT, Online Repeatable Cognitive Assessment‐Language Learning Test; NIH‐TB, National Institutes of Health Toolbox; CANTAB, Cambridge Neuropsychological Test Automated Battery; ARC, Ambulatory Research in Cognition; M2C2, Monitoring of Cognitive Change; dCDT, digital Clock Drawing Test. *Is available for use through a web browser

4.2. Validation with established cognitive composites

In addition to validation using preclinical AD biomarkers, an alternative means of validation involves comparing digital assessments against conventional cognitive measures used in large-scale studies. This type of validation can supplement biomarker studies or provide important preliminary data before proceeding with more costly biomarker studies. A few assessment instruments, including a tablet-based test, an eye-tracking assessment, and a smartphone app, have been validated against relevant global cognitive composites, indicating some validity for further study of biomarkers in preclinical AD. However, as the correlation between conventional composites and preclinical AD is already weak, paper-and-pencil validation alone is not sufficient to claim that a test is suitable as a measure of subtle cognitive changes in preclinical AD.

4.3. Potential of digital cognitive assessment instruments in different settings

The contexts in which new technology is used impose different requirements on a test's capabilities. For instance, the requirements are higher if the test results serve as outcomes in a clinical study than if they are used simply for participant selection into studies. Tablet-based tests, similar to traditional cognitive test batteries, have already been implemented in clinical trials. They are primarily intended for administration with the help of a trained rater. Unsupervised and remotely administered tests have not yet demonstrated sufficient robustness for use in this context, and concerns remain regarding reliability, adherence, privacy, and user identification.

The various digital assessment instruments discussed in this review enable different uses. Supervised digital assessment instruments could provide robust outcomes in clinical trials, offering benefits such as automatic recording of responses and scoring, making it easier to follow study protocols, reducing the risk of error, and increasing inter-rater reliability. Remotely administered tests could serve as a cost-effective pre-screening tool before more expensive and invasive examinations, such as lumbar puncture and brain imaging, are recommended. In clinical trials, mobile devices could be used to identify individuals at the greatest risk of cognitive decline, who are most likely to benefit from a specific intervention. Close follow-up of cognitive function from a person's home environment may also enable high-quality evaluation of interventions.

4.4. Importance of data security, privacy, and adherence

As cognitive testing becomes increasingly digitized, regulatory authorities have raised concerns about data security and privacy. Pharmaceutical companies have also emphasized the importance of these issues. One such consideration is data storage and transmission between servers, which is crucial when data are to be stored and processed on servers not under the direct control of the study. When commercial companies are involved, questions can arise regarding data ownership and conflicts of interest.

Data protection in the United States is governed by several laws enacted at federal and state levels (e.g., the Patient Safety and Quality Improvement Act and the Health Information Technology for Economic and Clinical Health Act). In the European Union (EU), the General Data Protection Regulation (GDPR) governs data storage and processing in EU countries, affecting scientific cooperation between countries inside and outside the EU. Technological development places increasing demands on developers and researchers to familiarize themselves with regulatory issues, especially as new types of personal data are collected to a greater extent and across country borders.

Finally, a significant and necessary focus is to ensure that data captured remotely, in an uncontrolled environment, are reliable and accurately reflect an individual's cognitive functioning. Here, the importance of adherence also comes into play. Although there is growing evidence that unsupervised testing can be conducted, large longitudinal health studies also indicate significant problems with participant attrition. Work remains to ensure valid and reliable results for participants performing unsupervised testing in large clinical trials.

CONCLUSION

This review highlights the wide range of digital assessment instruments currently being evaluated in preclinical populations. Digital technology can be used to assess the subtle cognitive decline that defines biomarker-confirmed preclinical AD. Potential benefits include increased sensitivity and reliability, and it could provide value to individuals through increased accessibility, engagement, and reduced participant burden. Digital assessments may have implications for clinical trials by optimizing screening, facilitating case finding, and providing more sensitive clinical outcomes. Several promising tests are currently under development and undergoing validation, but further work is needed before many of these can be considered alongside conventional in-clinic cognitive assessments. Understanding the reliability and validity of cognitive assessments obtained in naturalistic environments has begun, which is required before large-scale cognitive testing outside research centers can begin. Lastly, more feasibility studies investigating potential implementation barriers are needed, including challenges related to adherence, privacy, and data security.

Open Article as PDF

Abstract

There is a pressing need to capture and track subtle cognitive change at the preclinical stage of Alzheimer's disease (AD) rapidly, cost-effectively, and with high sensitivity. Concurrently, the landscape of digital cognitive assessment is rapidly evolving as technology advances, older adult tech-adoption increases, and external events (i.e., COVID-19) necessitate remote digital assessment. Here, we provide a snapshot review of the current state of digital cognitive assessment for preclinical AD including different device platforms/assessment approaches, levels of validation, and implementation challenges. We focus on articles, grants, and recent conference proceedings specifically querying the relationship between digital cognitive assessments and established biomarkers for preclinical AD (e.g., amyloid beta and tau) in clinically normal (CN) individuals. Several digital assessments were identified across platforms (e.g., digital pens, smartphones). Digital assessments varied by intended setting (e.g., remote vs. in-clinic), level of supervision (e.g., self vs. supervised), and device origin (personal vs. study-provided). At least 11 publications characterize digital cognitive assessment against AD biomarkers among CN. First available data demonstrate promising validity of this approach against both conventional assessment methods (moderate to large effect sizes) and relevant biomarkers (predominantly weak to moderate effect sizes). We discuss levels of validation and issues relating to usability, data quality, data protection, and attrition. While still in its infancy, digital cognitive assessment, especially when administered remotely, will undoubtedly play a major future role in screening for and tracking preclinical AD.

Introduction

Alzheimer's disease (AD) involves major changes in the brain that begin many years before memory loss and other symptoms of dementia appear. These changes include the buildup of amyloid beta protein into plaques and the clumping of hyperphosphorylated tau protein into tangles. During this early, "preclinical" stage, thinking abilities generally remain unaffected. However, as the disease progresses, subtle changes in cognition start to emerge. This preclinical phase is a crucial time for interventions to prevent cognitive decline, making it extremely important to identify these early, subtle changes in thinking.

Associations Between Paper-and-Pencil Cognitive Measures and AD Biomarkers in Preclinical AD

In individuals who appear healthy, abnormal levels of amyloid beta (Aβ+), measured through spinal fluid or brain imaging scans, indicate an early stage of AD. During this preclinical phase, the initial link between Aβ levels and cognitive problems is often weak or not significant. However, healthy individuals with higher Aβ levels show a faster decline in thinking abilities over time and often progress to clinical dementia more quickly than those with lower Aβ levels. This cognitive decline is subtle and only becomes noticeable over several years. For instance, some studies have shown declines ranging from about -0.07 to -0.42 standard deviation units per year or 18 months in individuals with high Aβ. The strongest connection between AD biomarkers and cognitive decline is typically seen in memory function, though declines in other areas like executive function and visuospatial skills have also been reported.

The second key pathological process in preclinical AD is the clumping of tau protein into tangles, which can also be measured in spinal fluid or with brain imaging. Tau has generally been considered more directly linked to cognitive impairment during the AD process compared to Aβ. In healthy individuals, higher tau levels have been associated with memory impairment and a continued decline in cognitive function over time. Since tau imaging is a newer technique than Aβ imaging, less is known about the long-term relationship between tau and cognition, including progression to clinical stages. Generally, individuals with higher tau are at greater risk for cognitive decline over time; however, this decline is several times faster in Aβ-positive healthy individuals.

Paper-and-Pencil Versus Digitized Cognitive Assessment

The relationship between traditional paper-and-pencil cognitive tests and AD biomarkers in healthy older adults is complex, with observed correlations generally being weak, especially when measured at a single point in time. Over longer periods, these relationships are more consistently observed and stronger, with healthy older adults who have elevated biomarker levels showing cognitive decline. The weak connections between cognition and AD biomarkers may partly be due to the limitations of paper-and-pencil tests. Most of these tests were designed to detect clear impairment in people with existing clinical conditions, not subtle preclinical changes. Additionally, normal variations in cognitive performance, the effects of practice, and an individual's cognitive reserve may hide these subtle declines.

Using digital technology for cognitive assessment has the potential to overcome some of the limitations of current paper-and-pencil tests. For example, mobile devices allow for more frequent testing, which provides more reliable and informative data over time. They are also more accessible and affordable because they can be self-administered. Computerized tests that automatically create different versions can help reduce the effects of practice and repeated testing. Furthermore, advanced methods like artificial intelligence (AI) and deep learning enable faster, new, and potentially more sensitive analysis of cognitive data.

However, digital assessments also present new challenges. Many studies using remote assessments struggle to keep participants engaged over time. Storing and sharing digital cognitive data raise concerns about data privacy, especially when devices might collect additional personal information, like voice recordings. Unsupervised digital assessments require systems to ensure the person taking the test is indeed the intended participant. Rapidly changing technologies and operating systems make it challenging to maintain a single version of a digital assessment over time. Finally, while trends suggest that older adults are becoming more comfortable with new technology, a significant portion of the population might be excluded from research using digital assessments due to a lack of familiarity, technical skills, or access.

Digital technology has not yet replaced paper-and-pencil assessments, particularly in clinical trials, because several questions remain unanswered. For instance, does digital technology capture cognitive information similar to traditional "gold-standard" measures? Is there a fundamental difference when data is collected by a device rather than a human administrator? How reliable and practical is digital technology? These questions are only now beginning to be widely addressed as the use of digital technology quickly expands, for example, in research on preclinical AD.

Organization of Results

The objectives of this review were to systematically examine the current landscape of digital cognitive tests for use in preclinical AD. This included describing how well these digital tests have been validated against: (1) established cognitive tests and composite scores (paper-and-pencil measures), and (2) biomarkers of amyloid beta and tau pathology. The discussion also critically examines the advantages and disadvantages of digital cognitive assessments for use in clinical trials, and offers a future outlook for this field. The goal was not to provide a comprehensive overview of mobile technology or computer testing for older populations in general, nor to address passive monitoring that infers cognition using sensors and wearables.

The review first describes the current understanding of the connections between cognitive performance on traditional paper-and-pencil measures and AD biomarkers. Then, digital assessments are discussed, organized into three categories based on their technology and setting: (1) primarily in-clinic computerized and tablet-based tests, (2) primarily unsupervised, smartphone- or tablet-based tests, and (3) novel data collection systems and analysis procedures (such as digital pens, eye-tracking, language analysis, and AI-based data analysis).

For each digital assessment, its validation is discussed in terms of (1) biomarker validation and (2) validation against paper-and-pencil measures. The paper-and-pencil validation involved comparing digital measures to conventional measures, such as overall cognitive composite scores (e.g., Preclinical Alzheimer Cognitive Composite [PACC]) or domain-specific test composites.

Search Strategies

To identify relevant publications, a search was conducted from January to December 2020 using three electronic databases (PubMed, Scopus, PsycINFO) with terms like "digital," "mobile," "smartphone," "tablet," "Alzheimer's," "preclinical," and "amyloid." Two online registries (ClinicalTrials.gov and National Institutes of Health [NIH] research portfolio) were also searched for trials and awarded grants. A second search was performed using the names of digital tests and companies found in the initial search. Additionally, two conferences (Clinical Trials on Alzheimer's Disease conference [CTAD] 2020 and Alzheimer's Association International Conference [AAIC] 2020) were searched for preliminary results.

Inclusion and Exclusion Criteria

Published articles, ongoing studies, and clinical trials using digital cognitive assessment were selected if they included individuals identified with preclinical AD. Preclinical AD was defined by biomarker evidence of amyloid beta plaque pathology (e.g., cortical Aβ PET ligand binding or low cerebrospinal fluid Aβ42) and/or tau pathology (e.g., elevated cerebrospinal fluid p-tau or cortical tau PET ligand binding). Following the revised guidelines from the National Institute on Aging and Alzheimer's Association (NIA-AA) Research Framework, preclinical AD was considered to correspond to the earliest stages of clinical progression (stage 1–2). Studies that only included participants meeting criteria for clinical diagnoses, such as mild cognitive impairment (MCI) or dementia, were excluded.

Procedures

A total of 469 articles were screened using the web application Rayyan, with 458 excluded for not meeting the inclusion criteria, and 11 included in the review. Grant applications from the NIH research portfolio were screened, but no additional studies were included from this source. Since the initial literature search, two newly published articles have been added. Preliminary results from seven conference presentations (from CTAD 2020 and AAIC 2020) were also incorporated. The resulting body of literature was relatively small, varied, and inconsistent in methodology, which led to a qualitative synthesis rather than a meta-analysis.

Primarily In-Clinic Computerized and Tablet-Based Cognitive Assessment

An established area of digital development in cognitive testing involves adapting traditional cognitive measures for computerized platforms. Examples include Pearson's Q-interactive for the Wechsler Adult Intelligence Scale or the Montreal Cognitive Assessment (MoCA) Electronic Test. Furthermore, clinical trial data management companies have converted traditional cognitive measures into electronic clinical outcome assessments. Automatic scoring and recording help reduce common sources of error, though these systems do not fundamentally change neuropsychological testing. Many computerized cognitive tests have been developed to detect cognitive decline. These can be standalone applications or web-based programs usable on personal computers or tablets. Some are digitized versions of traditional paper-and-pencil tests, while others are new tests designed to be completed without an examiner's active presence. Examples include Savonix, BrainCheck, Cogniciti, Mindmore, BAC, NIH-Toolbox, CANTAB, and Cogstate, among others. These differ in their approach, commercialization, security, regulatory readiness, and degree of "gamification." They also target different populations and clinical conditions. The focus here is on systems specifically designed to detect the earliest cognitive decline in AD.

Validation of Primarily In-Clinic Computerized and Tablet-Based Cognitive Assessment

Cogstate Digital Cognitive Testing System

Cogstate is an Australian commercial company. A core principle behind the Cogstate Brief Battery (CBB) was to minimize the influence of language and culture on cognitive assessment. Therefore, its measures of response time, working memory, and continuous visual memory use common playing cards as universal stimuli. Other tasks, such as a paired associative learning task and a maze learning task, are also available. This test battery was initially developed in the early 2000s for personal computers, where responses were made via keystrokes, but it is now available for tablets. A second key principle of Cogstate tasks is a more reliable measurement of change over time, achieved through randomized alternative versions to reduce the impact of practice effects. The Cogstate system was originally designed for administration by an examiner, but recent efforts have focused on remote administration. Once logged into the platform, the tasks are easy for individuals to complete independently. The CBB has recently been made available for unsupervised testing via a web browser. A recent report from the Healthy Brain Project in Australia showed high acceptance and usability for this unsupervised cognitive testing in a non-clinical sample, with low rates of missing data and psychometric characteristics similar to those collected with supervised testing.

A newer version of Cogstate tasks, the C3 (Computerized Cognitive Composite), includes the CBB along with two measures potentially sensitive to early AD changes, based on evidence from cognitive neuroscience: the Behavioral Pattern Separation–Object Version (BPS-O) and The Face-Name Associative Memory Test (FNAME). Behavioral versions of FNAME and a modified BPS-O were chosen for the C3 because they have shown sensitivity to activity in the medial temporal lobes in individuals at risk for AD based on biomarkers.

In a large group of older adults (n = 4486), C3 performance showed a moderate correlation with cognitive performance on a composite of paper-and-pencil measures (PACC). A smaller study similarly found this correlation between the C3 and paper-and-pencil measures. It also showed that the memory tasks within the Cogstate C3 battery were most effective at identifying subtle cognitive impairment, as defined by PACC performance. Together, these findings suggest that these computerized tasks are valid measures of cognitive function and may be used for further study of cognitive decline in preclinical AD.

The Cogstate test batteries are used in several ongoing studies and clinical trials, including the Wisconsin Registry for Alzheimer's Prevention (WRAP), Alzheimer's Disease Neuroimaging Initiative 3 (ADNI3), Cognitive Health in Ageing Register: Cohort Study, and the Dominantly Inherited Alzheimer Network–Trials Unit (DIAN-TU). The C3 is currently being used in the Anti-Amyloid Treatment in Asymptomatic Alzheimer's Disease (A4) study and the Study to Protect Brain Health Through Lifestyle Intervention to Reduce Risk.

Preclinical AD Biomarker Validation

Screening data from the A4 study indicated that among a large sample of clinically normal elderly individuals, elevated amyloid beta, as assessed with [18F]florbetapir-PET, was associated with slightly worse C3 performance. Other observational studies have not shown cross-sectional associations between CBB performance and amyloid beta status in preclinical AD. However, some studies have demonstrated that amyloid beta-positive individuals decline on CBB over time. For example, in the Australian Imaging, Biomarkers and Lifestyle (AIBL) study, a decline in episodic and working memory over 36 months was linked to a higher baseline amyloid beta burden in healthy participants. Researchers from the Mayo Clinic Study on Aging used similar methods in a population-based sample but did not find any significant association between amyloid beta and CBB decline. In another AIBL study, performance on a continuous paired associative learning task (CPAL) within the Cogstate battery was examined for amyloid beta-positive and amyloid beta-negative healthy individuals. Over 36 months, performance for amyloid beta-negative individuals improved over time, while amyloid beta-positive individuals showed no practice effect. In healthy individuals, the absence of improvement from repeated exposure over time was associated with a higher amyloid beta burden.

The Computerized National Institutes of Health Toolbox Cognition Battery (NIH-TB)

The National Institutes of Health (NIH) Toolbox Cognitive Battery (TB-CB) was designed as an easily accessible and low-cost tool to provide researchers with standard and brief cognitive measures for various settings. The development of the NIH TB-CB was a large-scale effort involving government funding, many scientists (over 250), and institutions (over 80). It consists of seven established neuropsychological tests, selected and adapted by an expert panel for a digital platform. The NIH TB-CB tests assess a range of cognitive domains, including attention, executive functions, language, processing speed, working memory, and episodic memory. Released in 2012 for personal computers, a tablet version is now also available and has been validated against standard neuropsychological measures and established cognitive composites for use in preclinical AD. To ensure valid results, an examiner is still required to administer the application; however, some tests have recently been adapted for remote administration via screen sharing in a web browser.

The NIH TB-CB is currently used in several clinical trials and long-term studies focusing on aging and early AD. Examples include the Risk Reduction for Alzheimer's Disease study, the Comparative Effectiveness Dementia & Alzheimer's Registry, and the ongoing project Advancing Reliable Measurement in Alzheimer's Disease and Cognitive Aging (ARMADA). ARMADA, an NIH-funded large multi-site project in the United States, aims to validate the NIH Toolbox in diverse healthy and clinical groups, including previously underrepresented demographic groups. ARMADA also seeks to further facilitate the use of NIH TB-CB in aging research by forming a consortium with the National Alzheimer's Coordinating Center and collaborating with researchers from other existing groups.

Preclinical AD Biomarker Validation

A few studies have examined the NIH TB-CB in aging and dementia populations. However, there are currently a limited number of published studies on NIH TB-CB and preclinical AD biomarkers. A recent study involving 118 healthy older adults did not find an association between AD neuroimaging markers of amyloid beta and any of the NIH TB-CB cognitive tasks. However, it did find a weak association between measures of processing speed and executive functions and higher Braak regions of tau pathology.

The Cambridge Neuropsychological Test Automated Battery (CANTAB)

The Cambridge Neuropsychological Test Automated Battery (CANTAB) is designed as a language-independent and culturally neutral cognitive assessment tool. Originally developed by the University of Cambridge in the 1980s, it is now commercially provided by Cambridge Cognition. CANTAB has been used in a wide range of clinical settings and clinical trials, including aging studies. CANTAB primarily uses non-verbal stimuli and includes measures of working memory, planning, attention, and visual episodic memory. While initially administered on personal computers, CANTAB is now available through CANTAB mobile (tablet-based). Additionally, CANTAB offers an online platform for patient recruitment by pre-screening individuals using their cognitive assessment instruments.

Preclinical AD Biomarker Validation

In the Dallas Lifespan Brain Study, healthy individuals underwent amyloid beta PET scans with [18F]florbetapir and took the CANTAB Verbal Recognition Test, which measures memory recall and recognition. In this test, participants view a sequence of words on a touchscreen. They are then asked to recall the words, and the task concludes with a recognition task. Researchers found that in relatively younger adults (ages 30 to 55), higher amyloid beta levels were moderately associated with reduced memory recall and recognition. This effect weakened as people aged and their amyloid levels increased.

Remotely Administered Tablet- and Smartphone-Based Cognitive Assessment

Demographic surveys from the United States in 2019 indicated that 77% of Americans aged 50 and older own smartphones, a number that continues to rise annually. Similar figures are reported from European countries. Concurrently, there has been an increase in smartphone-based applications designed for cognitive assessment in older populations. The appeal and implications of smartphone-based cognitive assessment for detecting and tracking preclinical AD are clear. This approach is highly scalable, enabling remote assessment in a much larger population compared to samples gathered through in-clinic and supervised assessments. It allows for more frequent assessments using potentially more sensitive cognitive tasks. With mobile technology, cognitive assessment can be performed in a familiar environment, which may increase the ecological validity (i.e., the generalizability to real-life settings) of the task. When participants complete tasks on their own phones (rather than a study-issued device), the results may better reflect their cognition in daily life. The improved ecological validity of smartphone-based assessment is particularly relevant as researchers and regulators emphasize the importance of demonstrating the clinical meaningfulness of cognitive change in a preclinical AD population. Furthermore, having participants in a familiar environment during cognitive assessments may reduce the risk of the "white-coat effect," where individuals underperform in a medical setting. Remote and mobile tracking of cognitive function provides an additional opportunity for individuals to monitor their own cognitive health over time, potentially leading to increased commitment to their well-being. Finally, for those willing to participate in demanding clinical trials, reducing in-clinic visits through remote testing may lessen the overall participant burden and encourage those in more distant areas to participate.

However, despite the potential of smartphone-based assessment, multiple issues persist. These include challenges related to (1) feasibility (e.g., older adults' willingness to complete smartphone assessments, compliance, participant dropout, privacy concerns), (2) validity (e.g., ensuring consistency between smartphone-based data and "gold-standard" cognitive assessment data, verifying the identity of the person taking the test), and (3) reliability (e.g., variability across different hardware and operating systems, reduced control over the testing environment).

Given the recent rapid growth of interest in this area, the focus here is on observed themes for smartphone-based instruments that are in early, but varying, stages of development. Identified themes include (1) improving assessment reliability through ambulatory or momentary testing, (2) using mobile and serial assessment to identify subtle decreases in learning and practice effects, (3) targeting cognitive processes more specific to decline in preclinical AD, and (4) leveraging the potential of large-scale data collection. Validation data concerning in-clinic cognitive assessment and AD biomarkers is discussed where available.

Feasibility of Using Mobile Devices to Capture Cognitive Function

While participant retention in long-term study designs is particularly challenging for studies using remotely administered testing, adherence in shorter studies appears promising. In a recent study, 1594 healthy individuals (aged 40 to 65) completed a testing session using a web-based version of four playing card tasks from the Cogstate battery. High adherence to instructions and low rates of missing data (1.9%) were observed, indicating high acceptability. Error rates were consistently low across tests and did not vary based on the self-reported environment (e.g., with others present or in a public space). Another recent study investigated adherence over 36 days using a smartphone-based app. Thirty-five healthy participants (aged 40 to 59) completed very short daily cognitive tasks, with 80% completing all tasks and 88% still active at the end of the study. More problematically, a recent report from eight digital health studies (providing app usage data from over 100,000 participants) in the United States described significant participant attrition (e.g., participants losing engagement over time), which complicates the generalizability of the data obtained. Monetary compensation improved retention, and, positively for preclinical AD studies, older age was associated with longer study participation. However, participants involved in trials that included in-clinic visits showed the highest compliance, suggesting that attrition in fully remote long-term studies remains a significant challenge.

Improving Reliability: Ambulatory/Momentary Cognitive Assessment

The concept behind ambulatory or momentary cognitive assessment is that single-timepoint assessments fail to capture the inherent variability in human cognitive performance, which is influenced by factors such as mood, stress, or time of day. Capturing the most representative sample of an individual's cognition at a given interval is one promising approach to improve measurement sensitivity by reducing variability and increasing reliability. Using a "burst" design, a more reliable composite measure of cognitive performance is derived by averaging performance over multiple assessment timepoints administered in quick succession (e.g., four assessments per day for seven days).

Sliwinski and colleagues developed the brief smartphone-based app Mobile Monitoring of Cognitive Change (M2C2) to capture cognition more frequently in an uncontrolled, naturalistic setting. In a younger (ages 25 to 65) but highly diverse (9% White) sample, they showed that brief smartphone-based cognitive assessments of perceptual speed and working memory in an uncontrolled environment correlated with in-clinic cognitive performance. The proportion of total variance in performance attributable to differences between people (after accounting for within-person variance across each test session and number of test sessions) was high, demonstrating excellent reliability achieved using a burst design.

Similarly, Hassenstab designed the Ambulatory Research in Cognition app (ARC) for use in the Dominantly Inherited Alzheimer Network (DIAN) study. Unlike previous studies that used study-provided devices, participants download the app onto their own devices and specify the days and times they are available for testing. Participants then receive notifications to take ARC, which lasts a few minutes, four times per day for one week. ARC evaluates working spatial memory (Grids Test), processing speed (Symbols Test), and associative memory (Prices Test). Preliminary results suggest that ARC is reliable, correlates with in-clinic cognitive measures and AD biomarkers, and is well-received by participants. Further research is needed to determine whether ambulatory cognitive data are (1) more strongly related to AD biomarker burden in healthy older adults compared to conventional in-clinic assessments, and (2) whether this data represents a more reliable measure of cognitive and clinical progression than conventional in-clinic assessments.

Using Mobile and Serial Assessment to Identify Subtle Decrements in Learning and Practice Effects

A reduced practice effect, which means a lack of the typical improved performance upon retesting, has been suggested as a subtle indicator of cognitive change before overt decline. Mobile technology allows for much more frequent serial assessment. For example, a recent study provided iPads to 94 participants to take home and complete a challenging associative memory task requiring them to remember face-name pairs (FNAME) monthly for one year. The study found an association between reduced learning and greater amyloid and tau PET burden among healthy individuals, with differences in memory performance between Aβ-positive and Aβ-negative groups emerging by the fourth exposure.

Research using a web-based version of FNAME and other memory tasks, known as the Boston Remote Assessment for Neurocognitive Health (BRANCH), was designed to move learning paradigms from study-provided tablets to smartphones and to shorten the interval for serial assessment (e.g., from months to days). These tasks focus on cognitive processes supported by the medial temporal lobes, which are best suited to characterize AD-related memory changes. BRANCH primarily includes measures of associative memory, pattern separation, and semantically facilitated learning and recall. BRANCH also uses paradigms and stimuli relevant to everyday cognitive tasks.

Similarly, the Online Repeatable Cognitive Assessment-Language Learning Test (ORCA-LLT), developed by Dr. Lim at Monash University in Australia, asks participants to learn the English word equivalents of 50 Chinese characters for 25 minutes daily over six days. The task is web-based and completed on a participant's own device at home. The researchers found that learning curves were significantly diminished in 38 Aβ-positive healthy older adults compared to 42 Aβ-negative healthy older adults, and the magnitude of this difference was very large.

Assessing learning curves over short intervals using smartphones may serve as a cost-effective screening tool to identify samples more likely to be positive for AD biomarkers before expensive tests are performed. For instance, one clinical study found that lower practice effects over one week were associated with nearly 14 times higher odds of being Aβ-positive on a composite measure using [18F]flutemetamol. Future work with larger healthy participant samples and further optimized learning paradigms may show similar abilities of learning curves to differentiate AD biomarker positivity in a preclinical sample. Capturing learning curves over short intervals using remote smartphone-based assessment may provide a faster way to determine if a new treatment has beneficial effects on cognition. This could help in more quickly stopping ineffective treatment trials or, more importantly, trials that have harmful effects on cognition. However, how repeated measures of short-term learning curves can be used to track cognitive progression remains unexamined. Methods to establish this relationship are being developed but will require validation studies to overcome logistical and technical challenges.

Targeting Relevant Cognitive Functions

While the nature and progression of cognitive decline in AD vary considerably, the availability of AD biomarkers and insights from cognitive neuroscience have allowed researchers to focus on cognitive processes that may be more sensitive and specific to AD. For example, researchers from the Otto-von-Guericke University in Magdeburg and the German Center for Neurodegenerative Diseases (DZNE) have been involved in developing a digital platform, including a mobile app for smartphones and tablets (neotiv). This platform includes memory tests focused on object and scene mnemonic discrimination, pattern completion, face-name association, and complex scene recognition. The object and scene mnemonic discrimination paradigm was designed to capture memory function associated with both an object-based (anterior-temporal) and a spatial (posterior-medial) memory system. While functional magnetic resonance imaging (fMRI) studies have shown that both memory systems are active when individuals perform the task, age-related and performance-dependent changes in functional activity have been observed in the anterior temporal lobe in older adults. Furthermore, two studies in individuals characterized by biomarkers revealed that object mnemonic discrimination performance was associated with measures of tau pathology (i.e., anterior temporal tau PET binding and cerebrospinal fluid p-tau levels), and there was evidence for an association of performance in the scene mnemonic discrimination task with amyloid beta PET signals in posterior-medial brain regions. The complex scene recognition task has been shown to rely on a broader episodic memory network, and task performance was associated with cerebrospinal fluid total tau levels. All of these tests have been implemented in a digital platform for unsupervised testing using smartphones and tablets. Recently, the relationships of these tests with biomarkers for tau pathology, as well as strong relationships with in-clinic neuropsychological assessments (e.g., Alzheimer's Disease Assessment Scale-Cognitive subscale delayed word recall, PACC), have been demonstrated.

Participants download the app onto their own mobile devices and complete a brief introduction and training session, along with a short vision screening. They then receive notifications to complete tests according to a predefined study schedule to acquire frequent, long-term data. To minimize practice effects, stimulus material has been tested in large-scale web-based behavioral assessments, and parallel test sets with similar difficulty have been created. The neotiv platform is currently included in several AD cohort studies, such as DELCODE, BioFINDER-2, and WRAP.

Harnessing the Potential of “Big Data”: Citizen Science Projects

Citizen science is a concept where the general public participates in collaborative projects, for instance, by collecting their own data for research. This approach can be a way to gather large amounts of data on individuals at risk of developing AD. One such citizen science project is The Many Brains Project, with its research platform TestMyBrain.org. The Many Brains Project has generated some of the largest samples in cognition research, with over 2.5 million people tested since 2008, though the study is not specific to AD. Another initiative, the Models of Patient Engagement for Alzheimer's Disease study, is an EU-funded international project aiming to identify individuals with early AD who are not typically found in memory clinic settings within their communities. Through web-based cognitive screening tools, individuals from the public with an increased risk of AD are identified and invited to a memory clinic for a full diagnostic evaluation.

At Oxford University's Big Data Institute, researchers have developed the smartphone app Mezurio, which is included in several European studies (e.g., PREVENT, GameChanger, Remote Assessment of Disease and Relapse in Alzheimer's Disease study, BioFINDER-2). One of these, GameChanger, is a citizen science project with over 16,000 participants from the general UK population who complete remote, frequent cognitive assessments with the Mezurio app. Through this project, healthy volunteers can perform tests on their smartphones, thereby providing population norms for different age and demographic groups. Mezurio is installed on a smartphone and includes several game-like tests that examine episodic memory (Gallery Game and Story Time), connected language (Story Time), and executive function (Tilt Task), including multiple recall tasks and longer delays of up to several days. In a recent study investigating the feasibility of Mezurio in middle-aged participants, individuals showed high compliance, indicating that this app may be suitable for long-term cognitive follow-up.

In Germany, Dr. Berron and colleagues from the German Center for Neurodegenerative Diseases (DZNE) developed a Germany-wide citizen science project ("Exploring memory together") focused on the feasibility of unsupervised digital assessments in the general population. Beyond demographic factors that affect task performance, several everyday factors (e.g., taking the test in the evening versus during the day) could influence performance on remote unsupervised cognitive assessments. Preliminary results from over 1,700 participants (ages 18 to 89) identified important factors to consider in future remote studies, including time of day, the time between learning and retrieval, and (for one task) the screen size of the mobile device. They concluded that investigating memory function using remote and unsupervised assessments is feasible in an adult population.

Novel Data Collection Systems and Analysis Procedures

Other promising assessment tools being evaluated in various studies on preclinical AD involve the analysis of spoken language, eye movements, spatial navigation performance, and digital pen stroke data. Some of these tasks require separate equipment (e.g., an eye-tracker or digital pen), while others can utilize existing platforms or devices (e.g., device-embedded cameras, such as those in a personal laptop or a smartphone's front-facing camera).

Some instruments, such as commercial-grade eye-tracking cameras or digital pens, are not yet widely accessible, which hinders their broader implementation. Predominantly passive monitoring of cognition, like speech recording and eye movement tracking, may prove less stressful and time-consuming than traditional cognitive tests. Participants complete an objective cognitive task (e.g., drawing a clock or describing a picture) while subtle aspects of their performance are recorded (e.g., pen strokes, eye movements, language). This generates a large quantity of data about performance, which must then be refined or combined to identify relevant performance features. Using machine learning (ML) or deep learning, researchers have begun investigating whether automated analyses and classification of test performance based on specific criteria (e.g., biomarker or clinical status) can help in sensitive screening for preclinical AD. In a clinical context, ML can serve as a clinical decision support system, creating prediction models that achieve high accuracy in clinical diagnosis and aid in selecting patients for clinical trials at the early stages of dementia development.

Spoken Language Analysis and Automated Language Processing

Recent technical advancements have provided further insights into language deficits in preclinical AD, and several newly developed analysis tools are now available. For these instruments, speech production is typically recorded during spontaneous speech, verbal fluency tasks, or when describing a picture (commonly the Cookie Theft picture from the Boston Diagnostic Aphasia Examination). Speech is usually recorded using either a standalone audio recorder or an embedded microphone, then transcribed and analyzed using computer software. For example, during a picture description task, researchers can analyze aspects of speech such as verbs and nouns used in spontaneous speech, the complexity of sentences (represented by grammatical complexity and verb usage), word diversity, and the flow of speaking (e.g., speech repetitions and pauses).

Transcribing speech to text, a crucial part of speech analysis, is inherently time-consuming. However, there has been increasing effort to apply machine learning and deep learning technology to detect cognitive impairment in AD. For example, researchers from the European research project Dem@care and the EIT Digital project ELEMENT demonstrated that automated analysis of verbal fluency could distinguish between healthy aging and clinical AD. They also showed that vocal analysis using smartphone apps could automatically differentiate between different clinical AD groups. In the latter, speech was recorded while participants performed short verbal cognitive tasks, including verbal fluency, picture description, counting down, and a free speech task. This data was then used to train automatic classifiers for detecting mild cognitive impairment (MCI) and AD, based on machine learning methods. The question remains whether these methods can identify early cognitive impairment and decline in preclinical AD.

Intensive development of these methods is ongoing at several research sites. For example, researchers are exploring subtle speech-related cognitive decline in early AD through the European Deep Speech Analysis (DeepSpA) project. The DeepSpA project uses telecommunication-based assessment instruments for early screening and monitoring in clinical trials, employing remote semi-automated analysis methods. In the United States, researchers in the Framingham Health Study are recording and analyzing speech obtained during neuropsychological assessments (from over 8,800 examinations of over 5,376 participants). Similarly, in Sweden, researchers are recording speech from over 1,000 participants during neuropsychological assessments in the Gothenburg H70 Birth Cohort Studies. The studies mentioned above are collaborating with ki elements, a spin-off of the German Research Center for AI. Another company, Canadian Winterlight Labs, has developed a tablet-based app to identify cognitive impairment using spoken language. The app is currently being evaluated in Winterlight's Healthy Aging Study, an ongoing longitudinal normative study.

Preclinical AD Biomarker Validation

While most researchers have focused on patients with mild cognitive impairment (MCI), researchers from the Netherlands-based Subjective Cognitive Impairment Cohort recorded spontaneous speech in healthy individuals performing three open-ended tasks (e.g., describing an abstract painting). After manual transcription, linguistic parameters were extracted using a fully automated, freely available software (T-Scan). Using conventional neuropsychological tests, the participants performed within the normal range, regardless of their amyloid beta status (determined by cerebrospinal fluid Aβ1-42 or [18F]florbetapir-PET). Interestingly, a modest correlation was observed between abnormal amyloid beta levels and subtle speech changes, specifically fewer specific words used.

Eye-Tracking

In AD research, commercial-grade eye-tracking cameras have been shown to detect abnormal eye movements in clinical groups. These high frame rate cameras collect extensive data on eye movement behavior, including saccades (simultaneous eye movements) and fixation (where eyes focus). Eye movements can be recorded within specific tasks, such as reading a text or performing a memory test. For example, Peltsch and colleagues measured the ability to inhibit unwanted eye movements during a task using visual stimuli. This data can then be analyzed automatically using commercial software or manually by researchers. However, eye-tracking devices are currently expensive and not widely available in clinical settings. A potential solution may be the use of device-embedded cameras (e.g., in a laptop or tablet) to capture eye movements during tasks like memory tests. Bott and colleagues from Neurotrack Technologies demonstrated that low-cost, highly scalable device-embedded cameras (e.g., in a personal computer) are feasible for capturing valid eye-movement data of sufficient quality. In their study, eye movements were recorded in healthy participants during a visual recognition memory task. They observed a modest association between eye movements and cognitive performance on a paper-and-pencil composite. Importantly, both device-embedded cameras and commercial-grade eye-tracking cameras yielded robust data of sufficient quality. This suggests that device-embedded eye-tracking methods may be useful for further study of AD-related cognitive decline in healthy individuals. Beyond accuracy of performance, eye trackers provide data on additional eye movement behaviors, opening opportunities for new types of potentially meaningful outcomes.

Digital Pen

Digital pens resemble regular pens but contain an embedded camera and sensors that capture position and pen stroke data with high spatial and temporal precision. The data collected includes time spent with the pen in the air and on the surface, velocity, and pressure. This results in hundreds or thousands of data points and variables, in contrast to traditional paper-and-pencil measures where reaction time and accuracy are typically the main outcomes. Big-data techniques like machine learning (ML) can then be applied to these datasets to extract relevant signals. For example, Digital Cognition Technologies collected data from thousands of individuals completing the standard clock drawing test. They subsequently developed the digital Clock Drawing Test (dCDT), which features an extensive scoring system based on ML techniques that describes performance outcomes related to information processing, simple motor functioning, and reasoning, among many others. This approach allows researchers to capture an individual's inefficiencies in completing a cognitive task, even if their overall performance appears intact, which has the potential to systematically collect and analyze much more subtle aspects of behavior.

Preclinical AD Biomarker Validation

Preliminary results from a study of older adults showed that worse performance on the digital Clock Drawing Test (dCDT), particularly on a visuospatial reasoning subscore, was associated with a greater amyloid beta burden on PET scans. The dCDT also showed better discrimination between individuals with high versus low amyloid beta levels compared to standard neuropsychological tests included in a multi-domain cognitive composite.

Virtual Reality and Spatial Navigation

In virtual reality (VR)-based tests, participants perform tasks of varying complexity within computer-generated environments. These tasks are typically presented on computer screens (e.g., laptops or tablets) with which participants interact using a joystick, keyboard, touchscreen, or a VR head-mounted display. The Four Mountains Test is an example of a VR-based test available for use on an iPad. It measures spatial function by alternating viewpoints and textures of the four mountains' topographical layout within a computer-generated landscape. The clinical utility of this test has been demonstrated in clinical studies; however, its relationship to preclinical AD biomarkers is still unknown. Another example is a VR path integration task, developed by researchers from Cambridge University. In this task, participants are asked to explore virtual open arena environments. Using a professional-grade VR headset, participants are then asked to walk back to specific locations. In a clinical study, this task was superior to other cognitive assessments in differentiating mild cognitive impairment (MCI) from healthy individuals and correlated with cerebrospinal fluid biomarkers (total tau and amyloid beta). This task is currently being evaluated in a preclinical population with biomarker data.

The launch of the online mobile game Sea Hero Quest generated significant interest, with over 4.3 million people having played it to date. Deutsche Telekom collaborated with scientists from University College London and University of East Anglia to create this mobile game. The aim is to gather data to create population norms from several countries, enabling the development of easily administered spatial navigation tasks for AD detection. Preliminary results suggest that Sea Hero Quest is comparable to real-world navigation experiments, indicating that it captures more than just video gaming skills.

Preclinical AD Biomarker Validation

In a recent study, performance on the Sea Hero Quest mobile game was found to distinguish healthy aging from individuals genetically at risk for AD. Researchers used Sea Hero Quest performance in a smaller cohort genotyped for apolipoprotein E (APOE). Despite having no clinically detectable cognitive deficits, genetically at-risk individuals performed worse on spatial navigation. Wayfinding performance was able to differentiate between APOE carriers and non-carriers.

In another study, participants from the Knight Alzheimer's Disease Research Center completed a virtual maze task measuring spatial navigation. This virtual maze was created using commercial software and presented on a laptop, with participants navigating using a joystick. The mazes consist of a series of interconnected hallways with several landmarks. Their findings indicated that amyloid beta-positive status (cerebrospinal fluid Aβ42+) was associated with lower wayfinding performance. For inclusion in future studies, the spatial navigation task has been made available for remote use through a web-based interface.

Discussion

This systematic review highlights the abundance of digital assessment tools currently being evaluated in preclinical populations. Digital technology can be used to assess the subtle cognitive decline that characterizes biomarker-confirmed preclinical AD. Potential benefits include increased sensitivity and reliability, and it could provide value to individuals through enhanced accessibility, engagement, and reduced participant burden. Digital assessments may also have implications for clinical trials by optimizing screening, facilitating case identification, and offering more sensitive clinical outcomes. Several promising tests are currently under development and undergoing validation, but further work is needed before many of these can be considered equivalent to conventional in-clinic cognitive assessments. Researchers have begun to understand the reliability and validity of cognitive assessments obtained in naturalistic environments, which is necessary before widespread cognitive testing outside of research centers. Finally, more feasibility studies are required to investigate potential barriers to implementation, including challenges related to adherence, privacy, and data security.

Validation with Established Cognitive Composites

In addition to validation using preclinical AD biomarkers, an alternative method of validation involves comparing digital assessments against conventional cognitive measures used in large-scale studies. This type of validation can supplement biomarker studies or provide important preliminary data before embarking on more costly biomarker studies. A few assessment tools, including a tablet-based test, an eye-tracking assessment, and a smartphone app, have been validated against relevant global cognitive composites. This indicates some validity for further study of biomarkers in preclinical AD. However, given that the correlation between conventional cognitive composites and preclinical AD is already weak, validation against paper-and-pencil tests alone is not sufficient to claim that a test is suitable for measuring subtle cognitive changes in preclinical AD.

Potential of Digital Cognitive Assessment Instruments in Different Settings

The environments in which new technology is used impose different requirements on a test's capabilities. For instance, requirements are higher if test results are primary outcomes in a clinical study than if they are used simply for participant selection. Tablet-based tests, similar to traditional cognitive test batteries, have already been implemented in clinical trials. They are primarily designed to be administered with the help of a trained evaluator. Unsupervised and remotely administered tests have not yet shown sufficient robustness for this context, and concerns remain regarding their reliability, participant adherence, privacy, and user identification.

The various digital assessment instruments discussed in this review enable different uses. Supervised digital assessment instruments could provide robust outcomes in clinical trials, offering benefits such as automatic recording of responses and scoring, making it easier to follow study protocols, reducing the risk of error, and increasing consistency between evaluators. Remotely administered tests could serve as a cost-effective pre-screening tool before more expensive and invasive examinations, such as lumbar punctures and brain imaging, are recommended. In clinical trials, mobile devices could be used to identify individuals at the greatest risk of cognitive decline, who are most likely to benefit from a specific intervention. Close monitoring of cognitive function from a person's home environment may also enable high-quality evaluation of interventions.

Importance of Data Security, Privacy, and Adherence

As cognitive testing becomes increasingly digitized, regulatory authorities have raised concerns about data security and privacy. Pharmaceutical companies have also emphasized the importance of these issues. One such consideration is the storage and transmission of data between servers, which is crucial when data are stored and processed on servers not directly controlled by the study. When commercial companies are involved, questions can arise regarding data ownership and conflicts of interest.

Data protection in the United States is governed by several federal and state laws (e.g., the Patient Safety and Quality Improvement Act and the Health Information Technology for Economic and Clinical Health Act). In the European Union (EU), the General Data Protection Regulation (GDPR) governs data storage and processing in EU countries and affects scientific cooperation between countries both inside and outside the EU. Technological development places increasing demands on developers and researchers to become familiar with regulatory issues, especially now that new types of personal data are collected more extensively and across national borders.

Finally, an important and necessary focus is to ensure that data captured remotely in an uncontrolled environment is reliable and accurately reflects an individual's cognitive functioning. Here, the importance of adherence comes into play. Although there is increasing evidence that unsupervised testing can be done, large long-term health studies also indicate significant problems with participant attrition. Work remains to ensure valid and reliable results for participants performing unsupervised testing in large clinical trials.

Conclusion

This review highlights the wide array of digital assessment tools currently being evaluated in populations at risk for Alzheimer's disease. Digital technology offers a means to assess the subtle cognitive decline that defines biomarker-confirmed preclinical AD. Potential benefits include increased sensitivity and reliability, and it could provide value to individuals through greater accessibility, engagement, and reduced participant burden. Digital assessments may have implications for clinical trials by optimizing screening, facilitating the identification of cases, and providing more sensitive clinical outcomes. Several promising tests are currently in development and undergoing validation, but further work is needed before many of these can be considered alongside conventional in-clinic cognitive assessments. Researchers have begun to understand the reliability and validity of cognitive assessments obtained in naturalistic environments, which is essential before large-scale cognitive testing outside of research centers can commence. Lastly, more feasibility studies investigating potential barriers to implementation are necessary, including the challenges of adherence, privacy, and data security.

Open Article as PDF

Abstract

There is a pressing need to capture and track subtle cognitive change at the preclinical stage of Alzheimer's disease (AD) rapidly, cost-effectively, and with high sensitivity. Concurrently, the landscape of digital cognitive assessment is rapidly evolving as technology advances, older adult tech-adoption increases, and external events (i.e., COVID-19) necessitate remote digital assessment. Here, we provide a snapshot review of the current state of digital cognitive assessment for preclinical AD including different device platforms/assessment approaches, levels of validation, and implementation challenges. We focus on articles, grants, and recent conference proceedings specifically querying the relationship between digital cognitive assessments and established biomarkers for preclinical AD (e.g., amyloid beta and tau) in clinically normal (CN) individuals. Several digital assessments were identified across platforms (e.g., digital pens, smartphones). Digital assessments varied by intended setting (e.g., remote vs. in-clinic), level of supervision (e.g., self vs. supervised), and device origin (personal vs. study-provided). At least 11 publications characterize digital cognitive assessment against AD biomarkers among CN. First available data demonstrate promising validity of this approach against both conventional assessment methods (moderate to large effect sizes) and relevant biomarkers (predominantly weak to moderate effect sizes). We discuss levels of validation and issues relating to usability, data quality, data protection, and attrition. While still in its infancy, digital cognitive assessment, especially when administered remotely, will undoubtedly play a major future role in screening for and tracking preclinical AD.

Introduction

Alzheimer's disease (AD) involves major changes in the brain, like the buildup of amyloid beta (Aβ) protein into plaques and tau protein into tangles. These changes can start years or even decades before a person shows signs of memory loss or thinking problems. During this early, "preclinical" stage, a person's cognitive functions, such as memory and thinking skills, are mostly unaffected. However, as these brain changes grow over time, subtle declines in cognitive abilities may start to appear. Catching these small changes early in the preclinical phase is very important, as this period offers a promising chance to prevent the disease from worsening.

Associations Between Traditional Cognitive Tests and AD Biomarkers

For individuals who do not show clinical symptoms, abnormal levels of Aβ, detected in cerebrospinal fluid (CSF) or through brain scans (PET imaging), indicate the start of an early AD process. In this preclinical stage, the immediate link between Aβ levels and cognitive problems is usually weak or unclear. However, individuals with higher Aβ levels tend to experience a faster decline in cognitive function over time and often progress to a clinical stage more quickly than those with lower Aβ levels. This cognitive decline is subtle and may only be noticed over several years. For instance, some studies have shown small but consistent declines in cognitive scores over 18 months or annually in those with higher Aβ. The strongest connection between AD biomarkers and cognitive decline is typically seen in memory function, though declines in other areas like executive function and visual-spatial skills have also been reported.

Tau protein aggregation is the second major brain change in preclinical AD, also measurable in CSF and with PET imaging. Tau is generally considered more directly related to cognitive problems during the AD process than Aβ. In clinically normal individuals, higher tau levels have been linked to memory difficulties and ongoing cognitive decline. Since tau PET imaging is newer than Aβ PET, less is understood about the long-term relationship between tau and cognition, including later progression to clinical AD. Generally, individuals with higher tau levels face a greater risk of long-term cognitive decline. Importantly, this decline is several times faster in individuals who also have high Aβ levels.

Traditional vs. Digital Cognitive Assessment

The relationship between traditional paper-and-pencil cognitive tests and AD biomarkers in older adults who are clinically normal is complicated, with observed links often being weak, especially when measured at a single point in time. Over longer periods, these relationships are more consistently observed and are stronger, showing that clinically normal older adults with high biomarker levels do experience cognitive decline. The weak links between cognition and AD biomarkers might partly be due to the limitations of paper-and-pencil tests. Most of these tests were designed to identify clear cognitive impairment in patients, not subtle preclinical changes. Also, normal changes in cognitive performance, improvements from repeated testing, and cognitive reserve (the brain's ability to cope with damage) can make it harder to detect subtle cognitive decline.

Digital technology offers a way to address some of the limitations of current paper-and-pencil assessments. For example, mobile devices allow for more frequent testing, which provides more reliable and useful long-term data. They are also more accessible and affordable because they can be used without supervision. Computerized tests that automatically create different versions of the same test can help minimize the effects of practice or simply using the same test repeatedly. Furthermore, artificial intelligence (AI) methods, such as deep learning, enable faster, new, and potentially more sensitive analysis of cognitive data.

However, digital assessments also present new challenges. Many studies using remote assessments find it hard to keep participants engaged. Storing and sharing digital cognitive data raise concerns about data privacy, especially if devices collect additional personal information, like voice recordings. Digital assessments performed without supervision require systems to confirm that the person taking the test is indeed the correct participant. The rapid development of technologies and operating systems makes it difficult to choose and maintain a single version of a digital assessment over time. Lastly, while older adults are becoming more comfortable with new technology, a significant number may still be excluded from research using digital assessments due to a lack of familiarity, technical skills, or access.

Despite their potential, digital technology has not yet fully replaced paper-and-pencil assessments, particularly in clinical trials. Several questions remain unanswered: Does digital technology capture cognitive information similar to traditional "gold-standard" paper-and-pencil measures? Is there a fundamental difference between gathering data with a human examiner versus a device? How reliable and practical is digital technology? These questions are just beginning to be explored more widely as the use of digital technology quickly expands, for example, in research on preclinical AD.

Review Objectives

This document aims to systematically review the current state of digital cognitive tests for use in preclinical AD. It describes how these digital tests are validated against two key standards: (1) traditional paper-and-pencil cognitive tests and (2) biomarkers related to amyloid beta and tau pathology. Furthermore, the document critically discusses the benefits and drawbacks of using digital cognitive assessments in clinical trials and provides a future outlook for this field. The goal is not to offer a complete overview of mobile or computer testing for older adults in general, nor does it cover passive monitoring of cognition using sensors and wearable devices.

The content first describes the current understanding of how performance on traditional paper-and-pencil tests relates to AD biomarkers. Then, digital assessments are discussed, organized into three groups based on their technology and setting: (1) primarily in-clinic computerized and tablet-based tests, (2) primarily unsupervised smartphone- or tablet-based tests, and (3) new data collection and analysis methods (like digital pens, eye-tracking, and language analysis, including AI approaches). For each digital assessment, its validation against biomarkers and against paper-and-pencil measures is reviewed. Paper-and-pencil validation involves comparing digital measures to standard cognitive composites or domain-specific test scores.

Methods

Researchers searched three electronic databases and two online registries from January to December 2020 for relevant publications and trials. A second search was conducted using specific names of digital tests and companies identified in the first search. Two major conferences from 2020 were also reviewed for preliminary results. Studies using digital cognitive assessment were included if they involved individuals identified with preclinical AD. Preclinical AD was defined by biomarker evidence of amyloid plaque pathology (from PET scans or CSF Aβ42 levels) and/or tau pathology (elevated CSF p-tau or tau PET scans), corresponding to the earliest stages of the disease. Studies that only included participants with clinical diagnoses like mild cognitive impairment (MCI) or dementia were excluded.

A total of 469 articles were initially screened, with 458 being excluded for not meeting the criteria, leaving 11 for review. Two additional newly published articles and preliminary results from seven conference presentations were also included. The resulting body of literature was small, varied, and inconsistent in its methods, leading to a qualitative summary rather than a statistical meta-analysis.

Results

In-Clinic Computer and Tablet-Based Tests

A common approach in digital cognitive testing involves adapting traditional cognitive measures for computerized platforms. Examples include electronic versions of standard tests like the Wechsler Adult Intelligence Scale or the Montreal Cognitive Assessment (MoCA). Clinical trial companies have also adapted traditional tests for electronic use. These systems simplify scoring and recording, reducing common errors, but they do not fundamentally change how neuropsychological tests are designed. Several computerized cognitive tests have been developed to detect cognitive decline. These can be stand-alone applications or web-based programs usable on computers or tablets. Some are digital versions of traditional paper-and-pencil tests, while others are new tests designed to be completed without a human examiner. Examples include Savonix, BrainCheck, Cogniciti, Mindmore, BAC, NIH-Toolbox, CANTAB, and Cogstate. These vary in their approach, commercial status, security, and game-like features. They also target different populations and conditions. This section focuses on systems designed specifically to detect the earliest cognitive decline in AD.

The Cogstate digital cognitive testing system, developed in Australia, aims to reduce the impact of language and culture on cognitive assessment by using universal stimuli like playing cards. It measures response time, working memory, and visual memory. Cogstate also uses randomized alternative versions of tests to make changes over time more reliable by reducing the effects of practice. While initially designed for an examiner to administer, recent efforts have focused on remote administration, with tasks easy for individuals to complete independently. A recent study found high acceptance and usability for unsupervised Cogstate testing in a non-clinical group, showing low rates of missing data and similar test characteristics to supervised testing. A newer version of Cogstate tasks, the C3, includes additional measures sensitive to early AD changes. In a large group of older adults, C3 performance showed a moderate link to scores on a composite of paper-and-pencil measures, suggesting these computerized tasks are valid for studying cognitive decline in preclinical AD. Cogstate tests are used in several ongoing studies and clinical trials, including those for Alzheimer's prevention. Regarding preclinical AD biomarker validation, initial findings suggest a link between higher Aβ levels and slightly worse C3 performance in a large sample of clinically normal older adults. While some studies have not found a cross-sectional link between Cogstate tests and Aβ status, some long-term studies have shown that individuals with higher Aβ decline on these tests over time. For instance, an Australian study found declines in episodic and working memory over 36 months were linked to higher baseline Aβ levels. The absence of improvement from repeated testing in clinically normal individuals has also been linked to higher Aβ burden.

The computerized National Institutes of Health Toolbox Cognition Battery (NIH-TB) was created to provide researchers with easily accessible and affordable standard cognitive measures for various settings. It was a large project involving many scientists and institutions. The NIH TB-CB includes seven established neuropsychological tests adapted for a digital platform, assessing areas like attention, executive functions, language, processing speed, and memory. It was released in 2012 for computers and is now available on tablets, validated against standard neuropsychological measures and cognitive composites used in preclinical AD. An examiner is still generally needed for administration, though some tests can now be done remotely. The NIH TB-CB is used in several clinical trials and long-term studies focused on aging and early AD. One large ongoing project, ARMADA, aims to further validate the NIH Toolbox in diverse groups and make it more widely available for aging research. For preclinical AD biomarker validation, few studies have examined NIH TB-CB with these biomarkers. One study in clinically normal older adults did not find a link between AD brain imaging markers of Aβ and any NIH TB-CB tasks. However, it did find a weak link between measures of processing speed and executive functions and higher levels of tau pathology.

The Cambridge Neuropsychological Test Automated Battery (CANTAB) is a cognitive assessment tool designed to be independent of language and culture. Developed in the 1980s, it is now commercially available and used widely in clinical settings and trials, including aging studies. CANTAB uses mostly non-verbal tasks and assesses working memory, planning, attention, and visual episodic memory. It was initially for computers but is now available on tablets. CANTAB also offers an online platform for pre-screening patients. In one study of clinically normal individuals, higher Aβ levels were moderately linked to reduced memory recall and recognition on a CANTAB test, especially in younger adults. This effect became less pronounced as people aged and amyloid levels increased.

Remote Tablet and Smartphone-Based Tests

Recent trends show that most Americans aged 50 and older own smartphones, and this number is growing, mirroring similar trends in Europe. At the same time, more smartphone apps are being designed for cognitive assessment in older populations. The appeal of smartphone-based cognitive assessment for detecting and tracking preclinical AD is clear: it can reach many more people remotely than in-clinic assessments. It also allows for more frequent testing with potentially more sensitive cognitive tasks. Mobile technology enables cognitive assessment in a familiar environment, which may make the test results more relevant to real-life situations. Having participants use their own phones might better reflect their daily cognitive function. Improved real-world relevance is important as researchers and regulators emphasize showing how cognitive changes in preclinical AD are meaningful in a clinical context. Additionally, taking tests at home may reduce "white-coat effects" (where participants perform worse in a medical setting) and encourage individuals to track their own cognitive health, potentially increasing their commitment to well-being. Finally, for those in demanding clinical trials, remote testing can reduce the need for in-clinic visits, lessening the burden on participants and encouraging those in remote areas to join.

Despite the potential of smartphone-based assessment, several issues persist. These include challenges related to how practical they are (e.g., older adults' willingness to use smartphones for assessments, how consistently they use them, dropout rates, and privacy concerns). Validity is also a challenge (e.g., ensuring smartphone data matches gold-standard cognitive assessment data and confirming the identity of the person taking the test). Finally, reliability is a concern (e.g., variations between different hardware and operating systems, and reduced control over the testing environment).

Given the recent rapid growth in this area, observed trends for smartphone-based instruments, which are in early stages of development, include: (1) improving assessment reliability through frequent, momentary testing; (2) using mobile and repeated assessments to identify subtle drops in learning and practice effects; (3) focusing on cognitive processes more specific to decline in preclinical AD; and (4) leveraging the potential of large data collection. Validation data related to in-clinic cognitive assessment and AD biomarkers are discussed when available.

The feasibility of using mobile devices to capture cognitive function appears promising for short studies, despite the challenge of retaining participants in long-term remote studies. One study of over 1,500 clinically normal individuals who used a web-based version of Cogstate tasks showed high adherence and low rates of missing data, indicating good acceptance. Another study tracking 35 clinically normal participants daily for 36 days using a smartphone app found high completion rates. However, a report from eight digital health studies with over 100,000 participants noted significant participant dropout over time. Monetary incentives improved retention, and older participants tended to stay in studies longer, which is a good sign for preclinical AD research. Still, studies with in-clinic visits had the highest compliance, suggesting that long-term fully remote studies face significant challenges with participant retention.

Improving reliability through ambulatory/momentary cognitive assessment rests on the idea that single-timepoint assessments fail to capture the natural variability in human cognitive performance, which is affected by factors like mood, stress, or time of day. A promising approach to improve measurement sensitivity by reducing variability and increasing reliability is to capture a more representative sample of an individual's cognition by averaging performance over multiple assessments taken in quick succession. One smartphone app, Mobile Monitoring of Cognitive Change (M2C2), aims to capture cognition more frequently in uncontrolled, natural settings. It showed that short smartphone-based cognitive assessments of perceptual speed and working memory correlated with in-clinic performance, with a high proportion of performance differences attributed to individual variations, indicating excellent reliability with this approach. Similarly, the Ambulatory Research in Cognition (ARC) app, used in a major Alzheimer's study, allows participants to download the app onto their own devices and receive notifications to take short tests multiple times a day for a week. Early results suggest ARC is reliable, correlates with in-clinic cognitive measures and AD biomarkers, and is well-liked by participants. Further research is needed to determine if this frequent mobile data relates more strongly to AD biomarker burden and provides a more reliable measure of cognitive and clinical progression than traditional in-clinic assessments.

Using mobile and serial assessment to identify subtle decrements in learning and practice effects is a promising area. A reduced practice effect (less improvement upon retesting) has been suggested as a subtle sign of cognitive change before clear decline. Mobile technology allows for much more frequent repeated assessments. For example, one study gave iPads to participants to complete a challenging associative memory task monthly for a year. They found that less learning was linked to greater amyloid and tau PET burden in clinically normal individuals, with differences in memory performance appearing by the fourth test. Other work, using web-based memory tasks (Boston Remote Assessment for Neurocognitive Health – BRANCH), aims to move these learning tasks to smartphones and reduce the time between assessments to days. These tasks focus on cognitive processes supported by brain regions critical for AD-related memory changes. Similarly, the Online Repeatable Cognitive Assessment-Language Learning Test (ORCA-LLT) asks participants to learn new words daily over several days on their own devices. This study found that learning curves were significantly slower in clinically normal individuals with higher Aβ levels compared to those with lower levels. Assessing learning curves over short periods using smartphones could be a cost-effective screening tool to identify individuals likely to have positive AD biomarkers before more expensive tests. Such an approach might also quickly show if a new treatment benefits cognition, potentially allowing for quicker decisions in clinical trials. However, how these short-term learning curves can track cognitive progression over time still needs to be explored, requiring further validation studies.

Targeting relevant cognitive functions has become possible with AD biomarkers and insights from cognitive neuroscience. Researchers have focused on cognitive processes that might be more sensitive and specific to AD. For instance, a digital platform and mobile app called neotiv, developed by researchers in Germany, offers memory tests focusing on object and scene discrimination, pattern completion, and face-name association. These tests are designed to capture memory functions linked to specific brain regions affected by AD. Studies have shown links between performance on these tasks and measures of tau pathology and Aβ-PET signals. All these tests are available on a digital platform for unsupervised testing on smartphones and tablets. Participants download the app, undergo a brief training, and then complete tests according to a study schedule to collect frequent, long-term data. To minimize practice effects, new test sets with similar difficulty are created. The neotiv platform is currently used in several AD cohort studies.

Harnessing the potential of "big data" through citizen science projects involves the public in collaborative research, often by collecting their own data. This can generate large datasets on individuals at risk of AD. The Many Brains Project, with its TestMyBrain.org platform, has collected data from millions of people, though not specifically for AD. Another initiative, the Models of Patient Engagement for Alzheimer's Disease study, aims to find individuals with early AD who are not typically seen in memory clinics by using web-based cognitive screening. Researchers at Oxford University's Big Data Institute developed the Mezurio smartphone app, which is part of several European studies. One project, GameChanger, uses Mezurio for citizen science, with over 16,000 UK participants completing frequent remote cognitive assessments. This provides population norms for different age and demographic groups. Mezurio offers game-like tests for episodic memory, language, and executive function. A recent study found high compliance in middle-aged participants, suggesting the app's suitability for long-term cognitive follow-up. In Germany, a citizen science project called "Exploring memory together" focused on the feasibility of unsupervised digital assessments in the general population. Preliminary results from over 1,700 participants showed that factors like time of day, time between learning and recall, and mobile device screen size can affect performance on remote unsupervised cognitive assessments. This suggests that studying memory function remotely in adults is feasible, but these factors need consideration.

New Data Collection and Analysis Methods

Other promising assessment tools for preclinical AD studies include analysis of spoken language, eye movements, spatial navigation performance, and data from digital pens. Some of these methods require specialized equipment (like eye-trackers or digital pens), while others can use existing devices (like embedded cameras in laptops or smartphones). These instruments, such as commercial eye-tracking cameras or digital pens, are not yet widely accessible, which limits their broad use. Passive monitoring of cognition, like speech recording and eye-tracking, might be less stressful and time-consuming than traditional cognitive tests. Participants may complete a cognitive task (e.g., drawing a clock or describing a picture) while subtle aspects of their performance (e.g., pen strokes, eye movements, language patterns) are recorded. This generates a large amount of data that must then be analyzed to find relevant performance features. Researchers are using machine learning (ML) or deep learning to automatically analyze and classify test performance based on criteria like biomarker or clinical status, aiming for sensitive screening in preclinical AD. In a clinical setting, ML can support decisions by creating prediction models for accurate diagnoses and selecting patients for clinical trials in the early stages of dementia.

Spoken language analysis and automated language processing have gained new insights into language difficulties in preclinical AD, with several new analysis tools now available. Speech is typically recorded during spontaneous conversation, verbal fluency tasks, or picture descriptions, often using a standard "Cookie Theft" picture. The recordings are then transcribed and analyzed using computer software to examine aspects like the use of verbs and nouns, sentence complexity, word diversity, and speech flow (e.g., repetitions and pauses). Transcribing speech manually is time-consuming, but increasing efforts are applying machine and deep learning to detect cognitive impairment in AD. For example, automated analysis of verbal fluency has distinguished healthy aging from clinical AD, and voice analysis using smartphone apps has differentiated between clinical AD groups. These methods are being developed to identify early cognitive impairment and decline in preclinical AD. Intensive development is ongoing at several research sites, exploring subtle speech-related cognitive decline in early AD through projects using remote semi-automated analysis. In the US and Sweden, researchers are recording and analyzing speech collected during neuropsychological assessments from large participant groups. A Canadian company has developed a tablet app to identify cognitive impairment using spoken language, which is currently being evaluated. Regarding preclinical AD biomarker validation, most research has focused on MCI patients. However, one study recorded spontaneous speech in clinically normal individuals and found a modest link between abnormal Aβ levels and subtle speech changes (fewer specific words), even when participants performed normally on traditional cognitive tests.

Eye-tracking using commercial-grade cameras has shown the ability to detect abnormal eye movements in clinical AD groups. These cameras capture a wealth of data on eye movement behavior, including rapid eye movements (saccades) and fixed gazes (fixations). Eye movements can be recorded during specific tasks like reading or memory tests. This data can then be analyzed automatically or manually. However, eye-tracking devices are currently expensive and not widely available in clinical settings. A potential solution is to use cameras embedded in devices like laptops or tablets to capture eye movements during tasks. One study showed that low-cost, device-embedded cameras could capture valid, high-quality eye-movement data during a visual recognition memory task in clinically normal participants. They found a modest link between eye movements and cognitive performance on a traditional paper-and-pencil composite. This suggests that device-embedded eye-tracking methods could be useful for further study of AD-related cognitive decline in clinically normal individuals. Beyond accuracy, eye trackers provide additional eye movement data, opening new possibilities for meaningful outcomes.

Digital pens look like regular pens but contain an embedded camera and sensors that capture pen position and stroke data with high detail. This provides information on how long the pen is in the air or on the surface, its speed, and pressure. This results in hundreds or thousands of data points, unlike traditional paper-and-pencil measures which usually only provide reaction time and accuracy. Big-data techniques like machine learning can then be used to analyze these large datasets and extract important signals. For example, Digital Cognition Technologies developed a digital Clock Drawing Test (dCDT) based on data from thousands of individuals completing the standard clock drawing test. The dCDT uses extensive ML-based scoring to describe performance related to information processing, motor function, and reasoning. This approach allows researchers to capture subtle inefficiencies in completing a cognitive task, even when overall performance seems normal, providing a systematic way to analyze minute behavioral details. Preliminary results from a study of older adults found that worse performance on the dCDT, especially on a visual-spatial reasoning subscore, was linked to higher Aβ burden on PET scans. The dCDT also showed better ability to distinguish between individuals with high versus low Aβ compared to standard neuropsychological tests.

Virtual reality (VR) and spatial navigation tests involve participants performing tasks in computer-generated environments. These tasks are typically presented on computer screens with which participants interact using a joystick, keyboard, touchscreen, or VR headset. The Four Mountains Test is a VR-based test available on an iPad, which measures spatial function by showing different views of a computer-generated landscape. Its clinical usefulness has been shown in studies, but its link to preclinical AD biomarkers is still unknown. Another example is a VR path integration task where participants explore virtual open environments using a professional VR headset and are asked to navigate back to specific locations. In a clinical study, this task was better than other cognitive assessments at differentiating MCI from clinically normal individuals and correlated with CSF biomarkers (total tau and Aβ). This task is currently being evaluated in a preclinical population with biomarker data. The mobile game Sea Hero Quest has gained widespread interest, with over 4.3 million players. This game was created to collect data and establish population norms from various countries, enabling the development of easily administered spatial navigation tasks to detect AD. Early results suggest Sea Hero Quest is comparable to real-world navigation experiments, indicating it measures more than just video gaming skills. For preclinical AD biomarker validation, one study found that performance on Sea Hero Quest could distinguish healthy aging from individuals at genetic risk for AD. Despite having no clinically detectable cognitive problems, individuals at genetic risk performed worse on spatial navigation, and wayfinding performance could differentiate between those with and without a specific genetic risk factor. In another study, participants underwent a virtual maze task measuring spatial navigation. Findings indicated that Aβ positivity was linked to lower wayfinding performance. This spatial navigation task has been made available for remote web-based use for future studies.

Discussion

This review identified numerous digital assessment instruments currently being evaluated in preclinical populations, utilizing various platforms like tablets, smartphones, and external hardware. These assessment tools differ in their intended settings (e.g., remote vs. in-clinic), level of supervision (e.g., self-administered vs. supervised), and device ownership (personal vs. study-provided). Studies validating assessment instruments for more established platforms (e.g., computers, tablets) are more common than those for newer ones (e.g., smartphones). However, many newly developed tests are actively being evaluated in biomarker studies.

A crucial aspect of early detection in preclinical AD is a cognitive test's ability to identify subtle cognitive impairment or decline over time. Primarily in-clinic administered tests have shown a cross-sectional link to preclinical AD biomarkers, similar to traditional neuropsychological assessments, with weak to moderate effects. Long-term studies using conventional paper-and-pencil assessments have yielded mixed results, but most indicate subtle declines during the preclinical phases of AD. Remotely administered tests have been less explored, but several preclinical biomarker studies are underway. A few validation studies and preliminary results for smartphone-based memory tests show a relationship to tau pathology. A small but promising study using a remote web-based learning assessment found significantly slower learning curves in individuals with higher Aβ levels, warranting further investigation. Other novel assessment instruments, including speech analysis, eye-tracking, and VR, have shown potential for studying relevant preclinical AD biomarkers. Notably, preliminary results from a digital clock drawing test have demonstrated high sensitivity to changes in clinically normal individuals with positive AD biomarkers. Future long-term studies should include ongoing biomarker data and explore the validity of these tests for tracking changes in biomarkers over time. They also need to investigate the tests' ability to detect clinical progression from preclinical AD to MCI and dementia, which will require extensive long-term studies of clinically normal individuals.

An alternative way to validate digital assessments, besides using preclinical AD biomarkers, is to compare them against conventional cognitive measures used in large-scale studies. This type of validation can support biomarker studies or provide important initial data before costly biomarker studies are performed. Some assessment instruments, including a tablet-based test, an eye-tracking assessment, and a smartphone app, have been validated against relevant global cognitive composites, suggesting their potential for further biomarker studies in preclinical AD. However, because the correlation between conventional composites and preclinical AD is already weak, validation against paper-and-pencil tests alone is not enough to claim that a test is suitable for measuring subtle cognitive changes in preclinical AD.

The contexts in which new technology is used impose different requirements on a test's capabilities. For instance, the requirements are higher if test results are used as outcomes in a clinical study compared to their use for selecting participants for studies. Tablet-based tests, similar to traditional cognitive test batteries, have already been implemented in clinical trials. They are primarily designed to be administered with the help of a trained examiner. Unsupervised and remotely administered tests have not yet shown enough robustness for use in this context, and concerns remain regarding their reliability, how consistently participants use them, privacy, and user identification. The various digital assessment instruments discussed in this review enable different uses. Supervised digital assessment instruments could provide reliable outcomes in clinical trials, offering benefits such as automatic recording of responses and scoring, which makes it easier to follow study protocols, reduces the risk of error, and increases consistency between different examiners. Remotely administered tests could serve as a cost-effective initial screening before more expensive and invasive examinations, such as spinal taps or brain imaging, are recommended. In clinical trials, mobile devices could help identify individuals at the highest risk of cognitive decline who are most likely to benefit from a specific treatment. Close monitoring of a person's cognitive function from their home environment may also allow for high-quality evaluation of treatments.

As cognitive testing becomes increasingly digitized, regulatory authorities have raised concerns about data security and privacy. Pharmaceutical companies have also emphasized the importance of these issues. One consideration is how data is stored and transferred between servers, which is crucial when data is stored and processed on servers not directly controlled by the study. When commercial companies are involved, questions about data ownership and potential conflicts of interest may arise. Data protection laws, such as those at federal and state levels in the United States and the General Data Protection Regulation (GDPR) in the European Union, govern data storage and processing across borders. Technological advancements place increasing demands on developers and researchers to understand regulatory issues, especially as new types of personal data are collected more extensively and across different countries. Finally, a significant and necessary focus is ensuring that data captured remotely, in an uncontrolled environment, is reliable and accurately reflects an individual's cognitive functioning. This also highlights the importance of adherence, and while there is growing evidence that unsupervised testing can be done, large long-term health studies also indicate significant problems with participant dropout. More work is needed to ensure valid and reliable results for participants performing unsupervised testing in large clinical trials.

Conclusion

This review highlights the wide range of digital assessment instruments currently being evaluated in preclinical populations. Digital technology can be used to assess the subtle cognitive decline that characterizes preclinical AD confirmed by biomarkers. Potential benefits include increased sensitivity and reliability, and it could also offer value to individuals through improved accessibility, engagement, and reduced participant burden. Digital assessments may impact clinical trials by optimizing screening, making it easier to find suitable cases, and providing more sensitive clinical outcomes. Several promising tests are currently under development and undergoing validation. However, more work is needed before many of these can be considered equivalent to conventional in-clinic cognitive assessments. Understanding the reliability and validity of cognitive assessments obtained in natural environments has begun, which is necessary before large-scale cognitive testing can occur outside research centers. Lastly, more studies are needed to investigate potential barriers to implementation, including challenges related to participant adherence, privacy, and data security.

Open Article as PDF

Abstract

There is a pressing need to capture and track subtle cognitive change at the preclinical stage of Alzheimer's disease (AD) rapidly, cost-effectively, and with high sensitivity. Concurrently, the landscape of digital cognitive assessment is rapidly evolving as technology advances, older adult tech-adoption increases, and external events (i.e., COVID-19) necessitate remote digital assessment. Here, we provide a snapshot review of the current state of digital cognitive assessment for preclinical AD including different device platforms/assessment approaches, levels of validation, and implementation challenges. We focus on articles, grants, and recent conference proceedings specifically querying the relationship between digital cognitive assessments and established biomarkers for preclinical AD (e.g., amyloid beta and tau) in clinically normal (CN) individuals. Several digital assessments were identified across platforms (e.g., digital pens, smartphones). Digital assessments varied by intended setting (e.g., remote vs. in-clinic), level of supervision (e.g., self vs. supervised), and device origin (personal vs. study-provided). At least 11 publications characterize digital cognitive assessment against AD biomarkers among CN. First available data demonstrate promising validity of this approach against both conventional assessment methods (moderate to large effect sizes) and relevant biomarkers (predominantly weak to moderate effect sizes). We discuss levels of validation and issues relating to usability, data quality, data protection, and attrition. While still in its infancy, digital cognitive assessment, especially when administered remotely, will undoubtedly play a major future role in screening for and tracking preclinical AD.

INTRODUCTION

Alzheimer's disease, or AD, causes problems with memory and thinking. These changes, like the buildup of certain proteins in the brain, can start many years before a person shows any clear signs of memory loss. This early time, before symptoms appear, is called the preclinical stage. During this stage, thinking skills are mostly fine, but as the brain disease grows, very small changes in thinking can begin to show. Finding these small changes early is very important, as it offers a chance to try and stop the disease from getting worse.

Associations between paper-and-pencil cognitive measures and AD biomarkers in preclinical AD

In healthy-looking adults, unusual levels of a protein called amyloid beta (Aβ) are seen as an early sign of AD. This can be found by looking at fluid from the brain or by special brain scans. At this early stage, the link between Aβ levels and thinking problems is usually small or hard to see. However, people with more Aβ tend to show a faster decline in thinking skills over time and often move to a later stage of the disease sooner than those with less Aβ. This decline is very slight and can only be noticed over many years. The strongest link between these early AD signs and thinking decline is in memory, but other thinking skills like planning and visual understanding can also be affected.

Another main change in early AD is the buildup of a protein called tau. Like Aβ, tau can be measured in brain fluid or with special scans. Tau has been thought to be more closely linked to thinking problems as AD progresses. In healthy-looking people, higher tau levels have been linked to memory issues and a decline in thinking over time. While less is known about tau compared to Aβ, people with more tau generally have a higher risk of thinking decline. However, this decline is much faster in people who also have high Aβ levels.

Paper-and-pencil versus digitized cognitive assessment

The connection between traditional paper-and-pencil thinking tests and early AD signs is complex. Any links found are often weak, especially when looked at only at one point in time. When studies follow people over time, these links are seen more clearly, with healthy-looking older adults who have high levels of disease markers showing a decline in thinking. The weak links might be partly because paper-and-pencil tests were made to find clear problems in people who already had symptoms, not small changes in early stages. Also, normal changes in how well people think, getting better with practice, and the brain's ability to cope can hide these small declines.

Using digital tools for thinking tests can help with some of the problems of paper-and-pencil tests. For example, mobile phones allow testing more often, which gives more trustworthy information over time. They are also easier to use and cheaper because people can do the tests themselves. Computer tests can automatically create new versions, which helps stop people from getting better just by practicing. Smart computer programs (AI) can also help analyze the test data in new and more sensitive ways.

However, digital tests also bring new challenges. Many studies using remote testing find it hard to keep people interested. Storing and sharing digital thinking data raise questions about keeping personal information safe, especially if devices collect extra private details. When no one is watching, there needs to be a way to make sure the right person is taking the test. Fast-changing technology also makes it hard to use the same version of a digital test over time. Lastly, while more older adults are comfortable with new technology, a good number of people might be left out of studies using digital tests because they don't know how to use them or don't have access.

Digital technology has not yet replaced paper-and-pencil tests, especially in medical studies. This is because many questions remain: Can digital technology gather the same kind of thinking information as the best paper-and-pencil tests? Is it truly different to collect data with a person watching versus a device? How trustworthy and practical is digital technology? These questions are just starting to be answered as digital technology is rapidly used more in research on early AD.

METHODS

This review looked for studies published from January to December 2020 that used digital thinking tests for people with early Alzheimer's disease. The search involved looking at scientific papers and ongoing research trials. Studies were included if they looked at people with early signs of Alzheimer's in their brain, as shown by special tests. Studies that only included people with clear memory problems were not included.

Over 400 studies were checked, but only a small number met the rules to be included in the review. Because there were not many studies, and they were all quite different, the review provides a summary of what was found rather than a deep, number-based analysis.

RESULTS

Primarily in-clinic computerized and tablet-based cognitive assessment

One common way digital tests are used is by putting traditional thinking tests onto computers or tablets. These tests might include well-known ones like the Montreal Cognitive Assessment (MoCA), now available electronically. While these digital tests can help reduce mistakes by scoring automatically, they often do not change the basic way thinking is tested. Many computer tests have been made to find small changes in thinking. Some are digital versions of old paper tests, while others are new tests designed to be done without a tester present. Examples include systems like Cogstate and the NIH-Toolbox. These differ in how they work, how available they are, and whether they feel like games. They also target different groups of people. The focus here is on systems made to find the very first signs of thinking decline in AD.

Cogstate digital cognitive testing system The Cogstate system was created to make thinking tests less affected by language or culture. It uses playing cards for tasks about reaction time, working memory, and visual memory. It was first made for computers and now works on tablets. A key idea behind Cogstate tests is to get more reliable results over time by using different versions of tests to prevent people from just getting better with practice. While an examiner first gave the tests, efforts are now made for people to do them on their own. Studies have shown that people can use these tests well on their own, even without a tester. Newer versions of Cogstate tests include tasks sensitive to early AD changes. Studies show these computer tasks are good at measuring thinking and can be used to study thinking decline in early AD.

Preclinical AD biomarker validation In a large group of healthy older adults, those with high amyloid levels on brain scans showed slightly worse results on Cogstate tests. Other studies at a single point in time did not find a strong link between Cogstate test results and amyloid status. However, some studies have shown that people with high amyloid do decline on these tests over time. For example, in one study, people with high amyloid showed a decline in memory skills over three years. In another study, healthy people with high amyloid did not improve with practice on a learning task, unlike those with low amyloid. Not showing improvement with practice was linked to higher amyloid levels.

The computerized National Institutes of Health Toolbox Cognition Battery (NIH-TB) The NIH-TB was created to be an easy and cheap way for researchers to use standard, short thinking tests. Experts chose and adapted seven known brain tests for a digital platform. These tests check many thinking skills like attention, speed, memory, and language. It was released for computers and is now on tablets. To get correct results, a tester is usually still needed, but some tests can now be done remotely using a computer. The NIH-TB is used in several ongoing studies about aging and early AD.

Preclinical AD biomarker validation While some studies have looked at NIH-TB in older adults and people with dementia, few have focused on NIH-TB and early AD brain signs. One study in healthy older adults found no link between early AD signs and any of the NIH-TB tests. However, they did find a small link between how fast people processed information and how well they used their executive functions, and higher levels of tau protein in the brain.

The Cambridge Neuropsychological Test Automated Battery (CANTAB) CANTAB was designed to be a thinking test that works the same way for people from different cultures and who speak different languages. It mostly uses pictures and shapes instead of words. CANTAB measures working memory, planning, attention, and visual memory. It was first used on computers and is now available on tablets. CANTAB also has an online platform for finding people for studies.

Preclinical AD biomarker validation In one study, healthy adults had their brains scanned for amyloid and took a CANTAB memory test. Researchers found that in younger adults (ages 30 to 55), higher amyloid levels were somewhat linked to worse memory recall. This link became weaker as people got older and their amyloid levels increased.

Remotely administered tablet- and smartphone-based cognitive assessment

Surveys show that most older adults in the United States and Europe own smartphones, and this number is growing. At the same time, more smartphone apps are being made for thinking tests in older people. The idea of using smartphones for early AD detection is appealing because many people can be reached, and tests can be done often from home. This might make the results more like real-life thinking. Taking tests on one's own phone, rather than a study device, could also better show how someone thinks every day. Also, testing at home might make people less nervous than testing in a doctor's office. Remote testing also makes it easier for people to keep track of their own brain health. Lastly, for those taking part in demanding medical studies, fewer visits to the clinic mean less burden, which might encourage more people from faraway places to join.

However, even with the promise of smartphone tests, many problems remain. These include issues with people sticking with the tests, keeping information private, and making sure the results are true. Other concerns include ensuring that smartphone test results match the best traditional tests, making sure the right person is taking the test, and dealing with differences in phones and software.

Given the fast growth in this area, the focus is on new ideas for smartphone tests that are still being developed. These include making tests more reliable by checking thinking often throughout the day, using frequent testing to find small declines in learning, targeting thinking skills more specific to early AD, and using large amounts of collected data. Information about how well these tests match clinic tests and AD markers is discussed where available.

Feasibility of using mobile devices to capture cognitive function While keeping people in long studies is hard for remote testing, sticking to short studies seems promising. One recent study showed that people followed instructions well and had few missing data points when using a web-based version of some thinking tasks, meaning they found it easy to use. Another study found that most people completed short daily tasks on a smartphone app for over a month. More concerning, a report from several digital health studies found that many people drop out over time, which can affect how well the study results apply to others. Getting paid helped keep people in studies, and older people tended to stay in studies longer, which is good for early AD research. However, people stayed in studies longest if they also had in-clinic visits, suggesting that keeping people engaged in fully remote long-term studies is still a big challenge.

Improving reliability: ambulatory/momentary cognitive assessment The idea behind testing thinking often throughout the day is that a single test does not show how a person thinks normally, as mood, stress, or time of day can affect performance. Getting a true picture of someone's thinking by averaging results from many short tests done over a few days is a good way to make measurements more sensitive and reliable. Studies have shown that short smartphone-based thinking tests done in a normal environment are linked to thinking performance in a clinic. These frequent tests provide very trustworthy results.

Using mobile and serial assessment to identify subtle decrements in learning and practice effects Not getting better with practice, meaning a person does not show the usual improvement on a test when taking it again, has been suggested as a small sign of thinking change before clear decline. Mobile technology allows for much more frequent testing. For example, one study gave iPads to people to take home and do a challenging memory task once a month for a year. They found that less learning was linked to more amyloid and tau in the brain scans of healthy-looking people, with differences in memory showing up by the fourth time they did the test.

Another web-based system for memory tests was designed to move learning tests from study-provided tablets to smartphones and to shorten the time between tests (from months to days). These tasks focus on memory processes that are affected by AD, like remembering pairs of things. Testing learning curves over short times using smartphones could be a cheap way to find people who might have early AD signs before more expensive tests are done. For example, one study found that less improvement over one week was linked to much higher chances of having high amyloid. Future studies might show that learning curves can help tell who has early AD signs. Using remote smartphone tests to see learning curves quickly might help find out faster if a new treatment is working or not. However, how these repeated measures of learning over short times can track thinking decline is still being explored.

Targeting relevant cognitive functions While thinking decline in AD can vary, using AD signs and brain science has helped researchers focus on thinking processes that might be more specific to AD. For example, a group of researchers developed a mobile app (neotiv) with memory tests that look at telling apart similar objects and scenes, remembering patterns, linking faces to names, and recognizing complex scenes. These tests have been put into a digital platform for people to do on their own using smartphones and tablets. Recently, these tests have been shown to be linked to signs of tau protein buildup and also strongly linked to clinic-based thinking tests.

Participants download the app to their own devices and do a short training. They then get reminders to take tests based on a study schedule to track changes over time. To reduce the effect of practice, different test versions that are equally hard have been made. The neotiv platform is currently used in several AD studies.

Harnessing the potential of "big data": citizen science projects Citizen science is when the general public helps with research, for example, by collecting their own data. This can be a way to gather a lot of information on people who might be at risk for AD. One such project is The Many Brains Project, which has tested millions of people, though it is not only for AD. Another project aims to find people with early AD who are not usually seen in memory clinics by using online thinking checks.

At Oxford University, researchers developed a smartphone app called Mezurio, used in several European studies. One project, GameChanger, is a citizen science project where over 16,000 people from the UK take frequent remote thinking tests with the Mezurio app. This project helps create normal ranges for different age groups. Mezurio has game-like tests for memory and other thinking skills. A recent study found that middle-aged people used the app consistently, suggesting it could be good for tracking thinking over time.

In Germany, researchers developed a project focused on how well people in the general public can do unsupervised digital tests. They found that factors like time of day and screen size of the mobile device should be considered in future remote studies. They concluded that testing memory function remotely and without supervision is possible for adults.

Novel data collection systems and analysis procedures

Other new ways of testing being looked at for early AD include analyzing spoken language, eye movements, how well people find their way around, and digital pen data. Some of these need special equipment, while others can use a phone or laptop camera. Watching how people speak or move their eyes can be less stressful than traditional tests. For example, people might draw a clock or describe a picture, while subtle details of their actions (like pen strokes, eye movements, or words used) are recorded. This creates a lot of data that smart computer programs (ML or deep learning) can then analyze to find important features. These programs can help find early signs in AD. They can also help doctors make decisions and pick patients for studies at early stages of dementia.

Spoken language analysis and automated language processing New technology helps better understand language problems in early AD. For these tests, a person's speech is usually recorded during natural talking, word-listing tasks, or when describing a picture. The speech is then put into text and analyzed by computer software. For example, researchers can look at how many verbs and nouns are used, how complex sentences are, how many different words are used, and how smoothly someone talks (like pauses or repeats).

Turning speech into text takes a lot of time, but more effort is being made to use smart computer programs to find thinking problems in AD. For example, researchers have shown that computer analysis of word-listing tasks and voice analysis using smartphone apps can tell the difference between healthy aging and different stages of AD. These programs are trained to find changes based on what is recorded when people do short talking tasks. The question remains whether these methods can find very early thinking problems in preclinical AD.

Lots of work on these methods is happening in different research places. For instance, a European project is looking at small speech-related thinking declines in early AD using remote, partly automatic analysis. In the U.S. and Sweden, researchers are recording and analyzing speech from thousands of people during thinking tests. A Canadian company has also developed a tablet app to find thinking problems using spoken language.

Preclinical AD biomarker validation While most researchers have studied people with mild memory problems, a Dutch study recorded healthy-looking people describing abstract paintings. After their speech was written down, computers extracted details about their language. Although these people performed normally on standard thinking tests, regardless of their amyloid status, a small link was seen between unusual amyloid levels and subtle speech changes, such as using fewer specific words.

Eye-tracking In AD research, special cameras that track eye movements have been shown to find unusual eye movements in people with AD. These cameras collect a lot of data on how eyes move, including quick jumps and when eyes focus on areas. Eye movements can be recorded during tasks like reading or memory tests. This data can then be analyzed by computers or by researchers. However, eye-tracking devices are expensive and not widely available in clinics. A possible solution is to use cameras already in devices like laptops or tablets to capture eye movements during tasks. One study showed that using device-embedded cameras, which are cheap and can be used for many people, works well to get good quality eye movement data. In this study, healthy people's eye movements were recorded during a visual memory task. A small link was found between eye movements and performance on a paper-and-pencil thinking test. This suggests that using built-in cameras for eye-tracking might be helpful for studying AD-related thinking decline in healthy people. Besides how accurate someone is on a task, eye trackers provide data on other eye behaviors, opening doors for new ways to measure things.

Digital pen Digital pens look like regular pens but have a tiny camera and sensors that can record where the pen is and how it moves, very precisely and quickly. They can capture details like how long the pen is lifted, how long it's on the surface, its speed, and how hard it presses. This creates hundreds or thousands of data points, unlike traditional paper tests that only give simple times or accuracy scores. Smart computer programs can then be used to find useful information from these large datasets. For example, one company collected data from thousands of people doing a standard clock drawing test. They then developed a digital clock drawing test that uses computer learning to score performance in many ways, including thinking speed, simple movements, and problem-solving. This method allows researchers to find small struggles a person might have while doing a thinking task, even if their overall performance looks fine. This can help collect and analyze subtle behaviors in a clear, organized way.

Preclinical AD biomarker validation Early results from a study of older adults found that worse performance on the digital clock drawing test, especially on tasks involving seeing shapes and understanding space, was linked to more amyloid in brain scans. It was better at telling apart those with high versus low amyloid than standard thinking tests.

Virtual reality and spatial navigation In virtual reality (VR) tests, people do tasks in computer-made worlds. These tasks are usually shown on computer screens or with VR headsets. One example is The Four Mountains Test, which is available on iPad. It measures how well people understand space by showing different views of mountains in a computer landscape. This test has been shown to be useful in clinical studies, but its link to early AD signs is not yet known. Another VR task, developed by researchers from Cambridge University, asks people to explore virtual open areas and then walk back to specific spots using a VR headset. In one study, this task was better than other thinking tests at telling the difference between mild memory problems and healthy people, and it was linked to brain fluid markers of AD. This task is now being tested in people with early AD signs.

The online mobile game "Sea Hero Quest" has been played by over 4 million people. This game was created to collect data and make normal ranges for different countries, helping to develop easy spatial navigation tasks to find AD. Early results suggest that Sea Hero Quest is similar to real-world navigation, meaning it measures more than just video gaming skills.

Preclinical AD biomarker validation In a recent study, performance on the Sea Hero Quest mobile game was able to tell the difference between healthy aging and people who are at genetic risk for AD. Even though people at genetic risk showed no clear thinking problems, they did worse on spatial navigation tasks. How well they found their way around could tell apart those with the specific risk gene from those without it.

In another study, people with high amyloid in their brain fluid did worse on a virtual maze task that measured how well they found their way around. This virtual maze was done on a laptop using a joystick. For future studies, this spatial navigation task has been made available to use remotely online.

DISCUSSION

This review shows that many digital assessment tools are now being looked at for people in the early stages of Alzheimer's disease. These digital tools can help find the small thinking declines that show up in people with early AD signs. They offer benefits like being more sensitive to changes and more trustworthy. They could also help people by being easier to get, more engaging, and less effort to use. For medical studies, digital tests could make it easier to find people for studies and provide better ways to measure how well treatments work. Many promising tests are still being developed and checked, but more work is needed before they can be used as widely as traditional clinic-based thinking tests. Researchers are starting to understand how reliable and true the results are when thinking tests are done in everyday settings. This understanding is needed before these tests can be used widely outside of research centers. Lastly, more studies are needed to look at problems with using these tools, such as people sticking with the tests, privacy, and keeping data safe.

Validation with established cognitive composites

Besides checking digital tests against early AD brain signs, another way to check them is to compare them to standard thinking tests used in large studies. This can support studies with brain signs or provide important early information. A few digital tests, including tablet-based tests, eye-tracking, and a smartphone app, have been checked against these standard thinking measures. This shows they might be good for further study of brain signs in early AD. However, because the link between standard thinking tests and early AD signs is already weak, simply checking against paper-and-pencil tests is not enough to say that a test is good for finding small thinking changes in early AD.

Potential of digital cognitive assessment instruments in different settings

The way new technology is used affects what the tests need to be able to do. For example, tests used to measure how well a drug works in a medical study need to be more precise than tests used just to pick people for studies. Tablet-based tests, like traditional thinking test sets, are already used in medical studies. They are usually given with the help of a trained person. Tests done without supervision and remotely have not yet been shown to be strong enough for this use, and there are still concerns about how reliable they are, how well people stick with them, privacy, and making sure the right person is taking the test.

The different digital thinking tests discussed in this review can be used in various ways. Supervised digital tests could provide strong results in medical studies, with benefits like automatically recording responses and scoring, making it easier to follow study rules, reducing mistakes, and making sure different testers get the same results. Remotely given tests could be a cheap way to check people before more expensive and invasive exams, like spinal taps and brain scans. In medical studies, mobile devices could help find people at the highest risk of thinking decline who would benefit most from a specific treatment. Closely watching people's thinking from their homes might also help evaluate treatments well.

Importance of data security, privacy, and adherence

Because digital thinking tests are being used more, government groups have raised concerns about keeping data safe and private. Drug companies have also stressed how important these issues are. One concern is how data is stored and sent between computers, especially when data is stored on computers not directly controlled by the study. When private companies are involved, questions can come up about who owns the data and conflicts of interest.

In the United States, laws protect data privacy. In Europe, the General Data Protection Regulation (GDPR) sets rules for data storage and processing. This affects how scientific groups work together across countries. As technology grows, it demands that developers and researchers learn about these rules, especially now that new kinds of personal data are being collected more often and across country borders.

Finally, it is very important to make sure that data collected remotely, where no one is watching, is trustworthy and truly shows how someone is thinking. This is where how well people stick with the tests comes in. Even though there is more proof that unsupervised testing can be done, large long-term health studies also show that there are big problems with people dropping out. More work is needed to make sure results are true and reliable for people taking unsupervised tests in large medical studies.

CONCLUSION

This review shows that many digital thinking tests are being looked at for people in the early stages of Alzheimer's. Digital technology can be used to find the small changes in thinking that happen when someone has early AD signs. It has benefits like being more sensitive to changes and more trustworthy. It could also help people by being easier to get, more engaging, and less effort to use. For medical studies, digital tests could help with better checking, finding cases more easily, and providing better ways to measure how well treatments work. Several promising tests are currently being developed and checked, but more work is needed before many of these can be considered as good as traditional clinic-based thinking tests. Researchers have started to understand how reliable and true the results are when thinking tests are done in everyday settings. This understanding is needed before these tests can be used widely outside of research centers. Lastly, more studies are needed to look at problems with using these tools, such as people sticking with the tests, privacy, and keeping data safe.

Open Article as PDF

Footnotes and Citation

Cite

Öhman, F., Hassenstab, J., Berron, D., Schöll, M., & Papp, K. V. (2021). Current advances in digital cognitive assessment for preclinical Alzheimer’s disease. Alzheimer’s & Dementia, 13(1), e12217-n/a. https://doi.org/10.1002/dad2.12217

    Highlights