Cognitive measures used in adults with multiple sclerosis: A systematic review

ABSTRACT Cognitive problems are common in people with Multiple Sclerosis (MS), and researchers and clinicians have used a vast array of measures to assess cognition. Our aim was to systematically identify cognitive measures routinely used in MS research, and outline their different uses. Previous recommendations of cognitive measures to use in MS have relied on expert consensus approaches. We believe this systematic review is a starting point for an evidence-based approach to recommend cognitive tests for use with people with MS. We systematically searched electronic databases using relevant search terms for studies that assessed cognitive functioning in MS (last search in February 2020). From 11,854 abstracts retrieved, based on title and abstract review, 2563 remained. Data were extracted from 1526 studies. Studies used 5665 measures of cognition, with 316,053 people with MS. Substitutional style tests, serial addition tests, and word list learning tests were the most commonly used individual tests, and the Brief Repeatable Battery of Neuropsychological Tests was the most commonly used battery. Some of the most frequently used measures were potentially inappropriate due to measuring irrelevant domains of cognition, and issues with sensitivity. Further research is needed to ascertain the psychometric properties, and acceptability of measures for people with MS.

of disability from the onset, and Secondary Progressive MS (SPMS) in which the majority of people with RRMS eventually develop slowly progressively disability (Lublin et al., 2014). Although there is no cure, there are several disease-modifying treatments available. In addition, symptoms of MS can be managed using medical treatments, cognitive rehabilitation, physiotherapeutic treatments, lifestyle changes, among others (Samkoff & Goodman, 2011).
One of the common symptoms of MS is cognitive impairment, which is reported to occur in 43-70% of people with MS (DiGiuseppe et al., 2018;Rao et al., 1991). Cognitive impairment can have an adverse effect on many aspects of an individual's life, including the ability to work and socialize (Rao, Leo, Ellington, et al., 1991), treatment adherence (Bruce et al., 2010), and overall quality of life (Glanz et al., 2010). Predominantly impaired cognitive domains are memory and learning (including immediate and delayed recall and recognition), complex attention including information processing speed, and executive functions (Chiaravalloti & DeLuca, 2008;Ferreira, 2010;Prakash et al., 2008). Recent research has also identified deficits of prospective memory in pwMS, which is associated with other impaired domains including attention and executive functions (Rouleau et al., 2018). Social cognition. including Theory of Mind and emotion recognition, may also be impaired, however research has been limited (Cotter et al., 2016). Cognitive impairment can worsen with progression of MS (Achiron, 2005), and is unlikely to remit (Amato et al., 2006). Cognition is associated with physical and psychological symptoms in MS (Amato et al., 2019), with poor performance on cognitive assessments frequently found to be significantly associated with the common symptoms of fatigue and depression (Hansen & Lautenbacher, 2017).
Poor performance on tests of processing speed, visual and verbal memory is associated with difficulties carrying out everyday life tasks (Goverover & DeLuca, 2015), and in particular, slowed information processing speed is associated with poor outcomes in activities of daily living and quality of life (Costa et al., 2017). Executive function is considered a higher-order domain, involving subdomains of decision-making and planning, and thus impairment in this domain may have a negative impact on vocational and other activities for people with MS (Drew et al., 2008). A recent recommendation has supported domain-specific management of cognitive problems as opposed to more generalized approaches (DeLuca et al., 2020), and novel and improved methods of rehabilitation require appropriate outcome measures to determine their efficacy. Valid and reliable cognitive tests can additionally be used to identify cognitive issues at an early stage of MS, to assess progression of cognitive impairment.
Standard neurological reviews may not identify specific details of cognitive functioning and their significance (Romero et al., 2015), and appropriate, standardized cognitive measurement tools for use with people with MS are therefore required to detect changes to cognitive function accurately. A recent narrative review identified a need for incorporating neuropsychological assessment into routine care for people with MS (Sumowski et al., 2018). However, due to the high prevalence of cognitive impairment in MS and limited resources in clinics, it may not always be feasible to conduct a full neuropsychological assessment for all patients at routine clinic visits. Therefore, there is a need for both comprehensive test batteries and shorter tests that can reduce the resources needed, including the level of expertize required to use and interpret results. Tests also need to be acceptable to clinicians and patients.
A survey of North American clinical neuropsychologists (Rabin et al., 2005) reported that the most frequently used measures across different clinical populations were Wechsler Adult Intelligence Scale (WAIS), the Wechsler Memory Scale (WMS), the Trail Making Test (TMT), and the California Verbal Learning Test (CVLT). However, as this survey did not specify tests used with different neuropsychological conditions, it is not possible to determine whether this reflects what psychologists are using with people with MS in particular. A more recent UK-wide survey (Klein et al., 2018), which specifically focused on cognitive assessments for people with MS, found that the most commonly used measures by healthcare professionals working clinically with people with MS (including neuropsychologists, neurologists, and occupational therapists) were the Montreal Cognitive Assessment (MoCA) and the Addenbrooke's Cognitive Examination (ACE-R). Generic measures may assess cognitive domains not commonly impacted in MS, such as orientation. Conversely, they might not include measurements of commonly impacted domains, such as the lack of a measure of information processing speed in the MoCA. These studies show that generic measures are prevalently used in clinical practice, and it is unclear why some measures are chosen over others.
In addition to generic measures that are not specific to MS (such as the MoCA), a number of MS-specific batteries of cognitive measures have been developed, and both comprehensive and brief screening batteries exist. The Brief Repeatable Battery of Neuropsychological Tests (BRB-N) was one of the first batteries developed specially for use with people with MS, comprising a selection of tasks from a longer battery that were shown to be sensitive to cognitive impairment (Rao et al., 1991). An expert panel then developed the Minimal Assessment of Cognitive Function in MS (MACFIMS, Benedict et al., 2003), a comprehensive battery of tests which expanded on the BRB-N to include tests with better established validity. Although the 90-minute administration time may make it impractical to use in some research and routine MS clinical practices, an abbreviated screening version, the aMACFIMS, is available. A recommendation of an assessment with shorter administration time of 15 minutes emerged from an expert committee rating individual cognitive tests (Langdon et al., 2011), however the full methodology and results have not been published. More recently, recommendations have promoted the use of the proprietary Symbol Digit Modalities Test (SDMT) as a brief, sensitive measure processing speed impairment in people with MS, which is one of the most predominantly impaired domains (Kalb et al., 2018). The original administration of the SDMT consists of both written and verbal versions of the same task, however as dexterity problems can impact the results of the written task, the verbal administration is thought to be most appropriate, and is frequently used in MS populations (Benedict et al., 2017) and has been found to be valid and reliable (Jaywant et al., 2018).
The recommendations for cognitive measures, screens and batteries discussed above have used consensus approaches. These used expert panels comprising neuropsychologists, neurologists, clinical psychologists, researchers and Patient and Public Involvement to make recommendations, including that of full batteries (MACFIMS, Benedict et al., 2003), and more brief assessment (Kalb et al., 2018;Langdon et al., 2011). All approaches stated the need for full neuropsychological batteries to be administered for further insights if needed but highlighted the necessity for screening tools at a minimum, to promote routine assessment.
Following on from consensus recommendations, it would be useful to know which cognitive assessments are being used in research and clinical practice with people with MS. This might influence future consensus recommendations. There are ongoing concerns whether MS-specific batteries and tests (i) are able to evaluate the range of cognitive domains impaired in people with MS, particularly executive function, processing speed and memory (Hansen & Lautenbacher, 2017) and (ii) may lack ecological validity (Korakas & Tsolaki, 2016). Furthermore, in some healthcare and research settings, the use of proprietary measures may not be practical due to the financial implications of using such measures.
The aim of this review was to systematically compile and describe the neuropsychological measures that have been used in research with people with MS, and to examine the modality of administration and contexts in which they have been used. This review is the first step in an evidence-based approach to selecting measures for use in cognitive assessment with people with MS and complements existing consensus-based approaches.

Methods
The study protocol was prospectively registered with PROSPERO International Prospective Register of Systematic Reviews (Registration number CRD42018103384)

Search strategy and selection criteria
The systematic review followed the guidelines for the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA, Moher et al., 2009). We searched the following databases: MEDLINE, EMBASE, CINAHL, PsycINFO and Web of Science, for studies published in English from inception to February 2020 and used relevant key words including, but not limited to: Multiple Sclerosis, Cogniti*, Neuropsych*, Screen, Assessment, Test, Outcome. The search strategy was developed for all databases by one reviewer (HE) and independently reviewed by four members of the team (RdN, GT, AD and NE). The database searches were limited to adult, human studies, and texts in the English language.
Studies were included if they used psychometric measures to assess cognitive functioning in MS or screen participants into a study. We included measures that were performed face-to-face, and measures which were administered electronically or via telephone. The target population for studies was adults (aged 18 and over) who had been diagnosed with any form of MS. We included individual measures, screening tests, and batteries of cognition, measures of single and multiple domains of cognition, and both subjective and objective measures of cognition. There were no constraints for the setting in which the cognitive assessment took place.
Initial title and abstract screening were completed by one reviewer (HE), and full-texts were divided between two reviewers for data extraction given the volume of the literature (HE & CA). Any disagreements were resolved by discussion and arbitration with a third reviewer (GT, RdN, AD or NE). Studies were excluded based on the hierarchy shown in Figure 1.
A formal quality assessment was not conducted due to the absence of a single appropriate assessment for all studies, as well as the impracticality of undertaking this given the considerable number of studies included in this review. As we aimed to capture the breadth of measures being used and their frequency of use in a variety of contexts, including a range of research methodologies, the quality of the studies would have been difficult to compare, and this would therefore been of limited use. However, we did capture pertinent issues, including whether studies specified the names of measures used or how measures were administered.
Despite restricting the searches to adult and English language papers, some papers that did not meet these criteria were still present in citations extracted and were therefore removed through screening in the iterative steps of the title and abstract review, and the full-text review. In cases where two or more publications reported results of the same dataset, only the most recent study was included for data extraction.

Data extraction and analysis
Our main interest was to identify the measures of cognition being used with people with MS, including how the measure was used (e.g., as a screening measure, or as an independent or dependent variable in a study), and the modality the measure was administered in (e.g., in-person, electronically). We also extracted information about the participants' type of MS, the country in which the study took place if a single centre study, or if it was a multicentre study, how many respondents completed the measure, and if it was a patient or proxy-reported measure. For the purpose of analysis, the different measures were characterized into a number of classes for tests so that we could understand what measures were used most often, despite the names and ascriptions given. Examples of the classes used are serial addition tests in which respondents add up numbers as they are presented (e.g., the Paced Auditory Serial Addition Test (PASAT)); substitutional tests, where respondents are presented with an array of stimuli (usually digits, symbols or letters) and asked to match them correctly with similar stimuli using a key (e.g., SDMT). The classes, named examples of these, and a description of the class are described in Table 1. These classes were determined through an assessment of how the assessments were described conceptually in relevant literature, or as described within the study. Once all individual assessments used in studies were classified, total numbers and frequency of use was calculated for each class, or multidomain test including batteries.

Results
The searches produced a total of 11,854 papers to be screened, from which data from 1526 studies were extracted for analysis. The PRISMA flowchart is shown in Figure 1. Full-text review was conducted by two reviewers on a subsample of randomly chosen (n=50) manuscripts, to ensure uniformity of data extraction. We found a high level of agreement qualitatively.
The studies were conducted in 52 countries (50% conducted in Europe, and 39% from North America), and included 29 multicentre studies. Included studies collected data from 316,053 people with MS. All phenotypes of MS were reflected in the studies; most of the studies (66%) used a mixed sample of people with MS, and 29% of studies enrolled people with RRMS. Only 4% of the studies specifically enrolled people with progressive forms of MS.
There was a total of 5665 cognitive measures used. Of these measures, 93% (n=5260) were "objective" measures (for example, memory or processing speed tests administered by a researcher or clinician), and 7% (n=370) were "subjective" (for example, self-report or proxy questionnaires). Thirty-five studies did not specify the names or provide sufficient description of the measures or procedures they used to classify them. However, some of these unnamed measures did report whether they were objective or subjective and were included in the overall counts. Approximately 18% (n=912)  were named batteries or screening tests of cognitive functions consisting or two or more individual assessments (for example, the MoCA is a brief screening test but includes a trail making task, a verbal fluency task, word list learning among others).

Individual tests of cognition
There were 504 different individual measures used, however many of these overlapped considerably. For example, verbal fluency tests that used a variety of stimuli, such as different letters of the alphabet, were given different names, including the Controlled Oral Word Association Test (COWAT), F-A-S test, or letter fluency tests.
The most frequently used measures were the substitution-based tests (12% of all individual tests) such as the SDMT, and serial addition tests (11%) such as the PASAT, followed by tests which require participants to learn a list of words and recall them (immediately and/or after a delay; 7.8%) and verbal fluency tests (7.5%), such as the COWAT, in which participants are asked to name as many words that belong to a certain category (such as animals), or words beginning with a specific letter of the alphabet. The domains of cognitive impairment measured by the most frequently used measures relate to those experienced in MS, including information processing speed, attention, executive function, and aspects of memory including working and short-term memory.

Multi-domain tests and batteries
Measures were operationally defined as a named collection of tests if they were a named screening test (e.g., MoCA) or a battery (e.g., BRB-N) that included one or more measure that is considered an individual test. Sixty-one different multi-  Table 2). The Multiple Sclerosis Functional Composite (MSFC) only includes one neuropsychological measure (the PASAT) and was therefore included as an individual measure rather than a battery. The MoCA was frequently used to screen participants into studies (37%) or to report on cognitive functioning or between group differences (38%). However, 25% of the studies using the MMSE used it as a dependent or independent variable. Consensus recommendations have advised the use of the MACFIMS and BICAMS as cognitive batteries for use with people with MS. The MACFIMS was recommended by consensus in 2003 and the BICAMS in 2011. The BRB-N was developed in 1990 and was used 10 times before the recommendation of the MACFIMS in 2003. Since then, the BRB-N has continued to be popular, with 217 studies using it, compared to 72 studies using MACFIMS. Even since the publication of BICAMS, the BRB-N has remained popular, being used in 151 studies, whereas the BICAMS was used in 62 studies.

Subjective measures of cognition
A breakdown of the subjective, patient (or proxy) reported measures and their frequency of use is shown in Table 3. There was a total of 370 uses of named subjective measures in the studies, 87% were completed by the people with MS, and 11% were proxy reported (8% were carer/family reported, and 3% were clinician reported). Eighteen studies did not specify who completed the subjective measure of cognition. The most frequently used subjective measures were the Multiple Sclerosis Neuropsychological Questionnaire (MSNQ), the Perceived Deficits Questionnaire (PDQ), and Fatigue Scale for Motor and Cognitive Functions (FSMC). There were 29 studies that used a self-report measure that was not reported by name, or not a formally named measure (for example, a Likert scale or visual analogue scale of cognitive functioning not attributed to a specific measure).

Uses of cognitive measures
Cognitive measures were used for myriad reasons, and these were grouped into four categories (see Table 4). Many of the measures were used for two or more purposes in the study (for example to screen participants into the study, and also as a dependent variable). Cognitive measures were predominantly used to report on cognitive functioning or as a between-groups measure in people with MS, and as a dependent variable. There were 27 studies that used measures in ways that did not fall into these predefined categories and these were recorded separately. Twelve were recorded as using the cognitive measures in a training programme for cognitive rehabilitation for people with MS, and other uses included matching healthy controls with people with MS, power calculations, and to derive the minimal clinically important difference. Some studies did not report on the cognitive measures they administered.

Administration of cognitive measures
Where available, we extracted data on how the cognitive measures were administered. Approximately half of the 5670 measures used in the studies were not described in enough detail to extrapolate how they were administered. Of the remaining 2680 measures, the majority were administered in-person (87%) by a healthcare professional, technician or researcher, and 11% used a computerized administration. Remaining modalities consisted of postal and telephone administrations. The 53 measures completed by post were all subjective, self or proxyreported measures, however, the 12 telephone administered measures were a mix of subjective and objective measures, including verbal fluency tests, word list learning, and digit span tests.

Discussion
We identified a wide range of different individual cognitive tasks, screening tests and batteries that cover a number of cognitive domains, and despite recommendations for the use of specific measures in research with people with MS, their use is still far from being ubiquitous. Previous recommendations have focused on minimum cognitive screening for all people with MS, given the high prevalence of impairment (Kalb et al., 2018;Korakas & Tsolaki, 2016). Assessment should be able to identify impairment at an early stage, as well as becoming a part of routine monitoring. It is therefore important for assessments to be relevant, available and accessible, and acceptable to both people with MS and their clinicians.
Only 34% of the studies that used a neuropsychological multi-domain test or battery used one specifically developed for people with MS. Two of the most frequently used of these are screening assessments which are not specific to MS, the MMSE and the MoCA, which have been criticized for floor and ceiling effects, and not being sensitive or specific enough to fully identify and reflect changes in cognitive impairment in people with MS (Beatty & Goodkin, 1990). The MMSE does not include measures of processing speed or executive functioning and is not considered an adequate assessment for use in MS clinical practice (Tobin, 2019), particularly due to inability to detect mild cognitive impairment.
Consensus-based recommendations for use of cognitive tests in MS have stated the need for both brief screening tests with the availability of comprehensive neuropsychological assessment in MS where needed. Quick screening tests of cognition (MMSE, BICAMS and MoCA, which take less than 30 minutes to complete) and longer assessments were reflected in the results, with eight batteries taking between 30 and 100 minutes to administer. While longer assessment may not be practical due to the high prevalence of fatigue in MS (Hadjimichael et al., 2008) and the limited time available in clinics, the MS-specific batteries are important to capture a full neuropsychological evaluation of cognitive impairment.
The BICAMS has been translated into a number of languages since being recommended, however the extent of the validations is limited due to its recency, and a lack of development of translations and linguistic validation could mean that uptake of this battery has been limited. Conversely, the availability of the BRB-N in European countries, shown by the range of published translations and validations, is likely to make it more appealing to researchers and perhaps explain its high use in Europe (76% of uses of the BRB-N).
An expert consensus recently recommended the use of the SDMT as a minimum in people with MS (Kalb et al., 2018), and the high volume of use of this measure suggests that it is both acceptable and an adequate measure of processing speed for people with MS. The SDMT has been shown to have good validity and reliability (Benedict et al., 2017), as well as measuring one of the most frequently affected cognitive domains, that of information processing speed. There is some limited evidence that the results of the SDMT are related to factors impacted by cognitive impairment, including employment status and activities of daily living (ADLs) in people with MS (Benedict et al., 2005). However, the extent of this association has yet to be systematically reviewed. In the studies included in our review, there were several substitution style tests used with different names (for example, the Symbol Search Test; the Digit Symbol Coding Test, Coding Test), which may have been chosen over the SDMT due to cost or licencing requirements, over the psychometric properties. The omission of information about the administration of substitution style tasks in many of the included studies also demonstrates the importance of including pertinent issues about the outcome measures chosen. The use of variants and different administrations of these measures could mean that the generalizability of individual study findings is brought into question unless cross-validation has been considered. Norming data for the SDMT would be of limited value for similar tasks or different administrations, and so researchers and clinicians may use invalid comparisons to a normative sample.
The high usage of serial addition measures in research is unsurprising, given the inclusion of the PASAT in the MSFC. However, more recent endorsements have moved away from using this measure, owing to its potential for inducing anxiety (Aupperle et al., 2002), stress and frustration (Locke et al., 2011). Performance on this task may be affected by IQ and mathematical ability, which cannot be corrected without further and extensive data collection (Wills & Leathem, 2004). It is also likely that using the PASAT would lead to a high proportion of missing data due to participants refusing to undergo or complete the task. Participants who do not wish to complete this task have been shown to be associated with higher levels of cognitive impairment (Cortés-Martínez et al., 2019), so using the PASAT could lead to significant missing data from some of the most cognitively impaired. These studies affirm the importance of choosing measures that are acceptable and accessible to people with MS.
Subjective measures of cognition in the review made up approximately 7% of the measures. Research has found that self-report measures of cognitive impairment in people with MS are more strongly related to depression than performance on objective cognitive measures, however proxy-reported measures tends to correlate higher with cognitive measures and may be valuable in assessing cognitive functioning (Benedict et al., 2004). Despite this, only 11% of subjective measures were proxy-reported, and it is possible that researchers and clinicians do not trust subjective reports of cognition. However, subjective measures can overcome many of the disadvantages of objective measures. Self-and proxyreported measures can be less onerous on the participant and administrator, have fewer variables to control, may be more acceptable to the participant, and could reflect a global aspect of cognition that is distinct from that assessed in other ways. Just as mood may affect responses to subjective measures, fatigue and dexterity issues may affect performance on objective measures of cognition.
Just under half of the studies had no administrative details of the assessments used (for example, verbal, written, or electronic), and 35 of the cognitive measures were not described well enough to ascertain the name, or what class of measures was used. Replication of these studies would be challenging. There was also considerable difficulty in classifying many of the measures due to the lack of information given in the papers, and this may have had an impact on the frequency of the classes and batteries. Both visual acuity and motor dexterity can be impaired in MS (Lau et al., 1998), and cognitive assessments often involve the visual processing of stimuli, with participants giving motor (for example, written) or verbal responses. The written administrations of assessments can be affected by manual dexterity issues in MS, and while the verbal administrations are often used to overcome this, such as the oral administration of the SDMT, studies often did not specify this. Whilst some studies excluded participants with problems with vision and dexterity, many did not report how such issues would be overcome for the assessments. The missing information would make replication of these studies difficult.
Our finding associated with the frequency of use of tests should be treated with caution. This is based on simple counts, and the selection criteria for this review, which excluded translation and validation papers, may have meant that some papers were not counted here. Another limitation of this review is the missing information on the measures that were extracted for each study. There were some aspects of the use of cognitive measures in research that were outside of the scope for this review, such as the domains purported to be measured in the studies. Future search strategies could include specific cognitive domains and the names of measures, but this may inflate the number of articles that will need to be searched.
A formal quality assessment of individual studies was not considered as an inclusion criterion because we wanted to capture the breadth and context of cognitive measures. An unexpected finding was that, for many studies, we were unable to ascertain what measures were used, or how the measures were administered, for example, data regarding the types of electronic devices used or where the respondents completed the measure, was missing in many primary studies. Devising a quality assessment checklist for future similar reviews may add further insight into these replicability issues. A further limitation of this review was that we only included papers that were published in English. Research conducted and published in other languages may have provided pertinent information regarding measures that are popular globally.
It was clear from the results of the systematic review that there are many cognitive measures being used in research with people with MS, however some, for example substitution and serial addition tasks, are prominent. The measures that were used ranged from short, individual screening tests to lengthy batteries that require trained staff to administer, as well as subjective measures that could be completed by the people with MS, or a proxy. Cognitive measures were administered in a variety of modalities. One of the impacts of COVID-19 has been a shift towards telemedicine, and for neuropsychology to respond to this shift, research is needed on how cognitive assessments perform in remote administrations. We believe that this shift will be long lasting, beyond the COVID-19 pandemic, and are being implemented in clinics (e.g., www.neuroms.org). Evidence of validity and reliability for computerized cognitive tests and batteries for people with MS is emerging (Wojcik et al., 2019), but these do not guarantee equivalence between all potential modalities (for example, selfadministered vs. technician administered; keyboard and mouse vs touchscreen), and new normative data may be required (Bauer et al., 2012). We recommended that researchers and clinicians ensure that the measures used in all settings are appropriate to assess cognitive impairment in MS, and are valid and reliable.
More research is needed to determine the acceptability of MS-specific tests from clinicians' and patients' perspectives, which may increase the use of and response to specific tests. Computerized adaptations especially require further evidence of their psychometric properties, as well as acceptability to patients. Systematic reviews are also needed to examine the association between the cognitive tests and functional outcomes, to improve clinicians' and patients' engagement with these tools.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by a grant from the UK Multiple Sclerosis Society (grant number 70).