Pain and Physical Functioning in Neuropathic Pain: A Systematic Review of Psychometric Properties of Various Outcome Measures

A range of outcome measures across various domains are used to evaluate change following an intervention in clinical trials on chronic neuropathic pain (NeP). However, to capture a real change in the variable of interest, the psychometric properties of a particular measure should demonstrate appropriate methodological quality. Various outcome measures in the domains of pain and physical functioning have been used in the literature for NeP, for which individual properties (eg, reliability/validity) have been reported. To date, there is no definitive synthesis of evidence on the psychometric properties of those outcome measures; thus, the aim of this systematic review was to evaluate the methodological quality [COnsensus‐based Standards for the selection of health status Measurement INstruments (COSMIN) guidelines] of studies that evaluated psychometric properties of pain and physical functioning outcome measures used for NeP.


& Abstract
Introduction: A range of outcome measures across various domains are used to evaluate change following an intervention in clinical trials on chronic neuropathic pain (NeP). However, to capture a real change in the variable of interest, the psychometric properties of a particular measure should demonstrate appropriate methodological quality. Various outcome measures in the domains of pain and physical functioning have been used in the literature for NeP, for which individual properties (eg, reliability/validity) have been reported. To date, there is no definitive synthesis of evidence on the psychometric properties of those outcome measures; thus, the aim of this systematic review was to evaluate the methodological quality [COnsensus-based Standards for the selection of health status Measurement INstruments (COS-MIN) guidelines] of studies that evaluated psychometric properties of pain and physical functioning outcome measures used for NeP.
Methods: Specific MeSH/keywords related to 3 areas (pain and/or physical functioning, psychometric properties, and NeP) were used to retrieve relevant studies (English language) in key electronic databases (MEDLINE (Ovid), CINAHL (EBSCO), Scopus, AMED, and Web of Science) from database inception-July 2012. Articles retrieval/screening and quality analysis (COSMIN) were carried out by 2 independent reviewers. Results: Twenty-four pain and thirty-seven physical functioning outcome measures were identified, varying in methodological quality from poor-excellent. Conclusion: Although a variety of pain and physical functioning outcome measures have been reported in the literature, few have demonstrate methodologically strong psychometric properties. Thus, future research is required to further investigate the psychometric properties of existing pain and physical functioning outcome measures used for clinical and research purposes. &

INTRODUCTION
Neuropathic pain (NeP) is defined by the International Association for the Study of Pain's Neuropathic Pain Special Interest Group (NeuPSIG) as "pain arising as a direct consequence of a lesion or disease affecting the somatosensory system". 1 A range of assessment guidelines have been developed from the Initiative on Methods, Measurement and Pain Assessment in Clinical Trials (IMMPACT), 2 the European Federation of Neurological Societies (EFNS), 3 and the NeuPSIG 4 for NeP clinical trials and for clinical practice. These guidelines advocate a range of measures for assessing the core domains of pain, quality of life, mood, sleep, and functional capacity (physical, cognitive, emotional, and social). This notwithstanding, a variety of outcome measures are available for the above-stated domains. 2 To evaluate the applicability of these measures, a systematic review of psychometric properties of available outcome measures used in published trials may provide a useful basis for selecting the best measurement instrument for a specific purpose. 5,6 Individual assessment of psychometric properties of available outcome measures is important. 7,8 As part of this, in reviewing the evidence on available outcome measures, it is important to assess the methodological quality of those studies that investigated psychometric properties. 9 While in clinical practice, adoption of outcome measures will depend on feasibility of use (speed, ease of use, and limited need for an overly sophisticated instrument), 10 emphases should also be given to measures which are proven to be reliable, valid, and responsive/interpretable for a given population.
Pain remains a leading cause of disability at the individual level, associated with functional losses as well as mood disturbances. 11 Thus, the focus of this systematic review will be in evaluating the psychometric properties of various outcome measures used in the domains of pain and physical functioning in NeP. On examination of the literature, a number of outcome measures have been identified in which have been used to measure pain intensity and physical function in NeP trials; 5,7,8,12 however, there is limited conclusive evidence on their psychometric properties. Use of reliable and valid outcome measures can help to better evaluate the patient's outcomes in terms of pain and physical functioning, enabling better management, including the earliest appropriate management to minimize risks of comorbidities and disabilities.
Existing evidence on the psychometric properties of pain and physical functioning outcome measures used in NeP trials have not previously been systematically reviewed. The aim of this systematic review was to systematically review and identify the gaps in literature for the evaluated psychometric properties (reliability, validity, responsiveness, and interpretability) of identified outcome measures for "pain and physical functioning" as recommended by the IMMPACT guidelines in NeP population. This review involved a systematic search of the literature. The findings of the current study may assist in outlining the effective intervention strategies for patients with NeP. The objectives of this systematic review were as follows: (1) Systematically review and identify the type of established psychometric properties for the identified outcome measures quantifying pain and physical functioning in neuropathic pain populations; (2) Evaluate the methodological quality of the included studies investigating the psychometric properties of the identified outcome measures in the domain of pain and physical functioning in neuropathic pain populations in accordance with the COnsensusbased Standards for the selection of health status Measurement INstruments (COSMIN) checklist with 4-point scale.

Information Sources
A systematic search was conducted following the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines. The following electronic databases were searched: Ovid MEDLINE, CINAHL, Scopus, AMED, and Web of Science (WOS) (from database inception to July 31, 2012). The search update engine from the available databases was activated to be familiar with the new searches in the current field, since the original search.

Search Strategy
The keywords and MESH headings in 3 broader areas (pain and/or physical functioning outcome measures, psychometric properties, and NeP) were used in the development of a search strategy (Table 1). Several strategies were used to develop a comprehensive list of keywords/MeSH terms/subject headings representing each area. For outcome measures, all pain and physical functioning outcome measures that were used in clinical trials of NeP were chosen. For psychometric properties, we chose the standardized terminologies used by the COSMIN framework. 6 For the terms relating to NeP, MESH terms/keywords indexed for neuropathy, neuralgia, and neurodynia were used. Words within each theme were combined with OR and across themes with AND. This search strategy was amended for different databases as necessary.

Study Selection
Articles identified in the search underwent a series of screening processes. Firstly, duplicate articles were removed. Two reviewers (PM and LC) independently selected and screened articles for potential eligibility at the title and abstract stages. Full-text articles of all potentially eligible abstracts were retrieved for application of the eligibility criteria. Disagreements between the reviewers regarding inclusion of individual studies were discussed during a consensus meeting and, when unresolved, were resolved by discussion with other reviewers (PH, CC, and GDB). References of the selected papers were further explored for relevant articles.

Eligibility Criteria
Cross-sectional studies and longitudinal cohort studies, which included at least 1 assessment of a psychometric property of a pain or functional outcome measure in a NeP population (NeP as defined by the Clinical Resource Efficiency Support Team-CREST) 13 , were included. The adopted search strategy revealed 2 distinct categories of evaluations: one intended for screening or diagnosis; the other developed to measure outcomes. As the focus of this review was to investigate the psychometric properties of tools used to measure changes in the status of either pain or functional outcomes over time, screening or diagnostic tools were excluded. Studies published as case report, editorial, or reviews were also excluded. Only articles published in the English language and on humans were selected.

Data extraction and Synthesis
A systematic approach to data extraction was carried out by independent reviewers (PM and LC/PH/CC/ GDB), with equal number of articles randomly distributed among the team members. Each member extracted the data from the allotted articles, which were then checked for accuracy, with consensus meetings and opinions from other reviewers to resolve any disagreements. The following data were collected and tabulated from each of the included articles: study reference, participant characteristics, outcome measures studied, and type of psychometric properties tested (reliability and/or validity) ( Table 2). Further summary of identified outcome measures with their published psychometric properties and COSMIN grading were synthesized (Tables S1 & S2). Results from excellent and good methodological quality studies based on COSMIN criteria (as stated in Table  S3) were used to formulate recommendations for acceptable psychometric properties scores (for definitions of acceptable, good and excellent scores see Table S3).

Methodological Quality of Individual Studies Reporting on Psychometric Properties
Whereas a variety of tools are available to measure the methodological quality of studies that report on scale development and assessed psychometric properties, the COnsensus-based Standards for the selection  (Table 3) consists of "A to J" 10 boxes: (Internal consistency-Box A; Reliability-Box B; Measurement error-Box C; Content validity-Box D; Structural validity-Box E; Hypotheses testing-Box F; Cross-cultural validity-Box G; Criterion validity-Box H; Responsiveness-Box I; Interpretability-Box J), with 5 to 18 items concerning methodological standards for how each measurement property should be assessed. According to COSMIN guidelines, the methodological quality of a study is considered adequate if all items in a box (A to J) were considered adequate. For this, each item was scored on a 4-point rating scale (ie, "poor," "fair, "good," or "excellent"). The primary investigator (PM) independently scored all articles and the results were discussed and consensus obtained with each relevant team member. Methodological quality was determined using the "lowest rating score" 6 achieved by any item for the representative psychometric property. Therefore, if one criterion for any property scored "poor", the methodological quality for that particular property was rated as "poor" overall, irrespective of the scores that other criteria achieved. Disagreements regarding COS-MIN scoring were resolved by discussion between reviewers. Reviewers were not blinded to the journal affiliation or authors of the included articles. Figure 1 illustrates the study selection process. The search resulted in 10,913 articles. After accounting for duplicate removal, title screening, and abstract screening, 80 articles were identified and retrieved as potentially eligible for the review. While checking the eligibility of full-text articles, a further 16 articles were excluded from the review as 2 articles were editorial papers; 2 were commentary papers; 5 articles were based on cancer pain; 3 articles were PhD publications; and for the remaining 4, full-text articles were not available. Thus, a total of 64 articles satisfied our eligibility criteria and were included in this review.

Characteristics of Included Studies
In total, 64 studies reporting 61 different outcome measures were identified. The included studies evaluated the psychometric properties of pain outcome domains (n = 24) and physical function outcome domains (n = 37), ( Table 2). For the 24 pain intensity outcome measures, 15 (63%) measures were patient-reported/ self-reported measures, and the other 9 (37%) were the therapist/clinician completed measures. For the 37 physical function outcome measures, 17 (46%) measures were patient-reported/ self-reported measures, that is, symptomatic assessment (subjective), 9 (24%) measures were performance-based measures, and the rest of the 11 (30%) measures were therapist completed measures, that is, symptoms and signs (subjective and objective testing). The synthesis of results per/outcome measure, their published psychometric properties, and quality assessment scores for studies are detailed in Tables S1 & S2. Data on the characteristics of the study population and sample population were extracted on the interpretability and generalizability boxes provided by the COSMIN checklist. Information regarding the sample size and gender distribution is reported in Table 2.

Pain intensity Outcome Measures
Pain domain outcomes (Tables 2 and S1) included the following: Brief Pain Inventory Scale for Diabetic Peripheral Neuropathy; 15 Complex Regional Pain Syndrome Severity Score; 16 Diabetes Symptom Checklist Type-2; 17 Foot Function Index (pain subscale); 18 Italian Neuropathic Pain Symptom Inventory; 19 McGill Pain Questionnaire; 20 modified Toronto Clinical Neuropathy  70,71 Step Activity Monitor; 72 Step Activity Monitor (4-min walk test); 73 Sheehan Disability Scale; 74 Sollerman Hand function test; 59 Turkish version of the Boston Questionnaire; 75 Ulnar Neuropathy at the Elbow Questionnaire; 76 12-Item Multiple Sclerosis Walking Scale; 77 Walking Stairs Questionnaire; 68 Work stimulation tasks (knob turn, Linear motion, and Lever arm); 78 and Zoster Impact Questionnaire. 45

Methodological Quality of Studies Evaluating Psychometric Properties of Pain Intensity and Physical Functioning Outcome Measures
Reliability. The majority of the instruments included in our review were not tested for all psychometric properties listed on COSMIN checklist. Forty-four of the 64 studies (68%) assessed various forms of reliability (Internal consistency, inter-rater reliability, intrarater reliability, test-retest reliability, and measurement error) and showed a mixed methodological quality of evidence (excellent/good/fair/poor), when evaluated on COSMIN (Tables S1 & S2). The key results for reliability showed that the BPI-DPN and the SF-MPQ2 have excellent (a > 0.90) internal consistency. The mTCNS has good internal consistency (a = 0.81 to 0.90), interrater reliability, and intrarater reliability (ICC or К = 0.81 to 0.90). The hot and cold pain thresholds on the QST have good inter-rater and test-retest reliability (ICC or К = 0.81-0.90). The Spanish NPSI has excellent internal consistency (a > 0.90) with good test-retest reliability (ICC or К = 0.81 to 0.90).
Measurement error was the least reported form of reliability, and the TRNDSI had good test-retest reliability (ICC or К = 0.81 to 0.90) and measurement error (see Table S1). These measures with excellent and good psychometric properties scores also scored good/ excellent on the COSMIN checklist (according to COSMIN criteria stated in Table S3).

Validity.
Validity was the more frequently tested psychometric property, in 49 of 64 studies (76%), there was face/content validity, structural validity, construct validity, criterion/concurrent validity, convergent validity, discriminative validity, hypothesis testing, and responsiveness. Similar to the findings for reliability, mixed methodological quality evidence (excellent/good/fair/ poor) was found when evaluated on COSMIN (Tables S1 and S2). The key results for validity showed that the NPSI, the SALSA, and the UNEQ have excellent content validity as there were no concerns raised by the patients or experts regarding the wording of questionnaires, and thus, no further modifications were advised. The UENS has the best criterion validity followed by the HAP and the mNDS. Approximately one-third of the studies (18/49, 36%) evaluated responsiveness form of validity. The NPS has excellent responsiveness followed by the 0 to 10 PI-NRS and the ODSS. Also, the studies showing these evidences were of excellent/good methodological quality on the COSMIN checklist (as according to COSMIN criteria stated in Table S3).

DISCUSSION
To our knowledge, this is the first systematic review to evaluate the evidence for the psychometric properties of pain and physical functional outcome measures used in assessment in NeP conditions and to identify the methodological quality of the studies investigating the psychometric properties of various outcome measures. A total of 61 different outcome measures were identified related to the domains of pain and physical functioning. In this systematic review, while most of the studies have shown good/excellent evidence of reliability and validity of the used scales, only few are considered "excellent to good" in terms of their methodological quality. Our review identified acceptable reliability and validity (for a few key properties) for the mTCNS, the TRNDI, the 0-10 PI NPS, the QST, the SALSA, the Spanish NPSI, the ODSS, the SF-MPQL, the UNEQ, the UENS, the HAP, the mNDS, the NDS, and the BPI-DPN.
The available studies investigating the psychometric property of reliability were rated in varying methodological quality from "poor" to "excellent" on the COSMIN checklist. However, the majority of studies showed similar methodological shortcomings. In this review, smaller sample sizes were found to be associated with the majority of inconsistent results. According to COSMIN guidelines, 6 a sample size of ≥ 100 is considered to be an adequate/excellent sample size, given the need for precision in the overall estimates; these estimates are based on the power 0.80. 25,79 A sample size of 50 provides a 0.70 power (level of significance being 0.05), while 100 has a power of 0.94. 25 In the current systematic review, many outcome measures seem promising for different domains of reliability and validity (according to COSMIN criteria stated in Table S3), as the FFI, the NTSS-6, the AMHFQ, the DASH, the HAP, the ISS, the MHQ, the PEM, the SDS, the TBQ, the UNEQ, and the Walk-12 scales have "moderate" (a > 0.71 to 0.80) to "excellent" (a > 0.90) published grades for internal consistency. However, when the methodological quality of the studies was evaluated on COSMIN, these were graded of "poor/fair" quality because of the small sample size. These findings are consistent with those of a recent systematic review on outcome measures in neck pain, where smaller sample sizes frequently led to poorer results. 80 This current review recommends that future research on a larger sample size (n ≥ 100, as recommended by COSMIN) is needed to improve the quality of research on these measures.
Validity was the most frequently evaluated psychometric property in both pain and physical functioning outcome domains. The majority of these studies demonstrated unsatisfactory (poor/fair scores) results on COSMIN. The main reasons for this were inconsistencies in the following areas: smaller sample sizes; hypotheses were not formulated; and expected direction/magnitude of correlations was not stated in advance. Other common findings were a lack of information about reporting of missing items, and measures adopted to handle missing data. Although these 2 items did not contribute to the overall "poor" grading on the COSMIN, it is expected that studies of "good" methodological quality should report this construct, as a high number of missing items can introduce bias.
A further interesting finding of this review was that responsiveness was the least frequently studied psychometric property for the included pain and physical functioning outcome measures. There were a total of 18 studies which published the findings on responsiveness and only 3 scales-the NPS, the 0 to 10 PI-NRS, and the ODSS proved satisfactory methodological quality on COSMIN. The remaining measures were graded "fair to poor", and all the above-stated shortcomings (small sample size, un-reporting of missing items, vagueness about how the missing data were handled, not wellformulated hypothesis, etc.) equally contributed to the inconsistent results for the studies reporting on this property.
In the current systematic review, there were few measures identified which had promising psychometric properties for key variables: the mTCNS (good internal consistency, inter-rater and intrarater reliability, and criterion validity); the TRNDSI, and the ZBPI (good test-retest reliability); the NPSI (excellent face/content validity); the 0 to 10 PI-NRS (good responsiveness); the QST-pain threshold (good intrarater and test-retest reliability); the NPS (excellent responsiveness); and the SALSA (excellent internal consistency and content validity), and were supported by a "excellent to good" methodological quality on the COSMIN checklist. The future use of these measures can be recommended based on their proven psychometric properties; however, it is imperative that other remaining psychometric properties of these outcome measures should also be established.
We also identified a list of instruments which showed their best methodological quality for few psychometric properties on COSMIN, but at the same time good methodological quality evidence was lacking for other properties: the TCSS (good construct validity, but poor inter-and intrarater reliability); the Short-form MPQ-2 (excellent internal consistency, but fair construct validity and responsiveness); the HAP (good criterion validity, with poor internal consistency and responsiveness, and fair hypothesis testing); the ODSS (good responsiveness but fair inter-rater and intrarater reliability and construct validity); the UNEQ (excellent content validity, fair test-retest reliability, and poor internal consistency, construct validity, and responsiveness); the TBQ (good construct validity, fair test-retest reliability, and poor internal consistency); the UENS (excellent criterion validity, with poor inter-rater reliability and responsiveness); and the BPI-DPN (excellent internal consistency and discriminative validity, fair construct validity and poor criterion validity). As study methodology may influence results for psychometric properties, it is recommended that further evaluation of these psychometric properties with studies of improved methodological quality should be carried out.

LIMITATIONS
Firstly, it is acknowledged that "Neuropathic Pain conditions" is an umbrella term which covers a range of different conditions such as diabetic neuropathy, trigeminal neuralgia, and postherpetic neuralgia. 81 For the search strategy, MESH terms/keywords indexed for neuropathy, neuralgia, and neurodynia were used to be as inclusive as possible. It is acknowledged that each condition could have been separately searched and that such an approach may have lessened the chances of missing studies.
Secondly, psychometric properties such as reliability and validity, including responsiveness, are subclassified into various forms such as internal consistency, interrater/test-retest reliability, content validity, minimal important difference, and standard error of measurement, etc. 82 For the current search strategy, keywords in 3 broader areas (reliability, validity, and/or responsiveness) were used rather than individual subclassified keywords. However, as these broader terms are the most commonly used to denote the various forms of psychometric properties, it is anticipated that the majority of studies would have been selected.
Lastly, for this systematic review, multidisciplinary, international consensus-based methodological quality reporting guidelines, and COSMIN were followed for rating the quality of included studies of psychometric properties. The COSMIN checklist has well developed data extraction forms with detailed instructions for completion. The 4-point rating scale classifies each assessment of a measurement property as "excellent, good, fair, or poor", based on the scores of the items in the corresponding COSMIN box. The methodological quality of a study is considered adequate if all items in a box (A to J) are considered adequate. However, frequently not all items in a box are scored adequate, and it is not feasible to provide overall definitive grade for each psychometric property; thus, no decisions can be drawn for the methodological quality of the studiesbased purely on COSMIN findings.

CONCLUSION
In this review, we evaluated the evidence for psychometric properties of 61 unique outcome measures identified to assess pain and physical functioning outcome domains in trials of NeP conditions. We have presented extensive data, which demonstrate the psychometric properties of these available outcome measures, and recommend the use of the mTCNS, the TRNDSI, the ZBPI, the NPSI, the 0 to 10 PI-NRS, the QST-pain threshold, and the NPS to detect changes in pain intensity and physical functions. We found that important information regarding the methodological quality of the majority of studies demonstrating these psychometric properties is lacking or is of poor quality. As NeP is a multidisabling condition with significant associated morbidity, usage of quality-evidenced pain and physical functional measures is a key recommendation for future research in NeP intervention studies. It appears that despite representing these measures in many studies of NeP, the methodological quality for most of the measures is not strong enough to recommend their use based on their psychometric properties. Thus, good quality future research is required to further investigate the psychometric properties of identified outcome measures used for clinical and research purposes. for Health Activity and Rehabilitation Research, School of Physiotherapy, University of Otago for his suggestions, invaluable constant assistance and helping with the constructive feedback on drafts of the manuscript. The findings of the study have been presented as a poster in the 8th Congress of the European Federation of IASP â Chapters (EFIC 2013) in Florence, Italy.

SUPPORTING INFORMATION
Additional Supporting Information may be found in the online version of this article: