Development of the Social Participation Restrictions Questionnaire (SPaRQ) through consultation with adults with hearing loss, researchers, and clinicians: a content evaluation study.

Abstract Objective: This research aimed to evaluate the content of the Social Participation Restrictions Questionnaire (SPaRQ) in terms of its relevance, clarity, comprehensiveness, acceptability to adults with hearing loss, and responsiveness. Design: Cognitive interviews and a subject matter expert survey were conducted. The interview data were analysed using thematic analysis and a taxonomy of questionnaire clarity problems. Descriptive statistics were calculated for the survey data. Study sample: Fourteen adults with hearing loss participated in the cognitive interviews. Twenty clinicians and academics completed the subject matter expert survey. Results: The majority of the SPaRQ content was found to be relevant, clear, comprehensive, and acceptable. However, an important clarity problem was identified: many adults with hearing loss struggled to switch from answering positively worded items (e.g. “I can attend social gatherings”) to answering negatively-worded items (e.g. “I feel isolated”). Several subject matter experts found responsiveness difficult to assess. The SPaRQ was amended where necessary. Conclusion: Few hearing-specific questionnaires have undergone content evaluation. This study highlights the value of content evaluation as a means of identifying important flaws and improving the quality of a measure. The next stage of this research is a psychometric evaluation of the measure.


Introduction
Participation restrictions have been defined as the difficulties an individual experiences with involvement in life situations (World Health Organization 2001). These situations include family relationships, friendships, recreation, community life, education, and employment (Danermark et al. 2013). Numerous studies have demonstrated that participation restrictions are one of the major negative consequences of hearing loss Vas et al. 2017). Therefore, one of the main aims of auditory rehabilitation is to reduce participation restrictions in individuals with hearing loss (Boothroyd 2007;Ferguson et al. 2017;Ferguson et al. in press).
In order to evaluate the impact of auditory rehabilitation on participation restrictions, it is necessary to have a valid, hearingspecific outcome measure for this construct. However, participation restrictions are recognised as being one of the most difficult constructs to measure (Salter et al. 2005;Whiteneck and Dijkers 2009). Much of this difficulty stems from the broad and inconsistent conceptualisation of the construct (Heinemann et al. 2010). The World Health Organization's (2001) definition of participation restrictions does not readily lend itself to measurement, as "life situations" could refer to practically any situation between birth and death (Dijkers 2010). There is little agreement in the literature concerning the domains (e.g. communication, community life) that should be included in a participation restrictions measure. Furthermore, it has proven difficult to distinguish participation restrictions from similar constructs, such as quality of life, activity, and social support (Whiteneck and Dijkers 2009;Eyssen et al. 2011).
It is clear that, in order to develop a valid measure of participation restrictions, it is first necessary to develop a strong conceptual foundation for that measure. This can be achieved by following best practice recommendations from the questionnaire development literature (e.g. Brod et al. 2009;Mokkink et al. 2012;Reeve et al. 2013). Specifically, it is recommended that questionnaire developers conceptualise the target construct through an in-depth literature review and qualitative research with key stakeholders (e.g. patients and clinicians). The findings are used to generate a clear definition and conceptual model of the target construct. The conceptual model comprises domains and subdomains that serve as the basis of the subscales and items of the measure. Ideally, the words and phrases used by the patients should be incorporated in the items (Haynes et al. 1995;Rattray and Jones 2007;Brod et al. 2009).
Once a prototype of the measure has been created, it is recommended that it is thoroughly reviewed by key stakeholders to identify and rectify any flaws in its design (McGartland Rubio et al. 2003;Brod et al. 2009). For instance, the measure could omit important content or include unimportant content, which degrades the quality of the data collected via that measure, as well as the quality of the clinical interferences drawn from those data (Haynes et al. 1995). Furthermore, aspects of the questionnaire could be difficult to understand, especially abstract expressions and technical terms. Such problems are difficult to detect without stakeholder feedback, as some respondents answer items that they do not understand out of a sense of politeness or duty, whilst others answer items "mindlessly" without realising that they have misunderstood them (Collins 2003). Additionally, respondents could differ in their interpretations of the questionnaire. For example, respondents can have different interpretations of seemingly unambiguous terms such as "Always" and "Never" (Aronson and Ferner 2006). Finally, the questionnaire could include terms that are offensive or off-putting to particular cultural groups (Boynton et al. 2004). Therefore, stakeholder feedback is vital to ensuring that valuable resources are not wasted by utilising a questionnaire that is inherently flawed in a large quantitative study (McGartland Rubio et al. 2003). Despite this, qualitative research with key stakeholders has seldom been used in the development of hearing-specific questionnaires.
This research set out to develop a high-quality, hearing-specific measure of participation restrictions in accordance with best practice recommendations (Brod et al. 2009;Mokkink et al. 2012). This questionnaire, entitled the Social Participation Restrictions Questionnaire (SPaRQ), was specifically designed to be a standardised, self-administered, patient-reported outcome measure (PROM) for use in research and practice with adults with hearing loss. The development of the initial prototype of the SPaRQ began with the conceptualisation of the target construct: hearing-related participation restrictions. This was achieved by reviewing: 1. The findings of semi-structured interviews with adults with hearing loss, researchers, and clinicians (Heffernan et al. 2016). 2. Extant PROMs that were identified in two published systematic reviews (Seekins et al. 2012;Granberg et al. 2014). 3. The International Classification of Functioning, Disability, and Health (ICF) Core Sets for Hearing Loss (Danermark et al. 2013).
Subsequently, hearing-related participation restrictions were defined as the difficulties an individual with hearing loss experiences with authentic involvement in social situations. The term "authentic involvement" was used because adults with hearing loss can appear to participate in social situations without being truly engaged, such as by pretending to follow a conversation (Heffernan et al. 2016). The term "social situations" was used because the conceptualisation process demonstrated that hearingrelated participation restrictions primarily occur in the social arena. Furthermore, "social situations" is more precise and measurable than "life situations." A conceptual model of hearing-related participation restrictions was also developed, which contained three domains: 1. Behaviour: problems with performing actions in a social context due to hearing loss (e.g. difficulty with group discussions). 2. Emotion: negative feelings experienced in a social context due to hearing loss (e.g. feeling isolated at get-togethers). 3. Identity: negative social attributes perceived as stemming from hearing loss (e.g. being seen as unfriendly).
Each domain contained a range of subdomains (Supplementary material 1). Forty-nine items were generated to represent these subdomains, using the words and phrases of patients where possible. This included 26 behaviour items, 15 emotion items, and 11 identity items. The number of items associated with a domain was in proportion to the relevance of that domain to the target construct (Clark and Watson 1995). The behaviour items were positively worded, whereas the emotion and identity items were negatively worded (see Figure 1). The behaviour items were accompanied by an 11-point self-efficacy response scale, whilst the emotion and identity items were accompanied by an 11-point agree/disagree response scale (Rattray and Jones 2007;Sheer 2014). This first iteration of the questionnaire was entitled the SPaRQ-49.
Once the SPaRQ-49 had been created, it was important to thoroughly evaluate its content in order to identify and rectify any flaws that could diminish its quality. Therefore, the first aim of this study was to evaluate the content of the SPaRQ-49 in terms of the following criteria: 1. Relevance: representative of hearing-related participation restrictions. 2. Clarity: easy to understand and interpreted as the questionnaire developers intended. 3. Comprehensiveness: captures all of the important aspects of hearing-related participation restrictions. 4. Acceptability: inoffensive and not in any way intrusive to adults with hearing loss. 5. Responsiveness: sensitive to clinically relevant changes in hearing-related participation restrictions.
The second aim was to improve the content of the SPaRQ-49 by making any necessary amendments, such as introducing new items.

Design
Content evaluation (i.e. pre-testing or content validation) is an essential component of PROM development (Brod et al. 2009). It facilitates the assessment of two important measurement properties: (1) content validity, or the relevance and comprehensiveness of the content of the PROM and (2) respondent burden, or the degree to which the PROM poses a challenge for respondents in terms of length, complexity, and literacy demands (Reeve et al. 2013). Content evaluation typically involves key stakeholders appraising every element of a questionnaire (e.g. items, response scale) against specific criteria (e.g. relevance, clarity) (Haynes et al. 1995;Brod et al. 2009;Reeve et al. 2013). The PROM can then be amended before it undergoes psychometric evaluation.
In this study, two prominent content evaluation techniques were used. Firstly, adults with hearing loss (AHLs) participated in cognitive interviews. These are individual, semi-structured interviews that uncover respondents' thought processes when completing a questionnaire. For example, they reveal how respondents interpret the wording of the items or how they decide which response category to select (Conrad and Blair 1996;Drennan 2003). Cognitive interviews can be retrospective (i.e. conducted immediately after respondents have completed the questionnaire) or concurrent (i.e. conducted whilst respondents are completing the questionnaire) (Drennan 2003). Retrospective interviews were used because they examine whether respondents can follow the instructions and successfully complete a selfadministered questionnaire (Willis 2004). Second, a panel of subject matter experts (SMEs), who had relevant clinical or academic qualifications and experience, completed a survey in which they evaluated the relevance, clarity, comprehensiveness, and responsiveness of the SPaRQ-49 (Haynes et al. 1995;Grant and Davis 1997;McGartland Rubio et al. 2003).

Adults with hearing loss
The inclusion criteria were self-reported: (1) hearing loss, (2) aged 18 years or older, (3) good written and spoken English language ability, and (4) normal or corrected-to-normal vision. The exclusion criteria were self-reported: (1) cognitive decline or dementia that would necessitate assistance in completing a questionnaire and (2) profound hearing loss.
A convenience sampling strategy was used (Patton 1990). Potential participants were sought from the NIHR Nottingham Biomedical Research Centre (BRC) participant database. In total, 22 potential participants were contacted via post, of whom 14 participated in the study (see Table 1). Recruitment ceased when the research team determined that data saturation had been reached. This was the point at which no new themes or problems with the SPaRQ-49 were identified through an examination of field notes and preliminary data analysis (Leidy and Vernon 2008). The majority of AHLs had gradual-onset hearing loss and all owned hearing aids. Two individuals provided reasons for not participating, which were work commitments and health problems.

Subject matter experts
The inclusion criteria for the SMEs were identical to inclusion criteria 2-4 for the AHLs. A purposeful sampling strategy was used (Patton 1990;Grant and Davis 1997). Specifically, clinicians and academics who had expertise in adult aural rehabilitation and/or outcome measurement, as demonstrated by their academic qualifications, clinical qualifications, or publication history, were recruited from the professional network of the research team. It is recommended that SME panels have approximately 6-20 participants (McGartland Rubio et al. 2003). In this study, 29 potential participants were contacted via email, of whom 20 participated in the study (see Table 2).

Pilot testing
A pilot cognitive interview was conducted by the lead author (EH) with a NIHR Nottingham BRC Patient and Public Involvement (PPI) representative who had hearing loss. He made several valuable suggestions regarding study design. In particular, he advised that certain cognitive interview techniques (e.g. open-ended questions) would be less artificial and intrusive to AHLs than others (e.g. observation, thinking-aloud). Two NIHR Nottingham BRC researchers, who were not involved in the study, completed a pilot SME survey and suggested some minor alterations.

Cognitive interviews
Each participant attended the NIHR Nottingham BRC for their study session, which lasted approximately 2 h. Written informed consent was obtained prior to the start of each session. The participants self-administered the SPaRQ-49, which took approximately 30 min. They were then interviewed by EH, who had formal training in and experience of interviewing, including interviewing AHLs (see Pearson et al. 2012;Heffernan et al. 2016). The interview schedule was flexible, yet its core content remained the same across each interview (Supplementary material 2). The interviews lasted 45 min on average and were audio-recorded and transcribed verbatim. The participants then completed a demographics questionnaire and the Davis et al. (2007) Strand 2 Screening Questionnaire for hearing loss. The participants were offered an honorarium of £10GBP and their travel expenses were reimbursed.

Subject matter expert survey
The SMEs completed an online survey, which took approximately 1 h and 30 min. They answered a series of closed-ended and openended questions in which they evaluated the proposed factor structure, response scales, comprehensiveness, and responsiveness of the SPaRQ-49. They also rated the relevance and clarity of each SPaRQ-49 item using the following scale: 1 ¼ "Does not fulfil criterion," 2 ¼ "Major revisions needed," 3 ¼ "Minor revisions needed," 4 ¼ "Fulfils criterion" (Haynes et al. 1995;Grant and Davis 1997;McGartland Rubio et al. 2003). There was a mixture of optional and mandatory questions. Completion of the survey served as informed consent. The SMEs were offered an honorarium of £10GBP.

Data analysis
IBM SPSS Statistics (SPSS Inc., Chicago, IL) for Windows Version 22.0 and QSR International's NVivo 10 Software were used to organise and analyse the data. Anonymised identification codes were assigned to each AHL (e.g. AHL1) and SME (e.g. SME1).
For the SME survey data, descriptive statistics and frequencies were calculated for the closed-ended questions. The written comments from the open-text boxes were summarised and reported. The cognitive interview data were analysed by EH using Braun and Clarke's (2006) thematic analysis procedure. The analysis was deductive, as the themes (i.e. relevance, clarity, comprehensiveness, and acceptability) were derived from the content evaluation literature, where it has been recommended that these criteria be examined (Brod et al. 2009;Mokkink et al. 2012). Deductive (i.e. theoretical) thematic analysis is a "top-down" approach that is based on a pre-existing framework or the researcher's analytical interests. This contrasts with inductive thematic analysis, which is a "bottom-up," data-driven process. The deductive approach was selected because it is suited to answering a specific research question, whereas the inductive approach is suited to exploring the data to develop further research questions (Braun and Clarke 2006).
Within the relevance, comprehensiveness, and acceptability themes, the data were coded inductively. Within the clarity theme, there were inductive and deductive codes. The deductive codes came from Conrad and Blair's (1996) taxonomy of problems encountered by questionnaire respondents. According to the taxonomy, there are three stages of responding to an item: 1. Understanding: deciding what information is being requested and recognising how this information should be provided. 2. Performance: producing the information needed to respond through mental operations (e.g. computation, evaluation). 3. Response formatting: mapping the information produced in the performance stage onto the response scale.
The taxonomy also lists several types of problems that can occur in each of the three response stages: 1. Lexical: problems with knowing the meanings of words. 2. Inclusion/exclusion: problems with deciding whether or not particular concepts are within the scope of the item. 3. Logical: problems with negation, repetition, complementarity, contradictions, and tautologies. 4. Computational: information processing problems that do not fall into one of the other problem categories, such as complicated syntax or mental arithmetic. 5. Temporal: problems relating to the time period or frequencies specified in the questions. This problem type was not applicable to the SPaRQ-49.
Therefore, in terms of coding, if a participant did not recognise a medical term used in an item, this would be coded as a "lexical-understanding problem." To enhance the rigour of this analysis, a peer assessment was completed (Yardley 2008). Specifically, EH and a second researcher, who was not otherwise involved in the study, independently applied the taxonomy to seven interview extracts that had proven challenging to code. For example, it was difficult to determine whether certain extracts described a problem in the "performance" or "response formatting" stage. EH and the second researcher then met to compare their coding. In the majority of cases, their coding matched. Any discrepancies were discussed and an agreement was made regarding which codes should be applied. In addition, the preliminary results were discussed with the research team to ensure that the analysis was not limited to the viewpoint or preconceptions of EH.

Amendments
Amendments were made to the questionnaire based on the results of the data analysis. Specifically, aspects of the questionnaire that were identified as problematic by two or more AHLs in the cognitive interviews were reviewed (Brod et al. 2009). In addition, aspects of the questionnaire that received less than perfect ratings or comments in the SME survey were reviewed. Subsequently, three PPI representatives who had hearing loss completed and provided feedback on the revised questionnaire. This process helped to ensure that the amendments were effective.

Relevance
The cognitive interviews showed that the AHLs felt that the majority of behaviour items were representative of their experiences. A small number of items about employment, volunteering, community activities, and interacting with a significant other were irrelevant to some AHLs because these situations did not arise in their daily lives, irrespective of their hearing loss. For example, AHL14 (man, aged 69) reported that the questions were representative of his hearing difficulties, with the exception of one that asked about participation in training courses: "I can answer all those questions … because it's asking something I know about … I don't do any … courses … so I put zero on that [question], but the rest of themit's very good … it's a very good questionnaire." Some AHLs saw the emotion and identity items as being highly relevant. For example, AHL8 (woman, aged 62) said of the identity items: They're very relevant because … that's a different level of your hearing loss, isn't it? A different effect that it has is about … how people perceive you … You do get treated as though you're not quite on the planet … at times … so I think that's a very relevant part of the questionnaire.
Contrastingly, some AHLs stated that the emotion and identity items were less relevant than the behaviour items. They explained that hearing loss did not lead them to feel particularly emotional or insecure due to their personality, particularly their self-confidence or sense of humour. AHL10 (man, aged 80) said: I found it difficult to evaluate myself [in the emotion section] … I don't mind being left out of a conversation … I just … sit through it and then move onto the next one … this comes back to your personality, doesn't it? … I don't feel stressed … I don't feel upset … It's just one of those things.
The results of the SME survey supported the interview findings, with the majority of the SPaRQ-49 items obtaining median and modal relevance ratings of 4 (Supplementary material 3). The mean relevance ratings for the individual items ranged from 3.35 to 4. The SMEs also evaluated the proposed factor structure of the SPaRQ-49. The majority (n ¼ 13) agreed that hearingrelated participation restrictions consist of the domains of behaviour, emotion, and identity. However, some SMEs (n ¼ 7) disagreed. In the written comments, some SMEs reported that the behaviour dimension contained some items that represented activity limitations, rather than participation restrictions. Two SMEs stated that the identity domain was the least relevant. SME4 (Head of Adult Audiology Service) said: [I] don't think identity is as significant as behaviour and emotion … people rarely report the impact of identity … [Patient] needs … mainly focus on behaviour but, with appropriate discussion, are often associated with [the] emotional dimension.

Clarity
The cognitive interviews showed that the AHLs found the majority of items to be easy to understand. Nevertheless, some clarity problems were uncovered through the application of Conrad and Blair's (1996) taxonomy to the data. The most substantial of these was a computational-response formatting problem. Specifically, the majority of AHLs struggled to switch from using the self-efficacy response scale accompanying the positively worded behaviour items to using the agree/disagree response scale accompanying the negatively worded emotion and identity items. Some AHLs did not observe that the response scale had changed and assumed that the self-efficacy scale was present throughout the entire questionnaire. Other AHLs did observe that the response scale had changed but did not know how to adjust their responses accordingly. AHL8 (woman, aged 62) said: "the marking changed … and that made me … stop and think … I couldn't really understand why it had … swapped over." Similarly, AHL2 (man, aged 78) said: "When you swapped over … I think it probably did trip me up … I thought … couldn't you keep [the response scale] the same … all the way through the questionnaire?" Consequently, the AHLs often selected a response that did not accurately represent their views, such as mistakenly selecting "Completely agree" instead of "Completely disagree." A second computational-response formatting problem was identified. Specifically, a small number of AHLs reported that they would have completed the questionnaire more quickly and easily if the 11-point response scale had fewer (i.e. 5 or 6) options. AHL12 (man, aged 64) said: "Too many options. It's like having a big menu: you can't make a decision." In contrast, several AHLs stated that they had no difficulty with the 11-point response scales. Also, an examination of their responses showed that all of the AHLs used a range of response options, rather than using only the extreme ends or the middle of the scale.
Several AHLs experienced an inclusion/exclusion-performance problem, which was that they found it difficult to determine the scope of certain items, such as determining whether they referred to noisy environments, quiet environments, or both. However, other AHLs did not experience this problem. Instead they selected answers that represented their typical experience across both noisy and quiet environments. A final example of a clarity problem uncovered through the taxonomy was a logical-performance problem, whereby some AHLs perceived that certain items were repetitive, rather than truly distinct from one another. For example, AHL3 (woman, aged 62) felt that several emotion items were repetitive and recommended merging the items that asked about isolation and loneliness. In contrast, AHL8 believed that the isolation and loneliness items were distinct: "I think it's … really important … because … there's a difference … being isolated is almost like [being] on an island watching, whereas [being] lonely is a very personal … sadness or … aloneness … And I … have actually answered them differently." The interview findings were supported by the SME survey. The majority of the SPaRQ-49 items had median and modal clarity ratings of 4. The mean clarity ratings for the individual items ranged from 2.9 to 4. In their written comments, some SMEs recommended providing more contextual information in certain items, such as clarifying whether they referred to noisy or quiet environments.
In terms of the response scales, the majority of SMEs (n ¼ 11) stated that the self-efficacy response scale did not need to be changed, though several (n ¼ 7) reported that change was required. In addition, the majority of SMEs (n ¼ 16) stated that the agree/disagree response scale did not need to be changed, although a small number (n ¼ 2) asserted that change was required. The written comments showed that one SME thought that there were too many response options, whilst two SMEs felt that the response options at the midpoint of the scales should be labelled, rather than unlabelled.

Comprehensiveness
The AHLs regarded the SPaRQ-49 as being highly comprehensive, as it assessed their main hearing-related difficulties. AHL4 (man, aged 77) said: "basically you've got it all. I don't think you need to change anything … I really don't … I was quite surprised [by] how comprehensive it is." Nevertheless, the interviews uncovered some potentially important participation restrictions that were missing from the questionnaire, including reduced independence, difficulties with participating in lengthy conversations, and friction with communication partners.
In the SME survey, the majority of SMEs (n ¼ 13) agreed that the SPaRQ-49 was a comprehensive measure. For instance, SME15 (Hearing researcher/audiologist) said: "there are some really important questions in here, which I doubt ever get asked in the … time constraints of clinic." However, some SMEs were unsure (n ¼ 5) or disagreed (n ¼ 1) that the SPaRQ-49 was comprehensive. Some recommended introducing open-ended questions that would allow the respondents to personalise the questionnaire. Another recommended ensuring that the participation component of the ICF Core Sets for Hearing Loss had been fully captured.

Acceptability
The majority of AHLs regarded the SPaRQ-49 as appropriate and inoffensive. AHL4 said: "there's nothing personal about it … it's actually very good … It's not intrusive … or anything like that." However, two identity items, which referred to being treated as a nuisance and being perceived as rude, were flagged as being potentially off-putting. Specifically, AHL11 (woman, aged 73) said: "I don't really like … how that's phrased … it's very negative." In addition, one participant, AHL8, reported that completing the questionnaire evoked unexpected thoughts and emotions: It was very thought-provoking … it actually makes you think about how you feel about not hearing, which … most people try and avoid … it actually made me quite sad … because it … brings home just how much you miss out … but actually it's quite good because … it made me reflect … I didn't feel like I've been … ripped asunder.

Responsiveness
The SMEs were asked whether they agreed that the SPaRQ-49 would be a responsive PROM. Seven agreed and three disagreed. The majority (n ¼ 9) selected "Don't know" in response to this question. SME3 (Lecturer/Hearing therapist) wrote: "[Its] a little hard to know … some questionnaires are designed to be sensitive to change but turn out not to be! However, I do think it asks about some of the things we would like to see change as a result of [an] intervention." SME17 (Hearing researcher/audiologist) suggested that identity might be the least responsive domain: "Tapping into identity is a brave and good idea. I wonder [how] much of this would be expected to improve as a result of our current interventions." Two SMEs warned that responsiveness is somewhat contingent on the timing of follow-up assessments. As participation restrictions tend to change slowly over time, longterm follow-up assessments may be necessary.

Amendments
Several amendments were made to the questionnaire in light of the above findings. The most substantial amendment entailed revising the behaviour items so that they were negatively-worded and accompanied by the agree/disagree scale. This revision addressed the difficulties experienced by AHLs when switching from answering the behaviour items to answering the emotion and identity items. The agree/disagree scale was selected to be the sole response scale in the questionnaire because it received a higher rating in the SME survey than the self-efficacy scale and because it was applicable to all of the items, whereas the self-efficacy scale was applicable only to the behaviour items.
Several items were revised to improve their clarity. Specifically, some items were altered to ensure that they were sufficiently distinct from one another. For example, an item about watching live events (e.g. concert) was differentiated from an item about watching television. Some items that substantially overlapped with one another were merged, such as two items about community activities and volunteering. Some items were adjusted so that they included a greater degree of contextual information, such as information about the acoustic environment. One item, which concerned being perceived as rude, was removed because it was found to be both off-putting and unclear. Finally, a small number of new items were created to enhance the comprehensiveness of the questionnaire, including items about managing responsibilities, participating in lengthy conversations, and getting along with others. The resultant iteration of the questionnaire contained 53 items (i.e. SPaRQ-53).

Discussion
This study aimed to evaluate and amend the content of the SPaRQ, a new hearing-specific PROM, in order to maximise its content validity and minimise any respondent burden. The results demonstrated that the majority of the SPaRQ content was relevant, clear, comprehensive, and acceptable. This likely reflects the benefits of having used a literature review and a previous qualitative study with key stakeholders to generate this content (Brod et al. 2009). Nevertheless, a number of potential problems were identified. For example, it was necessary to re-construct several items to improve their clarity. In addition, a small number of new items were created to enhance the comprehensiveness of the measure.
The most substantial problem identified was that most AHLs struggled to switch from answering positively worded behaviour items, which were accompanied by a self-efficacy response scale, to answering negatively worded emotion and identity items, which were accompanied by an agree-disagree response scale. Many questionnaires include reverse-worded items (i.e. items that are worded in the opposite direction to the other items) as a means of preventing response biases, especially inattention and acquiescence (Rattray and Jones 2007;van Sonderen et al. 2013). However, in this study, this approach did not circumvent response biases but rather led to confusion and inaccurate responding. Other studies have similarly demonstrated that reverse-worded items fail to inhibit response biases and instead cause confusion, frustration, errors, and careless responding (Carlson et al. 2011;van Sonderen et al. 2013). Furthermore, reverse-worded items can detrimentally affect the psychometric properties of a questionnaire, particularly internal consistency and factorial validity (Woods 2006;Carlson et al. 2011). Consequently, the revised SPaRQ omitted reverse-worded items and contained a single response scale.
Another potential problem was that some participants felt that the 11-point response scales had too many response options. In particular, some AHLs felt that it would have been easier to select a response had there been five or six options. However, most participants did not object to the 11-point response scales and all of the AHLs were able to understand and use these scales. Therefore, it was concluded that whilst some participants had a preference for five or six options, this did not mean that they had a problem with 11 options. Furthermore, the literature shows that response scales with a broader range of options are associated with greater responsiveness, reliability, and validity (Alwin 1997;Cummins and Gullone 2000;Weng, 2004;Leung 2011). There is also evidence to suggest that most respondents have a discriminative capacity greater than six points, which means that valuable data can be lost by adopting a response scale that is not sufficiently fine-grained (Cummins and Gullone 2000). Consequently, the 11-point response scale was retained.
It is not uncommon for content evaluation studies to uncover potential problems that ultimately do not lead to an amendment or that cannot be amended without creating additional problems. One previous study concluded that some of the potential faults identified within a physical activity questionnaire should not be amended because these amendments were associated with drawbacks as well as benefits (Andersen et al. 2010). Indeed, the purpose of content evaluation research is not the design of a "perfect" questionnaire, but is instead the facilitation of informed decisions about questionnaire design "trade-offs" through uncovering the advantages and disadvantages of different formats (Beatty and Willis 2007).

Recommendations
This research has highlighted the importance of conducting a content evaluation study as part of developing a new PROM. This process can uncover serious problems, particularly irrelevant, unclear, or offensive content, which can reduce the amount and quality of data collected by the measure (McGartland Rubio et al. 2003;Brod et al. 2009). Despite these benefits, to date, just a small number of hearing-specific questionnaires have undergone a rigorous content evaluation (e.g. Smith et al. 2011). Therefore, it is recommended that new and existing hearing-specific questionnaires be evaluated to confirm that they have adequate content validity and minimal respondent burden and thus meet the standards required of highquality PROMs (e.g. Terwee et al. 2007).
It is important to note that whilst content evaluation studies can provide valuable data on relevance, clarity, comprehensiveness, and appropriateness, they may be less informative when it comes to responsiveness. In this study, many SMEs found it difficult to assess the responsiveness of the SPaRQ. The AHLs were not asked about responsiveness because they were unlikely to be familiar with this concept. Consequently, the responsiveness of the SPaRQ can only be assessed statistically in the later stages of its development (see Terwee et al. 2007). If the SPaRQ is found to have poor responsiveness at that stage, it may be necessary to re-develop and re-validate the measure. Future research should investigate techniques for maximising and assessing responsiveness in the early stages of developing a PROM. One strategy that has already been identified is the use of fine-grained response scales.
This research has demonstrated that it is not advisable to use reverse-scored items and/or multiple response scales in a questionnaire. Furthermore, caution should be exercised when administering a battery of questionnaires to participants, as the various response scales, instructions, and formats could cause confusion. In addition, researchers should consider the potential emotional or psychological impact of their questionnaires. In this study, one AHL found completing the SPaRQ to be an emotional, thought-provoking experience. She suggested making future respondents aware that they might have a similar experience. It is recommended that researchers take this into consideration when preparing the documents that will accompany the study questionnaires, such as by including the contact details of appropriate support services in the participant information sheet or questionnaire booklet.
Finally, this research found that some participants viewed the emotion and identity items to be highly relevant, whilst others felt that the opposite was true. Previous research has shown that hearing loss can have a considerable impact on emotions and identity Vas et al. 2017). Research should examine whether variables such as personality traits and demographic factors influence this impact. The next stage of developing the SPaRQ involves assessing the effect of demographics (e.g. age, gender) on responses to the items.

Limitations
A limitation was that the AHLs were recruited through convenience sampling, which is one of the least rigorous sampling methods (Patton 1990). Consequently, AHLs with certain characteristics (e.g. non-ownership of hearing aids) were under-represented. Furthermore, information about the educational attainment of the AHLs was not collected. Fortunately, the interviews revealed that they had a wide range of education levels and occupations. An additional limitation is that the participants were aware that EH was involved in developing the SPaRQ. Although they were encouraged to identify problems with the questionnaire, it is possible that some participants were not completely comfortable in providing negative feedback to EH. Another potential limitation was that the thematic analysis was deductive, rather than inductive, which may have increased the risk of overlooking important results that did not fit within the pre-existing themes or the researcher's analytical interests. However, even when utilising the inductive approach, it is difficult for researchers to completely suppress their pre-conceptions and analytical interests (Braun and Clarke 2006). Finally, the peer assessment in this study was somewhat restricted. Ideally, the interview data should have been fully analysed by at least two researchers who were formally trained in this type of analysis (Conrad and Blair 1996). Unfortunately, this was not feasible within the timeframe of this study.

Conclusion
This study utilised cognitive interviews with AHLs and a survey of SMEs to evaluate and revise the content of the SPaRQ: a new hearing-specific PROM. This process helped to ensure that the SPaRQ had sufficient content validity and minimal respondent burden, which are important measurement properties for any PROM (Reeve et al. 2013). To date, this approach to PROM development is rare in the field of hearing research. This study highlights the value of this approach as a means of eliminating serious flaws from a questionnaire and substantially improving its quality prior to its use in quantitative research. The next stage of this research is a quantitative study in which a modern psychometric analysis technique, namely Rasch analysis, is used to further evaluate and refine the SPaRQ. The ultimate aim of this research is to produce a high-quality measure of hearing-related participation restrictions that is suitable for use in both research and practice.