Determining the Sample Size for Future Trials of Hearing Instruments for Unilaterally Deaf Adults: An Application of Network Meta-analysis.

OBJECTIVE
Previous trials have compared the efficacy of hearing instruments to no intervention in adults with single-sided deafness (SSD) or the relative efficacy of different instruments. Network meta-analysis (NMA) was used to refine estimates of effect sizes to determine required sample sizes for further trials.


DATA SOURCES
PubMed, EMBASE, MEDLINE, Cochrane, CINAHL, and DARE databases were searched with no restrictions on language, with studies to February 2015 included.


STUDY SELECTION
Studies were included that 1) assessed hearing instruments in adults with SSD; 2) compared instruments with other instruments, placebo, or no intervention; 3) measured speech perception in quiet/noise and listening ability; 4) were prospective controlled or observational studies.


DATA EXTRACTION
The following data were extracted: sample size in each group, type of intervention and comparator, type of outcomes, mean outcome scores and their 95% confidence intervals.


DATA SYNTHESIS
Random-effects meta-analysis was conducted to determine pooled effects for each outcome based on direct evidence alone. NMA used graph-theoretical method to determine pooled effects based on indirect evidence. Sample size calculations were conducted for each outcome for each class of evidence.


CONCLUSIONS
The incorporation of indirect evidence had substantial impacts on some effect sizes but negligible impacts on other effects. The most notable impacts were on self-reported listening ability and measures of speech perception in noise. Changes in effect size estimates and required sample sizes resulting from the incorporation of indirect evidence highlight areas of uncertainty where trials may be feasible to conduct.

Data Synthesis: Random-effects meta-analysis was conducted to determine pooled effects for each outcome based on direct evidence alone. NMA used graph-theoretical method to determine pooled effects based on indirect evidence. Sample size calculations were conducted for each outcome for each class of evidence. Conclusions: The incorporation of indirect evidence had substantial impacts on some effect sizes but negligible impacts on other effects. The most notable impacts were on self-reported listening ability and measures of speech perception in noise. Changes in effect size estimates and required sample sizes resulting from the incorporation of indirect evidence highlight areas of uncertainty where trials may be feasible to conduct. Key Words: Baha-Bone conduction hearing aids-Cochlear implants-Meta-analysis-Network meta-analysis-Single-sided deafness.
There have been several prospective studies of the effectiveness of hearing instruments for adults with single-sided deafness (SSD), a condition that has been associated with significant psychological and social burden (1)(2)(3). A recent meta-analysis examined the evidence for various hearing instruments including devices that re-route signals from the impaired to the nonimpaired ear via air conduction (ACD) or bone conduction (BCD), and cochlear implantation (CI) (4). Comparable outcomes were available across studies on a limited set of measures: the Speech Spatial and Qualities of hearing scale (SSQ) (5), the Abbreviated Profile of Hearing Aid Benefit (APHAB) (6), and the Hearing In Noise Test (HINT) (7).
The systematic review identified that there was a paucity of data for comparisons between certain hearing instruments ( Fig. 1). For example, three studies directly compared ACD to the unaided condition (8)(9)(10) and three studies directly compared BCD and ACD (8)(9)(10), whereas comparable outcomes for BCD versus the unaided condition were available from eight studies (8)(9)(10)(11)(12)(13)(14)(15). Few studies compared these interventions to CI. The meta-analysis (MA) of data extracted from those studies was therefore limited by the specific comparisons that had been reported in the published literature. Revised estimates of the relative effects of these different treatment alternatives could be obtained using network metaanalysis (NMA) to fully use all available evidence, both direct and indirect.
To understand NMA in lay-terms the following analogy is useful. We have three hypothetical treatments: A, B, and C. In this scenario there is a lot of data comparing A versus B and A versus C but little to none comparing B versus C (16). These data form a network from which inferences can be made on the basis of the indirect relationships formed by the data. In other words, NMA allows us to draw meaningful conclusions about the relationship between intervention B versus C even though we do not have little if any direct evidence to rely upon. It is therefore a statistical meta-analytic technique that incorporates both direct and indirect evidence (17).
We think that NMA has a role to play in reducing research waste by using both indirect and direct evidence to provide best estimates of treatment effects based on all the available evidence. The application of this metaanalytical approach allows for research effort to be targeted where there is the greatest amount of treatment uncertainty; i.e., where there are few direct comparisons and the incorporation of indirect evidence has a notable impact on the estimated size of the treatment effect. Conducting clinical trials is costly, and poorly targeted studies risk wasting scarce research budgets, moreover, errors in design can lead to an inability to draw meaningful clinical conclusions. Chalmers and Glasziou (18) have estimated that 85% of all research effort translates to no meaningful or reproducible output. This may be due to various reasons including the underreporting of studies with disappointing results, selective publication of results or inappropriate study design. A cross-sectional analysis has demonstrated that up to half of all National Institutes of Health funded trial results remain unpublished at 30 months after trial completion (19). Furthermore, Glasziou states that ''studies of published trial reports showed that the poor description of interventions meant that 40 to 89% were nonreplicable'' (20). This poor conversion from research activity to real clinical benefit to patients is of great concern to all those involved in and relying upon clinical research.
Kitterick et al.'s MA identified 30 articles of an original 778 that met the criteria to be included in their review (4). These were identified using PICOS (participants, intervention(s), comparators, outcomes, and study designs) framework (21) to set parameters that were of interest. These can be summarized as (P) patients with average PTA of 30 dB loss in the better ear and !70 dB loss in the better ear, (I) hearing instruments used in SSD, (C) hearing instruments, placebo and no intervention, (O) speech perception in quiet and in noise, sound localization, hearing-and health-related quality of life, complications and adverse events, (S) controlled trials and prospective observational studies. The studies from which data were extracted are outlined in Table 1 and show the interventions assessed and the outcome measures used. One of the main observations arising from the systematic review and MA was the lack of data on comparisons between certain interventions (i.e., BCD vs ACD) and a lack of controlled trials that had been designed prospectively to have sufficient statistical power to detect treatment effects. The authors suggested that the effect sizes from the MA could be used to inform the sample sizes of future studies (4).
NMA is an attractive prospect in this context as it allows one to use all the available evidence to obtain revised estimates of treatment effects. With better estimates of effect size come better knowledge of where the greatest uncertainty lies, and better estimates of the sample sizes required to detect such effects in the context of future prospective clinical trials. The current study subjected data from the previous meta-analysis to NMA to examine whether the incorporation of indirect evidence changed the size and direction of treatment effects. The resulting changes were also assessed to identify the outcomes and comparisons with the greatest level of uncertainty, and to determine whether the required sample sizes based on the revised effect sizes would be feasible to recruit in future clinical trials.

METHODS
The original meta-analysis synthesized data obtained using a variety of outcome measures that followed a prescribed methodology (and thus were likely to have been administered consistently) and were used across multiple studies (4). The Speech Spatial and Qualities of Hearing Scale (SSQ) measures hearing difficulties across several domains including speech perception, spatial awareness of sound, and sound qualities. It is designed to measure hearing disability across a range of scenarios including those that are affected by binaural function (5). The Abbreviated Profile of Hearing Aid Benefit (APHAB) measures listening ability across four six-item subscales: aversion to sounds, background noise, ease of communication, and reverberation (6). The Hearing In Noise Test (HINT) assesses the ability of participants to understand sentences with a degree of background noise either presented directly ahead (coincident with the speech, S 0 N 0 ) or presented toward the impaired ear (S 0 N ie ) or the nonimpaired ear (S 0 N ne ) (7). The test can also be conducted in the absence of background noise (SIQ).
A network meta-analysis of data obtained using these outcome measures was conducted in four steps. First, the raw data obtained using the measures described above and the number of patients for whom data were available were extracted from each study listed in Table 1 and organized into a spreadsheet using Microsoft Excel. Second, these data were loaded into the R statistical programming environment and effect sizes were calculated for each individual study. As all studies used before-after designs, effect sizes were computed by dividing the observed pre-post treatment change on each outcome measure by the standard deviation of that change (22) using the ''metafor'' package (23). The resulting values expressed the size of each effect in units of standard deviations. Third, the effect sizes for each outcome measure were subjected to traditional random-effects meta-analyses separately for each treatment comparison (e.g., BCD vs unaided). The analyses determined the pooled treatment effect on each outcome measure for each treatment comparison (23) and represent the metaanalysis approach used in the original systematic review (4). We refer to these pooled effects as the ''direct evidence.'' Fourth, and finally, all effect sizes for each outcome measure were subjected to a network meta-analysis to determine pooled effects based on indirect evidence using the graph-theoretical method described by Rücker (24) as implemented in the ''netmeta'' package for the R statistical programming environment (25). A simple explanation for the general approach is that the analysis determines the indirect evidence for a particular treatment comparison of interest (i.e., A vs B) based on the difference between the direct evidence for other treatment comparisons that involve one of the treatments of interest (e.g., A vs C and B vs C). The general approach to determining the ''indirect evidence'' can be expressed mathematically in the following form adapted from Cipriani et al. (26): AB indirect ¼ ACdirect À BC direct . We refer to the treatment effects produced by the network meta-analyses as the ''network evidence'' as they combine both direct and indirect evidence.
For each treatment comparison, the direct and indirect evidence and the ''network evidence'' (the result of synthesizing both direct and indirect evidence) are reported in terms of the mean effects and their 95% confidence intervals (27). Given the complexity of the network-based approach to determining treatment estimates based on direct and indirect evidence, metrics and tests have been proposed to aid interpretation of the resulting estimates of treatment effect. We report the proportion of direct evidence that contributes to the network evidence and a statistical test to compare the direct and indirect evidence to assess whether the assumption of consistency was violated (28). The pooled effects resulting from the use of direct and indirect evidence were also compared by noting whether the direction of the effect had changed and the whether the size of the effect had changed. Effect sizes were categorized   A sample size calculation was conducted for treatment effects based on direct and network evidence using G Ã Power (30), a free to use cross-platform statistical tool that is available as a download for Windows and Macintosh operating systems from the Hienrich Heine University, Dusseldorf (31). The sample size calculation determined the number of participants required to detect a given effect size with 80% power (probability of a false-negative of 0.2) and an alpha of 5% (probability of a false-positive of 0.05). The calculations were based on the assumption that future trials would know the expected direction of the effect (beneficial or harmful) and would be powered to detect changes in mean outcome scores between intervention and a control/comparator groups. Therefore, the sample size estimates were based on a one-tailed independent-samples t test with an allocation ratio of 1:1 to the two groups.

RESULTS
Tables 2 and 3 list the estimates of effect size for comparisons between the unaided condition, ACD, BCD, and CI for the self-reported outcomes (APHAB, SSQ) and speech perception outcomes (HINT), respectively. Inconsistency between direct and indirect evidence was not identified for the self-reported outcomes but was identified for the S 0 N 0 and S 0 N NE conditions of the HINT (Z and p values in Table 3). The size of the change resulting from the incorporation of indirect evidence   varied from negligible (0.01 standard deviations, SD) to notable (0.38 SD). In one case the incorporation of indirect evidence altered the direction of the mean effect of ACD from being detrimental to listening ability to being beneficial (ACD vs Unaided; SSQ). However, in all cases the 95% confidence intervals of the effect sizes estimated from direct and network evidence overlapped, with the incorporation of indirect evidence widening confidence intervals around the treatment effects. Table 3 reports sample size calculations performed using effect sizes based on direct and network evidence for comparisons between ACD, BCD, and the unaided condition. The incorporation of indirect evidence reduced the required sample size to detect changes in SSQ scores when comparing CI to unaided reduced from 36 to 26. However, the inclusion of indirect evidence increased the sample size required to detect changes in SSQ scores when comparing CI to ACD (46-48) and CI to BCD .

DISCUSSION
Network meta-analysis is a useful adjunct to standard meta-analytical techniques in cases where there are multiple treatment options for a condition and few studies that directly compare certain pairs of interventions. It is a technique that is yet to be widely adopted in the otological sciences; for example, at the time of publication the only example in the field indexed on PubMed is a protocol for a NMA in sudden sensorineural hearing loss (32). The application of NMA in the context of hearing instruments for adults with SSD resulted in some notable changes in terms of both the direction and size of treatment effects. For example, when using SSQ to measure listening abilities with ACD compared to the unaided condition, the incorporation of indirect evidence revised the mean treatment effect on listening ability from being a small detrimental effect to a medium beneficial effect. Such cases highlight areas where there is considerable uncertainty over treatment effects.
Differences in treatment effects based on direct and network evidence could arise due to a variety of factors. There may be an imbalance in the quantity of direct and indirect evidence. For example, the effect size associated with the difference in SSQ scores for CI versus BCD decreased substantially from 0.79 to 0.35, a 56% reduction, once the indirect evidence was considered and the network evidence had the lowest proportion of direct evidence (67%) across all the comparisons examined in the current study. In that case, direct evidence was available from only one trial that reported a large positive treatment effect (8) whereas two studies reported comparisons of BCD with unaided condition using that outcome measure. Differences in study methodology or population could also have resulted in varying effect sizes across these studies. For example, the study that compared BCD to CI provided boneconduction devices on a softband/tension clamp whereas all of the studies comparing BCD with the unaided condition used osseo-integrated implants. Osseo-integrated (percutaneous) implants are more effective at transducing high frequencies than transcutaneous devices such as softband-mounted devices (33). These factors and other differences in study designs, such as how the treatments were delivered and the duration of follow-up, could account for the significant inconsistency between direct and indirect evidence noted for some of the treatment comparisons.
The substantial reduction in the estimated size of treatment effect of CI versus BCD with and without indirect evidence increased the required sample size by a factor of 5 . A similar implication arose when comparing ACD to the unaided condition using the HINT sentence test in a frequently-used testing configuration for patients with SSD; i.e., speech from in front and noise toward nonimpaired (good) ear (S 0 N NE ). The sample size increased from 50 patients to 150 when indirect evidence was considered. For certain outcome measures, the numbers needed to power studies adequately became infeasible if one considered the indirect evidence (e.g., using APHAB to measure outcomes in ACD vs unaided) or conversely were reduced to potentially-feasible levels (e.g., using SSQ to measure outcomes in ACD vs unaided). These examples illustrate how NMA could prevent an underpowered trial being conducted or avoid unnecessary burden by over-recruitment, and in doing so prevent wastage of scarce research resources.
When discussing sample size calculations for future clinical trials, a distinction must be made between an observed difference reported in a published study, such as those incorporated into the current meta-analyses, and a clinically-important difference. In areas where researchers are unsure of a treatment effect it is expedient to determine the minimal clinically important difference (MCID) in the primary outcome of interest (34). The MCID can be defined as the minimum change in outcome that is deemed clinically-significant. For example, an increase in the SSQ score of a few points may be statistically significant following an intervention but may or may not give a patient a clinically-important (perceptible) benefit. It is therefore relevant not only to consider what effect sizes may be the subject of uncertainty (as indicated by large changes in Tables 2 and 3) and whether it is feasible to conduct a trial based on the required sample size (Table 4), but also whether the estimated sample size is likely to be meaningful to the clinician and patient alike.
Integral to the challenge of meta-analysis is the difficulty of comparing differing methodologies and outcome measures and synthesizing this into a meaningful discourse about the benefits of interventions. Comparison between trials can be facilitated by the development of Core Outcome Sets (COS) that offer the prospect of a uniform way of measuring interventions in the context of clinical trials (35). They can also inform the choice of primary outcome for future trial design by identifying outcomes that are important to patients. By adopting a COS it will be possible to directly compare interventions e346 A. C. GAUNT AND P. T. KITTERICK trials. There is currently a COS in development for adults with SSD (36,37).

LIMITATIONS
The prospect of being able to use all available data to provide new evidence of treatment effectiveness and therefore inform clinical trial design and clinical decision making is undoubtedly attractive. While NMA is able to adjust for bias when used in conjunction with conventional direct comparison techniques (38), as with any statistical procedure there are limitations to the NMA technique. To carry out NMA, as with traditional MA, assumptions have to made to allow the grouping and comparison of studies that include but are not limited to: 1) the study populations are likely to respond in comparable ways to the treatments under consideration; 2) the interventions are delivered in a similar way; and 3) the study designs are broadly similar-this is the concept of ''transivity'' (26,39). An example of this is that when we compare populations that have received a CI to those that have received ACD or BCD. There are likely to be subtle variations between these groups, including differences in the characteristics of those eligible for implantation and fit for a surgical procedure versus those unable, ineligible, or unwilling to receive a cochlear implant. In addition, data were only available from a small number of studies with small sample sizes, restricting the evidence upon which any inferences can be made about treatment effects, whether based on direct or network evidence.
Indirect evidence such as that provided by NMA is not afforded the same status as direct evidence found in head to head comparisons, in part due to the fact that the application of this technique is still an emerging field (26). Donegan et al.'s review of reporting and methodological quality in indirect analyses drew attention to the fact that the ''underlying assumptions are not routinely explored or reported when undertaking indirect comparisons'' (40). Chou also cites the limitation of indirect comparisons when comparing ''complex and rapidly evolving interventions'' (41). However, there is a move toward placing additional weight on indirect analyses as since 2015 10% of Cochrane reviews have used NMA (42), with some calling for a re-evaluation of the evidential status accorded to NMA (42). This technique is therefore illustrated here as an adjunct to conventional head to head comparisons that may be useful in situations where some treatment comparisons are under-represented in the published literature.

CONCLUSION
The application of network meta-analysis to extend existing analyses published alongside systematic reviews or to supplement the conduct of future reviews can aid the design of future trials of interventions for hearing-related interventions. The current results suggest that there is considerable uncertainty surrounding some published estimates of treatment effects associated with hearing instruments for adults with SSD. These results, together with further research to establish MCIDs and ongoing work to define a COS for SSD, will help ensure that future trials are targeted to reduce known uncertainty around treatment alternatives and make effective use of limited research resources.