The natural history of subjective tinnitus in adults: A systematic review and meta‐analysis of no‐intervention periods in controlled trials

Tinnitus is a prevalent condition, but little has been published regarding the natural history of the condition. One technique for evaluating the long‐term progression of the disease is to examine what happens to participants in the no‐intervention control arm of a clinical trial. The aim of this study was to examine no‐intervention or waiting‐list data reported in trials, in which participants on the active arm received any form of tinnitus intervention.


INTRODUCTION
Part of the counseling provided to tinnitus patients by practitioners involves reassurance that both the perceived loudness of the tinnitus sounds and the emotional symptoms of tinnitus generally improve with time. Although this may be true, data to support the validity of this statement and to quantify any improvement in symptoms have been poorly presented in the literature. There are a small number of longitudinal studies of tinnitus, which give some support to the suggestion that tinnitus impact lessens with time. [1][2][3] However, participants in these studies could access healthcare services for their symptom, and it is therefore difficult to ascertain whether any change is natural improvement with time or treatment effect. One technique used to study what happens to symptoms over time among people receiving no treatment is to examine the outcome of participants on a no-intervention or waiting-list control arm of clinical trials, and this methodology has a long pedigree of usage in the field of mental health. [4][5][6] By amalgamating the control groups of multiple trials, metaanalysis of the outcome is viable. A limited study of what happens to patients with tinnitus while on a nointervention and waiting-list control group has previously been undertaken, 7 but this was restricted to studies that had incorporated cognitive behavior therapy as the active arm of the trial. Restricting participants to those willing to embrace psychological therapies for their tinnitus potentially produces a study population that is not representative of the wider tinnitus population. The aim of the current study was to expand that original work by looking at people with tinnitus who had been allocated to a no-intervention or waiting-list control group in the context of a trial evaluating any form of tinnitus therapy.
The following research questions were posited: 1) During a period of no intervention or waiting list, what changes occur in self-reported measure of tinnitus? 2) During a period of no intervention or waiting list, what changes occur in self-reported measures of tinnitusrelated problems of mood and quality of life? 3) During a period of no-intervention waiting, what changes occur in perceived tinnitus loudness? The first research question was our primary outcome measure and our second and third research questions were our secondary outcome measures.

Study Registration
Details of the proposed study eligibility criteria, information sources, search strategy, selection and data collection processes, as well as data synthesis methods were registered at PROSPERO, the international database of prospectively registered systematic reviews (PROSPERO 2013:CRD42013003334). Reporting of the review has been conducted using the criteria recommended by Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). 8 Presentation of the meta-analysis complies with MOOSE Guidelines for Meta-Analyses and Systematic Reviews of Observational Studies. 9

Study Selection
In the protocol registered in PROSPERO, the condition of interest was referred to as "watchful waiting." Because this term implies some degree of symptom monitoring, which was not necessarily evident in the records found, and because our study selection strategy did not necessarily seek to exclude study designs in which a group was not anticipating receiving an intervention, we refer instead to this group throughout as "no-intervention" or "waiting-list" control. Inclusion criteria were formed using the Participants, Intervention, Control, Outcomes, and Study designs (PICOS) strategy. 10 These are: Participants, adults with tinnitus; Intervention, no-intervention or waiting-list control; Comparator: any intervention for tinnitus; Outcomes, primary, one or more tinnitus-specific measure using a multi-item patient-reported questionnaire; Study design, randomized controlled trial or observational study with a control group involving no intervention.
Studies that were not available in English were also excluded as we did not have the resources to translate them. Records that had not been through a peer-review process (grey literature) were excluded as a quality-control measure.
The search was not explicitly time limited, but the first multi-item patient-reported tinnitus-related questionnaire was published in 1988. 11 Hence, no clinical trials meeting our inclusion criteria would have been published prior to this date. For the purposes of the review, adult was defined as aged 16 years or older.

Appropriate Outcome Measures
Eligible studies were those reporting at least one patientreported outcome relating to tinnitus, measured using a multiitem patient-reported tinnitus-specific questionnaire with scores that were reported both before and after the time period corresponding to the intervention for the active comparator group. Examples of acceptable measurement instruments are shown in Supporting Table SI in the online version of this article. This is not an exhaustive list, and, if encountered, other tools were considered. Outcomes that were considered as a secondary question in this review were those multi-item patient-reported questionnaires of mood and quality of life, and tools for estimating a change in tinnitus percept, namely loudness, with scores that were reported both before and after the time period corresponding to the intervention for the comparator group. Such assessments were not prerequisites for study inclusion, but where such information was available it was extracted and analyzed. In a change to the study design as registered in PROSPERO, we did not investigate the change in audiological or physiological outcome measures as secondary questions.

Appropriate Study Design
Eligible study designs were randomized controlled trials in which adult participants were allocated to a no-intervention control group receiving no support. Observational studies in which there was a no-intervention group were also eligible. Cross-over designs were included if a no-intervention period preceded an active intervention comparator and data from the preintervention period could be separately extracted.

Search Strategy
A systematic search of the literature was conducted by one of the authors (D.J.H.) to identify relevant articles from eight literature search platforms: CINAHL, PsychINFO, Embase, ASSIA, PubMed, Web of Science, Science Direct, and EBSCO Host. For each database, the search was run using the Boolean search term: tinnitus AND waiting OR wait * OR waiting-list OR watchful OR observation. For interest, a sample search strategy (generated by PubMed) in executing the search is given in Supporting Table SII in the online version of this article.
In addition, hand-searching of the reference lists of all articles returned from the search was undertaken, and articles published by shortlisted authors were screened to identify any relevant articles that may not have been returned by the initial database searches. Cochrane and other relevant systematic reviews were searched. In October 2015, hand searches were conducted of articles published in issues since April 2013 of the prespecified journals (Acta Otolaryngologica, Ear and Hearing, Hearing Research, Journal of Psychosomatic Medicine, Psychosomatic Research, International Journal of Audiology, International Tinnitus Journal, Laryngoscope, Otolaryngology Head and Neck Surgery, Otology and Neurology, and PLoS One. Finally, the data collection form associated with an independent systematic review of clinical trials of tinnitus published between July 2006 and March 2015 was searched. 12

Data Management
All identified records were saved into a Microsoft Excel (Microsoft Corp., Redmond, WA) master file where records were tracked through the screening and data collection process by a unique study identification number. A simple system of record annotation was implemented to capture reasons for exclusion. Two authors (J.S.P. and D.J.M.) independently assessed the search results to identify studies for inclusion in the review and extracted the relevant data. Any discrepancies in study selection or data extraction were resolved in discussion with a third author (D.A.H. or D.J.H.). One of the authors (J.S.P.) was the data guarantor.

Data Extraction
Data extracted included study design, participants (demographics, baseline characteristics), context of waiting (waiting list for crossover or no intervention), comparator, outcomes measures used, study findings, and conclusions. A data extraction form was developed and piloted for the purpose. Where data were missing or unclearly reported, an attempt was made to contact the relevant corresponding author of the study; the most common problem was that the results had been presented graphically and numerical data for the meta-analysis could not be extracted. Supporting Table SIII in the online version of this article provides a summary of 18 study records for which we sought clarification or additional information. Of those, only three did not reply; six did reply but were unable to provide the data requested.

Risk of Bias (Quality) Assessment
Risk of bias assessment was guided by Higgins et al. 13 and was conducted by three authors (J.S.P., D.A.H., D.J.H.) on those study records included in the meta-analysis. The following terminology was specified: 1) Selection bias refers to how participants were allocated to the intervention arms of the trial and was assessed according to two criteria, namely sequence generation for the randomization process and allocation concealment to ensure that the schedule of random assignments prevented advance knowledge about the forthcoming allocations. 2) Attrition bias refers to how participants withdrew from any trial and was assessed by identifying incomplete outcome data. 3) Detection bias refers to how the outcomes were determined and was assessed according to the blinding of participants and outcome assessors assessing patient-or clinician-reported questionnaires, respectively. In the protocol registered in PROSPERO, these three categories of risk of bias were described as 1) study design, 2) compliance and drop out, and 3) blinding. Sample size was not evaluated in this section because this is a marker of quality, not risk of bias. 14 Similarly, external validity of the study sample (i.e., specialist subgroups such as occupational setting, tertiary clinic, severe tinnitus only) was not formally evaluated.

Measures of Effect
From each study a standardized mean difference (SMD) was calculated for every included score obtained on all tinnitus questionnaires. SMD was calculated for each post-baseline time point and was defined as the difference between the group mean questionnaire score at baseline and after n weeks of nointervention waiting, divided by the pooled standard deviation. A positive SMD indicated an improvement over time. This difference was then converted to Hedges' g, 15 a commonly used measure of effect that controls for the bias in effect size that might be introduced by studies with small participant sample size. The test-retest correlation between the repeated time points was set to 90% for all questionnaires. Where multiple questionnaires were used at the same time point, a mean effect size was calculated by averaging the individual effect sizes.

Meta-analysis
Mean effect sizes across studies were calculated using Comprehensive Meta-Analysis (version 2.2.048; Biostat, Englewood, NJ). For the primary synthesis, the latest time point in each study was selected, and a random effects model was run. A random effects model assumes that the true effect may vary from study to study; here it was assumed that changes in the impact of tinnitus over time are not likely a constant effect but may be influenced by study factors such as age of participants, duration of tinnitus, education level, or general health. Sensitivity analyses were conducted pooling effect sizes per time from baseline (6 weeks, 12 weeks, 6 months). For all metaanalyses, it was reasonably assumed that the multi-item questionnaires included showed sufficient convergent validity to be pooled; tinnitus questionnaires are generally demonstrated to measure the same underlying construct of the everyday impact of tinnitus.

Missing Data
Data queries were satisfactorily answered for eight study records and partly answered for one other (see Supporting Table SIII in the online version of this article).

Data Synthesis
The period spent on the no-intervention or waitinglist period varied from 1 to 52 weeks, with an average of 12 weeks. Information about the individual percentage and effect size of change in tinnitus severity, as measured by tinnitus questionnaire score is provided in Table II. Two studies (Fackrell et al., 2016;Krick et al., 2015) were excluded from the meta-analysis because the interval between assessments for most or all patients was as little as 7 days. 20, 30 Caffier et al. (2006) was excluded, as numerical data were not sufficiently available. 19 Jakes et al. (1992) 27 was excluded from the meta-analysis, as their tinnitus outcome questionnaire was the Tinnitus Effects Questionnaire, 11 which does not yield a global score.
Across the remaining studies, over the longest period reported, there was a small decrease in global tinnitus of 2.3%, indicating a trend for improvement over time. How clinically meaningful that is cannot be interpreted; although it was assumed that tinnitus questionnaires measure the same construct of the everyday impact of tinnitus, clinically meaningful change scores on those questionnaires differ. Strikingly, no study demonstrated statistically significant worsening of tinnitus over time.
There was, however, considerable heterogeneity across studies. Reports of changes in depression, anxiety, quality of life, and tinnitus loudness were few and not significant.

Risk of Bias Assessment
A summary of the risk of bias of the 21 study records that were included in the meta-analysis is shown in Table III. Low risk of bias was achieved on 51% of occasions. Six studies had a high or unclear risk of bias on two of the criteria, 22,23,34,37,38 whereas one study had a high or unclear risk of bias on all three criteria. 36 Detection bias was the most poorly reported. Support for judgement concerning selection bias, attrition bias, and detection bias is provided in Supporting Tables IV-VI in the online version of this article, respectively.

Effects Over Time on Global Tinnitus
Twenty-three study groups (788 participants) in 21 study records reported changes in tinnitus over time. Effect sizes (Hedges' g) for the maximum interval within studies ranged 20.17 to 0.55. The primary metaanalysis pooled data across studies using the longest timeframe reported in each study record, irrespective of the absolute length of time (23 study groups, M 5 12 weeks, range 5 4-52 weeks). There was significant heterogeneity across studies (Q[df 5 22] 5 112.97, P < .001, I 2 5 80.53). In a random effects model, the mean effect size was statistically significant in favor of tinnitus improving (Hedges' g 5 0.122, 95% confidence interval [CI]: 0.055 to 0.188, P < .001) (Fig. 2).

Effects Over Time on Depression
Eight studies (301 participants) reported changes in depression questionnaire scores, using the Hospital Anxiety Depression Scale-Depression 41 or Beck Depression Index (BDI) 42 questionnaires over intervals ranging 6 to 26 weeks (M 5 16.2). Henry et al. (1998) 22 measured the BDI score at three time intervals (baseline, 26 weeks, and 52 weeks later). Hedges' g across the eight studies ranged from 0.469 to 0.182, with one study favoring a worsening and two studies favoring an improvement in scores over time (Fig. 3). The pooled effect size across all studies (using the 26-week measure from Henry et al., 1996 21 ) was positive but not significant (Hedges' g 5 0.006, 95% CI: 20.045 to 0.057, P 5 .828) indicating no significant change in depression over time.

Effects Over Time on Generalized Anxiety
Five studies (161 participants) reported changes in anxiety questionnaire scores, using the Hospital Anxiety Depression Scale-Anxiety (HADS-A) (Zigmond and Snaith 1983) over intervals ranging from 6 to 12 weeks (M 5 8). Hedges' g across studies ranged from 0.089 to 0.206, with one study favoring an improvement in scores over time (Fig. 4). The pooled effect size across all studies was positive but not significant (Hedges' g 5 0.058, 95% CI: 20.012 to 0.127, P 5 .104), indicating no significant change in anxiety over time. Andersson et al.  Negative percentage change indicates decreased questionnaire score (tinnitus improves); positive percentage change indicates increased questionnaire score (tinnitus worse).
(2002) 16 additionally measured "fear of anxiety-related somatic sensation" using the Anxiety Sensitivity Index, 43 noting a slight improvement over time; the mean score was reduced from 19.1 (standard deviation [SD] 5 12.7) at baseline to 17.8 (SD 5 12.1) at 6 weeks. In contrast, Andersson et al. (2005) 17 report an increase in Anxiety Sensitivity Index score in their waiting-list control group after about 6 weeks; the mean score increased from 18.9 (610.0) to 26.3 (610.5).

Effects Over Time on Tinnitus Loudness
In six studies, tinnitus loudness was measured using a visual analogue scale. 16,17,21,27,28,34 However, Jakes et al. (1992) 27 reported abandoning the measure during the study for several reasons including poor compliance, and Andersson et al. (2005) 17 did not report numerical values. Of the four remaining, three used a 0 Selection bias includes sequence generation for the randomization process and allocation concealment. Bias was judged for those study records included in the meta-analysis, and a full description of the review authors' judgements about risk of bias for each included study is given in the Supporting Information, Tables SIV-SVI. Fig. 2. Meta-analysis of change in self-reported tinnitus severity over longest interval in individual studies indicating an improvement over time. Black square 5 effect size (Hedges' g) in that study. Black diamond 5 pooled effect size. The relative sample size and hence relative influence of individual studies on the pooled effect size is indicated by the size of the black square (i.e., the study by Ross et al. [2007] 35 has the greatest influence on the pooled result, followed by Henry [2007] 23 and Henry [2016] 24 ). CI 5 confidence interval. to 10 scale and reported a 0.8-point decrease, 34 no change, 16 and a 0.1-point increase 28 in scores, respectively, after 6 to 8 weeks watchful waiting. One study used a 0 to 4 scale and reported a reduction of <0.1 after 4 weeks. 21 Although single-item measures of tinnitus show good correlation with each other, they do not measure meaningful tinnitus-related constructs, so these data were not subjected to meta-analysis.

DISCUSSION
This systematic review with a meta-analysis presents the most inclusive evaluation of the natural time course of tinnitus under controlled experimental conditions to date. The random effects meta-analysis gives a reliable overall summary of findings, because the analysis accounts for heterogeneity and is weighted by sample size. This revealed a small but significant improvement in global tinnitus severity up to 4 months, but studies with longer assessment periods did not reveal any change. This finding may reflect a lack of statistical power for this subgroup analysis or an insensitivity of tinnitus questionnaire measures over longer periods. Even for the 2-and 4-month analyses, it must be cautioned that we cannot ascertain with certainty whether the small statistically significant improvement is equivalent to a clinically meaningful improvement that is noticeable to people with tinnitus. Clinical interpretation of the findings by anchoring numerical values against patient-reported experience is under-reported to date.
In contrast to the small improvements in global tinnitus severity, our meta-analyses did not reveal statistically significant improvement in measures of mood. This finding contradicts that of a study by Posternak and Miller (2001), which looked at mental health conditions in isolation and found improvement while on waiting-list control groups. 4 It is possible that the null findings in the current study simply represent the relatively low number of tinnitus studies that had incorporated a measure of depression or generalized anxiety.
Although this systematic review accepted studies testing any form of tinnitus intervention, the metaanalysis was biased toward psychological interventions, with 11 out of the 21 included studies testing a psychological management modality. The current study, although skewed toward psychological treatment trials, adds to previous work by Hesser    it assessed a much broader range of tinnitus experiences than this previous work. Our findings incorporated those participants enrolled into a range of tinnitus intervention studies, namely tinnitus retraining therapy, education, auditory discrimination training, self-help using books, drug treatment, and Qigong (a combination of body posture, breathing control, and meditation developed in China). We believe that this inadvertent bias toward psychological interventions is in large part a reflection of the type of control group favored by trial designs assessing different types of tinnitus study. Pharmacological intervention studies will typically use a placebo medication as control, whereas studies assessing a device treatment, such as repetitive transcranial magnetic stimulation or laser therapy, will generally employ a sham treatment as control. For psychological therapies, such as cognitive behavior therapy, acceptance and commitment therapy, or mindfulness meditation, a placebo or sham psychological therapy control is unethical for the trial design, and so those trials are therefore much more likely to use a no-intervention or waiting-list control. Moreover, psychological therapies present a routine therapeutic option for people with bothersome tinnitus, often with a natural waiting list for an initial appointment, and so a no-intervention control is often a straightforward pragmatic option. One limitation for our interpretation of the findings is that it is not clear whether tinnitus patients consenting to participate in a psychological treatment trial are representative of tinnitus patients in general, and more specifically whether they are equivalent to those consenting to join a drug trial or medical device study. Two studies drew their participants directly from US military veterans, 23,24 hence participants were more likely to be male, have been exposed to a greater than average risk of noiseinduced hearing loss, and to the psychological stress associated with military service. These limitations are mitigated to some degree by the meta-analysis, which pooled findings from a wide range of studies. One further potential limitation of note is the exclusion of studies not available in English (because of limited resources), and studies that appear only in the grey literature. Although excluding grey literature may have introduced a publication bias, including grey literature could in itself introduce bias if the included sample of unpublished studies was not representative of all unpublished studies. It would be interesting to explore this issue in further analyses.

CONCLUSION
Participants enrolled into clinical trials assessing tinnitus interventions generally demonstrate a small but statistically significant improvement in self-reported global tinnitus severity scores over time, despite receiving no intervention. This finding provides statistical evidence that tinnitus generally improves over time, albeit the effect is highly variable across individuals, and how clinically meaningful the effect is cannot be interpreted at a general level. This evidence can therefore cautiously be used when counseling patients.