The Influence of Cochlear Spectral Processing on the Timing and Amplitude of the Speech-evoked Auditory Brain Stem Response

The influence of cochlear spectral processing on the timing and amplitude of the speech-evoked auditory brain stem response.—The speech-evoked auditory brain stem response (speech ABR) is widely considered to provide an index of the quality of neural temporal encoding in the central auditory pathway. The aim of the present study was to evaluate the extent to which the speech ABR is shaped by spectral processing in the cochlea. High-pass noise masking was used to record speech ABRs from delimited octave-wide frequency bands between 0.5 and 8 kHz in normal-hearing young adults. The latency of the frequency-delimited responses decreased from the lowest to the highest frequency band by up to 3.6 ms. The observed frequency-latency function was compatible with model predictions based on wave V of the click ABR. The frequency-delimited speech ABR amplitude was largest in the 2-to 4-kHz frequency band and decreased toward both higher and lower frequency bands despite the predominance of low-frequency energy in the speech stimulus. We argue that the frequency dependence of speech ABR latency and amplitude results from the decrease in cochlear filter width with decreasing frequency. The results suggest that the amplitude and latency of the speech ABR may reflect interindividual differences in cochlear, as well as central, processing. The high-pass noise-masking technique provides a useful tool for differentiating between peripheral and central effects on the speech ABR. It can be used for further elucidating the neural basis of the perceptual speech deficits that have been associated with individual differences in speech ABR characteristics. speech-evoked auditory brain stem response; auditory temporal processing ; cochlear response time; speech-in-noise; auditory filter DEFICITS IN TEMPORAL PROCESSING in the central auditory pathway are thought to contribute to difficulties in speech perception, particularly in noise (Boets et al. 2007; Pichora-Fuller and Souza 2003). Recent studies have proposed a neurophysiolog-ical correlate of temporal processing deficits in the scalp-recorded auditory brain stem response to speech (referred to hereafter as " speech ABR "), typically evoked by a consonant-vowel (CV) stimulus (Anderson et al. 2010a; Hornickel et al. 2011; Song et al. 2011). It is generally assumed that the speech ABR is generated by the summed synchronous firing of neurons in the upper auditory brain stem (Chandrasekaran and Kraus 2010). The response consists of an onset peak, evoked by the high-frequency onset burst of the CV syllable, followed by a series of peaks that synchronize to the fundamental frequency (F0) of …

that synchronize to the fundamental frequency (F0) of the periodic portion of the syllable, which comprises a formant transition period followed by a steady-state vowel.The onset peak of the speech ABR is generally thought to share common neural generators with the click ABR wave V.These latter generators are presumed to comprise onset-or primary-like units located in the lateral lemniscus as it enters the inferior colliculus (Melcher and Kiang 1996;Møller and Jannetta 1983).The neural generators of the periodic component of the speech ABR are less clear; these may similarly include onsetand primary-like units that respond to the sharp periodic peaks in the stimulus envelope, but they may alternatively or additionally comprise chopper-type neural units that phase-lock to the periodicity of the envelope (Pfeiffer 1966).
Abnormalities in the speech ABR have been repeatedly correlated with deficits in speech-in noise perception, particularly in aging populations (Anderson et al. 2011(Anderson et al. , 2012;;Ruggles et al. 2012) and in children, usually with language-related learning problems (Anderson et al. 2010b;Hornickel et al. 2011;Hornickel and Kraus 2013).A peripheral basis for these abnormalities in terms of hearing sensitivity has generally been ruled out, because the participant groups tested presented clinically normal audiograms.Instead, it has been suggested that, in these populations, the speech ABR abnormalities reflect reduced precision of phase locking in central neurons.
It can be questioned, however, whether normal audiometric thresholds guarantee normal suprathreshold cochlear function.There is evidence that even in listeners with normal audiometric thresholds, there is considerable variability in suprathreshold measures of cochlear amplification (Dubno et al. 2007;Sommers and Gehr 2010), which is presumed to be driven by outer hair cells and determines not only the sensitivity but also the frequency resolution and dynamic range of human hearing.Similarly, a large degree of interindividual variability has been reported for the medial olivocochlear reflex (MOCR), a neural feedback pathway that projects back into the cochlea and modulates the cochlear amplifier gain (Backus and Guinan 2007;Cooper and Guinan 2006).
The effect of cochlear processing on the ABRs evoked by simple stimuli has been well-documented (Dau 2003).Of particular importance for the click ABR wave V is the increase in cochlear response time from high-frequency (basal) to lowfrequency (apical) regions.This increase results in part from the travelling wave delay, which is determined by the passive mechanical properties of the cochlear partition, but to a larger extent results from the increase in filter build-up time due to the narrowing of cochlear filters, the tuning of which is determined Address for reprint requests and other correspondence: J. de Boer, MRC Institute of Hearing Research, Science Road, Univ.Park, Nottingham, NG7 2RD UK (e-mail: jdb@ihr.mrc.ac.uk).
J Neurophysiol 113: 3683-3691, 2015. First published March 18, 2015;doi:10.1152/jn.00548.2014.by the cochlear amplification process (Don et al. 1998).The frequency-dependent cochlear delays are preserved in the latency of the click ABR wave V, which increases by up to 3 ms when the cochlear place of origin is restricted and moved from base to apex (Burkard and Hecox 1983).In fact, the delays increase so rapidly toward the apex that responses from neurons tuned to lower frequencies contribute little to the ABR as a result of their extensive desynchronization (Dau 2003).Furthermore, upward spread of excitation causes higher frequency neurons to respond to lower frequency stimuli via the basal tails of their tuning curves.This means that at moderately high intensities, the ABR wave V mainly reflects activity from neurons tuned to higher frequencies, even when the stimulus has mainly low-frequency content (Dau 2003).A further consequence of the frequency-dependent variation in cochlear response time on the click ABR is that responses from neurons tuned to different frequencies can add together either constructively or destructively in the overall response, depending on their relative delays, which introduces a further cochlear source of variability (Don et al. 1994).
In this study, we investigated whether the speech ABR shows a similar dependence on cochlear frequency analysis as the click ABR.The investigation specifically focused on the periodic portion of the speech ABR, which has been suggested to reflect distinct neural processes compared with the click ABR wave V and is widely used to index the quality of speech encoding in the brain stem.The aim was to compare the relative amplitude and latency of this portion of the speech ABR as a function of frequency.To this purpose, speech ABRs were recorded from four octave-wide frequency regions between 0.5 and 8 kHz using a subtractive masking technique (Don and Eggermont 1978).Three main hypotheses were tested, based on previous findings on the click ABR wave V: first, that the latency of the frequency-delimited speech ABR increases with decreasing frequency, according to a power-law function that reflects the increase in cochlear filter width; second, that interindividual variability in the latency of the frequency-delimited speech ABR correlates with variability in cochlear filter bandwidth measured psychophysically; and third, that the amplitude of the frequency-delimited speech ABR as a function of frequency is not proportional to the stimulus spectrum, but instead is biased toward mid to high frequencies.

METHODS
Participants.Twenty-six native English speakers (age range, 18 -39 yr; mean age, 22.4 yr; 17 women) took part in this study.All participants had pure-tone hearing thresholds at or below 20 dB hearing level (HL) at octave frequencies between 250 and 8,000 Hz and presented a normal wave V response to 100-s clicks presented monaurally at a 31.1-Hzrepetition rate and a peak-equivalent (pe) level of 70-dB sound pressure level (SPL).Written informed consent was obtained from all participants.The experimental procedures were approved by the Ethics Committee of the University of Nottingham Medical School and were in accordance with the guidelines of the Declaration of Helsinki.Participants were paid at an hourly rate.
Design.Speech ABRs were recorded in quiet and in five different high-pass noise-masking conditions.For all participants, speech ABRs were recorded across two blocks.In each block, responses were recorded for each of the six conditions so that two replicate responses were collected from each participant in each condition.The order of conditions was counterbalanced across participants and was the same for the first and second block of recordings.Fourteen of the participants attended one experiment in which only speech ABRs were recorded in one experimental session.The remaining 12 participants attended a second experiment in which speech ABRs were recorded in one session and additional tests, including psychophysical measurements of cochlear filter bandwidths (see below), were performed in a second session.Filter widths were measured at four different frequencies using notched noises with four notch widths.The measurements were performed in order from lowest to highest frequency and from narrowest to widest notch width.
Derived-band subtraction.Derived-band speech ABRs were obtained by subtracting recordings acquired under different high-pass noise-masking conditions, as illustrated in Fig. 1.With the use of this method, derived-band responses were obtained from four adjacent frequency regions (0.5-1, 1-2, 2-4.and 4 -8 kHz) that constituted octave-wide bands centered around 0.7, 1.4, 2.8, and 5.7 kHz, respectively.The resulting derived-band responses still reflect neural activity from the rostral brain stem, but now comprise responses only from those neural units that receive input from the cochlear frequency channels that fall within the respective octave-wide bands.
Stimuli.The speech stimulus was a 170-ms CV syllable ([da]) with five formants and a constant 100-Hz fundamental frequency.The syllable was developed using a KLATT synthesizer and provided by Nina Kraus's group (Northwestern University, Chicago, IL).It comprised a 5-ms stop burst followed by a 50-ms formant transition region with a linearly rising F1 (400 -720 Hz), a linearly falling F2 (1,700-1,240 Hz) and F3 (2,580-2,500 Hz), and a flat F4 (3,300 Hz) and F5 (3,900 Hz).The stop burst contained frequencies around F4 and F5.The high-pass noise maskers were composed of "equally exciting" noise, a uniform noise filtered to contain approximately equal energy within each cochlear filter [equivalent rectangular bandwidth (ERB); Glasberg and Moore 1990].To generate the five high-pass maskers, this noise was high-pass filtered with cutoff frequencies of 0.5, 1, 2, 4, or 8 kHz, with the low-pass cutoff always at 12.2 kHz.
Sound generation and presentation were controlled by a TDT System 3 (Tucker Davis Technologies, Alachua, FL) and MATLAB (The MathWorks, Natick, MA).Stimuli were generated digitally at a 24.4-kHz sampling rate, digital-to-analog converted with a 24-bit amplitude resolution (TDT RP2.1) and amplified (TDT HB7).The [da] stimulus and the high-pass noise maskers were set to levels of 70 dB peSPL and 80 dB SPL per ERB, respectively, and were mixed digitally.Initial pilot experiments indicated that this combination of sound levels resulted in full masking of the speech ABR without high-pass filtering.Speech and noise were presented monaurally to the left ear via a magnetically shielded insert earphone (ER-1; Etymotic Research, Elk Grove Village, IL).In each recording, the noise was turned on 5 s before the speech stimuli and turned off after 2,000 In this example, the noise maskers used in recordings A and B were cut off above 8 and 4 kHz, respectively.When recording B is subtracted from recording A, the response to the speech stimulus that is common to both (below 4 kHz in this example) is canceled out, leaving a derived-band response from 4 to 8 kHz.
responses had been accepted by the data acquisition system (see below).
Electrophysiology.Speech ABRs were recorded using the Intelligent Hearing Systems SmartEP evoked potentials system (Miami, FL) in electric ABR mode, which allows the use of an external trigger.Electroencephalographic (EEG) signals were differentially measured between Ag-AgCl scalp electrodes placed at Cz (ϩ) and the right earlobe (Ϫ).An electrode placed on the mid forehead (Fpz) served as the common ground.All electrode impedances were maintained below 5 k⍀.The raw EEG signal was amplified by a factor of 10 5 and band-pass filtered online between 30 and 3,000 Hz.The external trigger was generated in MATLAB during stimulus presentation and initiated acquisition of a 200-ms poststimulus epoch.Epochs containing activity exceeding 35 V were rejected as artifacts.Alternating polarity responses were averaged together online until 2,000 artifactfree epochs (1,000 for each stimulus polarity) were accepted for each condition.Data sets from two experiments were combined (see Design).Each experiment used the same equipment and recording procedures but different analog-to-digital sampling rates and interstimulus intervals (ISI).In the first experiment, stimuli were presented at an ISI of 130 ms and responses were sampled at 10 kHz.In the second experiment, stimuli were presented at an ISI of 85 ms and responses were sampled at 20 kHz.The two data sets were pooled after the latter were downsampled to 10 kHz.No significant differences in response amplitude or latency were found between the two experiments.
Data analysis.Offline data preprocessing and analysis were performed in MATLAB.All recorded responses were digitally band-pass filtered between 70 and 2,000 Hz using a 12 dB/octave zero-phase shift Butterworth filter.The onset peak latencies of the grand-average responses were estimated by manual peak picking.No estimates of onset latencies were performed for the individual derived-band responses, because this peak was not reliably present in all participants and frequency bands.The periodic portion of the response was analyzed in a time window between 22.7 and 170 ms.The frequencydependent timing of this portion of the response was evaluated by measuring the relative delay between the derived-band responses and the broadband response.This was accomplished by cross-correlating the derived-band waveforms to the broadband response over a range of relative delays, or "lags," that were imposed by shifting the derived-band waveform forward and backward in time (Ϫ4 to ϩ4 ms).The lag at which the maximal cross-correlation occurred was taken to correspond to the latency difference between the derivedband and grand-average broadband response.For individual derivedband responses, this cross-correlation procedure was performed with respect to the grand-average broadband response, rather than the individual's broadband response, to maximize the signal-to-noise ratio in the cross-correlation.First, the delay between the individual's broadband response and the grand-average broadband response was estimated, using the same cross-correlation procedure.The grandaverage broadband response was then aligned temporally with the individual's broadband response before being used in the crosscorrelation procedure to estimate the delay of the individual derivedband response.The delay between the derived-band response and the broadband response is henceforth referred to as "relative latency."The relative latency could take a positive or negative value, indicating that the derived-band response started later (lag) or earlier (lead) than the broadband response, respectively.Note that the relative latency reflects only the frequency-dependent portion of the response latency and does not include the frequency-independent neural conduction delay between the cochlea and the neural generators of the speech ABR, presumed to be located in the rostral auditory brain stem.The amplitude of the periodic portion of the derived-band responses was estimated by calculating the complex cross-spectrum between two replicate waveforms within the 22.7-to 170-ms time window and summing its real part across frequency.Use of the cross-spectrum reduces the bias introduced by random noise, because it includes only those signal components that have the same phase (timing) in both replicates.To obtain the grand-average amplitudes, individual crossspectra were averaged and the real part of the resulting grand-average complex spectrum was summed.
Fitting procedures.The function relating the relative latency of the derived-band responses to the band center frequency was fitted to the model developed by Strelcyk et al. (2009) for the click ABR wave V.The model predicts that the click ABR wave V latency varies as a function of frequency and level according to the following equation: where a, b, c, and d are free parameters and f and i are the stimulus frequency and level, respectively.The parameter a represents the asymptotic delay reached as frequency approaches infinity, which is independent of frequency and level; it can be interpreted as the summed postcochlear neural and synaptic delays; b represents the cochlear response time at a frequency of 1 kHz and a pe click level of 93 dB SPL; c describes the level dependence and d the frequency dependence of the wave V latency.Strelcyk et al. (2009) found the population means of these parameters to be a ϭ 4.7 ms, b ϭ 3.4 ms, c ϭ 5.2, and d ϭ 0.5.In the present study, the aim was to evaluate whether the latency of the periodic portion of the speech ABR follows a power-law dependence on frequency similar to that of the click ABR wave V. To this purpose, the relative latencies of individual derivedband speech ABR latencies were fitted to Eq. 1 as a function of derived-band center frequency.The free parameter of interest was d.
Because the model was fitted to relative rather than absolute latencies, parameter a in this fit does not represent the summed neural and synaptic delays, but instead reflects the asymptotic delay of the derived-band response relative to the broadband response as frequency approaches infinity.The value of this parameter could not be established a priori or inferred from the click ABR wave V results, and it was therefore retained as a free parameter in the model fit.The parameters b and c were kept fixed at the population mean values estimated by Strelcyk et al. (2009) to reduce the number of free parameters in the fitting procedure and avoid overfitting.The stimulus level i was set to the 70-dB SPL stimulus level used in the present study.The fit function was thus simplified to the following equation: where B ϭ 3.4 ϫ 5.2 0.93Ϫ0.70ϭ 4.96.To evaluate the model, fits were first obtained for individual participants by using a nonlinear leastsquares procedure, which minimized the sum of squared errors between the predicted (Eq.2) and the observed latencies.Subsequently, population mean estimates for the fit parameters were obtained by submitting all individual data combined to a statistical model (see Statistical analyses for further details).The starting point for d in the fitting procedures was set at the population mean found by Strelcyk et al. (estimated using a mixed-effects model approach) for the click ABR wave V, and the starting point for parameter a was set to zero.Behavior.To assess psychophysical frequency selectivity, cochlear filter bandwidths were measured using the simultaneous notchednoise masking method (Glasberg and Moore 1990;Patterson 1974Patterson , 1976)).In the present study, an abbreviated, audiometer-based version of this method was used, which was developed and provided by Glasberg and Moore (University of Cambridge, Cambridge, UK).This test involves measuring the detection threshold of a pulsed tone signal (20-ms raised-cosine ramps, 160-ms steady duration, 200-ms interval between pulses) in a simultaneously presented noise masker with a spectral notch centered on the signal frequency.The masker notch width is specified as the notch width, ⌬f, divided by the signal frequency, f, i.e., ⌬f/f.The pulsed tone and notched noise were presented using a twochannel audiometer.During each measurement, the noise was presented continuously and the pulsed tone was presented for about 1 s once every 2-4 s.Participants were asked to press a button when they heard the pulsed tone.The time of signal presentation and the noise level were controlled by the experimenter.
The threshold procedure was similar to that used in pure-tone audiometry (British Society of Audiology 2004).However, in the present study the tone signal level was held constant and the masker level was varied.Thresholds were measured for four signal frequencies (0.5, 1, 2, and 4 kHz) and four masker notch widths (0.0, 0.1, 0.2, and 0.3).For each signal frequency, detection thresholds were first measured for the pulsed tone in quiet, using a final step size of 2 dB.The level of the pulsed tone was then fixed at 10 dB above threshold, and the level of the noise was varied, again using a step size of 2 dB, to find the noise level at which the tone was just audible for each notch width.
The method assumes that a signal at a frequency f, in a notched noise, is detected within the cochlear filter centered on f.The bandwidth of this filter is expressed as the ERB and was estimated from the change in threshold with increasing notch width using the fitting procedure developed by Glasberg and Moore (1990).Linear regression lines were fitted to the auditory filter bandwidths as a function of frequency for each participant.The resulting fit coefficients were then used to predict auditory filter bandwidth values at 0.7, 1.4, 2.8, and 5.7 kHz for each participant.
Statistical analyses.Statistical analyses were conducted in the statistical software package R (R Core Team 2013).To evaluate the frequency dependence of the three outcome measures (relative latencies and amplitudes of derived-band speech ABRs, psychophysical cochlear filter bandwidths), a repeated-measures design with frequency as the independent within-participants variable was used.Mixed-effects models were used to account for interindividual variability ("nlme" package, Pinheiro et al. 2013).These models incorporate both fixed effects, which describe the population behavior, random effects, which describe the variation between experimental units, in this case the individual participants.Model residuals were inspected for violations of the assumption of homogeneity of variance and normality using Levene's test ("car" package, Fox and Weisberg 2011) and inspection of quantile-quantile plots, respectively.
Linear mixed-effects models were used for the derived-band amplitude and psychophysical cochlear filter bandwidths, with frequency entered as a categorical factor, and a participant-related random effect included for the intercept.The general formula for these statistical models was thus where Y i,j is the observed value of the dependent variable for participant j at frequency i; ␤ 0 represents the fixed intercept, ␤ i the fixed effect at frequency i, b 0,j the random intercept for participant j, and i,j the residual error for participant j at frequency i.Data points that exerted disproportionate influence on model parameters were identified using the Cook's distance measure ("lme4" package, Bates et al. 2013; "influence.ME" package, Nieuwenhuis et al. 2012).The cutoff level was defined as 4/N, where N is the total number of observations.These points were removed if their inclusion was found to have a significant effect on the fixed effects.On the basis of these criteria, one influential data point was removed from the amplitude data.No data points were removed from the psychophysical cochlear filter bandwidths, but a log transformation was applied to the data to remedy the violation of homogeneity of variance observed in the residuals of the model.For post hoc comparisons between different levels of frequency as a fixed factor, Tukey's honestly significant difference test was applied ("lsmeans" package, Lenth and Herve 2014).
For the relative latency data, a nonlinear mixed-effects model was used initially, in which the frequency dependence was described by Eq. 2. In this model, parameters a and d were entered as fixed effects, for which random effects were also included.The full statistical model was described by where lat i,j is the relative latency observed in participant j at frequency i, and ␤ a and ␤ d represent the fixed effects, and b a,j and b d,j the random effects, associated with parameters a and d, respectively, with i,j representing the residual error for participant j at frequency i.Neither of the random effects were found to contribute significantly to the model fit according to a log-likelihood ratio test.Therefore, a nonlinear least-squares regression model ("nls" in the core R package) was used instead, in which only fixed-effect terms for a and d were evaluated.The homogeneity of variance assumption was found to be violated for the residuals of the nonlinear regression model fit, mainly due to a greater variance in the highest frequency band compared with the lower frequency bands.To evaluate the effect of the inhomogeneity on the estimated values for a and d, a frequency-dependent weighting was applied to the data in the nonlinear regression model, with the weighting factors set to the inverse of the variance observed in each frequency band.The difference in the estimates for a and d between the weighted and nonweighted nonlinear regression was found to be less than the respective standard errors, indicating that the inhomogeneity did not have a substantial effect on the parameter estimates.Both nonweighted and weighted estimates are reported in RESULTS.To evaluate the goodness of fit of the (nonweighted) nonlinear regression, a parametric bootstrap procedure was performed on the R-squared value (Stute et al. 1993).This procedure tests the hypothesis that the observed data belong to the distribution of expected outcomes for the fitted model.One thousand samples of "repeat experiment" data were generated for the same number of participants and observations included in the original model.The simulated data points were calculated based on the model estimates for the fixed effects, and individual variability was simulated by adding to each data point a random sample of noise taken from a normal distribution with zero mean and a standard deviation equal to the model estimate of the residual error.Each simulated data set was submitted to the original model and the R-squared value calculated, thus generating a distribution of simulated R-squared values.The hypothesis was rejected if the proportion of simulated R-squared values that fell below that of the actual data was greater than (1 Ϫ ␣), where ␣ is the significance level.If the hypothesis was not rejected, this implied that the observed data were a typical outcome of the model.
Fitting of individual data to linear (auditory filter bandwidths) and nonlinear (latencies) frequency functions was performed using leastsquares regression.Correlations between variables were evaluated using Pearson's r, and mean comparisons were performed using two-tailed Student's t-tests, or the Wilcoxon signed-rank test when data were nonnormally distributed.In all analyses, the ␣ level for significance was set at 0.05, Bonferroni-corrected for multiple comparisons where appropriate.

Overview.
Figure 2A shows the grand-average broadband speech ABR (i.e., recorded in the absence of frequencydelimiting high-pass noise) overlaid with the stimulus waveform.The stimulus has been moved forward in time in the plot to visually align the stimulus envelope with the periodic response peaks, to show the time-locking of the response to the periodicity of the stimulus envelope.The grand-average onset response peak latency was 10.2 ms. Figure 2, B-E, shows the grand-average derived-band speech ABRs (gray) overlaid with the broadband response (black).The onset peak was observable only in the two highest frequency bands, as would be expected given the high-frequency content of the onset burst.The grand-average onset latency in the 5.7-kHz band was 9.8 ms, which is 0.9 ms earlier than the onset latency in the 2.8-kHz band, which was 10.7 ms.This difference is roughly in line with predictions based on the click ABR wave V model (Eq.1; the model prediction for the delay at 70 dB SPL is 0.88 ms).No further analysis of the frequency dependence of the onset response was possible because of the low amplitude of the response in the lower frequency bands and in the individual derived-band responses overall.
In contrast, the periodic portion of the speech ABR showed identifiable responses in each octave band, based on both the reproducibility of the waveform and its resemblance to the broadband response (Table 1).This implies that this part of the response includes contributions from an extensive region of the cochlea, spanning several octaves.The relative latency of the derived-band speech ABRs decreased systematically from low-to high-frequency regions.This is evident from the relative timing between the waveforms of the derived-band and broadband responses, which changed from a just discernible lead of the derived-band response in the highest frequency band (center frequency ϭ 5.7 kHz) to a notable lag in the lowest band (0.7 kHz) (Fig. 2, B-E, Table 1).The amplitude of the speech ABR was greatest in the 2.8-kHz band (Fig. 2D, Table 1) and substantially decreased toward lower and higher frequencies.These observations are in qualitative agreement with the frequency dependence of the click ABR wave V.In the following sections, these observations are tested statistically on the basis of the individual data.
Speech ABR latency decreases with increasing derived-band center frequency.For the individual derived-band speech ABRs, the median relative latency of the periodic portion of the response decreased with increasing center frequency (Fig. 3).As a first step, individual latency-frequency functions were fitted to the power-law model adapted from Strelcyk et al. (2009) (Eq. 2) for each participant separately to evaluate the variation in model fits and range of parameter values across participants (Fig. 3B).The median values of the individually fitted parameters were d ϭ 0.46 (range Ϫ0.14 to 1.01) and a ϭ Ϫ3.93 (range Ϫ5.17 to Ϫ2.96 ms).
Next, the set of individual relative latencies were submitted to a nonlinear least-squares regression model (see METHODS), to obtain population estimates for the fit parameters.The resulting estimates for d and a were 0.46 (0.06) and Ϫ3.83 (0.18) ms, respectively [means (SE); see Fig. 3B, black dashed line].These estimates were not substantially altered when the nonlinear least-squares regression was repeated with a weighting factor applied to each frequency level, equal to the inverse of the variance at that frequency [a: Ϫ3.81 (0.14); d: 0.5 (0.05)]; this suggests that these estimates were not affected by the inhomogeneity of variance observed in the fit residuals (see METHODS).
To assess the goodness of fit of the nonlinear regression model, a parametric bootstrap procedure was performed (see METHODS).This showed that the R-squared value of the regression model fitted to the actual data was greater than that observed in 37% of 1,000 simulated repeat experiments, implying that the observed data are a representative outcome of the model.The results indicate that the latency-frequency functions are well-fitted by the power-law function described by Eq. 2. The estimate for a represents a frequency-independent delay (see METHODS); its value is specific to the procedure used in the present study to measure the relative latencies of the derived-band responses and so cannot be meaningfully compared with previous findings for the click ABR wave V.More importantly, however, the estimate for the frequency-depen-  Values are relative latency, amplitude, and waveform cross-correlations of grand-average speech auditory brain stem responses recorded from different frequency regions.Cross-correlations were calculated between replicate waveforms (r Rep ) and between derived-band and broadband waveforms (r BB ).dent parameter d was highly comparable to that of Strelcyk et al. (d ϭ 0.5;2009).This supports the hypothesis that the latency of the periodic portion of the speech ABR follows the same dependence on cochlear response time as the click ABR wave V.
Psychophysical estimates of cochlear filter bandwidths do not predict interindividual variation in derived-band speech ABR latency.In line with previous studies (Glasberg and Moore 1990), psychophysically measured cochlear filter bandwidths broadened with increasing center frequency in all 12 participants.Linear regression fits to the individual data (Fig. 4A) yielded a mean slope of 136.50 (9.94) Hz Ϫ1 and a mean intercept of 19.29 (12.01)Hz Ϫ1 .These values are in close agreement with the reported estimates of Glasberg and Moore (1990).To assess the statistical significance of this frequency dependence, the data were submitted to a linear mixed-effects model (see METHODS).To adjust for the increasing variance with increasing center frequency (Fig. 4B), the bandwidths were log-transformed.The test for fixed effects confirmed a significant effect of frequency [F(3,33) ϭ 282.3, P Ͻ 0.0001].
Previous findings for the click ABR wave V have shown that the variance in frequency-specific latencies is at least partly explained by the perceptual auditory filter width measured at the corresponding frequency (Strelcyk et al. 2009).To evaluate whether this was also the case for the speech ABR, correlations were calculated between the relative latencies and perceptual filter widths at corresponding frequencies at each derived-band frequency separately.The corresponding plots are shown together in Fig. 4C.No significant correlation was found between latency and filter width at each separately tested frequency (indicated by the different symbols in Fig. 4C; 0.7 kHz: r ϭ 0.17, P ϭ 0.61; 1.4 kHz: r ϭ 0.30, P ϭ 0.34; 2.8 kHz: r ϭ Ϫ0.34, P ϭ 0.28; 5.7 kHz: r ϭ Ϫ0.05, P ϭ 0.75; ␣ level for significance after Bonferroni correction ϭ 0.0125).These data do not support the hypothesis that interindividual variation in Center frequency (kHz) Center frequency (kHz)

A B
Relative latency (ms) Relative latency (ms)  1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Center frequency (kHz) Filter bandwidth (Hz) Center frequency (kHz) Auditory filter bandwidth (Hz) Filter bandwidth (Hz) Relative latency (ms) derived-band speech ABR latency is explained by auditory filter bandwidth.Speech ABR amplitude is attenuated at lower derived-band center frequencies.As shown in Fig. 5A, the predominant contribution to the speech ABR originated from the 2.8-kHz band.Frequency region was confirmed to have a significant effect on response amplitude [F(3, 74) ϭ 20.0, P Ͻ 0.0001; one influential outlier removed].Post hoc analysis revealed that the differences in amplitude between derived-bands were all significant (0.7-1.4 kHz: P ϭ 0.0044; 0.7-2.8kHz: P Ͻ 0.001; 1.4 -2.8 kHz: P ϭ 0.007; 1.4 -5.7 kHz: P ϭ 0.002; 2.8 -5.7 kHz: P Ͻ 0.001; Fig. 5A), apart from the 0.7-and 5.7-kHz comparison (P ϭ 0.99).The very low amplitude in the highest frequency band (5.7 kHz) likely resulted from the steep drop-off in stimulus energy above 4 kHz (Fig. 5B, light gray line), which would be expected to cause very little excitation in the 4-to 8-kHz region of the cochlea (Fig. 5B, black line).However, the stimulus contained considerable energy in both lower frequency bands, which might thus be expected to produce a comparable, or greater, cochlear response than the 2.8-kHz band.The smaller derived-band speech ABR amplitudes from the lower frequency bands suggest that, like the click ABR, the speech ABR is biased toward the higher frequency regions of the cochlea.

DISCUSSION
The findings of the present study show that both the amplitude and the latency of the periodic portion of the speech ABR are strongly dependent on the frequency region of origin in the cochlea.The latency of the response was found to increase by 3.6 ms as the cochlear place of origin moved from regions of high (5.7 kHz) to low (0.7 kHz) frequency.The amplitude of the response was maximal in the 2.8-kHz frequency region, whereas responses from lower frequency regions were attenuated relative to their representation in the stimulus.These results are in line with previous findings on the click ABR wave V (Burkard and Hecox 1983;Don et al. 1977;Strelcyk et al. 2009).In particular, the variation of latency with frequency was well-fitted by the power-law function derived from click ABR wave V data (Strelcyk et al. 2009).The population mean estimate for the parameter d, which determines the shape of the frequency dependence, was highly similar to that obtained for the click ABR wave V (d ϭ 0.46 in this study vs. 0.5 in Strelcyk et al. 2009).The amplitude attenuation toward lower frequency regions was also in line with findings from previous studies focusing on the click ABR wave V, where high-pass noise masking was also used to obtain octave-wide derivedband responses (Burkard and Hecox 1983;Don et al. 1994Don et al. , 1998)).
The frequency dependence of the click ABR wave V is thought to result mainly from the narrowing of the auditory filters from higher to lower frequency regions in the cochlea, which causes an increase in the cochlear response time (Don et al. 1998).The resulting frequency-dependent response delay in the cochlea is preserved at the level of the neural generators of the click ABR wave V, where it is reflected in the peak latency of the response.The increasingly rapid variations in cochlear response time toward lower frequencies are assumed to give rise to phase cancellations (Dau 2003), resulting in a relative attenuation of contributions from these regions.The frequency dependence observed in the present study for the periodic portion of the speech ABR can reasonably be assumed to arise similarly from the narrowing of cochlear filter widths from base to apex.In addition to increasing cochlear response time, narrowing auditory filters also results in a reduced ability to follow amplitude modulation, which declines steeply when the modulation frequency exceeds the auditory filter width (Joris and Yin 1992).For the periodic portion of the speech ABR, this would have reduced the modulation depth of the cochlear response to the envelope of the speech stimulus in the 0.7-kHz band, and to a lesser degree in the 1.4-kHz band.It is likely that both reduced modulation depth and phase cancellations contributed to the attenuation of the amplitude of the response in these frequency bands.One way to evaluate the relative contributions of phase cancellation and reduced modulation depth to the low-frequency attenuation could be to reduce the width of the derived bands to half-octaves.When half-octave-wide bands were used to study click ABR wave V, derived-band amplitudes at lower center frequencies were not found to be attenuated (Don and Eggermont 1978).This difference from other studies on the derived-band click ABR wave V (Burkard and Hecox 1983;Don et al. 1994Don et al. , 1998) )  the more restricted range of frequencies in the half-octave derived-bands, which would have limited the degree of phase cancellation in the responses.If the derived-band speech ABRs showed a similar dependence on derived-band width, this would indicate a contribution of phase cancellations; no such dependence would be expected to arise if the attenuation of the lower frequency derived bands resulted purely from decreased modulation depth.One aspect of the present findings that did not agree with expectations based on click ABR wave V studies was the nonsignificant relationship between derived-band speech ABR latency and perceptual auditory filter widths.Strelcyk et al. (2009) reported a significant correlation between these measures for the click ABR wave V at a center frequency of 2 kHz when both normally hearing and hearing-impaired listeners were included.No correlation was reported in the normally hearing group alone, but the sample size was low (n ϭ 5).The present study tested this relationship in a larger sample of normally hearing participants and at multiple frequencies but still found no correlation at any frequency tested.This may be due to the limited interindividual variability in auditory filter bandwidth in the absence of a hearing loss when measured using the notched-noise method (Sommers and Humes 1993).It has been proposed that this method may not provide the most accurate estimate of auditory filter bandwidth (Moore and Glasberg 1981;Oxenham and Shera 2003).It also may be the case that the relative latency estimates for the individual speech ABR derived bands were limited by an inadequate signal-tonoise ratio.Future investigations may need to include hearingimpaired participants and use an alternative method to measure auditory filter bandwidth (Shera et al. 2002).Additional presentations of the stimulus in the acquisition of the response also may be required to improve the signal-to-noise ratio of the derived-band speech ABR.
Scalp-recorded brain stem potentials evoked by simple stimuli, such as clicks or tone bursts, are assumed to represent a linear summation of neural activity across frequency channels (Dau 2003;Goldstein and Kiang 1958).This assumption is supported by computational models that have successfully simulated key properties of these responses (Dau 2003;Rønne et al. 2013).These models incorporate physiological models of cochlear processing and have demonstrated the importance of cochlear frequency dispersion in the formation of the summated scalp-recorded responses.This has been further corroborated by experimental manipulations that have elicited enhanced ABR wave V amplitudes by compensating for cochlear response delays (Dau et al. 2000;Don et al. 1994;Elberling and Don 2008).It is reasonable to expect that the formation of the scalp-recorded speech ABR involves a similar summation across frequencies.A consequence of this summation would be that the amplitude and latency of the overall response could be altered by any cochlear changes that affect the relative weighting and/or delay between contributions from different frequency regions in the cochlea.This could arise, for example, from a loss of higher frequency sensitivity in the cochlea, known to be particularly vulnerable to noise and age-related hearing loss, or from broadening of cochlear filters.In addition, selective loss of high-threshold auditory nerve fibers, which can lead to "hidden hearing loss" (Furman et al. 2013;Kujawa and Liberman 2009), could, if frequency specific, lead to changes in both amplitude and latency of the response.Such peripheral factors thus could constitute a potential confound for the use of speech ABRs in diagnosing central temporal processing deficits.This might be expected to be the case when there is any degree of hearing loss but could even arise in the presence of clinically normal hearing thresholds.
The high-pass masking paradigm used in the present study can provide a useful tool for detecting and/or controlling for these potentially confounding peripheral influences on the speech ABR.The paradigm also could be useful for examining the frequency-specificity of variations in speech ABR characteristics that have been observed in certain clinical populations (Anderson et al. 2010b(Anderson et al. , 2012;;Hornickel et al. 2009Hornickel et al. , 2013;;Song et al. 2011).Such information could help to elucidate the underlying mechanisms that link the speech ABR to indexes of speech-in-noise perception and speech-and language-related deficits in these populations.The present study demonstrates that this technique can be successfully applied to obtain frequency-specific speech ABRs.The main drawback of this paradigm is the long recording times required to obtain responses with adequate signal-to-noise ratio.An alternative method for obtaining frequency-specific responses is to use notched-noise masking (Picton et al. 1979;Terkildsen et al. 1975).This method also uses intense noise masking but utilizes a broadband masker with a spectral gap, or notch, covering the frequency range of interest.This paradigm in principle involves a shorter recording time, because the frequency-specific response is obtained from one recording, rather than from the subtraction of two separate recordings, as in the high-pass masking paradigm.However, a problem that arises with notched-noise masking is the phenomenon of upward spread of excitation, whereby at moderate-to-high sound levels, lowfrequency sounds will produce substantial excitation in more basal (high frequency) regions of the cochlea.Noise frequencies below the lower edge of the notch would be expected to spread into the spectral gap and might partially mask the frequency region of interest; this could lead to reduced amplitudes in the frequency-specific responses obtained with the use of this method (Wegner and Dau 2002).Nevertheless, it may be useful for future studies to investigate the relative advantages of the notched-noise versus high-pass noise-masking paradigm for recording frequency-specific speech ABRs.
In conclusion, ABRs evoked by complex sounds, including speech, provide an important tool to investigate the mechanisms of speech encoding in humans, and thus to identify the neural basis of the widespread problems in understanding speech in challenging conditions.The results of the present study highlight the importance of considering the effect of cochlear processing on the amplitude and latency of the speech ABR and of using methods, such as the high-pass masking technique used in this work, that provide more rigorous means to test mechanistic hypotheses that relate the speech ABR to central temporal encoding.

Fig. 1 .
Fig. 1.Schematic of subtractive masking technique.The x-axis represents the center frequency along the cochlear partition.The gray-shaded area (Masker) represents the part of the cochlea masked by high-pass noise, and the open area (Response) represents the region that is left free to respond to the speech stimulus.In this example, the noise maskers used in recordings A and B were cut off above 8 and 4 kHz, respectively.When recording B is subtracted from recording A, the response to the speech stimulus that is common to both (below 4 kHz in this example) is canceled out, leaving a derived-band response from 4 to 8 kHz.

Fig. 2 .
Fig. 2. Grand-average speech auditory brain stem response (ABR) waveforms.A: broadband response (black) overlaid with the stimulus (gray), which has been shifted forward in time to align the periodic peaks in stimulus and response.Brackets indicate the different regions of the speech ABR.B-E: derived-band speech ABRs (gray) overlaid with broadband response (black) at 0.7 (B), 1.4 (C), 2.8 (D), and 5.7 kHz (E).

Fig. 3 .
Fig. 3. Relative latency of the periodic portion of the speech ABR as a function of derivedband center frequency.A: medians (central marks) and interquartile ranges (box edges) of individual latencies grouped by center frequency.Crosses show outliers (see METHODS); whiskers show the highest and lowest values not considered outliers.B: individual latencyfrequency functions.Open circles and gray lines show individual data and regression lines for model fits.The population model fit is shown by the dashed black line.

Fig. 4 .
Fig. 4. Relationship between auditory filter bandwidth and relative latency of the periodic portion of the derived-band speech ABR.A: individual filter bandwidths as a function of frequency (open circles).Dotted lines show the associated linear regression fits.B: medians (central marks) and interquartile ranges (box edges) of individual filter bandwidths grouped by center frequency.Crosses show outliers; whiskers show the highest and lowestvalues not considered outliers.C: relationship between derived-band latencies and filter bandwidths at 0.7 (triangles), 1.4 (circles), 2.8 (squares), and 5.7 kHz (diamonds).Note that bandwidths were interpolated from the fits in A (see METHODS).

Fig. 5 .
Fig. 5. Amplitude of the periodic portion of the derived-band speech ABR as a function of center frequency.A: medians (central marks) and interquartile ranges (box edges) of individual derived-band amplitudes grouped by center frequency.Crosses show outliers (see METHODS); whiskers show the highest and lowest values not considered outliers.B: cochlear excitation (black line) evoked by the [da] spectrum (light gray line) as a function of frequency.Filled circles show the expected summed excitation in each of the derived bands, delimited by vertical dashed lines.