Revealing reionization with the thermal history of the intergalactic medium: new constraints from the Lyman-$\alpha$ flux power spectrum

We present a new investigation of the thermal history of the intergalactic medium (IGM) during and after reionization using the Lyman-$\alpha$ forest flux power spectrum at $4.0\lesssim z\lesssim5.2$. Using a sample of 15 high-resolution spectra, we measure the flux power down to the smallest scales ever probed at these redshifts ($-1\lesssim \log(k/$km$^{-1}$s)$\lesssim -0.7$). These scales are highly sensitive to both the instantaneous temperature of the IGM and the total energy injected per unit mass during and after reionization. We measure temperatures at the mean density of $T_{0}\sim7000$-8000 K, consistent with no significant temperature evolution for redshifts $4.2\lesssim z\lesssim5.0$. We also present the first observational constraints on the integrated IGM thermal history, finding that the total energy input per unit mass increases from $u_{0}\sim4.6$ ${\rm eV}$ $m_{\rm p}^{-1}$ to 7.3 eV $m_{\rm p}^{-1}$ from $z\sim 6$ to $4.2$ assuming a $\Lambda$-CDM cosmology. We show how these results can be used simultaneously to obtain information on the timing and the sources of the reionization process. Our first proof of concept using simplistic models of instantaneous reionization produces results comparable to and consistent with the recent Planck constraints, favoring models with $z_{\rm rei}\sim 8.5^{+1.1}_{-0.8}$.


INTRODUCTION
The epoch of hydrogen reionization, represents one the most dramatic phases of evolution of the Universe. During this period, the UV radiation from the first luminous sources reionized the neutral hydrogen (and He I) atoms in the diffuse intergalactic medium (IGM), driving the transition from a neutral to an highly ionized Universe. Understanding sources and timing of this transformation can reveal crucial information on the properties of the first objects and the environment in which they were formed. When and how reionization happened therefore remains a primary subject of interest in extragalactic astrophysics (for a review, see Becker et al. 2015a).
The most direct probes of the highly ionized IGM have been obtained from observations of intergalactic Lyman-α (Lyα) absorption along the lines of sight to high-redshift quasars. Measurements of Lyα transmission along some lines of sight suggest that reionization was largely complete by z ∼ 6 (e.g., McGreer et al. 2015). On the other hand, large fluctuations in IGM opacity remain at z 6, suggesting that lingering evidence of reionization may remain in the IGM to somewhat lower redshifts (Fan et al. 2006;Becker et al. 2015b;Bosman et al. 2018;Eilers et al. 2018). While current constraints from cosmic microwave background (CMB) observations are consistent with a rapid reionization at redshift z rei 7.7 ± 0.7 (Planck Collaboration et al. 2018), meaelisa.boera@gmail.com surements of the fraction of neutral hydrogen at high redshift have been also obtained from the presence of Lyα damping wings (Mortlock et al. 2011;Simcoe et al. 2012;Greig et al. 2017;Davies et al. 2018) and from the weakening of Lyα emission lines in z ∼ 6 − 8 galaxies (e.g., Caruana et al. 2014;Schmidt et al. 2016;Sadoun et al. 2017;Mason et al. 2018). The available data seem to generally support a late reionization scenario (with the bulk of reionization happening at z ∼ 6 − 8) but are still consistent with a relatively broad range of reionization histories. The sources responsible for reionization also remain uncertain. Star-forming galaxies have commonly been considered the most likely candidate (e.g., Finkelstein 2016;Bouwens et al. 2016Bouwens et al. , 2015. Scenarios in which active galactic nuclei (AGN) make a substantial contribution, however, continue to be considered (e.g., Giallongo et al. 2015;Madau & Haardt 2015;Parsa et al. 2017;D'Aloisio et al. 2017).
Further insight may be gained from using the IGM thermal history to constrain the reionization process. The temperature of the IGM should increase significantly via photoionization heating during hydrogen reionization and, because its cooling time is long, the low density gas retains some useful memory of when and how it was reionized (Miralda-Escudé & Rees 1994;Abel & Haehnelt 1999;Upton Sanderbeck et al. 2016). At the mean density of the IGM the characteristic signature of reionization is expected to be an increase in temperature of tens of thousands of Kelvin as an ionization front sweeps through (e.g., D'Aloisio et al. 2018), followed by cooling (over ∆z ∼ 1 − 2) towards a thermal asymp-tote set primarily by the balance between photo-heating by the UV background (UVB) and adiabatic cooling due to the expansion of the Universe (e.g., McQuinn et al. 2009). The interplay among these effects is expected to lead to a powerlaw temperature-density (T-ρ) relation for the low density gas (∆ = ρ/ρ 10) of the form where T 0 is the temperature at the mean density and (γ-1) is the slope of the relation (Hui & Gnedin 1997;Puchwein et al. 2014;. Following reionization, the increase in gas pressure due to the boost in temperature smooths out of the gas on small scales (e.g., Gnedin & Hui 1998;Rorai et al. 2013;Kulkarni et al. 2015). The degree of "Jeans smoothing" in the IGM prior to a given redshift is sensitive to timing and the total heat injection during and after reionization. Measurements of both the gas temperature evolution and the Jeans smoothing at redshifts approaching reionization (z 4) can therefore constrain the timing of this process and potentially provide information on the nature of the ionizing sources.
In the last two decades the Lyα forest in quasar spectra has been the main laboratory for the study of the thermal state of the IGM. In combination with cosmological hydrodynamical simulations, previous efforts have used a variety of statistical approaches to measure the IGM temperature-density relation at 1.5 z 5.4 from the shapes of the Lyα absorption lines (e.g., Schaye et al. 2000;Ricotti et al. 2000;McDonald et al. 2001;Theuns et al. 2002;Bolton et al. 2008;Lidz et al. 2010;Becker et al. 2011;Rudie et al. 2012;Bolton et al. 2014;Boera et al. 2014Boera et al. , 2016Hiss et al. 2017;Rorai et al. 2017aRorai et al. , 2018. However, the widths of these features are sensitive to both the instantaneous temperature of the gas (thermal broadening) and Jeans smoothing (which increases the Hubble broadening) due to the heat injection at higher redshifts. In previous works the impact of pressure smoothing has generally either not been included or has been considered a source of systematic error. For example, Viel et al. 2013a andIršič et al. 2017 account for this effect by adding the redshift of reionization as a nuisance parameter for their warm dark matter constraints.
On the other hand, the first direct measurement of the characteristic filtering scale over which the gas is pressure smoothed (λ P ) has recently been obtained from the analysis of the Lyα absorption correlations using close quasar pairs at z ∼ 2 − 3 (Rorai et al. 2017b). This method largely disentangles the impacts of thermal broadening and pressure smoothing; however, the lack of known quasar pairs at higher redshifts prevents it from being used at redshifts closer to hydrogen reionization.
An alternative means of simultaneously constrain temperature and Jeans smoothing is presented by the Lyα flux power spectrum (Puchwein et al. 2014;Nasir et al. 2016, hereafter N16;Walther et al. 2018b). N16 demonstrated using hydrodynamical simulations that the Lyα flux power spectrum exhibits different scale dependences for the temperature and Jeans smoothing. In particular, probing small scales (wavenumber log(k/km −1 s) −1) increases the sen-sitivity to different reionization scenarios (see also Oñorbe et al. 2017 for an independent analysis). Although the onedimensional flux power spectrum statistic has been already explored in several works (e.g., Kim et al. 2004;McDonald et al. 2006;Viel et al. 2013a;Palanque-Delabrouille et al. 2015;Iršič et al. 2017;Yèche et al. 2017), the lack of high resolution, high signal-to-noise (S/N) Lyα forest spectra has so far prevented these small scales from being measured at redshifts approaching reionization (but see Walther et al. 2018a for an analysis at z < 4).
In this paper we present a a new measurement of the Lyα flux power spectrum at z ∼ 4 − 5.2 obtained from a sample of high resolution, high S/N spectra. We extend the measurement to previously unexplored small scales (log(k/km −1 s)= −0.7). By comparing the data to predictions from a suite of hydrodynamical simulations we investigate the IGM temperature evolution and, simultaneously, its integrated thermal history. We then demonstrate how the combined constraints offer new insights on the timing and sources of the hydrogen reionization process.
For this work we have adopted the parametrization of the Jeans smoothing effect described in N16. We characterize the integrated thermal history of the IGM using the cumulative energy per unit mass, u 0 , injected into the gas at the mean cosmic density during and after the reionization process. As we demonstrate, this quantity can be directly used to constrain reionization models. This paper is organized as follows. In Section 2 we introduce the observational sample of high-resolution spectra. An overview of the simulations used to test and interpret the measurements is presented in Section 3. In Section 4 we introduce the power spectrum method, discussing the effect of the most relevant thermal parameters. In Section 5 we present the observational power spectrum results and discuss the strategies applied to take into account and reduce systematic uncertainties. The calibration and analysis of the synthetic power spectrum models are described in Section 6. The Markov Chain Monte Carlo (MCMC) analysis, comparing models with the observational measurements is described in Section 7, where we also present our main results for the IGM temperature at the mean density and the integrated thermal history. As an example of how our thermal constraints can be used to test reionization histories, we apply our results to instantaneous reionization models in Section 8. We summarize our findings and conclude in Section 9. Tests for various systematic effects are described in the appendices.
2. OBSERVATIONAL SPECTRA We obtained high-resolution spectra of a sample of 15 quasars spanning emission redshifts 4.8 z em 5.4. The quasars and their basic properties are listed in Table 1. The spectra for eleven of the objects were obtained with the Keck High Resolution Echelle Spectrometer (HIRES; Vogt et al. 1994) while the remaining four were taken with the Ultraviolet and Visual Echelle Spectrograph (UVES; Dekker et al. 2000) on the Very Large Telescope (VLT).
The spectra were reduced using a custom set of IDL routines that include optimal sky subtraction (Kelson 2003) and extraction techniques (Horne 1986). For each object a single one-dimensional spectrum was extracted simultaneously from all exposures after individually applying telluric absorption corrections and relative flux calibration to the two-dimensional frames. Telluric corrections were modeled based on the ESO SKYCALC Cerro Paranal Advanced Sky Model (Noll et al. 2012;Jones et al. 2013). For the UVES data we found that flux calibration derived from standard stars yielded sufficiently accurate agreement between overlapping. For HIRES, however, this approach produced well-known moderate (∼10%) inter-order flux discrepancies. For all except one of our HIRES quasars, therefore, we used lower-resolution spectra from Keck/ESI, VLT/X-Shooter, or Gemini/GMOS to derive a custom response function for each exposure. The remaining object, J2111−0156, was calibrated using a response function from a standard star. We verified that our final flux power spectra remained essentially unchanged if standard star flux calibration was used for every object. We therefore do not expect this aspect of the reduction to significantly impact our results.
The HIRES objects were observed using a 0. 86 slit, giving a nominal resolution FWHM of ∼6 km s −1 . The UVES spectra were taken with a 1. 0 slit, giving a nominal resolution of ∼7 km s −1 . The telluric models for the UVES data, however, indicated somewhat higher resolution consistent with a typical seeing of 0. 8. Consequently, we adopt a resolution 6 km s −1 for the full data set, which is sufficient to resolve small-scale features in the Lyα forest. We therefore expect that even the smallest scale of the flux power spectrum measured in this work (log(k/km −1 s) = −0.7, or ∆v ∼ 30 km s −1 ) will not be strongly affected by the finite spectroscopic resolution (but see Section 5.6.3). For all the quasars, the echelle orders were redispersed onto a common wavelength scale with a dispersion of 2.5 km s −1 per pixel.
According to the analysis presented in N16 using mock observations with a redshift path ∆z = 4, a continuum-tonoise ratio (C/N) of ∼15 per 3 km s −1 pixel is necessary to break the degeneracy between thermal broadening and pressure smoothing and measure the thermal parameters with a statistical uncertainty of ∼20%. Conservatively, we have chosen our sample imposing this minimum threshold inside the Lyα forest region.
We have fitted the continuum in our spectra using spline fits guided by power-law extrapolations of the continuum redwards of the Lyα emission line. Given the high levels of absorption at z 4 the continuum measurements are necessarily characterized by large uncertainties (∼10-20%). We therefore use these estimations only to derive a rough estimate of the C/N level. In measuring the power spectrum, as described in Section 5.1, we adopt an approach that does not require a priori knowledge of the continuum. The redshift coverage and the median C/N for the Lyα forest region of our sample is reported in Table 1. The majority of the spectra have larger C/N than our cut with a typical value per pixel in the range of 20-30. This high C/N assures that the power spectrum measurement at small scales will not be strongly affected by uncertainties in the noise modeling. Table 1. List of quasars used for this analysis. For each object we report the name (column 1) based on the J2000 coordinates of the object and the emission redshift (column 2). The redshift intervals associated with the Lyα absorption used for this analysis are also reported with the corresponding median C/N level per pixel (column 3, 4 & 5). Finally, the instrument with which the spectrum was taken is listed in column 6. 3. THE SIMULATIONS To test systematics associated with the observed power spectrum and to interpret our observational results, we used synthetic spectra derived from hydrodynamical simulations and processed to closely match the characteristics of the real data. We ran a large set of hydrodynamical simulations that span a range of thermal histories at z > 4. The simulations run following the Sherwood simulations suite (Bolton et al. 2017) which uses a modified version of the parallel smoothed particle hydrodynamics code P-GADGET-3, an updated and extended version of GADGET-2 (Springel 2005). The models adopt the cosmological parameters Ω m = 0.308, Ω Λ = 0.692, h = 0.678, Ω b = 0.0482, σ 8 = 0.829 and n s = 0.961, consistent with the cosmic microwave background constraints of Planck Collaboration et al. (2014). Initial conditions were obtained using transfer functions generated by CAMB (Lewis et al. 2000). Because the vast majority of the absorption systems probed by the Lyα forest at z > 4 corresponds to overdensities ∆ = ρ/ρ 10 our analysis will not be affected by the star formation prescription (Viel et al. 2013b). Therefore, to increase the computational speed, gas particles with temperature T < 10 5 K and overdensity ∆ > 10 3 are converted to collisionless particles .
The bulk of our simulations uses a box size of 10 h −1 cMpc and 2 × 512 3 gas and dark matter particles, corresponding to a gas particle mass of 9.97×10 4 h −1 M . In addition, we use runs with larger box size and different mass resolution to test numerical convergence (see Appendix D).
We note that our simulations are not intended to selfconsistently model reionization. Instead, we employ models with a wide variety of thermal histories so that our ulti-mate constraints on the temperature and integrated heating of the IGM are as general as possible. The gas in our models becomes optically thin at a redshift z OT , after which it is photo-ionized and heated by a uniform ultraviolet background (UVB), which is a scaled version of the background from Haardt & Madau (2012). The thermal history of a given simulation is therefore determined by the choice of z OT and UVB scaling factor.
The photo-heating rates from Haardt & Madau (2012) ( HM 12 i ) for the different species (i=[H I, He I, He II]) have been rescaled proportionally by a constant factor ζ using the relation i = ζ HM 12 i (see Table 2). The combination of z OT and ζ will determine both the instantaneous temperature and the total integrated heating per unit mass at the epoch where the power spectrum is measured. Models with larger z OT and/or ζ will tend to have higher values of u 0 .
A summary of the simulations used in this work is listed in Table 2. For each model we selected the simulation outputs between 4.0 z 5.4 with a redshift step ∆z = 0.1. At each redshift, synthetic spectra of Lyα forest were produced by choosing 5000 "lines of sight" throughout the simulation box. In Section 6.2 we describe how these lines of sight were combined to create realistic mock spectra.
Following N16, the integrated thermal history in our simulations is parametrized using u 0 , the cumulative energy deposited per unit mass into the gas at the mean density. At each redshift u 0 is defined as: whereρ is the mean mass density and n i and i represent, respectively, the number density and the photo-heating rates for the species i=[H I, He I, He II]. As shown in N16 (see their Figure 4), this parameter correlates with the density power spectrum of the cosmic gas in the simulations, with larger u 0 corresponding to a smoother distribution of gas for overdensities ∆ < 10. These are the overdensities at which the Lyα forest is sensitive at z > 4 (e.g., Becker et al. 2011), suggesting that, at these redshifts, u 0 serves as a useful parametrization for the prior IGM thermal history. In Section 6.3.3 we will further consider the redshift range of integration over which u 0 optimally correlates with the flux power spectrum. Examples of the evolution of u 0 in our models are presented in Figure 1 along with the corresponding evolution of the temperature at the mean density, T 0 (for the complete set of models, see Appendix G ). The left panels show how increasing the photo-heating rate in the simulations produces larger values in both the temperature and u 0 . The right-hand panels show models with the same photo-heating rate but different z OT . These converge to the same value of T 0 provided sufficient time has elapsed after the onset of heating (∆z ∼ 1 − 2; e.g., McQuinn & Upton Sanderbeck 2016); however, they remain distinct in terms of u 0 values, reflecting differences in the total integrated thermal history and therefore in the amount of pressure smoothing. 4. THE LYMAN-α FLUX POWER SPECTRUM Both thermal broadening and pressure smoothing tend to reduce the amount of small-scale structure in the forest. Fig-ure 2 shows the effect of thermal broadening (top panel) and pressure smoothing (bottom panel) on simulated Lyα forest spectra at z = 5. While the impact of T 0 and u 0 are visually similar, the scale dependences of these effects makes it possible to break the degeneracy (e.g., N16, Oñorbe et al. 2017).
The top row of Figure 3 demonstrates how the shape of the 1D Lyα flux power spectrum at z = 5 varies for models with different instantaneous temperature (left panel) and integrated thermal histories (right panel). Similar results were shown in N16, but are expanded here to include a broader range of thermal histories. As described in Section 6.3.1, we use post-processing to vary the T 0 for a fixed u 0 (top left) or to impose the same T 0 for models with different u 0 (top right). We also demonstrate the impact of varying γ (bottom left) and the effective optical depth, τ eff (bottom right). As noted by N16, the scale dependence of T 0 and u 0 differ somewhat. While the impact of pure thermal broadening increases continuously towards smaller scales, the effect of changing u 0 peaks near log(k/km −1 s) ∼ −0.9 to −0.8.
Comparing the two panels of the first row of Figure 3, it is clear that in order to distinguish models characterized by an early reionization (large u 0 ) from those with high T 0 values, it is necessary to probe the power spectrum down to log(k/km −1 s) ∼ −0.7. Our effort to measure the power spectrum down to these scales is described in the following section.
5. DATA ANALYSIS In this Section we describe our procedure for measuring the flux power spectrum from the observed spectra. The following strategies have been tested using synthetic spectra for two reasons: first, to detect and quantify systematic effects in the calculation of the power spectrum, and second, to guarantee a fair comparison between simulated models and the observed data.

Rolling mean
We performed the power spectrum measurement on the flux contrast estimator where F is the transmission in the Lyα forest andF is the mean flux. When computing δ F we need to first divide out the intrinsic shape of the quasar spectrum, which can impact the power spectrum at large scales (log(k/km −1 s) −2; e.g., Kim et al. 2004;Viel et al. 2013a;Iršič et al. 2017). However, directly estimating the continuum is difficult at z 4 due to the high levels of absorption in the Lyα forest. We therefore used a rolling mean approach, whereinF is estimated locally by smoothing the observed spectrum using a boxcar average. We used a boxcar window of 40 h −1 cMpc, which was chosen to be short enough to roughly capture relevant features in the quasar continua over the forest (see Appendix A.1 for details). Examples of this approach are presented in Figure 4. Table 2. List of hydrodynamical simulations used in this work. Entries in bold correspond to the Sherwood simulations first introduced in Bolton et al. (2017); the model names used in that work are given in brackets. For each simulation we report the name (column 1), box size (column 2), mass resolution (column 3), the redshift at which the gas becomes optically thin (column 4), and the constant factor used to rescale the photo-heating rates for different thermal histories (column 5). The thermal parameters that describe the T-ρ relation at z = 5 are also listed: the temperature of the gas at the mean density (column 6) and the power-law index γ (column 7). Finally, the cumulative energy per unit mass deposited into the IGM at the mean density by z = 5 is given in column 8 (see text for details). Further details on the simulation methodology are presented in Bolton et al. (2017).

Proximity regions
Regions near to a quasar are subjected to the local influence of its UV radiation field and are therefore expected to show lower Lyα absorption with respect to the cosmic mean. To avoid any proximity effect bias we excluded these regions from the analysis. The UV flux of a bright quasar, is thought to affect regions 10 proper Mpc along its line of sight (e.g., Scott et al. 2000;Worseck & Wisotzki 2006). We conservatively masked 30 proper Mpc bluewards of the quasar redshift Lyα emission line. Moreover, to exclude possible blueshifted Lyβ absorption we also masked a velocity interval corresponding to 10 proper Mpc redwards of the Lyβ emission line. Excluding the proximity regions moderately changes the power (by 5%) only for the highest redshift bin at z = 5, although the correction is always well within the statistical error.

DLAs
We excluded damped Lyman-α (DLA) systems from our spectra. DLAs were identified visually and masked prior to computing the power spectrum. This step changes the power up to ∼5-10% which is within the statistical uncertainties at all scales.

Bad pixels
We masked bad pixels characterized by negative or zero flux errors. We also masked discrete regions affected by sky emission line residuals, which tend to be noisy. These features mainly impact smaller scales than the ones we want to compute (log(k/km −1 s) −0.5), but they may affect the evaluation of the correct noise power (see Section 5.6.2) and therefore need to be removed.

Lyα sections and redshift sub-samples
We compute the flux power spectrum on sections of 20 h −1 cMpc (comoving distance). This scale was chosen to be small enough that we would have enough sub-samples  Examples of the evolution of parameters governing the thermal state of the IGM in our simulations. Left panels: evolution as a function of redshift of the temperature (top) and the cumulative energy per unit mass at the mean density (bottom) for models in which heating begins at the same zOT but the photo-heating rates are changed. Right panels: models with the same photo-heating rates and different zOT.
The values of these quantities at z = 5 for all our simulations are also listed in Table 2. The full suite of thermal histories is plotted in Figure  G12. (N > 30) to evaluate the statistical uncertainty in the flux power via bootstrapping, yet large enough to avoid significant windowing affects (see Appendix A.2). Each of the Lyα sections have been examined by eye to avoid sections containing too many masked pixels. The power spectrum results from the useful sections are then collected and averaged in redshift bins of ∆z = 0.4 centered at z = 4.2, 4.6 and 5.0.

Measuring the power spectrum
For each of the 20 h −1 cMpc forest regions we calculate the power spectrum from the flux contrast δ F defined in Eq. 3. Our spectra are unevenly sampled because they are masked so we use a Lomb-Scargle algorithm (Lomb 1976;Scargle 1982) to compute the raw power of each region (P masked (k)). In all of our calculations we use k-bins logarithmically spaced with ∆logk = 0.1. To obtain the final 2.0 1.5 In all panels we plot a fiducial model with T0 = 7000 K, γ = 1.5, u0 = 4.78 eV m −1 p and τ eff = 1.85 using a black solid line. The four parameters are varied separately as indicated in each panel. Residuals dP k /P k relative to the fiducial value are displayed for each scale. For comparison, the 68% errors relative to the observational power spectrum computed in this work at z = 5 are also shown (shaded green region). Models with higher temperature show decreasing power towards smaller scales with the most prominent effect at scales log(k/km −1 s) > −1 (top left). Changes in the integrated thermal history (top right) produce variations in the pressure smoothing experienced by the gas. This effect has a somewhat different scale dependence than pure thermal broadening. The power spectrum at this redshift is not highly sensitive to variations in γ (bottom left) although decreasing γ tends to increase the power at log(k/km −1 s) −0.8. Differences in the effective optical depth (i.e., mean flux) create changes in the normalization of the power spectrum (bottom right).
power spectrum values, P F (k), for each section we first correct the raw P masked (k) for the effect of masking. Secondly, we subtract from the corrected P data (k) an estimate of the contribution to the power from noise, P N (k). All these steps are described in the following sections.

Masking correction function
The masking procedure described in Sections 5.3 and 5.4, and in particular the masking of sky line residuals impacts the power spectrum due to the application of a complex window function. In order to correct for this we apply a masking correction function, C m (k), to the raw power obtained from each of the 20 h −1 cMpc Lyα forest sections, where P data (k) is the corrected quantity used to infer the final power and P masked (k) is the raw power initially computed from masked spectra. We determine the effect of masking for each of the Lyα forest sections contributing to the analysis using the follow-ing procedure. First, we create hundreds of synthetic spectra with the same characteristics (i.e. size, noise, redshift) of each of the real 20 h −1 cMpc sections with and without the same masking applied. The final correction is then obtained from the average of the ratio between the power of the unmasked (P sim ) and masked (P mask sim ) simulated spectra, Because the impact of masking on the power spectrum in principle depends on its underlying shape, possible systematics may arise from choosing a particular simulation run to compute C m (k). While we use the 20h −1 cMpc run of Table  2 for the final correction, therefore, we quantify these possible uncertainties in Appendix A.5.

Noise subtraction
In principle the noise power can be directly computed from the flux error array output by the data reduction pipeline (e.g.,   Iršič et al. 2017;Walther et al. 2018a). This approach, however, relies on the precision of the pipeline; underestimating or overestimating these uncertainties could significantly impact on the final power, especially at the small scales we are interested in. We therefore estimate the amount of noise for each forest section directly from the raw power spectrum of the data. At the smallest scales (log(k/km −1 s) −0.2 ) the power is dominated by noise fluctuations and, assuming the noise in adjacent wavelength bins is uncorrelated, can be fitted with a constant value. We then assume that P N is constant over all scales and subtract it from the total power obtaining the "noiseless" power spectra. This method is illustrated in Figure 5 and has been tested on synthetic data after adding the observational noise arrays to the simulated lines of sight (see Appendix A.3).

Resolution correction
In this work we forward model the synthetic spectra generated from simulations to match the instrumental resolution and pixel size of the data. For reference, however, we include a version of the observed flux power spectrum that has been corrected for resolution as using the window function adopted in Palanque-Delabrouille et al. (2015), Assuming our nominal resolution R = 2.55 km s −1 (FWHM = 6 km s −1 ) and pixel size dv p = 2.5 km s −1 , the correction for the smallest scale considered in this work (log(k/km −1 s) = −0.7 ) is W 2 R ∼ 0.76. We note that the actual spectral resolution of the data will depend on the seeing of the observation and may be different from the nominal one. A possible error in the power spectrum due to uncertainties in the spectral resolution, even when forward modeling the simulations, must therefore be taken into account. We estimate an error of 10% in the spectral resolution, corresponding to an uncertainty in the power of 5% at log(k/km −1 s) −0.7. This correction is smaller than our statistical error, so we do not expect that uncertainties in the resolution will significantly affect the measurements (see Appendix A.4).

Metals
The flux power spectrum measured directly from the observational spectra contains both the power coming from the Lyα forest and a small contribution from intervening metal lines. These lines tend to show individual components significantly narrower than Lyα (b 15 km s −1 ), which will increase the power on small scales (e.g., Lidz et al. 2010). This Section describes our approach to quantifying and removing the effect of metals on our final power spectrum measurements.
The high level of Lyα absorption at high redshift makes it very challenging to directly identify all metal lines in the for-est. We therefore estimate the metal power spectrum directly from regions of quasar spectra redwards of the Lyα emission line, where only metal absorption systems are present (e.g., McDonald et al. 2005;Palanque-Delabrouille et al. 2013). The metal power measured in this way will not take into account transitions with rest-frame wavelength shorter than the Lyα line. Correlation features like the one observed for Si III (λ 1206) in McDonald et al. (2006), however, will tend to affect the power spectrum on scales larger than the ones considered in this work (log(k/km −1 s) −2.5).
We measured the metal power spectrum from two samples of high-resolution quasar spectra. First, we use a subset of the spectra listed in Table 1 with emission redshift 4.5 z em 5.3. Second, we use a sample of spectra of quasars with emissions redshifts 3.4 z em 4.1 from Boera et al. (2014Boera et al. ( , 2016 (Table 3). The latter sample allows us to measure the metal power spectrum over observed wavelengths similar to those spanned by the Lyα forest at 4.0 z 4.4. While metals redwards of Lyα are not a perfect estimate of those that appear in the forest at higher redshifts, analyzing multiple samples allows us to check for redshift evolution in the metal power spectrum. Figure 6 shows the comparison between our measurements of the metal power spectrum at z ∼ 4.2 (blue dashed line) and at z ∼ 5.4 (red dashed line), obtained using the same data analysis procedure described in the previous sections. The most significant difference in the metal power between these two redshifts is at large scales (log(k/km −1 s) −1.5) where the contribution of metals to the final flux power spectrum measurement is less relevant. Note that our measurement of the metal power spectrum at z ∼ 4.2 is also in good agreement with the one computed from the XQ-100 data sample at the same redshift (Iršič et al. 2017) (blue dotted line) even if the latter is slightly nosier.
Given the weak evolution in redshift of the metals power (already noted in previous works, e.g., Palanque-Delabrouille et al. 2013) we corrected our final power spectrum measurements assuming a metal contribution constant with redshift and equal to the average between our two measured metal power spectra (black solid line in Figure 6). The effect of the metal correction on the final flux power spectrum is shown in Appendix A.6.

The new power spectrum measurements
The main observational results of this work are presented in Figure 7, where we plot the final Lyα flux power spectrum measured from our data. The values are tabulated in Appendix I and reported as a function of scale for the three redshift bins centered at z = 4.2, 4.6 and 5.0. Solid colored lines and data points represents the power spectrum results without the resolution correction described in 5.6.3, while the corresponding dashed lines are the measurements corrected for finite resolution and pixel size. The 1σ errors are estimated from the bootstrap covariance matrix of the data, corrected and regularized following the procedure described in Section 5.9.
Our measurements at scales log(k/km −1 s) −1 are the first ones made at these redshifts (see Walther et al. 2018a Table 3. List of quasars used for the analysis of the metal power spectrum at z ∼ 4.2. For each object we report the name (column 1) based on the J2000 coordinates of the quasar and the emission redshift (column 2). All the spectra have been taken with the UVES spectrograph (see Boera et al. 2014 andMurphy et al. 2018 Table 3 is compared with the metal power measured at z ∼ 5.4 (red dashed line) from the quasars subsample with 4.5 zem 5.3 of Table 1. The average is given by the black solid line. For comparison, the metal power spectrum computed from the XQ-100 data sample at z ∼ 4.2 (Iršič et al. 2017) is also shown (blue dotted line).
for an analysis at z < 4). At larger scales, however, we can compare, with the results derived from high-resolution spectra by Viel et al. (2013a). This comparison is presented in Appendix C.

Covariance matrix
As demonstrated in previous works (e.g., Viel et al. 2013a;Iršič et al. 2017), the covariance matrix obtained via bootstrapping of a limited data set is necessarily noisy. We therefore regularized the observed covariance matrix using the correlation coefficients estimated from the simulated spectra following an approach similar to the one used by Lidz et al. (2006). We first used the simulations to verify the ability of the bootstrapped errors to reproduce the real statisti- cal variance. For this test, using the 40 h −1 cMpc box simulation S40 − 1z15, we created hundreds of samples of simulated lines of sight that closely reproduce the characteristics of the observational data (see Section 6.2 for details) and we compare the variance computed directly from these realizations with the uncertainty obtained from the bootstrapping of only one synthetic sample randomly chosen. We verify that, as already shown by previous studies (e.g., Kim et al. 2004;Iršič et al. 2013;Palanque-Delabrouille et al. 2013), the bootstrapping technique underestimates the cosmic variance by up to ∼25%, with the discrepancy level increasing towards smaller scales. We therefore increased the observational bootstrapped error at all scales by ∼15-25%, where the correction has been computed separately for each redshift. The elements of the final covariance matrix, C ij , are then computed as: with where C data ii are the diagonal elements of the bootstrapped observational covariance matrix, corrected as previously described, and C sim are the elements of the simulated covariance matrix obtained from the multiple realizations of synthetic lines of sight.
For comparison, Figure 8 shows the simulated and observational covariance matrices for the redshift bin at z = 5. As expected, the bootstrapped matrix is noisier but both the matrices show a similar structure, with correlations increasing towards the smallest scales, log(k/km −1 s) −1.5. This similarity gives us confidence that the simulation model used for the covariance matrix regularization is reasonably capturing the data properties. We note that the off-diagonal correlation structure will depend somewhat on the precise shape of the power spectrum and therefore on the thermal parameters characterizing the model. In Appendix B we verify that the particular choice of the simulation S40 − 1z15 for this analysis is not significantly affecting our final constraints. 6. SIMULATIONS ANALYSIS In this Section we describe how we calibrate and analyze synthetic spectra to create power spectrum models that will be used to fit the observational measurements in Section 7.

Constructing mock lines of sight
To ensure the correct comparison between simulations and observational data we need to produce mock lines of sight with the same resolution and redshift coverage as our observed sample. We first resample and smooth the synthetic Lyα spectra produced from the simulations in Table 2 to match the spectral resolution and the pixel size of the real spectra. We then progressively merge multiple synthetic sections randomly selected from the ∆z = 0.1 simulation snapshot closest to the Lyα forest redshift that we want to cover. We choose an arbitrary starting point along each section, taking advantage of the periodicity of the simulation box. We take into account the mild redshift evolution of the mean flux along the line of sight by rescaling the optical depths such that the global effective Lyα optical depth (τ eff = − ln(F )) in the simulation box follows the relation See Appendix E for details. We note that, while we use Equation 10 to calibrate the optical depth evolution within each simulated line of sight, the overall Lyα mean flux of each redshift bin will be treated as a free parameter in our models. When required for testing purposes (see Appendix A), the spectral noise at the same level of the corresponding observational line of sight is added to the synthetic spectra as well as the same pixel masking. Because the power spectrum computed from the real data is corrected for these systematics, the final power spectrum models used for the MCMC fit are computed without noise or pixel masking.

Modeling the power spectrum
To retrieve the final flux power spectrum for each of the models in Table 2 we average the power computed from hun-dreds of mock data samples. Each measurements has been obtained following a similar procedure to the one described in Section 5, with a few necessary expedients: • Lyα sections: As for the real data we compute the flux contrast estimator (Equation 3) using a 40 h −1 cMpc boxcar rolling mean over the reconstructed line of sight. After the rolling mean is applied we re-divide the line of sight into the original 10 h −1 cMpc sections and use these to compute the power spectra. We verified that discontinuities in the flux on the border between individual sections do not substantially affect the rolling mean.
• Mass resolution and box size corrections: Our 10 h −1 cMpc models with 2 × 512 3 gas and dark matter particles represent a necessary compromise in terms of computational resources (Bolton & Becker 2009) and need to be corrected for small errors due to box size and resolution convergence. Therefore, we rescaled our models by factors obtained from reference simulations with larger box size (40 h −1 cMpc box) and higher mass resolution (2 × 768 3 particles) in the convergence tests presented in Appendix D.

Varying model parameters
To be able to fit the power spectrum measurement of Section 5.8 we need a grid of models that cover the parameter space that we want to explore. In each redshift bin we consider four parameters to describe the power spectrum: the thermal parameters T 0 , u 0 and γ and the effective Lyα optical depth, τ eff . While the large set of simulations, listed in Table 2 spans a wide range of thermal histories, by themselves they are not sufficient to evaluate the power spectrum in all the possible combination of thermal parameters. We therefore use the interpolation scheme described below.
6.3.1. Varying T0 and γ In order to separate the impact of thermal broadening and Jeans smoothing in the power spectrum models, we applied a simple post-processing procedure to the simulated spectra. We recompute the optical depths in each of our models over an extended range of power law T-ρ relationships, with T 0 =[3000-15000 K] in steps of 1000 K and γ=[0.7-1.7] in steps of 0.1.
We note that at z 4 the Lyα forest is mainly sensitive to gas close to the mean density (e.g., Becker et al. 2011). For this reason we do not expect to place strong constraints on γ. We nevertheless treat γ as a free parameter in our fitting code. The impact of T 0 and γ on the flux power spectrum are demonstrated in Figure 3. Note that scales log(k/km −1 s) −0.8 seem to be insensitive to variations in γ while considerable changes in this parameter create minor shifts in the power for scales log(k/km −1 s) −0.8.

Varying τ eff
We rescaled the optical depths in our models to span a wide range of τ eff values. At each redshift the reference value has been obtained from Equation 10 while the entire range of optical depths covered by our models is τ eff =[0.6-2.2] in steps of 0.1. The impact of varying τ eff on the power spectrum is shown in the bottom right panel of Figure 3.

Varying u0
By post-processing our simulations to a common set of thermal parameters we can isolate how the power spectrum depends on the integrated heating. N16 demonstrated that the flux power spectrum at z = 5.0, (averaged over scales −1.5 log(k/km −1 s) −0.8) correlates with u 0 . They further argued that the correlation is strongest when u 0 is integrated between z = [12 − 5], reflecting the timescales over which the Jeans smoothing is sensitive to heat injection. Here we re-evaluate this redshift dependence using our more extended suite of models. For each of the redshift bins at which we compute the power spectrum and each of the scales sensitive to u 0 ( −1.4 log(k/km −1 s) −0.8) we empirically determine the "characteristic" redshift range of integration (∆ z u0 ) for which the power is closest to a one-to-one function of u 0 . The method is demonstrated in Figure 9. We first post-process all of the 10 h −1 cMpc simulations of Table 2 to the same values of T 0 , γ and τ eff . We then fit a power law to kP k versus u 0 , where u 0 is integrated using Eq. 2 over a redshift interval ∆z u0 . The preferred interval,∆ z u0 , is the one that minimize the χ 2 for this fit. Table 4. Fiducial redshifts ranges used to compute the u0 parameters for fitting the flux power spectrum at different redshifts. Redshift bins are indicated in column 1 with the corresponding fiducial u0 redshift intervals in column 2. The characteristic∆ z u0 computed for the scales −1.4 log(k/km −1 s) −0.8 are reported in Figure 10 for the different redshift bins. As expected, the sensitivity of the power spectrum to the previous thermal history varies slightly with the redshift at which the power spectrum is measured. The Lyα structures observed at progressively lower redshifts seem to slowly lose sensitivity to earlier epochs; while the power spectrum measured at z = 5.0 still maintains sensitivity up to z 13, at z = 4.2 the forest traces the thermal history of the gas mainly for z 12. Interestingly, we find that the power spectrum at z = 5.0 is less sensitive to heating happening at z 6, even though the power spectra at z = 4.6 and 4.2 retain sensitivity all the way down to their respective redshifts. We generally expect that the gas density distribution will exhibit some delay in responding to changes in gas pressure. Further investigation revealed that a delay did appear for all three redshifts when peculiar velocities were turned off. The delay increased with the redshift at which the power spectrum was measured, with a delay at z = 5.0 that was larger than the one found with peculiar velocities turned on. This suggests that peculiar velocities may play a role by decreasing the delay between heat injection and a change in the power spectrum. Presumably this occurs because, as the gas is heated, redshift distortions created by accelerating the gas precede changes in the density field. This effect may partly explain the lack of a gap at z = 4.2 and 4.6. For now we adopt these relations as empirical, and leave more detailed physical insights to future work.
Because the∆ z u0 values are generally constant among different scales within the same redshift bin, we adopt average values (black dashed vertical lines in Figure 10) as integration bounds in Eq. 2. The fiducial redshift range over which u 0 is integrated is given in Table 4. We note that our u 0 -kP k relationship, while remarkably tight over scales sensitive to u 0 , do exhibit scatter. In the final MCMC analysis therefore, the amount of scatter about the u 0 -kP k fit at each scale has been included as systematic uncertainty. We note that, while we chose our fiducial redshift ranges to maximize the sensitivity of the power spectrum to u 0 , in principle we could constrain this parameter integrated within any reasonable redshift range if properly accounting for the systematic uncertainty in the u 0 versus kP k fit. 7. THERMAL STATE CONSTRAINTS To obtain constraints on the IGM temperature and integrated thermal history from the observational power spectrum measurements obtained in Section 5, we adopted a Bayesian MCMC approach to measure T 0 , u 0 , γ and τ eff for each of the three redshift bins independently. In this Section S10_1z7 S10_1z9 S10_1z15 S10_1z12 S10_1z19 S10_18z7 S10_18z9 S10_18z12 S10_18z15 S10_18z19 S10_055z7 S10_055z9 S10_055z15 S10_055z19 S10_055z12 S10_1z7 S10_1z9 S10_1z15 S10_1z12 S10_1z19 S10_18z7 S10_18z9 S10_18z12 S10_18z15 S10_18z19 S10_055z7 S10_055z9 S10_055z15 S10_055z19 S10_055z12 Characteristic∆ z u 0 computed for −1.4 log(k/km −1 s) −0.8. For each log(k/km −1 s) on the y axis, the colored horizontal line covers the redshift interval that produces the best fit between the power spectrum and the integrated heating. Each panel shows the results for a different redshift bin. While there is a mild evolution with redshift, the∆ z u 0 values are generally constant within the same redshift bin. We use the average∆ z u 0 indicated by the black dashed vertical lines to compute the fiducial u0 parameters at each redshift.
we present the method and the main findings of this analysis.

The MCMC analysis
We constructed a grid of power spectrum models following the post-processing approach given above, where for a given choice of T 0 , γ, and τ eff the dependence of the power spec-trum on u 0 is derived from the fits described in Section 6.3.3. We then perform a multi-linear interpolation among the grid points of the four dimensional parameter space. We implemented the interpolation scheme using a Bayesian MCMC approach. At each redshift, applying flat priors for all variables, we obtain the set of parameters that maximize a Gaussian multivariate likelihood function: where ∆ is the residual vector between the power spectrum values of the data and the model and C is the N × N data covariance matrix (where N is the number of data points). We tested the interpolation scheme by removing one of the models used for the interpolation and using it to generate mock data (see Appendix F). We found that the parameters u 0 and T 0 are recovered accurately. Small biases (within the 68% uncertainties) appear in the recovered values of γ and τ eff due to their intrinsic degeneracy at large scales and the poor sensitivity of the high redshift power spectrum to γ. Fortunately, however, the relatively weak constraints on these parameters do not bias our results for T 0 and u 0 .
To test the reliability of the best fitting values, for each redshift we ran three independent chains of 2 × 10 5 iterations (half of which are discarded as burn-in) from different randomly chosen initial parameters. We verify that all the chains were converged by comparing the betweenchain and within-chain variances for each parameter using the Gelman-Rubin test.

Results
Figures 11, 12 and 13 display the posterior likelihood distributions for the parameters T 0 , u 0 , γ and τ eff at redshifts 4.2, 4.6, and 5.0 respectively. While the inclusion of small scales (log(k/km −1 s) −1.0) in the power spectrum allows relatively tight constraints on both T 0 and u 0 , some de- Table 5. Best fitting values and marginalized 68% confidence intervals for the fits to our power spectrum measurements. The power spectrum redshift (column 1) is reported along with the best fitting values of T0 (column 2), u0 (column 3), γ (column 4) and τ eff (column 5). generacy between these two variables is still noticeable at all redshifts. This is expected since both of these parameters act on intermediate scales in a similar way. Degeneracies between γ and τ eff increase with redshift with slightly weaker constraints on τ eff obtained towards higher redshifts. As expected, γ shows broad bounds at all redshift (with the 1σ contours covering almost the entire parameter space), reaffirming that the Lyα forest at high-redshifts mainly probes gas around the mean density and is not highly sensitive to the slope of the T-ρ relation. Figure 11, 12 and 13 demonstrate that our measurements of u 0 and T 0 are not highly affected by degeneracies with γ and τ eff . The final results of the MCMC analysis are summarized in Table 5. The temperatures are constrained with ∼15% uncertainties at all redshifts, while the error on u 0 varies from ∼18% for the z = 4.2 and z = 4.6 redshift bins up to ∼30% at the highest redshift, in good agreement with the forecast presented by N16. Our results for τ eff are highly consistent with the measurements of Becker et al. (2013) at z = 4.2. We are somewhat higher at z = 4.6, but all together our constraints appear to bridge the evolution of τ eff at z 4 measured by Becker et al. (2013)  In Figure 14 we show the best power spectrum models compared with the measurement. Visually there is good agreement with the data at all redshifts.
The final temperature constraints at the IGM mean density are presented in Figure 15. Our new results (green points) are compared with the previous measurements of Becker et al. 2011 (gray triangles) at z > 3.5 obtained with the curvature method. We have added the systematic uncertainty for Jeans smoothing estimated by Becker et al. to those data. Our temperature measurements show good agreements with this previous work in the overlapping redshift bins. This accord is significant because we analyzed a largely independent set of quasar spectra with a different method and using a new suite of hydrodynamical simulations. Most significantly, we now explicitly fit for u 0 , removing the systematic uncertainty in T 0 related to Jeans smoothing. Overall our T 0 values are consistent with little evolution over 4.2 z 5.0. Given the known trend of increasing temperatures at z 4 (e.g., Becker et al. 2011;Boera et al. 2014;Walther et al. 2018b) our measurement at z = 4.2 may include a contribution from the initial phase of IGM reheating due to the He II reionization (e.g., Worseck et al. 2011;Syphers & Shull 2014). We consider this possibility in the final part of our analysis, when we use the new thermal constraints to evaluate hydrogen reionization scenarios.
Finally, in Figure 16 we present our first constraints on the integrated thermal history of the IGM. Our u 0 measurements, are plotted at the minimum redshift of the fiducial ranges given in Table 5. As expected, u 0 increases from z = 6 to z = 4.2, reflecting ongoing heat injection after reionization. 8. REIONIZATION CONSTRAINTS Our observational constraints on T 0 and u 0 can, in principle, be used to test any reionization model for which a thermal history can be calculated. While we leave for future work the analysis of extended and more realistic reionization scenarios, in this Section we demonstrate the potential of this approach using semi-analytical models of instantaneous reionization.

Modeling instantaneous reionization
We model the thermal history of instantaneous hydrogen reionization using a semi-analytical approach similar to the one adopted in Upton . To obtain T 0 as a function of redshift we solve the equation describing the temperature evolution of a Lagrangian fluid element at the cosmic mean density, i.e., with ∆ = 1 (e.g. where H is the Hubble parameter and n tot is the total number density of particles (electrons and ions). Equation 12 is valid in the approximation that the number of particles remains fixed, describing well the post-reionization gas. The first term on the right side of Equation 12 takes into account the cooling due to adiabatic expansion while the second term gives the adiabatic heating and cooling due to structure formation. The thermal history at the mean density has been shown to depend weakly on this second term (McQuinn & Upton Sanderbeck 2016); we will therefore ignore it in our calculation. The differences among the models will depend instead upon the third term, which encodes photo-heating (the only heating source considered in these calculations) and radiative cooling processes. We can expand this term as: where dQ photo,X dt is the photo-heating rate of ion X, dQ Compton dt is the Compton cooling rate and R i,X is the cooling rate coefficient for the ion X and cooling mechanism i. Because we are modeling the temperature of the gas at the end of hydrogen reionization, in Equation 13 we will consider only the H I (dominant) and He I photo-heating. As for the cooling term, we include Compton and H II recombination in our calculations. As discussed in McQuinn & Upton Sanderbeck (2016), these represent the relevant cooling processes that shape the temperature evolution. We compute these cooling terms using the rate coefficients provided by Hui & Gnedin (1997). The optically thin photo-heating after reionization is modeled as (e.g., Upton Sanderbeck et al.

2016)
where ν X is the frequency associated with the ionization potential of species X, and γ X is the corresponding approximate power-law index of the photo-ionization cross section, for which we assume γ X = 2.8 for H I and γ X = 1.7 for He I. Equation 14 is valid in the approximation of photoionization equilibrium with an ionizing background that has a power-law specific intensity of the form J ν ∝ ν −α bk . In detail, the photo-heating rate will also depend on α A,X , the case A recombination coefficient associated with the transition from X I→ X for species X ∈ [H I, He I]; on the number density of the speciesX ∈ [H, He]; and on the electron number density n e . To compute the total energy deposited into the gas by photo-heating, u 0 , we just need to consider the third term of Equation 12 (and the first term of Equation 13). The equation to solve for u 0 will then be Because the specific internal energy can be expressed as u = 3 2 k B T 1 m , Equation 15 can be solved as whereρ is the mean mass density.

Instantaneous reionization parameters
We parametrize our models using three numbers: the redshift of instantaneous reionization, z rei , the temperature reached by the IGM during hydrogen reionization, T rei , and the spectral index of the post-reionization ionizing background, α bk . Figure 17 presents the effects on the evolution of T 0 (top row) and u 0 (bottom row) of these three parameters. The first column shows models with the same z rei and α bk but different reionization temperatures. Radiative transfer calculations suggest that temperatures during reionization should reach 17,000 K T rei 25,000 K (e.g., Miralda-Escudé & Rees 1994;D'Aloisio et al. 2018); however, we explored T rei down to 10,000 K and up to 30,000 K, where the upper range is similar to the hottest scenario of short and late reionization considered in D' Aloisio et al. 2018. The second column in Figure 17 demonstrates how changing the timing of reionization influences the histories of T 0 and u 0 . We tested models spanning a range of redshifts from z rei = 5.5 up to z rei = 12. Finally, the third column shows the effect of changing the spectral index of the post-reionization ionizing background. The value of α bk can be connected to the intrinsic spectral index of the sources, α s , via the expression α bk ≈ α s − 3(β − 1), where β is the logarithmic slope of the column density distribution of intergalactic hydrogen absorbers (Upton , which is valid at z 3 when the physical mean free path of 1 Ry photons λ M F P cH −1 . The value of β may vary, but for this analysis we adopt β = 1.3 from Songaila & Cowie (2010).
Here we focus on two cases: reionization driven by starforming galaxies with a soft α bk =1.5 (α s ∼ 2.4, within the commonly adopted range between 1 and 3; e.g., Bolton & Haehnelt 2007;Kuhlen & Faucher-Giguère 2012), and models of quasar-driven reionization with α bk = 0.5 (corresponding to α s ∼ 1.4; e.g., Telfer et al. 2002;Shull et al. 2012). We note that radiative transfer calculations have shown that the temperature increase from the passage of an ionization front may not depend strongly on the spectrum of the ionizing sources (D'Aloisio et al. 2018). We therefore constrain T rei independently from α bk .

Method
To be conservative, the constraints on instantaneous reionization models presented in this paper will be obtained using the thermal parameters in the lowest redshift bin (z = 4.2) only as upper limits because they may be affected by the extra heating due to the He II reionization. For each combination of parameters (α bk , z rei , T rei ) we obtain three likelihood values corresponding to the redshifts of the observational constraints: L model [< T z=4.  Figures 11, 12, and 13. The final probability of each model is computed by multiplying the independent likelihood values obtained at each redshift :

Results
Before presenting the final results, we stress that, unlike previous attempts to constrain reionization using the instantaneous temperature alone (e.g., Theuns et al. 2002;Raskutti et al. 2012) the power of our approach relies on the simultaneous use of measurements of both T 0 and u 0 . Figure 18 demonstrates how these measurements separately constrain the likelihood contours for our galaxy-driven reionization models. In the top panel the 68% and 95% probability contours are shown for the temperature constraints only, while in the middle panel they are given for the u 0 constraints only. The models that better fit the T 0 and u 0 data cover two different but intersecting regions in the T rei vs z rei parameter space. Applying both constraints simultaneously therefore, reduces the allowed parameter space considerably (bottom panel of Figure 18). Figure 19 shows the final 68% and 95% two-dimensional probability contours for models of instantaneous reionization driven by softer (α bk = 1.5; green contours) and harder sources (α bk = 0.5; blue contours). For softer sources the favored models are the ones with z rei ∼ 8 and reionization temperature of 20,000 K T rei 25, 000 K. These temperatures are consistent with the values predicted by radiative transfer models (e.g., Miralda-Escudé & Rees 1994;McQuinn 2012;D'Aloisio et al. 2018). For harder sources the thermal data prefer earlier reionizations and lower temperatures (T rei 20, 000 K). These results are driven by the fact that for harder post-reionization ionizing backgrounds the IGM temperature needs more time to cool in order to match the relatively low values observed at z 5. We note that even lower T rei would be needed to fit the observations if we included the contribution of He II photo-heating, which has been conservatively excluded. Radiative transfer calculations may disfavor T rei 17, 000 K, as this would imply reionization front speeds unexpectedly low even for the early stages of reionization (D'Aloisio et al. 2018). Some of the parameter space in Figure 19 preferred by harder sources may therefore be disfavored on physical grounds.
In Figure 17, we also compare our constraints to the empirically calibrated UV background model recently presented by Puchwein et al. 2018. Their predictions for the instantaneous temperature are larger than our measurements, and are inconsistent with our new constraints at around the 2σ level. Furthermore, the cumulative energy input into the IGM at mean density also exceeds our constraint at z = 6, and is again inconsistent at around 2σ. This suggests that there is slightly too much IGM heating IGM at z > 6 in the fiducial Puchwein et al. (2018) model, under the assumption of a Λ-CDM cosmology. This difference will be exacerbated further for models where AGN provide a substantial contribution to the photon budget for reionization.
Finally, we compare our results to constraints on the redshift of instantaneous reionization derived from the most recent Planck measurements of the Thompson scattering optical depth (Planck Collaboration et al. 2018). We marginalized over T rei and α bk (where α bk was allowed to vary from 0.5 to 1.5) to obtain the 1D probability distribution on z rei from the IGM thermal history. Figure 20 compares our marginalized constraints on z rei (green solid line) to those derived for an instantaneous reionization from the Planck baseline optical depth constraint τ e = 0.0544 ± 0.0073 (based on P lank TT,TE,EE+lowE+lensing; Planck Collaboration et al. 2018) (red solid line). The two distributions are broadly consistent and have comparable constraining power, with z rei 8.5 +1.1 −0.8 from the thermal history and z rei 7.7 +0.7 −0.7 from the Planck results. The combined probability distribution (blue dashed line) give z rei 8.1 +0.5 −0.5 . While we emphasize that the instantaneous reionization adopted in this Section is simplistic, the proof of concept presented here demonstrates the potential of our observational constraints to inform reionization scenarios. 9. CONCLUSIONS In this work we have presented the first simultaneous constraints on the instantaneous temperature and integrated thermal history of the IGM at z > 4 and demonstrated how these results can be used to test different scenarios of hydrogen reionization. We have utilized a sample of 15 Keck/HIRES and VLT/UVES high-resolution and high-C/N spectra to obtain new measurements of the Lyα forest flux power spectrum over redshifts 4.0 z 5.2, for the first time pushing the measurement down to the smallest scales currently accessible to high-resolution quasar spectra at these redshifts (log(k/km −1 s) ≤ −0.7). We fit the new flux power spectra to obtain robust constraints on the instantaneous IGM temperature, T 0 , and integrated energy input per unit mass, u 0 , marginalizing over the slope of the T-ρ relation and the effective optical depth, and assuming a Λ-CDM cosmology.
In agreement with previous results from the curvature method (Becker et al. 2011), we find temperatures of T 0 ∼ 7000 − 8000 K and no strong temperature evolution over 4.2 z 5.0. Our first constraints on u 0 show a significant increase from u 0 ∼ 4.5 eV m −1 P at z > 6 to 7.1 eV m −1 10000 12500 15000 17500 20000 22500 25000 27500 30000 T rei [K] 7 8 9 10 11 12 z rei bk = 0.5 bk = 1.5 Figure 19. Constraints on instantaneous reionization models for two choices of the post-reionization ionizing background spectrum. The twodimensional 68% and 95% probability contours in the Trei vs zrei parameter space are reported for softer (α bk = 1.5; green contours) and harder ionizing (α bk = 0.5; blue contours) backgrounds.

ACKNOWLEDGMENTS
We thank Simeon Bird and Anson D'Aloisio for helpful conversations. EB and GDB were supported by the National Science Foundation through grant AST-1615814. JSB acknowledges the support of a Royal Society University Research Fellowship. This work is based on observations made at the W.M. Keck Observatory, which is operated as a scientific partnership between the California Institute of Technology and the University of California; it was made possible by the generous support of the W.M. Keck Foundation. It also includes observations made with ESO Telescopes at the La Silla Paranal Observatory under program ID 092.A-0770. EB thanks Michael Murphy and the Centre of Astrophysics and Supercomputing at Swinburne for granting the access to the Swinburne supercomputer facility during the preparation of this work. The hydrodynamical simulations used in this work were performed with supercomputer time awarded by the Partnership for Advanced Computing in Europe (PRACE) 8th Call. We acknowledge PRACE for awarding us access to the Curie supercomputer, based in France at the Tres Grand Centre de Calcul (TGCC). This work also made use of the DiRAC High Performance Computing System (HPCS) at the University of Cambridge. These are operated on behalf of the STFC DiRAC HPC facility. This equipment is funded by BIS National E-infrastructure capital grant ST/J005673/1 and STFC grants ST/H008586/1, ST/K00333X/1. We thank Volker Springel for making P-GADGET-3 available. APPENDIX A. SYSTEMATIC EFFECTS In this Appendix we review some of the steps of our analysis to check and quantify possible systematic uncertainties arising from the specific strategies adopted.
A.1. Rolling mean As described in Section 5.1 we computed the flux contrast δ F of Equation 3 using a rolling mean along the entire Lyα forest region. The application of this technique on both simulated and observational lines of sight guarantees a fair comparison between models and the real measurement when the continuum level is unknown, but it may introduce possible bias when comparing our power spectrum with previous works in which the power was computed from continuum-normalized spectra. Using the simulations we tested different averaging functions and window sizes in order to minimize the impact of the rolling mean on the power spectrum at the relevant redshifts and to verify the ability of the rolling mean to capture continuum fluctuations. As demonstrated below, we found that a 40 h −1 cMpc boxcar rolling mean is able to recover the power at all relevant scales even in the presence of continuum fluctuations. Figure A1 shows, for our three redshift bins, the comparison between the power spectrum computed from simulated data sets using the rolling mean technique (green dashed line) and using a fixed mean flux (black solid line). Both the synthetic samples of lines of sight used in this test have been created following the procedure described in Section 6. For the rolling mean model, we first imposed on each of the lines of sight a random continuum selected from the real continua fitted for the XQ-100 survey (López et al. 2016). We then run the 40 h −1 cMpc boxcar rolling mean directly on the total Lyα + continuum flux. Differences between the two power spectra are shown in the bottom panel of each plot and compared with the statistical error characterizing our observational sample (green shaded region; see Section 5.9). We note that, at all redshifts, the discrepancies between the two models always lie well within the statistical error, with systematic uncertainties σ roll typically 0.20σ stat . We therefore do not expect our results to be sensitive to this averaging choice.

A.2. Windowing effects
As explained in Section 5.5 we compute the observational power spectrum in 20 h −1 cMpc sections of Lyα forest. Dividing the spectra into many small regions may introduce artificial excess power at intermediate and small scales due to a windowing effect. This effect does not not appear in the simulations because of their periodicity.
In Figure A2 we present the effect on the flux power spectrum of dividing the spectra into smaller sections. At each redshift bin, the power computed from the largest 40 h −1 cMpc simulation box (baseline: solid black line) is compared with the power computed from the same simulation but dividing each of the native synthetic spectra into sections of 10 h −1 cMpc (blue dotdashed line) and 20 h −1 cMpc (green dashed line). Changes in the power are reported as fractions of the baseline power spectrum values in the bottom panel of each plot. While cutting the spectra into 10 h −1 cMpc sections introduce an excess of power at the small scales (log(k/km −1 s) −1.1) of ∼20-25%, the windowing effect effect for the 20 h −1 cMpc sections is less significant, with variations in the power 8% at all scales. We therefore opted for this latter section size in our analysis.

A.3. Noise subtraction
In this Appendix we test the noise power subtraction method described in Section 5.6.2 using synthetic datasets generated from simulations. For each synthesized line of sight we use the error and the flux arrays of one of our observed spectra to fit a linear correlation between the signal and the noise level. Using these correlations we then add noise to the simulated samples in a flux-dependent way. We construct synthetic samples of lines of sight following the procedure described in Section 6.2 using our 20 h −1 cMpc simulation box and compute the power spectrum with and without adding noise. In Figure A3 we show the noisy (red dot-dashed line) and noiseless (baseline; black solid line) power spectra obtained for each redshift bins. We finally applied the noise power subtraction method to the noisy model and compare the corrected power spectrum (green dashed line in Figure A3) to the noiseless baseline. The noiseless power is recovered with errors of 2%. This suggests that this step of the data analysis is not introducing relevant systematic effects in the final results.

A.4. Instrumental resolution
In this Appendix we review the effect of uncertainties in the instrumental resolution when smoothing the synthetic spectra to match the resolution of the observed data. We calibrate the synthetic Lyα forest spectra using different instrumental resolutions, taking our nominal values of FWHM = 6 km s −1 as a baseline. Figure A4 shows the variations in the power spectrum expected for a change of +10% (red-dot-dashed line) and −10% (green dashed line) in the observed spectral resolution. We note that the power changes by 5% at all scales (see bottom panel of each plot).

A.5. Masking correction function
In this Appendix we analyze the systematic uncertainties arising from the choice of a particular simulation in computing the masking correction, C m (k), described in Section 5.6.1. For the fiducial masking correction we adopted the simulation S20 − 1z15 of Table 2. To demonstrate that this particular choice of model (with T 0 ∼ 7500 K and γ ∼ 1.5 at the redshifts of interest) does not relevantly affect the final power spectrum measurements, we compare our fiducial results with the measurements obtained when computing the masking correction from post-processed runs with different values of T 0 and γ. In particular we tested two extreme cases: a colder model, C test1 m (k), with T 0 ∼ 4000 K and γ ∼ 1.5, and an isothermal model, C test2 m (k), with T 0 ∼ 7500 K and γ ∼ 1.0.  Figure A5 shows the power spectrum obtained using different masking correction functions for the redshift bins considered in our analysis. In all the cases, computing the masking correction using models with different thermal parameters affects the small scales only mildly, with variations in the power of 4% for scales log(k/km −1 s) −1.1. We therefore do not expect that uncertainties in the masking correction function will relevantly affect our final constraints.

A.6. Metals correction
The Lyα forest region is affected by narrow metal line contaminants that may increase the flux power at the small scales. To correct for the metal contribution we subtract the metal power evaluated in Section 5.7 from the final power spectrum measurements. The effect of the metal subtraction on the power spectrum measurements is presented in Figure A6 for our three redshift bins. As expected, only the small scales (log(k/km −1 s) −1.0) are significantly affected. The variations are relatively small (<10%) and well within the statitical errors of our final measurements.

B. COVARIANCE MATRIX UNCERTAINTIES
In this Appendix we test how strongly the choice of simulation model for the covariance matrix regularization affects the final constraints on T 0 and u 0 . In principle, off-diagonal coefficients of the covariance matrix will mildly depend on the shape of the power spectrum and therefore on the thermal parameters characterizing the models. Figure B7 shows the results for T 0 and u 0 obtained when using a covariance matrix derived from the fiducial model S40 − 1z15 (Covariance matrix − 1) and from the S40 − 1z9 model (Covariance matrix − 2). Both the nominal results and errors estimates shows only modest ( 3%) changes. C. COMPARISON WITH VIEL ET AL. (2013) In this Appendix we compare our power spectrum measurements to those from Viel et al. (2013a). While Viel et al. (2013a) included somewhat lower resolution spectra from Magellan/MIKE, for consistency we limit our comparison to their measurements obtained with the HIRES spectrograph.
We consider two main differences between our new estimates and the older Viel et al. (2013a) power spectra: flux contrast estimators and cosmic variance. First, Viel et al. (2013a) normalized the spectra using a spline continuum estimate and then computed the mean flux in large sections of data. To test whether this affected the results we recomputed the Viel et al. (2013a) power spectra by applying the procedure described in Section 5 to the Viel et al. data, using the same sections of spectra. Note that given the somewhat larger scales probed by Viel et al. we do not expect any relevant effect due to noise or resolution. We verified that we were able to reproduce the previous results with good precision and no significant bias was introduced by the different flux contrast estimators.
We next consider whether sample size may be playing a role. In each of our redshift bins the number of independent lines of sight (LOS) contributing to our measurement is always more than double the number in Viel et al. (2013a). In particular, for the z = 4.2 bin we used 12 LOS versus 4 LOS in Viel et al. (2013a), for the z = 5.0 bin we used 12 versus 5, while at z = 4.6, where the largest differences between the power spectra are seen, we used 15 quasars versus 5 in the previous work.
To determine whether cosmic variance can explain the discrepancy between our results and those of Viel et al. (2013a) we computed the power spectrum from subsamples of our data. Figure C8  agreement is slightly worse but still largely within the 95% region. We note that the errors in different k bins are correlated, as discussed in Section 5.9. We further note that our sample size is still relatively modest, and our Monte Carlo technique is likely to underestimate the cosmic variance at the 95% level.
D. NUMERICAL CONVERGENCE In this Appendix we examine the convergence of the Lyα flux power spectrum in the simulations used in this work. We used multiple simulations with the thermal history model 1 − z15 in Table 2. All of the synthetic Lyα forest lines of sight were produced using the procedure described in Section 6.1. The flux power spectrum was then computed as in Section 6.2.
The tests are shown in Figure D9, where the convergence with box size for a fixed mass resolution (M gas = 9.97 × 10 4 h −1 M ) is reported in the left column and the convergence with mass resolution for a fixed box size (L = 10 h −1 cMpc) is displayed on the right. The results show that a small correction for both box size and mass resolution needs to be applied to the power spectra derived from our nominal 10 h −1 cMpc simulations. When increasing the box size, the power decrease up to ∼15%, particularly at very small scales (log(k/km −1 s) > −1). In contrast, when increasing the mass resolution the power towards small scales (log(k/km −1 s) −1.4) increase progressively, reaching a correction of ∼15% at z = 5.0 for our nominal mass resolution. Because the corrections are in opposite directions the final scale factor for box and mass resolution convergence is 5% at all scales. We further verified, using lower mass resolution simulations that increasing the box size up to L = 160 h −1 cMpc does not introduce additional power at the scales considered in this work. E. EFFECTIVE OPTICAL DEPTH EVOLUTION As explained in Section 6.1, when constructing mock samples we account for the mild redshift evolution of the mean flux along the line of sight by initially rescaling the effective Lyα optical depth using Eq.10. In this Appendix we show how the choice of this relation for the τ eff evolution, while somewhat arbitrary, represents a reasonable transition between the measurements of For comparison, we also show the constraints on the optical depth obtained from our MCMC chains and reported in Table 5 (green points). We recover values broadly consistent with Eq.10 for the three redshift bins.

F. INTERPOLATION UNCERTAINTIES
In this Appendix we describe the test performed to verify the interpolation scheme implemented in the MCMC analysis. For this we remove one model from the interpolation grid (in the example below we excluded the model S10 − 0.55z9 of Table 2) and test how well the thermal parameters for this model are recovered when it is used to generate artificial data. Figure F11 displays the correct values (red squares) overlaid on the parameters constraints obtained from the MCMC analysis at z = 5.0. The thermal parameters T 0 and u 0 are recovered accurately by the analysis, with discrepancies 5%. The correct values of γ and τ eff fall within the 1σ probability distribution, although the peaks of their posterior distributions are somewhat biased towards lower values. As expected, the power spectrum at these redshifts is not sensitive enough to break the degeneracy  Figure A6. Effect of the metal correction on the Lyα power spectrum measurements for the three redshift bins considered in this work. In each panel the average metal power computed in Section 5.7 (black solid line) is subtracted from the total power (colored solid lines and data points) to obtain the corrected measurement (colored dashed lines). The effect of metals at small scales is more relevant in the lowest redshift bin because the amount of Lyα absorption is lower. Nevertheless, the changes in power due to the metal contribution, reported in the bottom panels as fraction of the total power, is always <10% and well within the statistical errors (green shaded regions).
between γ and τ eff . The poor constraints on γ, however, do not affect our constraints on T 0 and u 0 . We have also tested how well our interpolation scheme was able to recover the thermal parameters of a completely independent model. We fit the power spectrum extracted from a simulation using the UV background model of Puchwein et al. (2018) assuming non-equilibrium ionization, and verified that the values of T 0 and u 0 (see Figure 17) were recovered within the 68% uncertainties given by our MCMC method. This gives some reassurance that, as intended, our results do not significantly depend on the specific thermal histories adopted for the modeling in this work.
G. THERMAL HISTORIES OVERVIEW For illustrative purposes we show in Figure G12 the evolution of the thermal parameters u 0 and T 0 for all the simulations listed in Table 2. While these models are not meant to represent realistic reionization scenarios, they provide a wide range of thermal histories and can be used to explore the thermal state of the IGM in a relatively model-independent way.
H. INTEGRATED HEATING VS REAL SPACE FLUX CUTOFF SCALE In this Appendix we show the relationship between u 0 and the characteristic real space flux power cutoff scale, λ P , as defined by Kulkarni et al. (2015). At each redshift we compute λ P for all the models of Table 2 following the method described in Kulkarni et al. We then fit a relationship between the corresponding u 0 computed over the fiducial redshift range. Figure H13  best fitting relationship between u 0 and λ P for the redshifts considered in this work. While a certain level of scatter about the fit is present at all redshifts, there is clearly a positive correlation between the two variables. Using the current constraints on u 0 we can then attempt to obtain an rough estimate of the λ P (green square with error bars).
We note these estimates for λ P are smaller than the recent constraints at 2 < z < 4 from quasar pairs presented by Rorai et al. (2017a), suggesting that the pressure smoothing scale increases toward lower redshift as the IGM is photo-heated further (cf. the instantaneous temperature measurements presented by Becker Figure D9. Convergence of the flux power spectrum with box size and mass resolution for the redshifts relevant in this work. Power spectra for this test were computed from noise-free mock lines of sight using the procedure described in Section 6.2. The left column shows the convergence with box size at a fixed mass resolution (Mgas = 9.97 × 10 4 h −1 M ), while the right column displays the convergence with respect to the highest mass resolution model (S10−1z15−768 in Table 2) for a fixed box size (L = 10 h −1 cMpc). The change in power is plotted as a scale factor with respect to the reference model in the bottom section of each panel. Both resolution and box size corrections have been applied to our fitting models.
I. LYα FLUX POWER SPECTRUM MEASUREMENTS In Tables 6 through 8 we report the power spectrum measurements obtained in this work for the three redshift bins centered at z = 4.2, 4.6 and 5.0. In each Table the values of the power spectrum obtained with (column 3) and without (column 2) instrumental resolution and pixel size correction (R.C.) are reported for each scale (column 1). The corresponding 68% uncertainties are shown in column 4. The covariance matrices for the power spectrum measurements may be found in the on-line version of this article.    This work z = 5.0 Figure H13. Relationship between the integrated heating per unit mass u0 and the real space flux power cutoff scale λP of Kulkarni et al. (2015). Colored points correspond to the different simulations of Table 2. While some scatter about the fit (black dashed line) is always present, there is a significant positive correlation between the two variables. For reference, along the fit at each redshift we plot our value of u0 with the corresponding value of λP (green squares with 68% errors).