New approach for precise computation of Lyman-alpha forest power spectrum with hydrodynamical simulations

We present a suite of cosmological N-body simulations with cold dark matter and baryons aiming at modeling the low-density regions of the IGM as probed by the Lyman-$\alpha$ forests at high redshift. The simulations are designed to match the requirements imposed by the quality of BOSS and eBOSS data. They are made using either 2x768$^3$ or 2x192$^3$ particles, spanning volumes ranging from (25 Mpc.h$^{-1})^3$ for high-resolution simulations to (100 Mpc.h$^{-1})^3$ for large-volume ones. Using a splicing technique, the resolution is further enhanced to reach the equivalent of simulations with 2x3072$^3$= 58 billion particles in a (100 Mpc.h$^{-1}$)^3 box size, i.e. a mean mass per gas particle of 1.2x10$^5$M_sun.h$^{-1}$. We show that the resulting power spectrum is accurate at the 2% level over the full range from a few Mpc to several tens of Mpc. We explore the effect on the one-dimensional transmitted-flux power spectrum of 4 cosmological parameters ($n_s, \sigma_8, \Omega_m, H_0$) and 2 astrophysical parameters ($T_0, \gamma$) related to the heating rate of the IGM. By varying the input parameters around a central model chosen to be in agreement with the latest Planck results, we built a grid of simulations that allows the study of the impact on the flux power spectrum of these six relevant parameters. We improve upon previous studies by not only measuring the effect of each parameter individually, but also probing the impact of the simultaneous variation of each pair of parameters. We thus provide a full second-order expansion, including cross-terms, around our central model. We check the validity of the second-order expansion with independent simulations obtained either with different cosmological parameters or different seeds. Finally, a comparison to the one-dimensional Ly-$\alpha$ forest power spectrum obtained with BOSS shows an excellent agreement.


Introduction
In the intergalactic medium, light is absorbed at the Lyman-α absorption wavelength λ Lyα ∼ 1216 Å by neutral hydrogen. Combined with cosmological redshifting, it produces an absorption spectrum which is observed on any background source as a map of transmission fraction as a function of redshift [2]. For light sources at sufficiently high redshift for the absorption of the intergalactic matter to be sufficiently strong, the continuous nature of the absorption spectrum is easily observable as the Lyman-α forest. Although this spectrum can be seen as a series of merged absorption lines, simulations have shown that it is in reality a map of density fluctuations in the intervening intergalactic medium seen in redshift space, with peaks of absorption at the density peaks of the absorbing gas [3,4]. Moreover, the fluctuations in the Lyman-α forest absorption can be used as a tracer of the varying density of intergalactic gas expected from the growth of structure from primordial fluctuations in the Universe [5]. An intergalactic medium heated exclusively by photo-ionization can be modeled with hydrodynamic simulations [6][7][8][9][10][11] and the physics at play in this model is well understood. However, mechanisms such as radiative transfer effects during hydrogen and helium reionisation [12] or the mechanical effects of galactic winds and quasar outflows can quickly complicate this simple picture, making simulations even more useful.
Small numbers of high-resolution spectra were first used to measure the Lyman-α forest power spectrum: 1 Keck HIRES spectrum [5], 19 spectra from the Hershel telescope on La Palma or the AAT [26], 8 Keck HIRES spectra [18], a set of 30 Keck HIRES and 23 Keck LRIS spectra [22], or a set of 27 high resolution UVES/VLT QSO spectra at redshifts ∼ 2 to 3 [11,27,28]. The Sloan Digital Sky Survey [29] lead to a major breakthrough providing a much larger sample of 3035 mediumresolution (R = λ/∆λ FWHM ≈ 2000) quasar spectra for the measurement of the Lyman-α forest power spectrum by McDonald et al. [30]. The large number of observed quasars allowed detailed measurements with well characterized errors of the power spectrum up to larger scales, probing the linear regime and providing cosmological constraints [24,31].
The next step is carried out by the Sloan Digital Sky Survey III [32] through the Baryon Oscillation Spectroscopic Survey (BOSS, Dawson et al. [33]). Quasars at redshift z > 2, which are useful for Lyman-α forest analyses, are specifically targeted, leading to a much higher number of such quasar spectra than in previous surveys (Dawson et al. [33] and references therein). It thus allows a measurement of the Lyman-α power spectrum in both three-dimensional and one-dimensional redshift space. The 60,000 quasars spectra with Lyman-α forest absorption [34,35] of the Data Release 9 [36] have already permitted the measurement of the BAO peak position and new constraints on the history of the expansion of the universe [37][38][39] using the three-dimensional power spectrum. A measurement of the one-dimensional power spectrum P 1D with a significant improvement over previous studies in the achieved precision has also been conducted [1]. Other background sources, such as Lyman-break galaxies, are also being investigated for a dense mapping of the Lyman-? forest [40].
Whereas the measurement of the three-dimensional power spectrum uses only information from the flux correlation of pixel pairs in different quasar spectra and thus provides information on rather large scales, the one-dimensional power spectrumP 1D , defined by uses the correlation of pixel pairs on the same quasar spectrum and thus provides a complementary, useful information on smaller scales that are fundamental to constrain the physical parameters of the Lyman-α forest. The one-dimensional P 1D is probing scales at the transition from linear to nonlinear regime. Therefore, cosmological simulations are required to provide insight on the non-linear physics ef the intergalactic medium on the small scales probed by P 1D . Such simulations are then used to constrain various cosmological and astrophysical parameters that have an effect on the power spectrum [11,25,31,[41][42][43][44].
Here, we present a set of 28 cosmological smoothed particles hydrodynamics (SPH) and Nbody simulations that reproduce the impact on the one-dimensional matter power spectrum of the values taken by the most relevant cosmological and astrophysical parameters. Only the baryonic particles undergoes a SPH treatment, i.e. they receive an additional hydrodynamic acceleration, and their internal entropy per unit mass is evolved as an independent thermodynamic variable. The re-quirements in terms of box size, resolution and redshift coverage of our simulations were derived from the Data Release 9 quasar catalogue [34,36] of the Baryon Oscillation Spectroscopic Survey [33]. We extrapolate these requirements so that this suite of simulations may also be used for future spectroscopic surveys such as eBOSS 1 (planned for 2014-2018) or DESI 2 [45] (2018-2023). The full suite of simulations will be made available upon request to the authors.
The outline of the paper is as follows. In section 2 we describe our grid and the values chosen for the different parameters we varied. In section 3, we present the simulations pipeline, along with our solutions to issues such as the generation of the initial conditions or the radiative cooling and heating processes that occur in the intergalactic medium (IGM). In section 4 we present tests that were made to determine the required characteristics of our simulations in the light of our goals. We describe, in section 5, the splicing technique we apply in order to obtain simulations with the desired resolution and box size. We demonstrate the validity of our grid approach and present final discussions on this suite of simulations in section 6. Conclusions and perspectives are given in section 7. A recapitulation of all the simulations performed for this study is given in appendix A.

Simulation grid
Ideally, in order to derive confidence intervals on each parameter of a cosmological model with eight to ten free parameters, one would like to compute theoretical predictions for thousands of models, exploring most of the parameter space. Statistical frameworks have been studied to optimize the precision of the model for a reduced number of simulations, such as Latin hypercube sampling [46]. While this method is superior to a random sampling of the parameters for instance as regards the attained precision [47], it still requires a large number of simulations. Latin hypercube sampling has only been tested so far to predict the power spectrum on large-scales, using low resolution simulations, of order 128 3 particles for a 450 Mpc.h −1 box [48,49].
When dealing with Lyman-α data, running large numbers of simulations is not possible due to the high execution time of each hydrodynamical simulation. Hence, people have developed various approximate methods in which a restricted number of simulations is used either to calibrate a fluxto-matter power-spectrum bias function or to Taylor expand the flux power-spectrum with respect to cosmological parameters in the vicinity of a best-fit model. For cosmological predictions of the power spectrum in the Lyman-alpha regime where hydrodynamical simulations are required, the grid approach as presented in [50] is generally adopted (cf. [51] for instance for a recent application). This is the method we have selected for this work.
The second-order Taylor expansion about our best-guess model is given in Eq. 2.1: With n parameters, the total number of simulations required to get the Taylor expansion coefficients is 1 + 2n + n(n − 1)/2, where the terms account for, respectively, the central (or best-guess) model, two other values of each parameter to derive the first and second-order derivatives, and the simultaneous best guess Figure 1. Illustration of the required grid for a second-order Taylor expansion in a two-dimensional parameter space.
variation of each pair of parameters to compute the cross derivatives (cf. figure 1). With this lattice, all derivatives are approximated to second order except the cross derivatives which are approximated to first order. This approximation is justified by the fact that the parameters are reasonably decoupled, and it allowed us to reduce the CPU time consumption since second-order cross derivatives would require additional n(n − 1)/2 simulations.

Simulation variable parameters
To model of the physics of the Universe, we introduced two categories of parameters that are varied in the simulations: cosmological parameters that describe the cosmological model in the simplest case of ΛCDM assuming a flat Universe with mass-less neutrinos, and astrophysical parameters that model the astrophysics within the IGM and the relation between temperature and density of the gas. A summary of all simulations performed to compute the coefficients of the Taylor expansion is given in appendix A.

Cosmological parameters
This first category contains four parameters: the amplitude of the matter power spectrum σ 8 , the spectral index of primordial density fluctuations n s , the matter density Ω m and the Hubble constant H 0 .
The values for our central model are in agreement with the latest best-fit values from Planck [52], which we recall in table 1.
We chose the range of variation for these parameters so as to include other recent constraints from the Wilkinson Microwave Anisotropy Probe seven years data [53], the South Pole Telescope data [54] and the SuperNova Legacy Survey three year data [55,56], thus taking into account the fact that results from Planck for H 0 (respectively Ω m ) are low (respectively high) compared to other measurements. Central value and range for each of the cosmological parameters are given in

Astrophysical parameters
This second category includes two redshift-dependent parameters that describe the temperaturedensity relation of the IGM for ρ/ ρ ≤ 10: where ρ is the baryonic density, T 0 (z) is a normalization temperature and γ(z) a logarithmic slope.
In the post processing step, we also have the possibility to modify the effective optical depth τ eff = − ln ( F ) = − ln e −τ , where F is the flux and τ is the optical depth. We can thus test the dependence on the mean flux.
In the absence of a clear consensus on the heating history of the IGM, we took the T (ρ) measurements from Becker et al. [57] assuming γ = 1.3 as our central model, and we chose a wide variation around these values so that other recent measurements [58][59][60] fall into the explored range. The evolution with redshift of γ(z) and T 0 (z) in our simulations is therefore designed to reproduce the T (ρ) measurements presented by Becker et al. [57] through an adaptation of the cooling routines in the simulation code. Thus we only need to fix those two parameters at a given redshift, in our case z = 3.0, which corresponds to the central redshift of our study. In practice, we do not set T 0 (z = 3) and γ(z = 3) but instead use two internal code parameters, AMPL and GRAD, that alter the amplitude and density dependence of the photo-ionization heating rates, such that f = AMPL × δ GRAD × i where 's are the heating rates and δ is the over-density. T 0 and γ are evaluated after the simulations have run, as explained at the end of section 3. Given the one-to-one correspondence between (T 0 , γ) and (AMPL, GRAD), we prefer to keep on quoting T 0 and γ since these parameters have a physical meaning and can be compared to other studies.

Grid values
The central values and variation ranges of the parameters of our study are summarized In table 2. With six varying parameters, this represents a total of 28 cosmological simulations in our grid.

Requirements
We base our minimal requirements for the resolution and box size of our simulations on the largest currently-available spectroscopic survey: SDSS-III/BOSS [33]. Those requirements are in part determined by the resolution of the spectrograph and the extension of the Lyman-α forest data. In addition, because the 1D power spectrum results from an integral over the 3D power spectrum up to k = ∞ (cf. Eq. 1.1), the resolution of the simulation has to be of the size of the smallest structures in the transverse direction. For structures in local hydrostatic equilibrium, this would be the Jeans scale, of order a few 100 kpc at z = 3. In an SPH approach, over-dense regions are sampled with much higher spatial  Table 2. Central values and variation ranges of the cosmological parameters for our simulation grid.
resolution than average, so the difficulty does reside there. Under-dense regions, on the other hand, might not necessarily be in local hydrostatic equilibrium. The decisive solution to ensure that the simulations do resolve the relevant structures is therefore to perform a convergence test. We present the output of such a test in Sec. 4. We describe below a first estimate of the simulation requirements driven by instrumental issues and experimental data.
The quasar coadded spectra provided by the SDSS pipeline [61] are computed with a constant pixel width of ∆v = 69 km s −1 . The largest possible mode is determined by the Nyquist-Shannon limit at k Nyquist = π/∆v = 4.5 × 10 −2 (km/s) −1 . For the other bound, the smallest k-mode is driven by the extension of the Lyman-α forest which lies between the Lyman-α and Lyman-β emissions respectively at 1216 Å and 1026 Å. The minimum k-mode one can obtain must take into account that the exploitable Lyman-α forest is smaller than the separation of the two emission peaks due to their respective widths. In addition, instrumental constraints often make the first and last theoretical modes very difficult to obtain with reasonable precision from data. McDonald et al. [30] used the region 1041 Å < λ rest < 1185 Å which corresponds to ∆v 4 × 10 4 km s −1 and k min = 1.6 × 10 −4 (km/s) −1 . They restricted their analysis however to k min = 1.4 × 10 −3 (km/s) −1 and k max = 1.8 × 10 −2 (km/s) −1 . Using more recent BOSS data, Palanque-Delabrouille et al. [1] computed the 1D power spectrum from forest lengths corresponding to a third of the total available range in order to restrain the redshift span to ∆z = 0.2 at most. This led to k min ∼ 1.0 × 10 −3 (km/s) −1 . We therefore consider simulations that should cover the minimal range 1 × 10 −3 (km/s) −1 < k < 2 × 10 −2 (km/s) −1 , which corresponds approximately to 0.
In numerical simulations, the two important parameters are the size of the box L that determines the smallest k-mode, and the ratio N 1/3 /L, where N is number of particles, that drives the largest k-mode. One may note that due to the computational algorithms used nowadays in simulations, such as smooth-particles hydrodynamics (SPH) or adaptive mesh refinement (AMR) in which "resolution follows density", particle spacing in regions of interest will be significantly smaller than L/N 1/3 . However, we will use this approximation as it reflects a worst-case scenario and is thus fairly conservative.
For a simulation evolving N particles in a box of size L, the smallest k-mode is given by k min = 2π/L, and the Nyquist limit by k Nyquist = (2πN 1/3 )/(2L). Therefore, the theoretical minimum requirement for a simulation to reproduce the BOSS DR9 data precision is a box of about 100 Mpc h −1 with ∼ 100 3 particles of each type. These theoretical requirements will be refined with the convergence tests provided in section 4. In particular, we will see that many more particles are needed to achieve the required resolution.  Figure 2. Our simulation pipeline: red circles represent input from the user, blue rectangles are softwares and scripts and yellow ellipses stand for outputs from these softwares.

Pipeline
All the components of our simulation work flow are represented on figure 2. The first part of the pipeline is the production of the initial condition snapshot. This is done in the linear approximation with perturbations treated up to second order. The simulations are then performed using both N-body and hydrodynamic (SPH) treatments. The post-processing stage takes the result of the simulations and computes the power spectra that will be compared to data through the Taylor expansion described earlier.
The products of our suite of simulations are obtained at 13 predefined redshifts, equally spaced every ∆z = 0.2 from z = 2.2 to 4.6. Our selection of redshifts reflects the possibilities of current and foreseen large-scale spectroscopic surveys. The lower-bound results from the UV cut-off of CCDs at λ ∼ 350 nm that prevents the observation of Lyman-α below z ∼ 2.2. The upper bound results from the quasar luminosity function that peaks near z ∼ 2 and drops significantly at z > 3. The density of QSOs at z > 4 is of order 2 per square degree to a limiting magnitude g < 23 as expected for the future DESI survey. This is less than an order of magnitude smaller than at z ∼ 2 [62].

CAMB
The Code for Anisotropies in the Microwave Background (CAMB) 3 [63] is a numerical Boltzmann code written in Fortran 90. It is a parallelized line-of-sight integration code developed from CMBFAST [64] and COSMICS [65], which is widely used (and thus tested) to calculate not only the lensed cosmic microwave background temperature and polarization spectra but also linear matter power spectra for different species of particles (in our case baryons and dark matter).
CAMB is here used to compute the transfer functions and linear power spectra that will be used in the next step to compute the initial displacement of particles.

2LPT
All our simulations are tuned to obtained a given σ 8 at z = 0. This is done with the spnorm Python script that rescales the total matter power spectrum P CAMB S issued from CAMB before generating the initial conditions such that where z i is the redshift at which the initial conditions are run, σ CAMB 8 is the value of σ 8 obtained with CAMB for a chosen cosmological model, and This allows us avoid rerunning CAMB with the corresponding initial spectral amplitude A s . The rescaled power spectra are then used as input to the 2LPTIC 4 code that provides initial conditions based on second-order Lagrangian Perturbation Theory (2LPT), rather than first-order (Zel'dovich approximation). The choice of second-order precision initial conditions is motived by the discussion in Crocce et al. [66] and the fact that we also run cosmological simulations including neutrinos as a new particle type [67]. Indeed, because of their high velocity, neutrinos require initial conditions taken at rather low redshift in order to reduce Poisson noise [44,68]. Initial conditions for all the grid simulations are run with the same seed.

Gadget-3
GADGET-3 (GAlaxies with Dark matter and Gas intEracT) is a massively parallel tree-SPH code for cosmological simulations, originally developed by Volker Springel and collaborators [69,70]. It is written in ANSI C, and uses the standardized message passing interface (MPI) along with several open-source libraries (GSL 5 , FFTW 6 ). Gravitational interactions are computed via a hierarchical multipole expansion using the standard N-body method, and gas-dynamics are followed with smoothed particle hydrodynamics (SPH); collisionless dark matter and gas are both represented by particles. Since its original version (GADGET-1), the code underwent a series of improvements and optimizations over several years (GADGET-2 and 3), to maximize the work-load balance and the efficiency in memory consumption and communication bandwidth. In what follows, we briefly describe the key features of the code.
GADGET-3 follows a collisionless fluid with the standard N-body method, and an ideal gas with smoothed particle hydrodynamics (SPH). The code solves simultaneously for the dynamics of the collisionless component and of the ideal gas, both subject to and coupled by gravity in an expanding background space. The N-body implementation only differs from other cosmological codes by the accuracy of the gravitational field computation. A number of further physical processes have also been implemented in GADGET-3, from radiative cooling/heating physics to non-standard dark matter dynamics, star formation and feedback. In figure 3, we present the evolution of a filament with redshift and in figure 7 we show the image of a snapshot made with splotch 7 . Such realizations can be used for visual confirmation before quantitative analysis as well as for public outreach and education.
Several optimization strategies have been added in GADGET-3. These include a Peano-Hilbert space decomposition, a massively parallel version of the Fast Fourier Transform library, the possibility of splitting the simulation data across several files (to facilitate and speed-up the input/output process), and the fact that the code can be run on an arbitrary number of processors. In its current version, GADGET-3 is highly efficient in memory consumption (it allocates up to 80 bytes per particle) and communication bandwidth, is versatile and flexible, accurate and fast. Another important aspect is the scalability of the code, i.e. its performance when the number of processors is increased, which has currently been tested up to 16,000 cores.

extract
The GADGET-3 snapshots contain various fields among which the position p and velocity v for both dark matter and gas particles. It also contains fields that are specific to the SPH treatment of gas particles: internal energy U, density ρ, electron fraction N e , hydrogen fraction N H and smoothing length h. We use these fields to extract two samples: • a particle sample: we extract a subsample of particles to study the temperature-density relation. For each particle the temperature is derived with the formula . γ is the adiabatic index (5/3 for monoatomic gas), M H is the mass of an hydrogen atom, k B is the Boltzmann constant and X H is the hydrogen fraction by mass. figure 10 illustrates typical temperature-density diagrams obtained from this particle sample.
• a line of sight sample: we extract lines of sight (LOS) from the simulation cube choosing random origin and axis. For each pixel of each LOS, we derive density ρ, temperature T , peculiar velocity v and optical depth τ, all for H I only using the SPH equation: where A is a scalar quantity, r a position in the cube, h the smoothing length, and W a kernel function. The index j runs on all particles. We use the 3D cubic spline kernel: where q j = r − r j /h j . These LOS are not mock spectra, in the sense that they do not match any properties (such as noise, resolution, metals absorption, . . . ) of observational data. The quantity we are particularly interested in is the optical depth for H I, from which we then compute the mean transmitted flux for each pixel.

Post-processing
The post-processing stage allows us to extract two categories of results from our simulations. The first one is the large-box high-resolution power spectrum that is derived by an appropriate combination of the power spectra from 3 lower-resolution or smaller-box simulations, using the splicing technique described in section 5. The second one is a particle sample that is used to derive the parameters T 0 (z) and γ(z) in the IGM. This is performed by estimating the location of the most populated region of the diagram using the mode of the 2D distribution, for particles lying in the region defined by log(δ) ∈ [−0.5, 0.0] and log(T/1 K) < 5.0, with δ = ρ/ ρ . Given the large tail of particles toward the high temperature regions where clusters of galaxies reside, in particular at low redshift, the mode was preferred to the mean since it is not affected by the precise choice of the (δ, T ) bounds used to define the IGM. We estimate the mode by taking bins of 1000 K and computing the barycenter of the five highest bins. A linear fit is then performed using these points.

Convergence tests
We describe below the convergence tests we performed to determine the mass resolution (or particle number) and box size needed to meet our requirements. As for the grid, the initial conditions for all simulations are obtained with the same seed. The cosmology used in these tests has (Ω m , Ω Λ Ω b h 2 , h, σ 8 , n s ) = (0.31, 0.021, 0.675, 0.83, 0.96). We ran two sets of simulations: the first set with simulations having the same box size L of ∼ 20 Mpc/h but changing the particle loading N 3 and therefore the mass resolution, the second with simulations having the same mass resolution but varying volumes, keeping L/N fixed at ∼ 0.12. These two sets are listed in table 3 and the results at three different redshifts are presented in figures 4(a) and 4(b). Hereafter, we will use the notation (L,N) to represent a simulation with N 3 particles for each species (dark matter and baryons) in a box of size L Mpc h −1 on a side.  These convergence tests are more stringent than what has been done before, justified by our aim to use our simulation suite for comparison to data of higher quality. For instance, to probe the effect of the box size, [11] compared to a reference simulation with (L, N) = (120, 200) and thus L/N ∼ 0.60, and [71] to a reference simulation (80, 400) i.e., L/N ∼ 0.20. This is to be compared to our L/N of 0.12. As regards the convergence on the mass-resolution, we explored a similar range of mass-resolutions as [71] (in contrast, [11] restricted to a minimum mass per particle 3 times larger), but using a 20 Mpc h −1 box instead of 10 Mpc h −1 .
The most difficult redshifts at which to achieve convergence are those at z > 3, since the mean flux level becomes very small at such epochs and even under-dense regions, which are less well sampled than average in an SPH framework, are producing absorptions. High redshift bins, however, are very important since gravitational collapse tends to suppress the differences in the linear-theory  power spectra. These bins therefore highlight primordial differences between the matter power spectra resulting from different contributions of the various cosmological constituents. Some convergence problems can also arise at z ∼ 2 due to the fact that strong systems, which are very non-linear, might be simulated inadequately due to cosmic variance or lack of resolution.

Mass resolution
We computed the ratio of the power spectra of each of the simulations listed in the first column of table 3 to the power spectrum of the (120,1024) simulation. The results presented in figure 4(a) show that an excess of power on large scales (small k) and a lack of power on small scales (large k) appear with decreasing resolution. As expected, this effect is stronger at higher redshift where the Lyman-α forest probes low density regions, which are less well resolved in the SPH treatment since it is the mass (and not the spatial) resolution that is kept fixed. Further details about this effect can be found in Bolton and Becker [71].
The dashed curves in figure 4(a) illustrate the level of the 1σ stat statistical uncertainties observed in the BOSS analysis [1] at each redshift. At z = 4.2, the experimental uncertainties are larger than the maximum ±15% departure allowed on the plot and no longer appear.
Simulations with a mass resolution at least as good as for the (20,512) simulation all deviate by less than 2.5% from the highest mass-resolution power spectrum over our minimal k-range. This corresponds to a mean mass per particle of M = 2.2 × 10 5 M .h −1 . Extending to k max = 0.1 (km/s) −1 , the (20,512) simulation deviates by ∼ 10% at the largest redshift,.

Box size
The results of figure 4(b) show that the box size has an effect on all scales, and not only on the large scales that approach the Nyquist limit. This is due to the non-linear coupling of modes during gravitational evolution, and to the fact that even on scales close to the box size, mass-fluctuations are not fully linear.
As before, the dashed curves in figure 4(b) illustrate the level of the 1σ stat data uncertainties at each redshift. To reach k min = 1.0 × 10 −3 (km/s) −1 , we see that we need a box size of at least 90 Mpc h −1 . The most significant constrain comes from the largest scales that cannot be probed (or not with adequate precision) otherwise.

Summary of convergence requirements
In conclusions, the ideal simulation for our study should use a ∼ 100 Mpc h −1 box and a mass resolution roughly equivalent to a (20,614) simulation, which translates into 3072 3 particles of each species. The mean mass of a gas particle is then M = 1.2 × 10 5 M .h −1 .
Although convergence tests are specific to each problem and each statistical property for which convergence is sought, we can briefly compare to the results obtained by other studies. To infer the dark matter power spectrum from the Lyman-α forest in high-resolution QSO absorption spectra covering 0.003 < k < 0.03 (km/s) −1 , Viel et al. [11] chose a (60, 400) simulation, i.e. a mass per gas particle of ∼ 4 × 10 7 M .h −1 . To resolve the high redshift Lyman-α forest in smoothed particle hydrodynamics simulations, a problem similar to our own, Bolton and Becker [71] found that a box size of at least 40 Mpc.h −1 is preferable at all redshifts. They also found that while a mean gas particle mass M gas ≤ 1.6 × 10 6 M .h −1 is required at z = 2, a mass resolution at least 8 times better is needed at z = 5, i.e. M gas ≤ 2 × 10 5 M .h −1 . Our requirements are thus more stringent than selected in past, both in terms of box size and mass resolution.
Several tens of such simulations, as needed to compute our grid of cosmological simulations, would require several tens of millions of hours to be run, which is not an acceptable computational time. We address and solve this issue with the splicing technique presented in the next section.

Splicing
In the previous section we have estimated that simulating a flux power spectrum covering the range k = 1 × 10 −3 (km/s) −1 to k = 2 × 10 −2 (km/s) −1 ) with a unique simulation at sufficient precision for every redshift in the range 2.2 < z < 4.6 requires N = 3072 3 particles of each species in a box of size L = 100 Mpc h −1 . To obtain power spectra of equivalent resolution and box size in a reasonable computational time, we use the technique described in [72]. In this method, competing demands of large box size and high resolution are solved by splicing together the power spectra from pairs of large and small box simulations, using L = 100 Mpc h −1 for the large-scale power, and L = 25 Mpc h −1 for the small-scale power, both with N = 768 3 . One must then correct the large box size simulation for the lack of resolution, and the small box size for the lack of non-linear coupling between the highest and the lowest k-modes. The corrections are computed using a transition (25,192) simulation that has same resolution as a (100, 768) and same box size as a (25,768).
One needs to distinguish three regimes when computing the full power spectra: • k < k min,25 , where k min,25 = 2π/25 Mpc h −1 is the minimum k present in a L = 25 Mpc h −1 box. The spliced flux power P F is the flux power P F,100,768 of the (100, 768) simulation here taken as our reference, corrected for its low-resolution by a k-independent factor evaluated at k min, 25 : 25,192 (k min, 25 ) .
The possibility of using a constant factor for the largest k-modes has been tested in [72].
• k min,25 < k < k Nyq,100 /4, where k Nyq,100 = 768π/100 Mpc h −1 is the Nyquist wave number of the large box. In this regime we use a similar correcting ratio, but taken at the wave number k at which the flux power is calculated: 25,192 (k) .
This is mathematically equivalent to considering the high-resolution simulation (25,768) as our reference, and correcting it for its small box size.
The splicing technique is applied for each redshift at which we compute the power spectrum. We illustrate the method and its accuracy on figure 5, using a set of smaller-resolution simulations to enhance the contrast between the different power spectra, as well as to allow the comparison to a full resolution run (labelled "exact" on the figure) with 1024 particles in a (100 Mpc/h) 3 box. For this illustration, the large box-size, the large resolution and the transition simulations are (100,256), It shows less scale-dependence when taken as a box-size correction to the large-resolution power spectrum. Although the correction factors show discontinuities at the boundary where the simulation chosen as reference changes, the spliced power spectrum is continuous by construction.
For the box size and resolution chosen for our simulation suite, the last regime begins at k = 5.3 × 10 −2 (km/s) −1 for z = 3.0, which is beyond the maximum mode that can be reached with BOSS or eBOSS data. The maximum correction factor, obtained for k = 2.0 × 10 −2 (km/s) −1 , is thus smaller than in the previous illustration. It ranges from 22% at z = 4.6 to 5% at z = 2.2.
We estimate the accuracy of the technique from the splicing residuals, defined as the ratio of the spliced to the exact power spectrum. The splicing residuals show no dependence with redshift. The residuals at z = 3.0 are overlaid on the right plot of figure 5. In figure 6, they are plotted for z = 2.2 and z = 4.2, along with the statistical uncertainty at the same redshifts obtained in the most recent BOSS analysis [1]. Over the k-range of interest for BOSS data, the residuals have an average of −0.98 with an rms of 0.01. The largest excess is seen near k = 10 −3 (Mpc/h) 3 . A simulation with a larger box size would be needed to reduce the splicing residuals further. For the purpose of this study, the splicing technique is accurate at the 2% level over the entire k-range of interest.

Results and discussions
Several checks were performed to validate our simulations. We first verified that the power spectrum of independent simulations obtained either with different cosmological parameters or different seeds is consistent with the power spectrum derived from the Taylor expansion of Sec. 2. We then present a comparison of our central model with the one-dimensional Lyman-α forest power spectrum obtained with BOSS by [1]. This allows us to quantify the agreement between our simulation and the measured power spectrum. Finally, we discuss some characteristics of our simulations. In particular, we describe the effect on the flux power spectrum of some of the parameters we have varied, and we show the T − ρ diagrams from which we derive the two parameters T 0 (z) and γ(z) that describe the IGM.

Checks with independent simulations
We validate our second-order Taylor expansion by comparing its prediction to the simulated power spectrum for simulations other than those used in the grid. For each simulation, we extract 10 subsamples of 10.000 lines of sight each with random origin and axis. For each redshift and mode, the power spectrum value is taken as the mean over the 10.000 lines of sight and the uncertainty on the mean as the rms of the distribution divided by √ 10.000. We produced a new simulation with the same parameters as our central simulation but using a different random seed to compute the initial conditions. Snapshots of the resulting gas distribution in the two cases are shown in figure 7. The derived power spectra for the two seeds are in excellent agreement on low scales. On the largest scales, the two power spectra can differ by up to 2 to 3σ Figure 7. Visualisation using splotch of the baryonic gas from a GADGET-3 snapshot taken at z = 2.2 for two simulations run with identical parameters but different random seeds to compute the initial conditions. Both simulations are using 2x768 3 particles in a (25 Mpc h −1 ) 3 box. Color represents gas temperature (from blue to red) and density is mapped to intensity. Left for the random seed used for the grid, right for a different random seed. at all redshift, indicating a cosmic variance contribution to the uncertainty on the simulated power spectrum since the simulation box has the size of the largest modes measured.
We also ran additional simulations conserving the same seed but with different cosmological and astrophysical parameters. For each sub-sample, we compute the power spectra corresponding to the twelve redshift bins in the range z = [2.1−4.5]. We then perform a simple fit of the six parameters (n s , σ 8 , Ω m , H 0 , AMPL, GRAD) using our second-order Taylor expansion as model.
The results are summarized in Tab. 4 and Tab. 5. The last column shows the average of the fitted values over the 10 sub-samples. The uncertainty on the fitted value is estimated as the rms of the distribution over the 10 sub-samples divided by √ 10. These four configurations probe different regions of the parameter phase space. They give results in good agreement with the input cosmological and astrophysical parameters. The level of accuracy achieved with these validation tests is 3 to 5 times better than the errors we expect on these parameters from a fit to data given the uncertainties of Palanque-Delabrouille et al. [1]. These validations demonstrate the robustness of our second-order Taylor expansion over the entire parameter space covered by our grid simulations.  Table 5. Comparison of the simulated parameters and the fitted parameters for three different sets of parameters conserving the same seed in the simulations.

Comparison to SDSS-III/BOSS DR9 data
In Palanque-Delabrouille et al. [1], the one-dimensional Ly-α forest power spectrum is measured with 13,821 quasar spectra from SDSS-III/BOSS DR9 selected on the basis of their high quality, large signal-to-noise ratio, and good spectral resolution. The power spectra are measured over twelve redshift bins from z = 2.2 to z = 4.4, and scales from 0.001 (km/s) −1 to 0.02 (km/s) −1 (see figure 8).
In order to compare the power spectrum obtained for our central model model, we normalized the simulation power spectrum at each redshift by constraining the effective optical depth to follow a power law evolution, τ eff (z) = A × (1 + z) τ eff , where A = 0.0025 and τ eff = 3.7. To account for the effect of the correlated Si iii absorption, we correct the simulated power spectrum by a multiplicative term, 1 + a 2 + 2a cos(vk) with a = f Si III /(1 − F (z)) following the suggestion of McDonald et al. [30]. The parameter f Si III is adjusted and v is fixed at 2271 km/s. We model the imperfection of the resolution of BOSS spectra though a multiplicative term. Finally, we allow for imperfection in the noise estimate of the BOSS spectra with eight additive terms (one for each redshift bin). Figure 8 illustrates the good agreement between the data and the simulations. Without any adjustment of the cosmological and astrophysical parameters, the χ 2 per number of degrees of freedom is already better than 1.2. The good agreement between data and simulation covers the whole redshift range, z = [2.1 − 4.5], in contrast with the cosmological analysis described in Palanque-Delabrouille et al. [1] which was performed over the reduced redshift range z = [2.1 − 3.7]. This simple comparison demonstrates the improvement obtained with these simulations over the previous generation of simulations [25].   Figure 8. One-dimensional Ly-α forest power spectrum obtained with BOSS spectra. The dots are the measured power spectrum by Palanque-Delabrouille et al. [1]). The solid line represents the power spectrum for our central model after adjustment of nuisance parameters to account for imperfect modeling of the instrumental parameters in the 1D power spectrum measurement. Figure 9 illustrates the impact on the power spectrum of our four cosmological parameters. We compare the power spectrum computed from our best-guess model to the one obtained when varying each parameter, one at a time. We note that the dependence on the value of the four parameters is as expected according to their physical meaning. We briefly explain the different behaviors below. The spectral index n s represents the evolution of the primordial density fluctuations with respect to k through P(k) ∝ k n s −1 . A larger n s therefore increases the power at large k, as seen in the top left panel of figure 9.

Power spectrum
The parameter σ 8 measures the rms amplitude of the linear matter density fluctuations today in spheres of size 8 h −1 Mpc, and thus determines the normalization of the matter power spectrum. To first order, increasing the value of σ 8 therefore increases the power spectrum on all scales, as shown in the top right panel of figure 9. A slightly larger effect, however, is seen on large scales, since an excess in the amplitude of the fluctuations will favor the merging of small scale fluctuations, thus enhancing the power on larger scales. This tiny trend is purely non-linear and not expected in the evolution with σ 8 of the linear power-spectrum.
The present-day Hubble constant H 0 (in units of velocity/distance) allows the conversion from distance-space to k-space (units of inverse velocity). Therefore, if H 0 is increased, a given distance will correspond to a higher k, thus leading to an increase of power since the power spectrum, which is a decreasing function of k, is shifted to the right. This is indeed what is observed in the lower left panel of figure 9. Figure 9. Effect of the parameters n s , σ 8 , H 0 and Ω m on the power spectrum (limited to the k-range of our study) at z = 3.2. P + (k) and P − (k) refer to the power spectra extracted from the simulations using the upper and lower limit on each considered parameter respectively. The fit to the points is a 5 th order polynomial function.
Finally, the parameter Ω m quantifies the fraction of matter density in a flat Universe. Because Ω m and the dark energy density Ω Λ vary in opposite directions, a higher Ω m delays the onset of dark energy domination, thus increasing the time available for structure formation. In addition, in a larger Ω m universe, more structures (in particular small ones that would not collapse otherwise) will be formed, leading to an increase of the power spectrum, especially at high k. This is in agreement with the plots in the lower right panel of figure 9.

Density-temperature relation
In figure 10, we present the T − ρ diagrams obtained from our central simulation at each of the snapshot redshifts. We can distinguish three different populations -the IGM, the stars, and the clusters -with a clear evolution with redshift for each of them.
The IGM is described by the low density and low temperature particles. This is the regime that dominates at high redshift. At later times, however, fewer and fewer particles reside in this part of the T − ρ diagram, since they are captured by collapsing over-densities. We use this region to extract the T 0 (z) and γ(z) parameters, displayed in figure 11, where they are compared to the measurements of Becker et al. [57].
The particles with higher temperature correspond to clusters and galactic gas. As expected, their density increases as structures are formed in the simulation box. They therefore become more Figure 10. Temperature-density diagrams at various redshift. Color represents the particle density in logarithmic scale. The black line represents the fitted T − ρ relation from several mode-estimated points. δ is the normalized density ρ/ ρ . prominent at lower redshifts.
In our simulations dedicated to the study of the IGM though the Lyman-α forest measurements, star formation undergoes a simplified treatment, which reflects as the sharp cut-off at log(δ) 3. Any particle sufficiently dense and cool is transformed into a star particle. The latter is used for gravity force calculation, but does not undergo SPH treatment like baryonic gas does.

Conclusions
We have designed and produced a grid of cosmological simulations that is useful to extract constraints on cosmological parameters from Lyman-α surveys, whether current like SDSS-III/BOSS or future like SDSS-IV/eBOSS. These simulations cover the redshift range 2.2 − 4.6. They explore the cosmological parameters n s , σ 8 , H 0 , and Ω m over a large range centered on Planck measurements, as well as the astrophysical parameters T 0 and γ in a range covering most recent results.
Using the splicing technique of McDonald [72], we computed 1D power spectra from simulations equivalent to a 100 Mpc h −1 box filled with 3072 2 particles of each species (here dark matter and baryonic gas), noted (100, 3072), from lower-resolution (100,768) and smaller box-size (25,768) simulations, combined using a transition (25,192) simulation. We show that the splicing technique allows us to approximate the exact full-resolution large-box simulation with an accuracy at the 2% level. While one full-size high-resolution (100,3072) simulation would have required of order one million hours, one equivalent set of 3 simulations consumes an average of 70,000 hours of CPU time, most of it being spent by GADGET. The data volume produced by each set is 1.6 terabytes. Therefore, the whole grid represents about 2 millions hours of CPU time and a volume of 45 terabytes of data.
From the 1D power-spectra that we computed at each point of the grid, we derived a secondorder Taylor expansion around our best-guess model. It describes the evolution of the 1D power spectrum with changes in either the cosmological or the astrophysical parameters that we studied. We have performed several check runs to ensure the quality and validity of our simulation grid, using either different seeds, or off-the-grid values of the cosmological and astrophysical parameters. These checks were all consistent with the power spectrum predicted using our second-order Taylor expansion, thus validating it. We compared our central simulation to published data from BOSS and showed that they were already in good agreement without any adjustment of any of the simulation parameters. In forthcoming work, we will use this Taylor expansion for a quantitative comparison to data in order to extract best-fit cosmological parameters.
These simulations are accompanied by a set of simulations where massive neutrinos are included. These required additional developments for an efficient treatment and a proper account of the additional particles (at all levels of the pipeline: in CAMB, in the setup of the initial conditions for thermal velocities, in Gadget-3, etc.), but are otherwise produced with a pipeline similar to the one presented in this study. The details about the simulations with massive neutrinos can be found in the companion paper [67]. Additional parameters can yet be included in the same context. However, due to the presence of the cross terms that are necessary for an accurate modeling of the likelihood function that illustrates the variation of the power spectrum in all directions of this growing parameterspace, adding new parameters will become more and more expensive in terms of calculation time.

A List of all simulations
We summarize in table 6 all the simulations mentioned in the paper. For box size and number of particles, (L, N) refers to a simulation with N 3 particles per species (gas or dark matter, thus 2 × N 3 particles total) in a box of size L Mpc h −1 on a side. Standard parameters are (n s , σ 8 , Ω m , H 0 , T 0 , γ) = (0.83, 0.96, 0.31, 67.5, 14.000, 1.3). Unless parameter names are explicitly listed, values are given for all parameters in the order just mentioned. All parameters are assumed to have their standard value unless specified otherwise. Except for the simulations performed for the convergence tests or to compute the exact power spectrum in the splicing test, akk simulations are using the splicing technique to combine each set of three simulations into a single one of equivalent size to the largest box and equivalent mass-resolution to the best mass resolution.