The evolution of the low-density HI intergalactic medium from z=3.6 to 0: Data, transmitted flux and HI column density

We present a new, uniform analysis of the HI transmitted flux (F) and HI column density (N(HI)) distribution in the low-density IGM as a function of redshift z for 0<z<3.6 using 55 HST/COS FUV (Delta(z)=7.2 at z<0.5), five HST/STIS+COS NUV (Delta(z)=1.3 at z~1) and 24 VLT/UVES and Keck/HIRES (Delta(z)=11.6 at 1.7<z<3.6) AGN spectra. We performed a consistent, uniform Voigt profile analysis to combine spectra taken with different instruments, to reduce systematics and to remove metal-line contamination. We confirm previously known conclusions on firmer quantitative grounds in particular by improving the measurements at z~1. Two flux statistics at 0<F<1, the mean HI flux and the flux probability distribution function (PDF), show that considerable evolution occurs from z=3.6 to z=1.5, after which it slows down to become effectively stable for z<0.5. However, there are large sightline variations. For the HI column density distribution function (CDDF, f proptional to N(HI)^(-beta)) at log (N(HI)/1cm^-2)=[13.5, 16.0], beta increases as z decreases from beta~1.60 at z~3.4 to beta~1.82 at z~0.1. The CDDF shape at lower redshifts can be reproduced by a small amount of clockwise rotation of a higher-z CDDF with a slightly larger CDDF normalisation. The absorption line number per z (dn/dz) shows a similar evolutionary break at z~1.5 as seen in the flux statistics. High-N(HI) absorbers evolve more rapidly than low-N(HI) absorbers to decrease in number or cross-section with time. The individual dn/dz shows a large scatter at a given z. The scatter increases toward lower z, possibly caused by a stronger clustering at lower z.


INTRODUCTION
The small amount of neutral hydrogen (H i) in the diffuse, warm (∼ 10 4 K), highly ionised intergalactic medium (IGM) produces a rich series of narrow absorption lines blueward of the Lyα emission line in the spectra of AGN, also known as the Lyα forest 1 . Combined with theory and state-of-art cosmological, hydrodynamic simulations, the evolution of the Lyα forest over cosmic time provides some of the most powerful cosmological and astrophysical constraints as 1) hydrogen is the most abundant element and a mostly unbiased basic building block of stars and galaxies, 2) the forest is the largest reservoir of baryons at all epochs, 3) it traces the underlying dark matter in a simple manner, thus outlining the skeleton of the large-scale structure, 4) its thermal state provides clues on the reionisation history, and 5) it contains information on galaxy formation and evolution through the gas infall from the surrounding IGM and galactic feedback (Sargent et al. 1980;Cen et al. 1994;Weymann et al. 1998;Schaye 2001;Lehner et al. 2007;Davé et al. 2010;Shen et al. 2012;Ford et al. 2013;Danforth et al. 2016;Martizzi et al. 2019).
The physics of the Lyα forest is largely determined by a combination of the Hubble expansion, the changes in the ionising UV background radiation field (UVB) and the formation and evolution of the large-scale structure and galaxies. The Hubble expansion cools the gas adiabatically and decreases the gas density and the recombination rate. This process is fairly well-constrained by the cosmological parameters from WMAP and Planck observations (Jarosik et al. 2011;Planck collaboration 2016).
On the other hand, the UVB assumed to originate primarily from AGN and in some degree also from starforming galaxies photoionises and heats the IGM. If the intensity of the UVB decreases, the H i fraction increases. Unfortunately, the UVB and its evolution are less well constrained both theoretically and observationally. The relative contributions from AGN and galaxies are poorly known as a function of redshift, in part since the escape fraction of H i ionising photons and the amount of dust attenuation of galaxies is uncertain and since the AGN spectral energy distribution including both obscured and unobscured AGN is poorly constrained. The process of the photoionisation and recombination of the integrated UV emission through the clumpy, opaque IGM is also complex (Bolton et al. 2005;Faucher-Giguère et al. 2008b;Haardt & Madau 2012;Kollmeier et al. 2014;Puchwein et al. 2019;Faucher-Giguère 2020). At the same time, outflows from star formation and AGN activity change the dynamical, chemical and thermal states of galaxy halos and the surrounding IGM, slowing down the gas infall Steidel et al. 2010;Suresh et al. 2015). In addition, structure evolution is expected to create collisionally-ionised hot gas known as the Warm-Hot Intergalactic Medium (WHIM) with temperature ∼ 10 5−7 K through gravitational shock heating. The WHIM becomes a more dominant phase at z < 1 and could hide a large fraction of missing baryons (Fukugita et al. 1998;Cen & Ostriker 1999;Savage et al. 2014;Haider et al. 2016).
All of these physical processes leave their footprints on the evolutionary properties of the diffuse IGM in the expanding universe through the shape and number of absorption profiles. The H i column density NH i is determined by a combination of the neutral fraction of photoionised hydrogen, the gas density and the UVB, while the absorption line width constrains the temperature and non-thermal turbulent motion of the IGM.
At 1.5 < z < 3.6, the evolution of the Lyα forest is wellestablished observationally from the Voigt profile fitting analysis of high-resolution and high signal-to-noise (S/N) ground-based optical QSO spectra taken with instruments such as the HIRES (HIgh-Resolution Echelle Spectrometer, Vogt (1994Vogt ( , 2002) on Keck I and the UVES (UV-Visible Echelle Spectrograph, Dekker et al. (2000)) on the VLT (Very Large Telescope), as the H i absorption lines at NH i 10 17 cm −2 are usually fully resolved.
At z < 1.5, the H i Lyα can be observed only in the UV region from space due to the atmospheric cutoff at ∼3050Å. Before the installation of COS (Cosmic Origins Spectrograph) onboard HST in 2009, the low sensitivity of available UV spectrographs such as HST/STIS (Space Telescope Imaging Spectrograph) had seriously limited the sample size and data quality, hindering a consistent analysis of the IGM combined at z > 1.5 from optical data and at z < 1.5 from UV data (Weymann et al. 1998;Janknecht et al. 2006;Lehner et al. 2007). With its factor of ∼10 higher throughput than STIS, COS has opened a new era for the low-z IGM study from a unprecedented large number of goodquality AGN spectra (Danforth et al. 2016). Although the COS G130M/G160M grating has a factor of 3 lower resolution (∼ 19 km s −1 ) than the UVES/HIRES resolution, most low-z H i lines are resolved at the COS resolution ( Fig. 1) and line blending is not as problematic as at z > 1.
Here in the first of a series from our ongoing observational study on the redshift evolution of the IGM, we present the properties of the transmitted flux F at 0 < F < 1 and H i column density NH i at NH i ∈ [13.5, 17] of the low-density intergalactic H i from z = 3.6 to z = 0, i.e. since the universe was 1.8 Gyr old. We constructed a high-quality IGM sample from three public archives: 55 HST/COS FUV G130M/G160M AGN spectra covering the Lyα forest at z < 0.47, two QSO spectra from the HST/STIS E230M archive supplemented with our new observations of three QSOs with the HST/COS NUV G225M grating at z ∼ 1 and 24 VLT-UVES/Keck I-HIRES QSO spectra at 1.7 < z < 3.6 2 .
We have performed our own consistent, uniform indepth Voigt profile fitting analysis to the three data sets, instead of compiling fitted line parameters from literature, cf. Tilton et al. (2012). Although time-intensive, this approach  (2009)) and HST/COS (lower panel, taken at Lifetime Position 1). The low-resolution FOS spectrum does not show an asymmetric profile of the H i Lyα at 1363Å as convincingly as STIS E140M and COS G130M spectra. Two weak absorption lines with log N H i ∼ 12.65 are clearly present at 1366.2Å and 1368.4Å in the higher-S/N COS spectrum. Being usually much narrower than H i, most metal lines are not resolved at the COS resolution, as seen in the profiles of Galactic ISM Ni ii λ1370.13 from STIS and COS.
is the only viable option to reduce any systematics, to account for the different spectral characteristics of each spectrograph and to remove metal contamination. One of our primary aims is to provide the fundamental measurements of the low-density IGM from the self-consistent analysis for theoreticians to test cosmological simulations and theories on structure/galaxy formation and evolution.
We produced two sets of the fitted parameters: one using only the Lyα (the Lyα-only fit) as most simulations use the Lyα forest region and another using all the available Lyman series (the Lyman series fit) to derive reliable line parameters of saturated Lyα lines. Although the redshift coverage is not continuous and the sample size at z ∼ 1 is rather small, the analysed redshift range is the best compromise within the capabilities of currently available ground-based and space-based spectrographs.
This paper is organised as follows. Our data sets are presented in Section 2. The Voigt profile fitting technique and its caveats are discussed in Section 3. The H i continuous flux statistics are found in Section 4. The distribution of H i column densities is discussed in Section 5. We summarise our results in Section 6. All the long tables are published electronically on the MNRAS webpage. Throughout this study, the cosmological parameters are assumed to be the matter density Ωm = 0.3, the cosmological constant ΩΛ = 0.7, and the current Hubble constant H0 = 100 h km s −1 Mpc −1 with h = 0.7. The logarithm NH i is defined as log NH i = log(NH i/1 cm −2 ). All the quoted S/N ratios are per resolution element. The atomic parameters are taken from the atomic parameter file in the Voigt profile fitting package VPFIT (Carswell & Webb 2014), with some unlisted values from the NIST (National Institute of Standards and Technology) Atomic Spectra Database. We also use the terms "absorbers", "components" and "absorption lines" interchangeably.

General description of the analysed data
The most physically meaningful analysis of absorption spectra is to decompose absorption lines into discrete components to derive column densities and line widths, assuming the profile shape to be the Voigt function. The commonly used curve-of-growth analysis from the equivalent width measurement is straightforward with the mathematically well-characterised associated error (Ebbets 1995). However, its derived column density is degenerate with the absorption line width for a single-line transition, such as typical IGM H i Lyα with log NH i 13.5 for which Lyβ cannot be detected in COS spectra with S/N 25. Since about 60% of IGM H i lines with log NH i ∈ [13,15] at z ∼ 0.2 have log NH i 13.5, inability of constraining the line width, thus the column density in some degree, is a serious drawback of the curve-ofgrowth analysis. Moreover, deblending of absorption complexes is not straightforward in the curve-of-growth analysis. High-z IGM spectra suffer from severe blending and measuring the equivalent width in high-resolution UVES/HIRES spectra is almost impossible and meaningless.
The Voigt profile fitting analysis requires high-quality spectra in which absorption lines are resolved and deblending is possible. In order to achieve a data quality adequate enough for the profile fitting analysis, we have built the three IGM data sets by selecting good-quality AGN spectra publicly available as of the end of 2017 from HST, FUSE, VLT and Keck archives. Due to the rapid increase of the number of absorption lines with z, it is essential to have highresolution, high-S/N spectra that allow for deblending at z > 1.5. At lower redshifts, high resolution is not as crucial due to much less blending, but a high S/N is still required to place a reliable continuum and to obtain robust fitted line parameters. Our main AGN selection criteria are: (i) Sightlines without damped Lyα systems (DLA, log NH i 20.3) in the Lyα forest region and only a few Lyman limit systems (log NH i 17.2) in the entire spectrum in order to maximise useful wavelength regions.
(ii) Spectra covering higher-order H i Lyman lines, at least Lyβ, to obtain a reliable line parameter for saturated Lyα lines. Available FUSE spectra were included to cover the corresponding Lyman series of COS Lyα.
(iii) For COS FUV, STIS and UVES/HIRES spectra, the S/N cut is set to be 18, 18 and 40 per resolution element in a large fraction of forest regions. This rather arbitrary S/N cut is a compromise between having well measurable lines and as large a sample as possible.
(iv) To increase the sample size at z ∼ 1, we relax the S/N cut and include our three new COS NUV QSO spectra obtained through HST GO program 14265. Two have S/N ∼ 15-18, while one has S/N ∼ 10-15. Since high-order The red histogram is from the two STIS E230M spectra used for both Lyα-only and Lyman series fits. The thick vertical blue lines mark the median z of each redshift bin used for the Lyα-only fit. Each redshift bin is shaded as gray and yellow alternatively to clearly distinguish from each other. These bins are chosen to exclude the z range at which the Lyman series fit does not contain enough lines. Middle and lower panels: Number of H i lines from the Lyα-only and Lyman series fits, respectively.
Lyman regions of the three sightlines are only partly observed, we use these spectra only for the Lyα-only analysis. The lower-S/N increases the lowest reliable value for NH i and leads to a unreliable measurement of the transmitted flux (see Section 4.2). However, including the three COS NUV spectra does not change our conclusions.
(v) For COS FUV/NUV spectra, a region with S/N lower than each S/N cut is discarded if it is longer than ∼5Å, so as not to compromise the reliable Voigt profile fitting and flux statistics.
(vi) The forest region with S/N > 18 of the COS FUV spectra is required to be 100Å wide. This limits the emission redshift to be zem > 0.1, for which the possible forest coverage is 120Å. Considering that the forest is ∼550Å long at z ∼ 2.5, such a small wavelength coverage makes cosmic variance a major issue. To avoid confusion with highorder Lyman lines in the FUV spectra, the maximum forest z is set to be 0.47.
(vii) No broad absorption line (BAL) AGN. Mini-BALs are included with the affected wavelength region excluded.
The COS FUV (1100-1800Å), COS NUV (2225-2525Å), and STIS NUV E230M (1850-3050Å) spectra contain many Galactic ISM lines, such as Si ii λλ 1260. 42, 1304.37, 1526.70, C ii λ 1334.53, Mg ii λλ, 2796.35, 2803.53, and Fe ii λλ 1608.45, 2382.76, 2600. The profile fit can easily reveal typical IGM lines blended with ISM lines, if multiple transitions of the same ISM ion are available and if some of the clean transitions are not saturated. However, broad and/or weak blended IGM lines can not be always validated when the spectrum has a low resolution, low S/N or fixed pattern noise. The most noticeable ISM line in the NUV region of our interest is multiple Fe ii including nonsaturated transitions so that blended IGM lines above the detection limit are easily detected. However, the COS FUV region contain many single/multiple ISM lines as well as geocoronal emission lines. Therefore, regions contaminated with strong and medium-strength ISM lines are excluded in our IGM study, regardless of available multiple transitions of the same ion.
Our final sample consists of 24 UVES/HIRES QSOs covering the forest at 1.67 < z < 3.56 with the total analyzed z range ∆z = 11.6, five STIS E230M and COS NUV QSOs at 0.76 < z < 1.30 with ∆z = 1.3 and 55 COS FUV AGN at 0.00 < z < 0.47 with ∆z = 7.  [3.20, 3.55] (z = 3.38), respectively. At z > 1.5, a sightline with less than ∼100Å-long in a redshift bin is excluded to reduce a sightline variation, since each z bin samples a wavelength range with 400Å. The middle and lower panels show the number of H i lines at log NH i ∈ [13, 15] from the Lyα-only fit and Lyman series fits, respectively. The steep decrease of the number of H i lines from the Lyman series fit at z ∼ 1.95 is caused by the atmospheric cutoff at 3050Å in the optical spectra without a corresponding UV spectrum.
All the analysed spectra are in the heliocentric velocity frame. In order to avoid the proximity effect, the region of 5,000 km s −1 blueward of the Lyα emission was excluded. When a sub-DLA with log NH i ∈ [19.0, 20.3] is present in the Lyα forest region, a region of ±50Å centred at the sub-DLA was discarded, as the low-density H i around sub-DLAs is not likely to represent the typical IGM due to a strong influence by the galaxy producing the sub-DLA. Table 1 lists the 24 QSOs observed with the UVES at the VLT or with the HIRES at Keck I, along with their emission redshift, analysed absorption redshift ranges and S/N per resolution element in the Lyα forest region. The UVES spectra are the same ones analysed by Kim et al. (2007Kim et al. ( , 2013Kim et al. ( , 2016, while the HIRES spectra are the same ones described by Boksenberg & Sargent (2015). The UVES and HIRES spectra were sampled at 0.05Å and 0.04Å, respectively. Their resolution is about 6.7 km s −1 . Although the S/N differs from QSO to QSO and even varies along the same QSO, the practical NH i detection limit is log NH i ∼ 12.5. Table 1 also lists the absorption distance path length, ∆X, which accounts for comoving coordinates at a given z  (Savaglio et al. 1999) covers a high-order Lyman region at 2550-3057Å. k: A sub-DLA at z = 2.187 in the Lyα region. l: The Lyα emission is slightly double-peaked. The redshift is set to the wavelength of the highest flux. m: A sub-DLA at z = 2.305 in the Lyα region. n: The emission feature is very flat with several intrinsic absorption lines. The redshift is set to be the center of the flat emission feature. o: A sub-DLA at z = 3.087 in the Lyα region causes the flux to be zero at 3754Å. p: As the right wing of the sub-DLA at z = 3.087 covers the Lyα emission feature in addition to several intrinsic absorbers, the redshift is not accurate.

HST/STIS data
Due to the low efficiency of STIS E230M, the HST archive offers only one good-quality AGN spectrum covering the forest at z ∼ 1, QSO PG 1634+706. The spectrum has S/N ∼ 40, comparable to UVES/HIRES data. In order to increase our sample at z ∼ 1, PG 1718+481 with the second highest S/N (∼20) is also included ( Table 2). These spectra are same as those analysed by Wakker & Savage (2009). The resolution is ∼ 10 km s −1 , if the slightly non-Gaussian line spread function (LSF) is approximated as a Gaussian (see more details in Section 3.2). The typical detection limit is log NH i ∼ 13.0. The pixel size of the final combined STIS spectra contin-uously increases toward longer wavelengths, ∼0.034Å per pixel at ∼2100Å and ∼0.039Å per pixel at ∼2400Å.

HST/COS NUV data
The three selected QSOs observed with the COS NUV G225M grating are part of our observing program (HST GO 14265) to study the IGM at z ∼ 1 ( Table 2). The observations were obtained in TIME-TAG mode in 2015-2016. The central wavelength setting was setup to produce a continuous wavelength coverage at ∼2226-2524Å. To increase the S/N of individual extractions, we ran the COS data reduction pipeline CalCOS version 3.3.4 with a 12-pixel-wide extraction box instead of the CalCOS default 57-pixel extraction box.
Coadding mis-aligned absorption lines due to wavelength calibration errors produces absorption lines artificially broader and smoother. While UVES, HIRES and STIS have a wavelength uncertainty less than 1 km s −1 , the CalCOS wavelength calibration uncertainty is quoted as ∼15 km s −1 (Dashtamirova et al. 2019). In general, the Cal-COS wavelength uncertainty tends to vary with wavelength and becomes larger at the edges of detector segments. A custom-built semi-automatic IDL program was developed to improve the CalCOS wavelength calibration and to coadd the individual CalCOS extractions (Wakker et al. 2015, see their Appendix for details on the COS wavelength re-calibration procedure). We first recalibrate the CalCOS wavelength on a relative scale better than ∼5 km s −1 between the same absorption features by cross-correlating the strong, clean Galactic ISM or IGM lines in all the available, individual extractions of the same QSO in the HST COS/STIS and FUSE archives. The absolute wavelength calibration was further performed using Galactic 21 cm emission toward the QSO by aligning this with the interstellar lines (Wakker et al. 2015). Since the majority of individual extractions have low S/N, it is not always straightforward to align weak/moderate-strength lines in the presence of fixed pattern noise, with the wavelength calibration uncertainty at 5-10 km s −1 . For strong lines, our wavelength recalibration has uncertainty better than 5 km s −1 in general. However, when absorption lines fall on the edge of the COS detector, their wavelength uncertainty can be at 10-15 km s −1 occasionally. The final coadded spectrum is sampled at ∼0.034Å per pixel, slightly smaller at longer wavelengths. The resolution is ∼12 km s −1 with a time-independent non-Gaussian LSF. While the non-Gaussian LSF has an extended wing, the FWHM (∼10.5 km s −1 ) at the core is comparable to the one of STIS spectra. Unfortunately, the two QSOs, HE 1211−1322 and HE 0331−4112, had become fainter at the time of observations compared to earlier low-resolution spectra, causing a lower S/N than the expected S/N ∼ 18. The region having a much lower S/N than quoted in Table 2 is discarded to keep the spectral quality as high as the data allow. The typical COS NUV NH i limit is log NH i ∼ 13.0.
The Lyβ region and the higher-order Lyman regions are not observed or are only observed in part by other UV spectrographs. Since one of the selection criteria is the coverage of the Lyβ forest, the three NUV G225M spectra are only used for the Lyα-only analysis.

HST/COS FUV data
We select 55 COS G130M/G160M (1100-1800Å) AGN spectra (Table 3). We note that 44 out of our 55 COS AGN are also included in the COS IGM sample of Danforth et al. (2016, D16 hereafter). However, our analysis methods are different and there is a difference in line identifications and fitted line parameters for ∼30% of the lines (see more details in Section 3.4).
All the raw, individual COS exposures were reduced with CalCOS versions 3.0 or 3.1 with the flat-field correction on. Similar to the treatment of COS NUV data as outlined in Section 2.4, we re-calibrated the CalCOS wavelength to an uncertainty better than ∼5-10 km s −1 (Wakker et al. 2015, see their Appendix for details) and coadded the individual extractions sampled at 0.00997Å (0.01223Å) per pixel for the G130M (G160M) grating. Since COS spectra are highly oversampled, we binned the final coadded spectrum by 3 pixels, sampled at 0.02991Å (0.03669Å) per pixel for the G130M (G160M) grating. The resolving power of each individual extraction is quoted as R ∼ 18,000 to 20,000, which corresponds to 15 to 17 km s −1 for a Gaussian LSF. However, the COS FUV LSF shows the time-dependent non-Gaussianity and the resolving power degraded with time. The spectral resolution can be approximated to ∼19 km s −1 for the COS non-Gaussian LSF at Lifetime Position 1 (see more details in Section 3.2). The wavelength regions contaminated by strong Galactic ISM lines are discarded. The typical COS FUV NH i limit is log NH i ∼ 13.0.
The CalCOS flat-field correction corrects strong wire grid shadow features greater than ∼20% in intensity, but not weak ( 10% in intensity) fixed pattern noise (FPN) produced by the hexagonal pattern of the fiber bundles in the COS FUV micro-channel plate known as MCP Hex (Dashtamirova et al. 2019). MPC Hex is supposed to be fixed in the detector pixel space, but not in the wavelength space. In practice, the position of MCP Hex and its intensity change along the pixel space. This sometimes produces false, equally-spaced weak absorption-like features in the high-S/N region of the coadded spectrum (Fig. 3). It is the most conspicuous when a high-flux Lyα emission region falls on the longer-wavelength edge of Segment B of the detector. Due to FPN, the noise is not Gaussian and the conventional way to quote noise as the reciprocal of 1 r.m.s. of the unabsorbed region underestimates true noise (Keeney et al. 2012). Since only an individual extraction with S/N 12 shows distinct FPN and the majority of our individual extractions has a lower S/N, we did not correct for MCP Hex (Fitzpatrick & Spitzer 1994;Savage et al. 2014;Wakker et al. 2015). Notes -a: The redshift is measured from the observed Lyα emission line of the QSO in the COS FUV spectra. Otherwise, the redshift is taken from NED or Simbad. b: If the z/wavelength range of the Lyαβ region in COS and/or FUSE spectra is different from the Lyα region, it is listed in parenthesis in the next row. c: F-An available FUSE spectrum is used to cover the high-order Lyman lines. The number in parenthesis is a S/N per resolution element at ∼1050Å. D16-The AGN is also included in the low-z COS IGM study in D16, although our adopted AGN naming is often different. The emission redshift is set to be that of the Lyα absorption at the highest redshift. j: Due to strong intrinsic absorbers on top of the emission peak, the redshift is slightly uncertain. k: The emission peak is relatively flat. The redshift is set to be at the highest flux around the peak. l: The S/N ratio changes abruptly in the forest region: S/N 35 at 1460Å and ∼14-17 at 1460Å. To satisfy our S/N selection criteria, only the forest region at 1460Å is included. m: Both NED and Simbad list its redshift as 0.16. However, the STIS E230M spectrum shows that it is a BL Lac type and the redshift is higher than 0.604 from the Lyα absorption features.

FUSE data
Available FUSE spectra (917-1187Å) were used to obtain a reliable column density of saturated COS FUV H i Lyα lines, since FUSE spectra cover high-order Lyman lines at z 0.12. The 8th column of Table 3 lists whether the COS AGN has a corresponding FUSE spectrum. The FUSE spectra used in this study are the same ones analysed by Wakker (2006). They are sampled at ∼0.0066Å per pixel, weakly dependent on the wavelength. As they are oversampled, we binned the FUSE spectra by 3, 5 or 7 pixels to increase the S/N. The S/N in general increases toward longer wavelengths, i.e. more reliable Lyβ profiles than Lyγ profiles. The S/N per resolution element at ∼1050Å is listed in parenthesis in the 5th column of Table 3. Since wavelength regions with S/N < 5 are not very useful to deblend saturated lines reliably, we excluded these low-S/N regions in our Lyman series fit. The 4th column in Table 3 accounts for this exclusion. AGN with low-S/N FUSE spectra but with a low-z limit z ∼ 0.002 (∼1218Å) do not have a saturated Lyα (no need for FUSE spectra) or have a higher S/N in FUSE Lyβ regions of interest than the S/N at ∼1050Å as quoted in Table 3. The resolution varies from AGN to AGN, usually ranging from ∼20 km s −1 above 1000Å to ∼25-30 km s −1 below 1000Å. For 3C273, its FUSE observations were taken in the early operation days when the telescope suffered from a focusing problem. This degraded the resolution to ∼30 km s −1 at 1100Å and to ∼60 km s −1 at 930Å. The wavelength uncertainty is about 5-10 km s −1 . However, if the Galactic molecular hydrogen with numerous transitions is detected, the wavelength uncertainty can be 5 km s −1 .

The Voigt profile fitting analysis
From the profile fitting of identified lines, three line parameters are obtained, the redshift z, the column density N in cm −2 and the line width or the Doppler parameter b in km s −1 . For thermal broadening, the b parameter (= √ 2σ, where σ is the standard deviation) is related to the gas temperature T in K by b = 2kBT /mion, where kB is the Boltzmann constant and mion is the atomic mass of ions.
We have performed the profile analysis to all the AGN spectra in this study using VPFIT version 10.2 3 with the VPFIT continuum adjustment option on (Carswell & Webb 2014). We remind readers that the publicly available VPFIT code has been extensively tested by the IGM community over three decades, including comparisons to curve-ofgrowth fit results. Our already published UVES and HIRES spectra (Kim et al. 2007(Kim et al. , 2013(Kim et al. , 2016 were also refit with VPFIT v10.2 to be consistent with the new COS and STIS fits. While the new fits overall do not change significantly from the previous ones, the errors produced by VPFIT v10.2 tend to be larger when the components are at absorption wings. Also note that the COS FUV spectra and line lists used in this study are updated from our previous ones analysed in Viel et al. (2017). Unfortunately, the Voigt profile fitting result is not unique (Kirkman & Tytler 1997;Tripp et al. 2008;Kim et al. 2013). The normalised χ 2 ν criterion does not always guarantee a good actual fit, as illustrated in Fig. 4. The number of fitted components is more sensitive to S/N than the spectral resolution since both STIS and COS spectra resolve the IGM H i lines. As S/N increases, a fitting program often tends to include more narrow, weak components to reproduce small fluctuations. Although additional components added to improve χ 2 ν are in general weak, log NH i 13.5, an actual change in the fitted parameters depends on S/N and differs for each absorption complex. Despite the nonuniqueness, our fitting analysis uses the same program to fit similar-quality spectra within each data set. Any judgmental calls and systematics would be repeated in similar ways. Therefore, our final combined fitted parameters from different spectrographs can be considered consistent and uniform within our own data sets.

The COS FUV line spread function
The profile fitting technique requires an instrumental line spread function (LSF) to convolve with the model fit profile. The LSFs of UVES, HIRES, STIS and COS NUV spectra are straightforward and well-characterised (Vogt 1994;Dekker et al. 2000;Riley 2018;Dashtamirova et al. 2019).
The COS FUV LSF is more complicated and changes with wavelength and time. The COS optics do not correct for the mid-frequency wavefront errors due to polishing irregularities in the HST primary and secondary mirrors. This causes the non-Gaussian COS FUV LSF with an extended wing and a broader and shallower core. This is stronger at shorter wavelengths and in particular evident for strong, saturated absorption lines (Kriss 2011;Keeney et al. 2012). The non-Gaussianity produces a broader and shallower line, with the bottom of saturated lines not reaching to a zero flux. Therefore, the flux statistics directly obtained from observed COS spectra cannot be compared with the one from STIS, UVES and HIRES spectra. The non-Gaussian LSF also increases the NH i detection limit compared to the same Gaussian resolving power.
In addition, the COS FUV detector loses its sensitivity from accumulated exposures known as gain sag. To avoid gain sagged regions, the position of the science spectrum on 3 Carswell et al.: http://www.ast.cam.ac.uk/∼rfc/vpfit.html the FUV detector has been moved to a different Lifetime Position (LP) periodically in the cross-dispersion direction, as noted in the 9th column of Table 3. At the later lifetime positions, the COS FUV LSF has a broader core and more extended non-Gaussian wings (Dashtamirova et al. 2019). Both non-Gaussianity and LP change reduce the resolving power as a function of wavelength and time: at 1300Å, the resolving power at LP3 decreases ∼12% from LP1. Note that ∼80% of our COS sample is taken at LP1.

Voigt profile fitting procedure
Our fitting approach is: (i) The COS FUV/NUV LSF is taken from the HST/COS Spectral Resolution homepage 4 , taking account of the Lifetime Position of the FUV LSF. The STIS E230M LSF is taken from the HST/STIS Spectral Resolution homepage 5 .
(ii) The error array is scaled to satisfy that the r.m.s. of the unabsorbed region is similar to the average of the errors in the same region, as the rebinning and interpolation during the data reduction often overestimates the error.
(iii) The appropriate good-fit χ 2 ν is set to be ∼1.3, as the average error array does not always correspond to the r.m.s. of the science array and noise is not often Gaussian.
We followed the standard approach for absorption line analysis (Carswell et al. 2002;Kim et al. 2007). First, the entire spectrum was divided into several regions. The number of divided regions is dependent on an apparent underlying continuum shape. When the continuum varies smoothly, divided regions are ∼100Å-long. However, when the continuum varies rapidly such as a region around the Lyα emission or the Lyβ+O vi emission, the length of divided regions is adopted to accommodate the rapid change of the continuum, 5-30Å. For COS, STIS and FUSE spectra, an initial continuum fit was obtained by iterating a cubic spline polynomial fit for each region, rejecting deviant regions at |(flux − fit)/fit| > 0.025 (Songaila 1998). The used fit order is between three and seven, depending on a underlying continuum shape. The continua of each region were joined to form an initial continuum of the entire spectrum. Any disjointed continua at the joined regions are adjusted manually as well as the global continuum after visual inspection, which often gives a better continuum placement. For UVES/HIRES spectra, we used the same normalised spectra analysed by Kim et al. (2013), which follows the same procedure to obtain a localised initial continuum except using the CONTINUUM/ECHELLE command in IRAF.
Second, all possible metal lines were searched for. We started from the most common metals found in the IGM (such as C iv, Si iv and O vi doublets, C ii and Si ii multiplets, and Si iii and C iii singlets) at their expected position for each H i, regardless of NH i. If any of these common metal lines are detected, we searched for other less common metals, such as Fe ii, Mg ii and Al ii. We also used empirically known facts, such as that Mg ii is not associated with low-NH i lines. When metals were found, they were fit first, using the same z and b values for the same ionic transitions. When metal lines were blended with H i, these H i absorption regions were also included in the fit. The rest of the absorption features were assumed to be H i and were fitted, including all the available higher-order Lyman series, such as Lyβ and Lyγ. When χ 2 ν 1.5, additional components are added manually and included only if they improve χ 2 ν significantly. When lines are too narrow to be H i, i.e. b 10 km s −1 , but without a robust line identification, the identification is noted as "??", but fitted assuming H i. These lines are usually weak at log NH i 12.8. The contamination by these unidentified metals is negligible at z < 1, but can be around 2-3% at z ∼3.
For each fit, we checked whether the initial continuum was appropriate for the available Lyman series and different transitions by the same ion. When necessary, a small amount of continuum adjustment was applied to achieve χ 2 ν 1.3. The entire spectrum was re-fitted with this re-adjusted continuum. In most cases, re-adjusting a local continuum makes it necessary to increase a previous continuum slightly, especially below the Lyβ emission where weak high-order Lyman absorptions at higher z can depress the continuum. This iteration has been performed several times until the final fit of lines with 3-4σ significance was obtained at χ 2 ν 1.3. Due to un-removed fixed pattern noise and continuum uncertainties, we did not fit all the absorption features at ∼3.5σ such as closely spaced several weak absorption lines as seen in Fig. 3. Any noticeable velocity shifts caused by the COS wavelength calibration uncertainty between the multiple transitions of the same ion are accounted for with the VPFIT "<<" option. The line identification and/or fitting are independently checked by B. P. Wakker for COS/STIS spectra and R. F. Carswell for STIS/UVES/HIRES spectra, and are finalised by T.-S. Kim.
Since most IGM simulations analyse the Lyα forest without incorporating high-order Lyman series, we also performed a fit using only Lyα. Note that even including all the available high-order Lyman lines does not vouch for the completely resolved profile structure of heavily saturated lines at log NH i 17-18, if severe line blending and intervening Lyman limits leaves no clean high-order Lyman lines.
The line parameters from VPFIT include the uncertainty due to statistical flux fluctuations and fitting errors. However, they do not include the error due to the continuum placement uncertainty. For the Galactic ISM, the continuum uncertainty is often estimated simply by shifting a fraction of the r.m.s. of the continuum Sembach et al. 1991) or by estimating all the uncertainties associated with a polynomial function fit to a continuum around an absorption line (Sembach & Savage 1992). In high-z IGM spectra for which VPFIT was initially developed, line blending is too severe to estimate a realistic local continuum around each absorption feature and the flux calibration of high-resolution echelle spectra is not very reliable due to a lack of well-calibrated high-S/N, high-resolution spectra of flux standard stars. The continuum-adjustment <> option in VPFIT does not use a similar procedure.
We estimated a continuum error by shifting ± 0.25σ of our fiducial continuum for 50 COS H i absorption features as shown in Fig. 5, since COS IGM H i features are not much affected by line blending. The ± 0.25σ shift is decided by visual inspection (see also Sembach et al. (1991); Penton et al. (2000); Kim et al. (2007)). Obviously the −0.25σ (+0.25σ) continuum returns a smaller (larger) b and NH i. Both sets of line parameters are within the fiducial VPFIT 1σ fitting error, with b values being more sensitive to the continuum. In general, the continuum error is 5% of the fitting error when log NH i 13.5 and S/N 30 (upper panel). The continuum error becomes larger for low S/N and NH i, especially for larger b values. In the lower panel, the continuum error of b and NH i is ∼25% for H i with b ∼ 40 km s −1 and log NH i ∼ 13.0 at 1251.4Å and is ∼15% with b ∼ 33 km s −1 and log NH i ∼ 13.0 at 1252.2Å. We remind that a large fraction of H i at log NH i 13.1 can be spurious if S/N 20-25.
Although VPFIT does not include a continuum error as in the ISM studies, its fitting errors are calibrated with the curve-of-growth analysis and the associated error array. Weak and broad lines at lower S/N have larger associated error arrays and continuum uncertainties, thus have larger fitting errors. Since our sample has mostly S/N > 20 and our analysed NH i range in the absorption line statistics is log NH i 13.5, including the continuum fitting error will increase the fiducial fitting error by 5-10%. Our main scientific goal is to quantify the observational estimates as uniformly as possible, reducing a systematic bias. Since it is not clear how to define a reasonable continuum for highlyblended high-z IGM spectra, we therefore used the VPFIT fitting error without including the ±0.25σ continuum error in this work for consistency.
A profile fit of a single-line of H i has been claimed to overestimate the true line width by ∼1.5 compared to a curve-of-growth fit using all available high-order Lyman lines in STIS, COS and FUSE spectra Danforth et al. 2010). We do not find such a tendency when we compare the Lyα-only and Lyman series fits for relatively clean, isolated and unsaturated H i Lyα from high-S/N, high-resolution optical UVES/HIRES spectra. Combined with large wavelength calibration uncertainties, imperfect line spread function (LSF) and fixed pattern noise, an observed absorption profile in lower-quality UV spectra does not necessarily show a Voigt-profile shape convolved with the true LSF. We often find that the profile shapes of Lyman lines, such as Lyα and Lyβ or Lyα and Lyγ, are inconsistent in COS and FUSE spectra. The discrepancy of b measurements between the profile and curve-of-growth fits is likely to be caused by low-quality data or an inaccurate mathematical treatment in some private profile fitting codes, not by the fundamental inferiority of a profile fit to a curveof-growth fit. We remind readers that the VPFIT profile fit compromises all the absorption profile shapes included in the fit as a function of S/N. The VPFIT fitting error can be used for reliability of fitted parameters.

Comparisons with published line parameters
Due to different data treatments and the non-uniqueness of the profile fit, discrepancies between different studies are inevitable. The discrepancy introduces a systematic uncertainty and can result in a contradictory result, especially for low-S/N data. Since only a few sightlines from UVES/HIRES spectra have published line lists besides our own, we compare the fit measurements exclusively using the D16 COS FUV line parameters. D16 sometimes misidentifies the weak Galactic ISM lines such as Mg ii λλ 1239.92, 1240.39 and orphaned high-velocity components as inter- galactic H i Lyα and does not fully account for contaminations by the ISM lines. Misidentification as metals and unaccounted metal contamination affects ∼10% of the D16 lines at their log NH i ∈ [12.6, 17.0]. We use our own line identification and measurements as a reference in this section.
D16 adopts the H i absorption line parameter from a Voigt profile fit at log NH i 14 (no other Lyman lines can be detected in low-S/N COS spectra) and a curve-of-growth fit at log NH i 14 (high-order Lyman lines can be detected), respectively. Without including FUSE spectra, D16 measures H i line parameters only from a single-line Lyα at z < 0.1. The vast majority (∼86%) of detected IGM H i lines at z ∼ 0.15 have log NH i 14. Therefore, the comparison is done for our Lyα-only fit and their Lyα-only profile fit and Lyα curve-of-growth measurements. Both N measurements for a saturated Lyα should be treated as lower limits, although VPFIT gives a very reliable column density for mildly saturated lines.
The two upper panels of Fig. 6   While the column density of common lines is mostly in good agreement, their b shows a larger difference, especially at larger b. Twenty components out of 21 with b > 60 km s −1 (12%, 21/173) have log NH i 14. At the typical COS S/N in our sample, these broad, weak lines are highly susceptible to the continuum placement and the line alignment among individual extractions to coadd, which reflects in the large b errors. The mean difference and its standard error (= 1σ/ √ N with N being the number of common lines) of 173 common H i lines is ∆b = 0.6 ± 0.4 km s −1 and ∆ log NH i = 0.03±0.07. The difference in line parameters for the lines not shown due to a different one-to-one component structure or uncorrected metal blending is obviously much larger.
The difference becomes increasingly larger for 88 common weaker lines at logNH i ∈ [12.6, 13.0] (lower panels). As the profile fitting is exclusively based on the absorption profile, the discrepancy is largely due to the difference in the profile shape of weak lines in the two studies, likely caused by the different coaddition procedure and by our improved wavelength re-calibration. As broader lines are highly sensitive to the local S/N and continuum, only 7% (8 out of a total of our 119 secure H i) have b > 60 km s −1 . About 12% (14/119) have unaccounted metal contamination or are wrongly identified as metals in D16. The mean difference and its standard error of common lines is ∆b = 1.2 ± 0.6 km s −1 and ∆ log NH i = 0.03 ± 0.01.
The difference is even larger for lower-S/N spectra (Fig. 7), since the coadded profile shape is more sensitive to the coaddition procedure and line alignment. At log NH i ∈ [13.2, 17.0] ([12.8, 13.2]), the mean difference and its standard error of 293 (133) common lines noted as filled circles is ∆b = 0.7 ± 0.4 km s −1 (∆b = 1.2 ± 1.0 km s −1 ) and ∆ log NH i = 0.01 ± 0.01 (∆ log NH i = 0.02 ± 0.01). Figure 8 displays the histogram of weak H i components in both studies. With wavelength calibration uncertainties at 5-10 km s −1 and fixed pattern noise (FPN), the fitted line parameters, in particular b, and identifications of weak lines are not as reliable as for strong lines. We measured 580 components at log NH i 13.2. Certain and uncertain (∼3.5σ) components are 73% (422/580) and 25% (146/580), respectively. The remaining is FPN features (gray-shade histogram, Fig. 3). Real weak absorption features can be missed easily in noisy spectra and a large fraction of detected weak lines can be spurious at log NH i 12.8. This incompleteness decreases the number of detected H i lines toward lower-NH i end. We did not attempt to remove any FPN in our coadding procedure (Wakker et al. 2015). With a very conservative approach, we flagged weak absorption features in coadded spectra as FPN only when we were certain by examining individual extractions. Not all of flagged fixed pattern noise were fitted.
In the lower panel of Fig. 8, the distribution of their 576 H i components and 468 FPN features from D16 suggests that FPN features become dominant at log NH i 12.8. D16 strictly measures all the absorption features at 3σ, thus their detection of weak absorption features is likely to be more objective and less biased. About 61% (287/468) of absorption features flagged as FPN in D16 are not measured in our study. However, their identification of weak lines should be taken with caution. For example, their H i features at ∼1288Å toward HE 1228+0131 (their Q 1230+0115) and at ∼1292Å toward 1H 0419−577 (their RBS 542) are likely to be FPN as shown in Fig. 3.

The mean flux and the flux PDF
The two simplest measurements of the amount of intergalactic H i are the transmitted mean flux and the transmitted flux probability distribution function (PDF). Both measurements are motivated by the current picture of the IGM in which the absorption arises from continuous matter fluctuations instead of discrete clouds, so are measured from the continuous spectrum and often referred to as "continuous flux statistics".
The mean H i flux is the average intervening absorption along the sightline and is proportional to the mean NH i through a combination of the gas density, the number of lines and line widths in redshift space. For the highly photoionised IGM, NH i is inversely proportional to the UV background intensity. In practice, the mean flux is used to calibrate simulations to observations and constrains the combined effect of the baryon density, the amplitude of the matter density fluctuation σ8, the temperature-density relation and the UVB (Rauch et al. 1997;Kirkman et al. 2007;Becker et al. 2013;Oñorbe et al. 2017).
The mean H i flux is related to the effective opacity τ eff , where f obs is the observed flux, fcont is the continuum flux, τ is the optical depth and τ eff is the effective optical depth. The effective optical depth is introduced to account for the fact that when close to 0 the normalised flux cannot be converted to the correct τ . The uncertainty is largely due to the continuum placement and the amount of unremoved metal lines, but this is not straightforward to determine. Based on a visual inspection of each spectrum, we arbitrarlly define the error as 0.25 times the r.m.s. of the unabsorbed region. The probability distribution function (PDF or P (F )) of the transmitted flux F is a higher order continuous statistic. It is defined as the fraction of pixels having a flux between F and F + ∆F for a given flux F (Jenkins & Ostriker 1991;Rauch et al. 1997;McDonald et al. 2000). While the mean H i flux is a one-parameter function of z, the flux PDF is a two-parameter function that constrains the amount of absorptions as a function of z and absorption strength F . Being a higher-order statistic, the PDF is more sensitive to the profile shape of absorption lines through the density distribution and thermal state of the IGM than the mean H i flux (Bolton et al. 2008). In practice, the PDF is also sensitive to the continuum uncertainties at F ∼ 1 and to the amount of unremoved metal lines at F ∼ 0.4 (Kim et al. 2007;Calura et al. 2012;Lee 2012;Rollinde et al. 2013).
With a large number of pixels per redshift bin, the conventional standard deviation significantly underestimates the actual PDF errors. Therefore, the errors were calculated using a modified jackknife method as outlined in Lidz et al. (2006). First, all the individual spectra longer than 50Å in each z bin are put together to generate a single, long spectrum to calculate the averaged PDF, with the bin size ∆F = 0.05 at 0 < F < 1. Pixels with F 0.025 or F 0.975 are included in the F = 0.0 and the F = 1.0 bins. Second, this long spectrum was divided into nc chunks with a length of ∼50Å. In thez = 0.08 bin, nc is 40 from the single long spectrum composed from 24 individual spectra. If the PDF estimated at the flux bin Fi is P (Fi) and the PDF estimated without the k-th chunk at the flux bin Fi is P k (Fi), then the variance at a flux bin Fi becomes This modified jackknife method is not sensitive to the length of chunks, but the errors become larger when the number of chunks is too small.

Data quality on the flux statistics
Removing the metal contamination in the AGN spectrum is not straightforward, especially when metals can be often blended with strong H i complexes over a considerable wavelength range. In addition, due to the non-Gaussian LSF of COS and STIS, the flux statistics directly measured from these spectra cannot be compared to the UVES/HIRES spectra (see Section 3.2). To avoid these drawbacks, a set of H i-only spectra was generated for each AGN. We included the fitted H i only with log NH i < 19, excluding sub-DLAs. A Gaussian LSF was assumed to be 19 km s −1 , 12 km s −1 , 10 km s −1 and 6.7 km s −1 for COS FUV, COS NUV, STIS and UVES/HIRES data, respectively. Note that the majority of our COS FUV spectra were taken at Lifetime Position 1 when the approximated Gaussian resolution was ∼19 km s −1 . As almost all COS H i lines are resolved, accounting for a degraded resolution by a few km/sec at a later Lifetime Position does not make any difference in the generated spectrum. The wavelength coverage used for the Lyα-only fit is in general larger than for the Lyman series fit, while both fits reproduce the observed absorption profiles within noise. Therefore, we used the Lyα-only fit to generate the metal-free spectrum to study the flux statistics. Lowest NH i included differs for each AGN. We also added artificial Gaussian noise to each generated spectrum, using the observed S/N (S/N = 1/σ, where 1σ is the r.m.s. of the unabsorbed region).
The effect of different S/N and undetected weak lines on the continuous flux statistics is demonstrated in Fig. 9. In the left panel, the filled circles are <F > measured from the generated spectrum of Mrk 1014 as a function of artificially added S/N. The mean flux is not sensitive to S/N as expected from Gaussian noise being symmetrical at F = 1, although the errors (0.25 times the r.m.s. of the unabsorbed region) are larger at lower S/N by definition.
Mrk 1014 is one of the lowest-S/N COS FUV spectra in this study with a detection limit log NH i ∼ 13.0. However, the highest-S/N COS FUV spectra (3C 273 and Mrk 876) show H i at log NH i < 13.0, indicating that real weak absorptions are undetected in low-S/N spectra. We manually add the expected number of H i lines at log NH i ∈ [12.3, 13.0] by extrapolating from the number of lines at NH i > 13.0 per NH i (Section 5.1 for details). The red open squares are the mean H i flux averaged from 10 generated spectra including artificial weak lines at each S/N. Added weak lines produce more absorption, but <F > decreases insignificantly by ∼0.004, less than 0.5%. The expected decrease becomes even lower for higher-S/N sightlines since they have a lower NH i detection limit so that the number of added weak lines below the detection limit down to log NH i ∼ 12.3 is smaller. We conclude that undetected weak lines do not have any meaningful impact on the mean H i flux.
The S/N has a significant impact on the PDF, as shown in the middle panel of Fig. 9. The PDF at 0.1 < F < 0.7 converges if S/N > 23. In the right panel, adding supposedly undetected weak lines has a noticeable impact on the PDF only when S/N > 60 at F 0.85 since added weak lines with log NH i 13.0 (F 0.9) can be detected only at high S/N. Note that this discrepancy is negligible for COS FUV spectra with observed S/N > 60, since H i at log NH i ∈ [12.5, 13.0] is detected and included in the PDF at F 0.9.
The PDF at F ∼ 1 is also subject to continuum placement uncertainty, especially at high redshifts (Kim et al. 1997;Calura et al. 2012;Lee 2012). The largest systematic uncertainty comes from the unknown, possible overall continuum depression by the Gunn-Peterson effect (Faucher-Giguére et al. 2008a), which is likely to be removed during the local continuum fit as we did. At zem < 3.5-3.7, the profile fit using all the available Lyman lines of the highest-S/N QSO spectra does not require a significant Gunn-Peterson depression (Calura et al. 2012). Our previous work (Kim et al. 2007, their Fig. 2) and our experience on high-S/N UVES/HIRES QSO spectra suggest that a continuum in general changes very smoothly over large wavelength ranges. Therefore, we do not expect our continuum error is much larger than ∼2% at z ∼ 3 if the S/N is larger than ∼70 per resolution element. Note that 21 out of our 24 UVES/HIRES QSO spectra have S/N 70. Since we apply the same procedure to the continuum placement for our high-z QSO spectra, we assume that a systematic contin- uum uncertainty is smaller than the statistical uncertainty at z < 3.5.
Our approach directly removes the metal contribution from the IGM, instead of commonly-used masking the metal regions (McDonald et al. 2001;Kirkman et al. 2007) or removing statistically using the metal contribution above the Lyα emission (Faucher-Giguére et al. 2008a). At z < 0.5, metals are almost fully identified, as line blending is low and the Lyα line is observed down to z = 0 so that associated metals are easily identified. At z > 1, most mediumstrength/strong metal lines are fully identified, however, weak narrow lines are not. Fortunately, when mediumstrength/weak unidentified metal lines are blended with H i lines, their contribution to the whole blended profile is often negligible. We empirically conclude that the unremoved metal contamination contributes 1% to <F > at z ∼ 3 and only affect the PDF at F ∼ 1.
The PDF from most COS FUV and STIS spectra (S/N ∼ 18-40) is sensitive to the continuum placement at F ∼ 1 and to S/N at F > 0.7, and the PDF from most UVES/HIRES spectra (S/N 60) has the largest uncertainty at F ∼ 1 due to the continuum error. Out of five COS NUV spectra, only one (HE 1211-1322) has a lower S/N (10-15) than the S/N cut of 18 for COS FUV data. However, its contribution to the total wavelength length at z ∼ 1 is only 18%. Therefore, we will consider the PDF only at 0.1 < F < 0.7 at 0 < z < 3.6 in this study.

The observed mean H i flux
The upper panel of Fig. 10 plots the mean H i flux of individual AGN from the Lyα-only fit as a function of log(1 + z) with gray filled circles. The mean flux toward each sightline is available as an online table on the MNRAS website (Table S1). The adopted error of 0.25σ of unabsorbed regions does not reflect a true relative error, but the S/N of each spectrum, and this adopted error is likely to be overestimated. The filled circles are the averaged mean H i flux <F >ave, listed in Table 4. This is not an arithmetic mean of Notes -a: The first error is the jackknife error of individual <F > values and the second error is the standard deviation of their adopted associated error (0.25σ).
individual <F > at each z bin, but is estimated from a single long spectrum combined from all the generated H i-only spectra with an appropriate Gaussian noise. Due to a large number of pixels in each z bin, any standard error estimates significantly under-estimate a true error. Therefore, we used the sum of the two error estimates: the jackknife error of individual <F > values in the z bin and the standard deviation of the associated error (0.25σ) of individual <F > to account for a continuum uncertainty. Our measurement is consistent with the previous observations within the errors. The mean flux from each sightline shows a large scatter (the inset plot). This scatter is more clearly seen in the lower panel. The deviation from the averaged mean flux at each sightline is calculated using the standard error (1σ<F > = 1σ/ √ N with N being the number of sightlines) of the arithmetic mean of all the sightlines within a given redshift range ∆z, but excluding the sightline in consideration. Due to the paucity of data points at higher redshifts, we use a different ∆z at different redshifts: ∆z = 0.05 at z < 0.45, ∆z = 0.51 at z ∼ 1, ∆z = 0.2 at 1.9 < z < 3.0 and ∆z = 0.35 at 3.0 < z < 3.6, respectively. About 71% of the sightlines have a mean flux at 1σ<F > and about 55% have a mean flux at 2σ<F >. This considerable cosmic variance depends largely on the occurrence rate of passing through intervening overdense or underdense environments such as galaxy groups or galaxy voids. Note that the large discrepancy from the Becker measurement noted as the cyan cross (Becker et al. 2013) is mainly caused by the fact that our sample does not have enough sightlines at z > 3, given that the cosmic variance is important.
The overlaid solid black curve is a conventional single power-law fit to individual measurements at 0 < z < 3.6, ln <F > = −τ eff = A0(1+z) α with A0 = −0.0060±0.0001 and α = 2.87 ± 0.01. Note that we used a median error ±0.005 of the UVES/HIRES data as the error of both COS/STIS individual <F > for this fit, since the adopted error of the latter incorrectly gives more weight to the UVES/HIRES data at z > 1.5. This simple single power law over-predicts <F > at z < 1.5, i.e. less absorption than the observations. The suggested single exponential fit (red curve) by Oñorbe et al. (2017) also overpredicts the observations at z < 1.5, more than a simple power law.
In fact, <F > increases faster (less absorption) from z = 3.6 → 1.5, slows down at z ∼ 1, then becomes almost invariant at z < 0.5. This requires a more complicated fitting function. If a double power law to individual data points is assumed, A0 = −0.0145 ± 0.0003 and α = 1.86 ± 0.07 at z < 1.5 (magenta dashed curve) and A0 = −0.0040 ± 0.0001 and α = 3.18 ± 0.02 at z > 1.5 (orange dashed curve), respectively. Note that a single power-law fit at z < 0.5 is similar to the fit at z < 1.5: A0 = −0.0142 ± 0.0004 and α = 2.06 ± 0.16 (not shown). This means that the mean flux does not show any abrupt evolutionary change at z < 1.5.

The observed flux PDF
The upper panel of Fig. 11 shows the mean PDF, <P (F )>, measured from a single, long spectrum combined from all the H i-only AGN spectra as filled circles at each z bin. Table 5 lists <P (F )> and their errors estimated from the modified jackknife method. The absorption path length ∆X noted in each panel provides a relative sample size, as the number of included pixels is meaningless due to the different pixel size for the different data sets. The green open squares at z ∼ 0.08 and 0.25 are from a subset of high-S/N COS spectra. A factor of 10 smaller ∆X at z ∼ 0.25 causes <P (F )> from the subset sample to be ∼25% smaller, demonstrating importance of a large sample to reduce systematic bias.
At the redshift bin with a large number of AGN, the individual PDF (thin gray curves) varies significantly, 40-50% at F ∼ 0.5. This sightline variance becomes stronger at lower redshifts. This is in part caused by the fact that the number of pixels per sightline is on average a factor of 18 smaller at z < 0.5 than at z > 2.5, i.e. coverage bias, and in part by the fact that the forest clustering increases at lower redshifts (Kim et al. 1997).
In thez = 3.38 panel, a noticeable difference exists between the PDF measured by Calura et al. (2012) (open dark-orange squares) and our measurements at F > 0.5. Although within 1σ errors, the amount of difference depends on F , suggesting that the main cause of the discrepancy might be the continuum uncertainties at high redshifts (Calura et al. 2012), in addition to the small number of sightlines included in both studies and the different redshift range studied. In thez = 2.99 panel, the open orange squares are the PDF at z = 3.0 measured by McDonald et al. (2000), about 1.7σ larger than our present measurements. The mean H i flux as a function of z (upper x-axis) and log(1 + z) (lower x-axis) is plotted as gray filled circles for individual sightlines and as filled circles for the averaged mean flux for each z bin. The x-axis error is the z range. The y-axis error is the 0.25 r.m.s. of unabsorbed regions for individual sightlines and is the sum of the jackknife error and standard deviation of the errors of individual <F > in each bin for the averaged mean flux. The inset plot shows the sightline variation at low z more clearly. In both panels, the solid curve is a single power-law model for our individual measurements at 0 < z < 3.6, while the magenta and orange dashed curves are the fit for z < 1.5 and z > 1.5, respectively. The red curve is the single exponential fit τ = 0.00126 × e (3.294× The discrepancy is in part caused by their imperfect metal removal as metal contamination increases <P (F )> especially at 0.2 < F < 0.6 (Kim et al. 2007), and in part by the sightline variance as their sample size is smaller by a factor of 2. In the same panel, the open purple triangles are our previous measurement at z = 2.94 which are ∼1.4σ smaller (Kim et al. 2007). Since we treated the data in a similar manner in both studies, the discrepancy is likely due to the fact that our older sample size is 2 times smaller and the measurement was done at a slightly lower z. At each z, the overall shape of <P (F )> is a convex function with the z-independent minimum at F ∼ 0.2: <P (F )> rapidly decreases at F = 0.0 → 0.2, then it increases slowly at F = 0.2 → 0.6 and rapidly at F = 0.6 → 1.0. At a given F , <P (F )> decreases rapidly as z decreases (the lower panel), consistent with the higher mean flux (lower H i absorption) at lower z. If the line width of a typical H i line is assumed to be b ∼ 25 km s −1 , F = 0.3 (F = 0.7) corresponds to log NH i ∼ 13.7 (13.1). This approximately translates that only lines with log NH i 13.7 can contribute to the PDF at F ∼ 0.3. If we ignore the b-dependence on z and NH i, a factor of 18 lower <P (F = 0.3)> atz = 0.08 than at z ∼ 3.37 indicates that the number of H i absorbers with log NH i 13.7 is a factor of 18 lower atz = 0.08.
The z-evolution of the PDF is more clearly illustrated in Fig. 12 with a larger F bin size ∆F = 0.1 to decrease a statistical fluctuation caused by a smaller F range. The overlaid  2007)). Lower panel: The difference between the mean PDF atz = 3.38 and at the given redshift. dashed line is a single power-law fit P (F, z) = C0(1 + z) C 1 at 0 < z < 3.6, while the solid line is a double power-law fit at z < 1.5 and z > 1.5, respectively, with the fit parameters listed in online Table S2 on the MNRAS web site.
This evolution reflects the fact that Lyα forest absorption typically probes rarer, higher density gas toward lower redshift due to the evolution of the UVB and the decrease in the proper density of gas in the IGM . Although a different IGM structure corresponds to a different F (or NH i) at a different z due to large-scale structure evolution (Davé et al. 1999;Schaye 2001;Hiss et al. 2018), the pixels with 0.2 < F < 0.7 and F ∼ 1 can be considered to sample roughly the filaments/sheets and cosmic flux voids (under-dense regions and regions under enhanced ionisation radiation) of the low-density IGM structure, re-spectively. The <P (F, z)> measurements shown in Fig. 12 qualitatively suggest that the volume fraction of flux voids increases rapidly from z ∼ 3.5 down to z ∼ 1.5, reflecting the higher Hubble expansion rate and also probably the rapidly increasing number of UV H i ionising photons compared to lower redshifts (Theuns et al. 1998a;Davé et al. 1999;Haardt & Madau 2012). The volume fraction increases slowly at z < 1.5. In contrast, the volume fraction occupied by IGM filaments and sheets decreases continuously with time, faster at z > 1.5 and slower at z < 1.5.

ABSORPTION LINE STATISTICS
Our three data sets almost fully resolve the IGM H i at log NH i 17. Therefore, the reliability of absorption line statistics combined from lower-S/N COS/STIS data and higher-S/N UVES/HIRES data is largely dependent on the chosen H i column density range for which each data set provides robust fitted line parameters, i.e. above the detection limit of NH i. In order to obtain a reliable NH i of saturated lines, our fiducial line parameter for absorption line statistics is from the Lyman series fit.

The H i column density distribution function
The H i column density distribution function (CDDF) is an analogue of the galaxy luminosity function. It is defined by the number of absorbers per H i column density and per absorption distance path length dX as defined by Eq. 1 (Rahmati et al. 2012): where n is the number of absorbers in a column density range dNH i centred on NH i and in the redshift range dz centred on z. Tables 1, 2 and 3 list dX without excluded regions, i.e. the Galactic ISM-contaminated regions. Since photons produced by the UVB, stellar and/or AGN feedback affect the observed NH i, comparisons between the observed and simulated CDDFs have been used to probe the importance of these effects (Kollmeier et al. 2014;Shull et al. 2015;Gurvich et al. 2017;Viel et al. 2017;Gaikwad et al. 2019).
At z ∼ 3, the shape of the CDDF at the entire observable range log NH i ∈ [12.5, 22.0] displays various dips and knees due to the non-uniform spatial distribution of H i absorbers, importance of self-shielding, changes in the UVB and the ionisation state of absorbers and galaxy feedback (Noterdaeme et al. 2009;Davé et al. 2010;Prochaska et al. 2010;Altay et al. 2011;Rahmati et al. 2012;Kim et al. 2013;O'Meara et al. 2013;Rudie et al. 2013). However, it has been customary to fit the CDDF with a power law over a smaller NH i range, f = BN −β H i , with β ∼ 1.5 at z ∼ 3 for the forest (Carswell et al. 1987;Petitjean et al. 1993;Hu et al. 1995;Kim et al. 2013;Rudie et al. 2013).
Since the same NH i samples a higher overdensity at lower z, i.e. probing different structures at different z, the slope β is also expected to change with z due to structure formation/evolution. Indeed, various simulations have predicted a steepening of the CDDF slope from ∼1.5 at z = 2 to ∼1.9 at z ∼ 0 (Paschos et al. 2009;Davé et al. 2010;Tepper-García et al. 2012;Nasir et al. 2017). A few existing low-z IGM studies at z < 2 find indeed a steeper β ∼ 1.7 at log NH i ∈ [13, 16], without any hint of dips and knees (Lehner et al. 2007;Janknecht et al. 2006;Tilton et al. 2012;Danforth et al. 2016). Penton et al. (2004) suggested a deviation from a single power law at z ∼ 0.03. However, their H i column density was converted from the equivalent width assuming a fixed b value for all H i lines, which may result in an incorrect conclusion.
The upper panel of Fig. 13 shows the logarithmic CDDF, log f , measured from the Lyman series fit (orange dots) and the Lyα-only fit (black dots). The shown CDDF is measured at log NH i ∈ [12.5, 17.0] with a log NH i bin size varying randomly between 0.1 and 0.5 to capture the various CDDF features in details. A total of 53 such measurements were performed with ∼500 data points per unit log NH i. This approach can produce several CDDF measurements at the same NH i, but each CDDF is measured over a different ∆ log NH i, e.g. the number of lines whose NH i is in log NH i = 13.5 ± 0.3 vs log NH i = 13.5 ± 0.5. A large scatter at a given NH i indicates that the lines whose NH i is around this NH i are rare and are not uniformly distributed in redshift space. At z ∼ 0.25, there are no lines with log NH i ∼ 15.7.
When there are no lines at NH i ± ∆NH i, the CDDF is not shown. This is more evident at log NH i 15.5, as higher-NH i lines are rarer, thus requiring more sightlines. At z ∼ 1, there exist no lines at log NH i ∈ [15.2, 17.2] from the Lyman series fit (only 39 lines at log NH i ∈ [13.5, 16.0] vs 92 lines from the Lyα-only fit). Therefore, there is no CDDF measurement at log NH i 15.3 from the Lyman series fit (orange dots). The difference between the CDDFs measured from the Lyman series and Lyα-only fits is evident at log NH i > 14.5, where the line parameters of saturated lines cannot be reliably measured from Lyα only.
The turnover of the CDDF at log NH i ∼ 12.5-13.0 is mainly caused by incompleteness as expected from Fig. 8. Due to noise including COS fixed pattern noise (FPN), limited S/N and line blending, all the real absorption lines around the detection limits of NH i and b cannot be detected, causing a CDDF turnover below the NH i detection limit. In fact, the CDDF measured from a subset of highest-S/N COS data (S/N > 30) atz = 0.08 shows the higher CDDF at log NH i 13.1 (green circles). Without a full FPN characterisation, the non-Gaussian COS LSF and low-S/N varying along the same COS spectrum, we did not attempt to do incompleteness corrections for COS data. Similarly, without knowing the amount of line blending at higher redshifts in addition to the continuum uncertainty, we also did not correct incompleteness for STIS/UVES/HIRES data.
In the upper and lower panels, the red dot-dashed line is a power-law fit to the Lyman-series-fit H i lines at log NH i ∈ [13.5, 16.0] atz = 3.38, while the blue dashed line is a power-law fit at each z ( Table 6). The fit error is the Figure 13. Upper panel: The logarithmic CDDF (log f ) as a function of log N H i . The orange and black dots are the CDDF measured from the Lyman series and the Lyα-only fits above the detection limit for each z bin, while gray dots are measured from the Lyman series fit below the detection limit. The filled green circles atz = 0.08 andz = 0.25 plot the CDDF at log N H i 13.5 measured from a subset of the high-S/N COS spectra used in Fig. 11 standard deviation of the 53 sets of the CDDF measurements shown in Fig. 13. The slope β of the CDDF is sensitive to the column density range fitted (Kim et al. 2013). The fit becomes more reliable with a larger fitting range because small-scale deviations from the power law are smoothed out. At log NH i < 14.5, IGM H i lines are more uniformly distributed in the intergalactic space for the CDDF to follow a power-law distribution. In contrast, at log NH i > 14.5, the IGM distribution starts to show irregularity. This is in part due to a stronger clustering of higher-NH i absorbers (Kim et al. 1997, D16) and in part due to a lower number of higher-NH i absorbers, i.e. 81 absorbers at log NH i ∈ [13.5, 13.8] versus two absorbers at log NH i ∈ [15.5, 15.8] from the Lyman series fit atz = 3.38. Therefore, determining a reliable shape for the CDDF at NH i 14.5 requires a larger total path length to decrease the fluctuations by these effects. Interestingly, this NH i range at log NH i 14.5 is also where the intergalactic H i starts to reside in collapsed regions and to interact with galaxies through IGM accretion and stellar/AGN feedback. The interaction between the IGM and galactic outflows affects the small-scale distribution of high-NH i absorbers around galaxies, which might result in the stronger clustering and the deviation from a power-law CDDF. In addition, the IGM temperature-density relation starts to break down at log NH i > 14.5 (Hui & Gnedin 1997;Theuns et al. 1998b;Davé et al. 2010;Peeples et al. 2010;Martizzi et al. 2019).
The impact of incompleteness and the non-uniqueness of fitted line parameters including spurious lines on the CDDF are more significant at low NH i as better illus-  (Tilton et al. 2012). Matching the observations and simulations at low-NH i end should be approached with caution. The left panel of Fig. 15 displays the redshift evolution of the overall shape of the CDDF. The CDDF shape at lower redshifts can be reproduced by a small amount of clockwise rotation of a higher-z CDDF with a slightly larger CDDF normalisation B. This is caused by a fact that the number Figure 14. Comparisons between the four CDDF measurements per unit redshift dz instead of dX at the low-N H i end. Incompleteness causes a turnover in our COS CDDF (filled circles) at log N H i ∼ 13.1 (the shaded region), while the incompleteness-corrected D16 COS CDDF continuously increases at log N H i 13.1. The CDDF calculated from the high-S/N subsample (green filled circles) used in Fig. 13 abruptly increases at log N H i ∼ 13.0. Note that the CDDF plotted depends both on log N H i and ∆N H i . Figure 15. Left panel: The CDDF from the Lyman series fit at the seven redshift bins. The symbol sizes atz = 3.38 (highest z) and 0.08 (lowest z) are a factor of 2 larger to contrast the CDDF over the largest z interval as its z-evolution is weak. The Poisson errors are not shown for clarity. Right panel: The CDDF slope β measured at log N H i ∈ [13.5, 16.0] from the Lyman series fit (filled circles) and the Ly-α-only fit (open red squares). Only errors larger than the symbol size are plotted.
of H i absorbers decreases faster at higher NH i and at lower z with a self-similar manner in terms of the evolution of the large-scale structure and the degree of the IGM-galaxy interaction as a function of z.
The right panel indicates that the CDDF slope β from the Lyman series fit in general becomes steeper as z decreases, if thez αβ = 1.03 CDDF is excluded due to the smallnumber statistics (Table 6). Atz αβ = 1.03, a very small ∆X coverage also decreases a probability of detecting less common H i absorbers at log NH i > 15.0. These lead to a factor of 3 larger statistical error than at other redshifts. This trend also holds for β estimated from the Lyα-only fit. On the other hand, the lower β at z ∼ 3 compared to at the adjacent z seems to be real as the number of analysed lines are large enough to obtain a reliable β. Due to a lack of data at z > 3.5 we cannot discard a possibility of β continuously increasing at z = 3 → 4 with a local minimum at z ∼ 3, which could be caused by a change in the IGM NH i distribution due to extra heating and ionisation by He ii reionisation at z ∼ 3. (Reimers et al. 1997;Songaila 1998;Syphers & Shull 2013;Worseck et al. 2016).

The forest gas-phase mass density
One of the key cosmological parameters constrained by the IGM is the gas-phase hydrogen mass density relative to the critical density of the universe (ΩH). This ΩH is modeldependent and is bound to be revised with an advent of more realistic models and with a better constraint on the UVB, the characteristic size of the IGM geometry and a density profile (Schaye 2001;Penton et al. 2004;Tilton et al. 2012). We used a simple method developed by Schaye (2001) to obtain a qualitative trend over time: where fg is a mass fraction in gas-phase hydrogen, the hydrogen photoionisation rate Γ12 ≡ ΓH i×10 −12 s −1 and the gas temperature T ≡ T4 × 10 4 K, respectively, for our assumed cosmology h = 0.7 (Schaye 2001). Strictly speaking, this holds only for overdense regions, i.e. log NH i 13.5 at z ∼ 3 and ΩH can be under-estimated by ∼20% at log NH i 13.5 (Penton et al. 2004).
We directly integrated f (NH i, dX) as shown in Fig. 13 over several different column density ranges with the ±1σ Poisson errors. The model-independent factor 2.2 × 10 −9 h −1 N 1/3 HI f (NHI , dX) dNHI and the modeldependent ΩH are are tabulated in an online table on the MNRAS website (Table S3). The model-independent factor is a purely observational quantity and will not be likely to be changed significantly within our adopted column density range at log NH i ∈ [13, 16] in the near future. For the modeldependent ΩH, Γ12 is interpolated from the HM01 QG UVB at the given z, while fg and T were interpolated from the IllustrisTNG simulation (Martizzi et al. 2019, their Table 1 and Fig. 4, respectively). For simplicity, we assume that the observed Lyα forest is mostly from the cool diffuse IGM and the halo gas in filaments and sheets and that the minimum temperature of simulated filaments at log nH = −4 is a fair representative of the IGM temperature. An uncertainty of 10% in Γ12, fg and T4 changes ΩH by ∼3%, ∼3% and ∼6%, respectively, indicating that T4 is the most important model-dependent parameter. However, the uncertainties associated with T4 and Γ12 are likely to be different, especially at low redshifts. Simulated distributions of H i line widths which are determined by gas temperature and nonthermal motion are not in agreement with observations by a factor of ∼2 at z ∼ 0.1 . Several studies also suggest a factor of 2-5 larger Γ12 than the widely-used theoretical prediction by Haardt & Madau (2012) at z ∼ 0.2 (Kollmeier et al. 2014;Shull et al. 2015;Wakker et al. 2015;Faucher-Giguère 2020).
Being fully consistent with previous studies, the low-NH i absorbers at log NH i ∈ [13, 15] contain most baryons at z > 2.5, but their contribution decreases down to about 22% at z ∼ 0 (Rauch et al. 1997;Shull et al. 2012;Danforth et al. 2016). The relative contribution to Ω b by absorbers at log NH i ∈ [13.0, 14.5] and at log NH i ∈ [14.5, 16.0] is about 4.5 at z ∼ 0 and 2 at z ∼ 3.4, which reflects a steeper slope of the CDDF at lower z. Due to incompleteness at log NH i ∼ 13, it is not currently possible to constrain the contribution to Ω b by these weaker absorbers.

Absorption line number density dn/dz
The H i absorber number density, dn/dz, is defined as the number of absorbers per unit redshift. It is proportional to the cross section and comoving number density of absorbers. It is usually measured over a specified H i column density range and its evolution as a function of z is traditionally described as a single power law, dn/dz = n0×(1+z) γn , where n0 is the number density at z = 0. Due to the growth of structure, the same H i column density corresponds to a higher overdensity at lower z (overdensity δ = ρ/ρo, where ρo is the cosmic mean matter density): log NH i = 15 corresponds to δ ∼ 100 (inside halos) at z = 0 and δ ∼ 6 (the diffuse IGM) at z = 3 (Davé et al. 1999). In addition, star formation and feedback is predicted to affect the H i absorbers close to galaxies Nasir et al. 2017). Therefore, dn/dz is expected to change with NH i and z, constraining structure evolution (Theuns et al. 1998a;Schaye 2001;Davé et al. 2010;Williger et al. 2010;Kim et al. 2013). Figure 16 displays the dn/dz evolution from the Lyαonly (upper panels) and Lyman series (middle panels) fits at two different NH i ranges, at log NH i ∈ [13.5, 14.5] (left panels) and at log NH i ∈ [14, 17] (right panels), respectively. The criterion of S/N > 18 for COS/STIS spectra enables detection of H i at log NH i 13. However, since the NH i detection limit varies with b and the two sightlines at z ∼ 1 have S/N ∼ 10-18, to be conservative we use a lower NH i limit of log NH i = 13.5.
The most striking feature of dn/dz in the upper and middle panels of Fig. 16 is a large scatter in individual dn/dz (gray filled circles, tabulated in the supplementary online Tables S4 and S5 on the MNRAS webpage) at any given redshift for both NH i ranges. The scatter becomes larger at lower redshifts, spanning about an order of magnitude at z ∼ 0 (Fig. 17). About half the COS AGN sightlines at z < 0.5 do not contain an absorber at log NH i 14.5. At the same time, Fig. 17 indicates that 9% (5/55) of sightlines at z < 0.5 contain more absorbers with 8σ at log NH i ∈ [13.5, 14.5] than the averaged dn/dz, compared to none at z > 1.5. The contrast between extremely high and low dn/dz becomes more prominent at a higher column density range and at low redshifts. A dn/dz study based on 27 STIS/FUSE spectra at z > 0.02 (Tilton et al. 2012, filled orange squares) is consistent with our individual COS dn/dz: although the STIS resolution is 3 times higher, its S/N is much lower and a different method was used for estimating NH i. Even though not shown, the D16 individual dn/dz also shows a large scatter.
This large scatter is in part intrinsic caused by a stronger clustering of stronger absorbers (a large positive σ dn/dz combined with a sightline without strong absorbers) toward lower z as a result of structure evolution, cooled-down galactic outflows near star-forming galaxies and enhanced H i ionizing photons (Dobrzycki et al. 2002;Dall'Aglio et al. 2008;Davé et al. 2010;Nasir et al. 2017). The scatter is also in part caused by the different z coverage for each sightline. This redshift coverage bias is especially significant at z ∼ 0 and z ∼ 2 as the wavelength coverage is smaller due to the rest-frame Lyα and atmospheric cutoffs, respectively. The large scatter due to both cosmic variance and redshift coverage bias implies that the dn/dz study requires many sightlines, especially at lower z. A small sample size is the primary reason of the earlier discrepancy between the FOS and STIS dn/dz studies as the STIS dn/dz was measured using only a few sightlines (Lehner et al. 2007;Williger et al. 2010).
While the parameter space occupied by individual dn/dz measurements on the z-dn/dz plane is important for constraining the inhomogeneity of H i distribution, the averaged dn/dz (filled circles) is a better quantity to directly compare to simulations which usually average thousands of sightlines. To reduce redshift coverage bias, the averaged dn/dz is measured from the combined line lists of all the AGN per NH i and per z instead of an arithmetic mean. Considering the large scatter in the individual dn/dz, the commonly-used Poisson errors seem to underestimate the real errors. Therefore, we include the bootstrap error measured from the combined lines for each z bin in addition to the Poisson errors. The averaged dn/dz is tabulated in Table 7 with the first dndz error being the Poisson error and the second error being 0.5 times the standard deviation.
In Fig. 16, the green solid line is a single power-law fit to the individual dn/dz at 0 < z < 3.6. Due to the large scatter at a given z, this single power-law fit roughly describes the overall individual dn/dz evolution, although a underprediction of dn/dz is suggested at z > 3.6. The blue dot-dashed line shows a best-fit single power law to the averaged dn/dz at z > 1.5. At both NH i ranges for the Lyα-only and Lyman series fits, this fit underpredicts the dn/dz at z < 0.5 (Weymann et al. 1998;Kim et al. 2013). The discrepancy is larger at the higher-NH i range, since stronger absorbers are expected to disappear more rapidly at lower z when extrapolated from high z. The red dashed lines represents a best-fit single power law to the averaged dn/dz at z < 1.5. This fit underpredicts the observed dn/dz at z > 1.5. The fit parameters are listed in Table 8. Note that there is no significant  [13.5, 14.5]. The individual and averaged dn/dz are shown as filled gray and black circles, respectively. The x-axis errors indicate the redshift range, while the y-axis errors are the 1σ Poisson error accounting for lines with a questionable identification. The blue dot-dashed and red dashed lines are a best-fit single power law to the averaged dn/dz at z > 1.5 and z < 1.5, respectively. The green solid line is a single power-law fit to the individual dn/dz at 0 < z < 3.6. Left middle panel: dn/dz from the Lyman series fit. Left lower panel: The difference in log dn/dz between the Lyα-only and Lyman series fits. The line number density dn/dz from the Lyα-only fit was recalculated over the same z range used in the Lyman series fit. Right panels: Same as the left panels except for log N H i ∈ [14,17]. When there is no line, log dn/dz is set to be 0.41 without y-axis errors at the bottom of the panel. The open blue triangles are dn/dz from HST/FOS spectra (Weymann et al. 1998), converted from equivalent width measurements assuming b = 25 km s −1 . In the upper panel, the yellow shade outlines the dn/dz range from theoretical predictions by Davé et al. (2010) and Nasir et al. (2017), while the pink dot-dot-dot-dashed curve is a prediction by Davé et al. (1999). Figure 17. Deviation of the individual dn/dz from the averaged dn/dz for the Lyman series fit. The deviation is calculated using the Poisson error of the averaged dn/dz within a given redshift range ∆z excluding the sightline in consideration: ∆z = 0.05 at z < 0.45, ∆z = 0.48 at z ∼ 1, ∆z = 0.2 at 1.9 < z < 3.0 and ∆z = 0.35 at 3.0 < z < 3.6, respectively. For a sightline without H i absorbers in a given column density range, σ dn/dz is assigned to be −10 with gray circles. The positive deviation indicates that the sightline contains more H i absorbers than the averaged dn/dz. difference between the single power-law fits to the averaged dn/dz (not shown) and the individual dn/dz (green solid line).
For both NH i ranges, the inadequacy of a single powerlaw fit is consistent with the evolution of <F > and the PDF -there exists an IGM evolutionary break at z ∼ 1.5-1.7 and the stronger absorbers evolve more strongly, i.e. a larger γn (Theuns et al. 1998a;Scott et al. 2000;Kim et al. 2013). Ribaudo et al. (2011) find that the averaged dn/dz of Lyman limit systems at log NH i 17.5 at 0.0 < z < 2.6 is well described with γn = 1.33±0.61. Although the errors are large for both studies and their dn/dz does not show any evolutionary break at z ∼ 1.5, their γn combined with ours at 0 < z < 3.6 at log NH i ∈ [13.5, 14.0] (γn ∼ 1.10) and [14.0, 17.0] (γn ∼ 1.27) suggests that γn increases with NH i.
At log NH i ∈ [14, 17] (right upper panel), the yellow shade represents the predicted dn/dz evolution in terms of NH i instead of the equivalent width, compiled from various simulations from outflow models to no-wind models under a quasars+galaxies UVB Nasir et al. 2017). As outflows eject the processed gas into halos, which subsequently cools down and produces strong absorbers, outflow models tend to predict higher dn/dz. However, it is clear that these models significantly underpredict the observed dn/dz by a factor of ∼3-5, suggesting that saturated Lyα absorbers at low redshift are not yet correctly simulated.
In the same panel, the pink dot-dot-dot-dashed line is a predicted dn/dz by Davé et al. (1999) under the quasarsonly UVB. Their prediction is in a strikingly good agreement with our measurement. This can indicate that the quasarsonly UVB is more preferable than the quasars+galaxies UVB under which latest simulations including more recent models by Davé et al. (2010) do not reproduce the observations. However, this comparison is potentially complicated by the fact that their Λ-CDM model is based on outdated cosmological parameters such as ΩΛ = 0.6 without incorporating any extra heating source such as He ii photoheating at z > 2 (Syphers & Shull 2013;Worseck et al. 2016;Nasir et al. 2017) nor any stellar/AGN feedback. The simulation includes only 64 3 particles in a small box of side length 11h −1 comoving Mpc so that it does not resolve the IGM gas particles as well as some of current IGM simulations Nasir et al. 2017;Martizzi et al. 2019).
The lower left panel displays the difference in dn/dz between the Lyα-only and Lyman series fits. At log NH i ∈ [13.5, 14.5], a few individual sightlines display a difference up to ∼20%. However, there is no noticeable difference in the average except at z ∼ 1 where the averaged dn/dz suffers from small number statistics. At log NH i 14.5, Lyα lines start to saturate in the UVES/HIRES spectra. Since some saturated lines can be resolved into several weaker components in higher order lines and since the Lyα-only fit in general gives a lower NH i limit for saturated lines, the difference between the two sets becomes more noticeable at higher NH i (lower right panel). At log NH i ∈ [14, 17], the dn/dz from the Lyman series fit is about a factor of 1.2 larger than from the Lyα-only fit.

CONCLUSIONS
We performed a new uniform, consistent Voigt profile fitting analysis on the 84 high-quality AGN spectra from the HST/COS, HST/STIS, VLT/UVES and Keck/HIRES archives in order to characterise the redshift evolution of the transmitted flux F and column density of neutral hydrogen H i of the low-density IGM at 0 < z < 3.6. Although this data set does not sample the IGM continuously in redshift space, the selected redshift ranges are the best compromise within the capabilities of currently available ground-based and space-based spectrographs: • VLT-UVES/Keck I-HIRES set consists of 24 QSO spectra with a resolution of ∼6.7 km s −1 and a S/N ratio per resolution element of 40-250, sampling the IGM at 1.7 < z < 3.6 with the total z coverage ∆z = 11.59 for the Lyα-only fit range. The typical NH i detection limit is log NH i ∼ 12.5.
• HST/STIS+COS NUV set covers the IGM at z ∼ 1 with ∆z = 1.27 (the Lyα-only fit). The set includes two QSO spectra from the HST/STIS archive supplemented with our new observations of three QSO spectra taken with the HST/COS NUV G225M grating. The approximated Gaussian resolution of STIS E230M and COS NUV spectra is ∼10 km s −1 and ∼12 km s −1 , respectively, with a non-Gaussian wing. The S/N range is ∼13-46. The log NH i detection limit is ∼13.
• HST/COS FUV set has 55 AGN spectra with S/N ∼ 18-85 covering the IGM at 0 < z < 0.5 with ∆z = 7.20 (the Lyα-only fit). The resolution can be approximated to ∼19 km s −1 with a non-Gaussian wing. The log NH i detection limit is ∼13.
For the continuous flux statistics, we used artificial spectra generated from the Lyα-only fit since it can use a larger wavelength range than the Lyman series fit for which the useful wavelength is sometimes shortened due to the need for coverage of high-order Lyman lines. The generated spectra also enable to combine the COS/STIS spectra having a non-Gaussian line spread function with the UVES/HIRES data having a Gaussian one and to remove the metal contamination. Our consistent analysis based on the best data currently available confirms previous findings qualitatively (Weymann et al. 1998;Penton et al. 2004;Lehner et al. 2007;Tilton et al. 2012;Danforth et al. 2016) and provides more robust quantitative results. We have found: (i) The mean transmitted H i flux is not sensitive to S/N, nor supposedly undetected weak lines due to noise. While the flux PDF (probability distribution function, the fraction of pixels having a given normalized flux F ) is not sensitive to undetected weak lines, the flux PDF is directly comparable among different S/N data only at 0.1 < F < 0.7.
(iii) The mean PDF as a function of F and z, <P (F, z)>, qualitatively suggests that the volume fraction occupied by flux voids (F ∼ 1) increases rapidly at z = 3.6 → 1.5, then increases slowly at z < 1.5. With no absorption defined as F = 1, this evolution reflects the thinning of the forest toward lower redshift, due to the evolution in the gas proper density and the intensity of the UV background.
For the NH i distribution, we used the Lyman series fit for more reliable determination of NH i for saturated H i Lyα. For the UV spectra taken with the HST, a corresponding non-Gaussian LSF provided for each instrument setting and observation date is used. At log NH i ∈ [13.5, 16.0] where incompleteness is negligible, 24 UVES/HIRES spectra at 1.9 < z < 3.6, two STIS+three COS NUV spectra at z ∼ 1 and 55 COS FUV spectra at 0 < z < 0.45 provide 1798 (2043), 39 (93) and 371 (360) H i lines, respectively, for the Lyman series (Lyα-only) fit. We have found: (i) The redshift evolution of the column density distribution function (CDDF), albeit weak over a small z range, is such that the overall shape of the CDDF at lower redshifts can be reproduced by a small amount of clockwise rotation of a higher-z CDDF with a slightly larger normalisation (bottom panels of Fig. 13 and left panel of Fig. 15).
(ii) For a conventional fit to the CDDF, f ∝ N −β H i , the slope β at log NH i ∈ [13.5, 16.0] in general becomes steeper at lower z: β = 1.60±0.02 at z ∼ 3.4 and β = 1.82±0.03 at z ∼ 0.1. This reflects that higher-NH i absorbers disappear more rapidly and decrease in number or cross-section over time.
(iii) The slope β is lower than the overall trend at z ∼ 1 where an evolutionary break in the flux statistics is seen and at z ∼ 3. The deviation at z ∼ 1 could be spurious due to the small sample size, while the deviation at z ∼ 3 could be caused by a change in the NH i distribution due to extra heating and ionisation by the hypothetical He ii reionisation at z ∼ 3. A further study with more data at z ∼ 1 and at z > 3.6 is required to confirm the β deviation.
(iv) The individual dn/dz (the number of absorbers per unit z) shows a large scatter at a given z. The scatter increases toward lower z and spans about an order of magnitude at z ∼ 0, possibly caused by a combination of a stronger clustering at lower z, outflows near star-forming galaxies, locally enhanced H i ionization rates and a shorter redshift coverage of some sightlines.

DATA AVAILABILITY
The data underlying this article are available in the article and in its online supplementary material. When the fitted line parameters will be completely analysed in our future papers, the entire fitted line list will be available online.

ACKNOWLEDGMENTS
We are grateful to everyone in the COS, STIS, UVES and HIRES instrument teams for building such superb spectrographs, which makes possible this study. This research has made use of the services of the ESO Science Archive Facility. This also has made use of the Keck Observatory Archive  Notes -a: if left blank, it is the same as for the Lyα-only fit. Table 2. Simple fit parameters of the mean PDF 0 < z < 3.5 z < 1.5 z > 1.5   Notes -a: the excluded wavelength is accounted for. b: the number in the parenthesis is the number of uncertain H i absorption lines in a given column density range. All of them are from COS AGN. Notes -a: the excluded wavelength is accounted for. b: the number in the parenthesis is the number of uncertain H i absorption lines in a given column density range. All of them are from COS AGN.