Searching for the shadows of giants II: the effect of local ionisation on the Lyman-$\alpha$ absorption signatures of protoclusters at redshift $z\sim2.4$

Local variations in the intergalactic medium (IGM) neutral hydrogen fraction will affect the Ly-$\alpha$ absorption signature of protoclusters identified in tomographic surveys. Using the IllustrisTNG simulations, we investigate how the AGN proximity effect and hot, collisionally ionised gas arising from gravitational infall and black hole feedback changes the Ly-$\alpha$ absorption associated with $M_{z=0}\simeq10^{14}\,M_\odot$ protoclusters at $z\simeq2.4$. We find that protocluster galaxy overdensities exhibit a weak anti-correlation with Ly-$\alpha$ transmission in IGM transmission maps, but local HI ionisation enhancements due to hot $T>10^{6}\rm\,K$ gas or nearby AGN can disrupt this relationship within individual protoclusters. On average, however, we find that strong reductions in the IGM neutral fraction are limited to within $\lesssim 5h^{-1}\,\textrm{cMpc}$ of the dark matter haloes. Local ionisation enhancements will therefore have a minimal impact on the completeness of protocluster identification in tomographic surveys if smoothing Ly-$\alpha$ transmission maps over scales of $\sim4 h^{-1}\,\textrm{cMpc}$, as is typically done in observations. However, if calibrating the relationship between the matter density and Ly-$\alpha$ transmission in tomographic maps using simple analytical models for the Ly-$\alpha$ forest opacity, the presence of hot gas around haloes can still result in systematically lower estimates of $M_{z=0}$ for the most massive protoclusters.


INTRODUCTION
A fundamental prediction of ΛCDM cosmogonies is that galaxy clusters are built from the assembly of lower mass haloes such as galaxy groups and isolated galaxies (e.g. White & Frenk 1991). At low redshifts, clusters are single dark matter haloes, filled with massive, evolved, early-type galaxies orbiting the brightest cluster galaxy. The progenitor of this halo at redshift > 2 is a diffuse collection of smaller halos spread over tens of comoving Mpc, all of which are rapidly growing and merging (Chiang et al. 2013;Muldrew et al. 2015). These ensembles of gravitationally bound but not yet virialised structures are known as protoclusters (Overzier 2016). They are the highest density regions in the early Universe and therefore are the most active sites of structure assembly.
Traditionally, protoclusters are located as galaxy overdensities in either photometric (Daddi et al. 2009;Chiang et al. 2014) or spectroscopic (Steidel et al. 2005;Cucciati et al. 2014;Chiang et al. 2015;Lemaux et al. 2017;Harikane et al. 2019) surveys. Hundreds of protocluster candidates have now been identified as re-★ E-mail: joel.miller@nottingham.ac.uk gions of high galaxy density on scales of a few to tens of arcminutes (Wylezalek et al. 2013;Toshikawa et al. 2018). A small but growing number of protoclusters have been selected not by their galaxy properties, however, but by their gaseous properties from the dominant baryonic component that lies in the intergalactic/intra-protocluster medium. X-ray and mm-wavelength observations -the latter via the Sunyaev-Zel'dovich effect (Sunyaev & Zeldovich 1972) -are used to identify the ∼ 10 7 K intracluster medium within massive collapsed clusters and groups (Ebeling et al. 2010;Finoguenov et al. 2010;Bleem et al. 2015), leading to the detection of clusters up to redshifts of = 1.7 (Strazzullo et al. 2019). Yet the majority of the gas by volume within protoclusters has not yet been shock heated to X-ray emitting temperatures and instead has ∼ 10 4 K (Miller et al. 2019).
Fortunately, neutral hydrogen in the intergalactic medium (IGM) also traces the underlying dark matter structure closely on scales 1ℎ −1 cMpc (Croft et al. 2002;Viel et al. 2004). Protoclusters at redshift > 2 have sizes of 10 − 50ℎ −1 cMpc (Muldrew et al. 2015) so their large overdensities may be traced by neutral hydrogen. This neutral hydrogen can be detected in the spectra of background quasi-stellar objects (QSOs) as a series of Ly absorp-tion lines known as the Ly forest (Rauch 1998). Coherent, large scale decrements in the Ly forest transmitted flux within individual QSO spectra may then correspond to intergalactic H I associated with significant mass overdensites. In particular, Cai et al. (2017) have used spectral regions with strong Ly-transmission decrements over a scale of 15ℎ −1 cMpc -which they name Coherently Strong Ly Absorption systems (CoSLAs) -to locate a protocluster at = 2.3 (see also Cai et al. 2016;Zheng et al. 2021;Shi et al. 2021). However, using cosmological hydrodynamical simulations, Miller et al. (2019) (hereafter Paper I) also showed that such CoSLAs are rare 1 and are not an exclusive probe of protoclusters. It is possible to adopt a rather strict (and model dependent) CoSLA detection threshold that removes contamination from coherent structures originating in the diffuse IGM, but any such protocluster sample is then incomplete.
An alternative technique for protocluster identification that also makes use of intergalactic Ly-absorption is IGM or Ly forest tomography (Pichon et al. 2001;Caucci et al. 2008;Lee et al. 2014;Stark et al. 2015;Horowitz et al. 2019;Porqueres et al. 2020;Li et al. 2021). If a sufficient number of individual Ly-forest sightlines sample a given volume, a three dimensional map of the Lytransmission from the IGM may be reconstructed. A large sample of QSO sight-lines can be used for this purpose (see e.g. Ravoux et al. 2020), but recent observational advances have also allowed spectra from background star-forming galaxies to be used, thus providing the high density of sight-lines needed to reconstruct the Ly-transmission on scales of a few comoving Mpc (Lee et al. 2016(Lee et al. , 2018Mukae et al. 2020a;Newman et al. 2020). This approach has been successfully used to locate dense structures in the early Universesome of which are expected to be protoclusters. Furthermore, combining these new tomographic Ly-transmission maps with coeval galaxy surveys provides a powerful insight into the galaxy-IGM connection at > 2, and can improve the accuracy of the tomographic reconstruction of the underlying density field (Mukae et al. 2017;Momose et al. 2021;Mukae et al. 2020b;Liang et al. 2021;Horowitz et al. 2021).
A key component in all of these recent IGM tomography studies are numerical simulations of the Ly-transmitted flux; these are used to translate the 3D reconstruction of the Ly-transmitted flux into the underlying matter density. The most common approach used to create simulated Ly-tomographic maps is to apply the fluctuating Gunn & Peterson (1965) approximation (FGPA) to the density field from large collisionless dark matter simulations (Stark et al. 2015;Lee et al. 2016;Newman et al. 2020). The FGPA assumes that the baryons trace the dark matter density field (modulo a correction for smoothing on the Jeans scale), that the neutral hydrogen is in photo-ionisation equilibrium with a spatially uniform UV background, and there is a single, power-law relationship between the gas density and temperature (see e.g. Rauch 1998;Becker et al. 2015). These assumptions are usually excellent ones when modelling the diffuse IGM at low densities, Δ = / 10 It is well known, however, that neutral hydrogen (and hence also the Ly-transmission) is not a completely unbiased tracer of the underlying density field, particularly in highly overdense regions. More specifically, the FGPA will no longer hold for: (i) gas that is hot and predominantly collisionally ionised, either due to 1 On analysing mock Ly-forest spectra drawn through the (80ℎ −1 cMpc) 3 Sherwood simulation volume  with an average transverse separation of 0.75ℎ −1 cMpc, only ∼ 0.1 per cent of sight-lines exhibited a CoSLA (Paper I).
shocks from gravitational infall or energetic feedback from supernovae driven winds and/or black hole accretion, (ii) high density gas that is self-shielded to Lyman continuum photons, and (iii) local enhancements in the otherwise spatially uniform metagalactic UV background due to the presence of bright, rare sources (i.e. the proximity effect). Indeed, the presence of hot, highly ionised gas was suggested by Lee et al. (2016) as an explanation for the lack of a strong Ly-transmission decrement associated with a galaxy overdensity in their tomographic maps at 2.3. Mukae et al. (2020a) also demonstrated a spatial offset of ∼ 3-5ℎ −1 cMpc between Lyemitting galaxies and the minimum Ly-transmission in their tomographic reconstruction around the MAMMOTH-1 nebula. These authors suggested that local fluctuations in the ionising background may explain this offset, by changing the distribution of neutral hydrogen in the surrounding IGM (see also Momose et al. 2021, for a similar result obtained from the cross-correlation of galaxies and Ly-tomographic maps).
In this paper we investigate this issue of "ionisation bias" further using state-of-the-art hydrodynamical simulations from the Il-lustrisTNG project. We explore how local ionisation variations in the IGM -either due to the presence of hot, collisionally ionised gas or the QSO proximity effect -impact on the detectability of protoclusters with Ly forest tomography. Furthermore, we assess how these variations may affect the relationship between the Lytransmission and the distribution of coeval galaxies, and how the assumption of the FPGA may bias constraints on protocluster mass. The goal of this work is not to test the efficacy of tomographic reconstruction techniques; this is already discussed in the literature in some detail (see e.g. Stark et al. 2015;Horowitz et al. 2019;Porqueres et al. 2020;Li et al. 2021). In this work, rather than using a full forward model, we instead create idealised, noiseless Ly-transmission maps by degrading our simulations to match the resolution of the tomographically reconstructed observations from Lee et al. (2018) and Newman et al. (2020). The advantage of this approach is that it allows us to isolate the effect of astrophysical systematics from any uncertainties associated with the reconstruction methodology.
In Section 2 we introduce the hydrodynamical simulations and local ionisation models used throughout this work, and then examine the expected Ly-transmission profiles around dark matter haloes in Section 3. We discuss Ly-transmission maps of protoclusters and their relationship with coeval Ly emitting galaxies in Section 4, and assess the role that local ionisation variations may play in Ly-tomography measurements. Finally, we conclude in Section 5. Throughout this paper, we refer to comoving distance units using the prefix "c" and to proper distance units using the prefix "p".

Cosmological hydrodynamical simulations
In this work we shall primarily use the publicly available TNG100-1 simulation from the IllustrisTNG collaboration (Nelson et al. 2019). IllustrisTNG has been performed using the moving-mesh hydrodynamics code (Springel 2010), and is described in detail in a series of five introductory papers (Pillepich et al. 2018;Springel et al. 2018;Naiman et al. 2018;Marinacci et al. 2018;Nelson et al. 2018). We use three further IllustrisTNG models with different box sizes and mass resolutions (TNG100-2, TNG100-3 and TNG300-1) to assess the numerical convergence of our results (see Appendix A for further details). Table 1. Hydrodynamical simulations used in this work. The columns list, from left to right: the simulation name, the box size in ℎ −1 cMpc, the total number of gas cells and dark matter particles, and the typical dark matter particle and gas cell masses. The IllustrisTNG simulations assume a Planck Collaboration et al. (2015) consistent cosmology, with Ω m = 0.3089, Ω Λ = 0.6911, Ω b = 0.0486, 8 = 0.8159, s = 0.9667 and ℎ = 0.6774. The cosmological parameters used in the Illustris-1 simulation instead take values consistent with WMAP-9 (Hinshaw et al. 2013 In addition to the IllustrisTNG models, we also use the earlier Illustris-1 simulation (Vogelsberger et al. 2014;Nelson et al. 2015) to assess the effect of a different sub-grid physics model on our results. The key differences between IllustrisTNG and Illustris-1 are summarised in table 2 of Nelson et al. (2019). These include changes to the stellar and AGN feedback implementations, and the addition of ideal magneto-hydrodynamics in IllustrisTNG (Pakmor et al. 2011). There are also small differences in the Λ-CDM cosmological parameters used in the two models. Importantly, however, the TNG100-1 initial conditions have the same random seed as Illustris-1, so we are able to directly compare the large scale structure of intergalactic gas in these models.
All five of the simulations used in this work are summarised in Table 1. For each simulation we use the snapshots and halo catalogues at = 2.44 and = 0.

Protocluster identification
Throughout this paper we take advantage of the ability of simulations to connect physical structures at different instances in time. Following Paper I, we define protoclusters in the TNG100-1 model as the structures that form clusters with z=0 ≥ 10 14 M at redshift = 0. We identify the simulation resolution elements that belong to these protoclusters as being all those within friends-of-friends haloes with z=0 ≥ 10 14 M at redshift = 0. We then find these resolution elements at redshift = 2.44 and use their positions to compute the centre of mass of each protocluster and the radial extent around the centre of mass, 95 , that contains 95 per cent of the protocluster's = 0 mass. This procedure yields a total of 22 protoclusters in the TNG100-1 volume at = 2.44.
An example of one such protocluster from the TNG100-1 simulation with = 0 mass z=0 = 10 14.46 M is displayed in the upper left and central panels of Fig. 1. A 2D projection of the logarithm of the normalised gas density, Δ = / , and logarithm of the gas temperature, , are shown within ±1 ℎ −1 cMpc of the protocluster centre of mass. The white dashed circle in each panel shows the radial extent of the protocluster, 95 . As discussed in detail in Paper I, a wide range of protocluster morphologies are expected using our protocluster definition, where typically 95 = 5-10ℎ −1 cMpc. On average, the gas in protoclusters will exhibit slightly higher densities, temperatures and neutral hydrogen fractions compared to the surrounding IGM.

Local ionisation models
The primary focus of this work is assessing the impact that local variations in the IGM ionisation state may have on the identification of protoclusters using Ly-absorption. We now turn to describing the three different IGM ionisation models we use for this purpose.
In our fiducial ionisation model we adopt a similar approach to Paper I and assume a spatially uniform UV background using the Faucher-Giguère (2020) synthesis model. For reference, the Faucher-Giguère (2020) model has an H I photo-ionisation rate Γ HI = 9.76 × 10 −13 s −1 at = 2.44, which is consistent with independent constraints on Γ HI from the Ly-forest opacity (Becker & Bolton 2013). We calculate neutral hydrogen fractions in each cell of the simulation under the assumption of ionisation equilibrium by using the coupled equations given by Katz et al. (1996), after updating the recombination and collisional ionisation rates to match those used by Bolton et al. (2017). We also use the Rahmati et al. (2013) prescription for self-shielding to obtain the correct incidence of absorbers that are optically thick to Lyman continuum photons (i.e. for HI ≥ 10 17.2 cm −2 ). We have already verified in Paper I (see fig. 1 in that work) that this procedure reproduces the shape of the observed H I column density distribution over the range 10 12 cm −2 ≤ N HI ≤ 10 22 cm −2 very well.
In addition to our fiducial model, we investigate two further, alternative ionisation models. In the first we assume a spatially uniform UV background, but now ignore the effects of collisional ionisation and self-shielding on the neutral hydrogen fraction. We achieve this by setting the collisional ionisation rates to zero and neglecting the Rahmati et al. (2013) correction when calculating the H I fractions in each gas cell of the hydrodynamical simulations. Collisional ionisation will be particularly important for the ionisation state of gas around haloes, where gas is heated to > 10 6 K by gravitational infall and AGN or supernovae feedback (see e.g. the protocluster in the upper central panel of Fig. 1). Neglecting collisional ionisation in these hot, dense regions will result in an overestimate of the H I fraction, and hence an overestimate of the Ly-optical depth associated with the gas. By contrast, ignoring self-shielding will instead result in an underestimate of the number of rare, high column density absorption systems with HI ≥ 10 17.2 cm −2 that arise from cool, dense gas. We refer to this model as "No Collisional" -shortened to NoCol -throughout this paper. The NoCol model is chosen to be similar (but not identical) to the fluctuating Gunn-Peterson approximation (FGPA) that has been commonly used in the recent literature to link the Ly-optical depth to the underlying gas or dark matter density (e.g. Stark et al. 2015;Newman et al. 2020). The FGPA assumes photo-ionisation equilibrium in an IGM which follows a power-law temperature density relation, = 0 Δ −1 , which is a good approximation only for gas with Δ ≤ 10 at 2 (see e.g. Rauch 1998). Our second alternative ionisation model includes the effect of local enhancements in the IGM ionisation state due to quasars and active galactic nuclei (i.e. the proximity effect, Murdoch et al. 1986;Bajtlik et al. 1988). At the redshift we consider in this work, = 2.44, the mean free path of Lyman continuum photons is ∼ 300 pMpc (Worseck et al. 2014); on smaller scales the UV background is to a good approximation spatially uniform. However, the presence of active galactic nuclei (AGN) in close proximity to protoclusters could mean the background photo-ionisation rate is significantly enhanced on scales up to a few proper Mpc in the vicinity of the AGN.
We model the effect of a local enhancement in the ionisation level of neutral hydrogen following the simple model described in Figure 1. Top: A series of 2D projections of the gas overdensity (left), gas temperature (centre) and the real space Ly-forest transmission real (right, see text for details) for a protocluster with = 0 mass z=0 = 10 14.46 M in the TNG100-1 simulation at = 2.44. The slices are projected over a distance of 2 ℎ −1 cMpc and are centred on the protocluster centre of mass. The dashed circle respresents 95 for the protocluster, while the yellow stars denote the locations of haloes within the slice that are populated with AGN in our local source ionisation model (see Section 2.3 for details). Bottom: Slices showing the difference in real between our fiducial model and a model with no collisional ionisation or self-shielding (left), a model with local enhancements in the IGM ionisation state due to the proximity effect from AGN (centre), and the Illustris-1 simulation which uses a different sub-grid physics implementation compared to TNG100-1. Here red represents a larger Ly-transmission (less absorption) in the fiducial model, while blue represents a smaller transmission (more absorption). Bolton & Viel (2011). We populate the TNG100-1 simulation with AGN at = 2.44 by requiring the number of AGN in a comoving volume, , satisfies where ( 1450 ) is the AGN luminosity function from Kulkarni et al. (2019) at = 2.44. We assume a minimum luminosity of min = 10 43.2 erg s −1 , corresponding to an absolute AB magnitude 1450 = −18. We assign a luminosity, 1450 , to each AGN by Monte Carlo sampling the luminosity function from Kulkarni et al. (2019), and then populate the simulation by assigning AGN to haloes in a one-to-one rank order fashion, such that the most luminous AGN resides in the most massive halo (i.e. we effectively assume an AGN duty cycle of one). This yields 281 AGN within the TNG100-1 volume, with a median 1450 = −18.9 and a minimum of 1450 = −24.7.
Next, for each AGN we assume the spectral energy distribution used by Kulkarni et al. (2019), We then compute the specific intensity, (r, ), of the ionising emission from the AGN on a 256 3 grid, assuming each AGN emits isotropically and that the IGM is optically thin within the periodic simulation volume. Hence where |r i − r| is the distance of the th AGN from r. Finally we compute the spatially varying photo-ionisation rate from the AGN by evaluating where HI ( ) is the photo-ionisation cross-section from Verner et al. (1996) and HI is the frequency at the hydrogen Lyman limit. The photo-ionisation rate for each gas cell is then obtained by trilinear interpolation of the nearest 256 3 grid points to the Voronoi cell centre. If the photo-ionisation rate from Eq. (4) exceeds the value from the Faucher-Giguère (2020) synthesis model at = 2.44 in any given cell, we use the former to calculate the ionisation fraction. Throughout this work we shall refer to this as our "local sources" -shortened to LoSo -model. In Fig. 1, we perform an initial assessment of the effect of these ionisation models on the average Ly-forest transmission. We consider a region of width Δ = 2 ℎ −1 cMpc centred around the protocluster, and obtain an estimate of the real space transmitted flux, real , from the column density, HI , in each pixel following a similar approach to Kulkarni et al. (2015), where Here Ly = 6.265 × 10 8 s −1 is the Ly-damping constant and Ly = 1216 Å. This approximation ignores the effect of peculiar velocities and thermal broadening on the Ly-opacity, and as a consequence it does not provide an accurate value for the average Ly-transmission along a given line of sight. However, it provides a convenient illustration of the relative effect of our ionisation models on the Ly-opacity within protoclusters.
In the top right panel of Fig. 1 we show F real for the example TNG100-1 protocluster at = 2.44 in the fiducial ionisation model, whilst in the lower panels we show the difference in real between the fiducial model and the NoCol model (left), LoSo model (centre), and the same region in the Illustris-1 simulation (right). In the NoCol model there is a decrease in the transmission from the filaments within the protocluster, leading to relatively more transmission in the fiducial model. The filaments (which are most apparent in the upper panels of Fig. 1) span up to ∼ 10 ℎ −1 cMpc in length, with widths on the order of ∼ 100 ℎ −1 ckpc, and consist of overdense gas (Δ ∼ 10-100) at high temperatures ( ∼ 10 5 -10 7 K). This corresponds to gas that has been heated by shocks and outflows and is therefore collisionally ionised in the fiducial model. Hence, we expect that ignoring hot, collisionally ionised gas will underestimate the Ly-transmission from the gas in protoclusters. By contrast, in the LoSo model there is an increase in the Ly-transmission in the protocluster relative to the fiducial model, with a magnitude that decreases radially and is generally more pronounced in cooler, less dense regions where photoionisation dominates. Finally, comparing the different sub-grid physics implementation used in Illustris-1 to the fiducial TNG100-1 model (see Section 2.1 for further details), we find the variation in transmission along the filaments of the cosmic web is more complex. Once again, these differences are driven primarily by changes in the thermal and ionisation state of the hydrogen gas. 2 Note the transmission from the low density IGM with Δ 1 remains unchanged, however, as the gas in voids is largely unaffected by shocks, AGN or supernovae driven winds at = 2.44 (e.g. Theuns et al. 2002;Viel et al. 2013).

Mock Ly-absorption spectra
In the remainder of this work we will analyse the Ly-absorption associated with protoclusters using simulated Ly-forest spectra. We again follow the procedure described in Paper I, which we briefly repeat here. Mock Ly-absorption spectra are extracted from the simulations by assigning each Voronoi cell a smoothing length, ℎ i , based on the cell volume, i , such that We assume sph = 64 for the number of smoothing neighbours. The interpolation scheme described by Theuns et al. (1998) is then used to extract Ly-optical depths using the Voigt profile approximation from Tepper-García (2006). Unless otherwise stated, we also rescale the optical depths of each pixel in our mock spectra by a constant to match observational constraints on the Ly-forest effective optical depth, eff = − ln = 0.20 at = 2.4, from . The transmitted flux in each pixel is then given by = − , and we define the transmitted flux contrast, F as the relative transmission -averaged over some velocity window of width Δ -around the IGM mean value A negative (positive) value of F thus represents a decrease (increase) in the Ly-transmission relative to the mean transmitted flux, , of the IGM.

Ly-ABSORPTION PROFILES AROUND HALOES
We perform a consistency test of our mock Ly-absorption spectra in Fig. 2, where we show the transmitted flux contrast for our different ionisation models around haloes in three mass bins: ≥ 10 12.8 (left), 10 12.4 ≤ < 10 12.8 (centre) and 10 12.1 ≤ < 10 12.4 (right). We select the mock spectra using a grid of sight-lines running the length of the simulation box in all three cardinal directions, with a mean transverse separation of 1.96 ℎ −1 cMpc. We then calculate the mean transmission within a velocity window of 2000 km s −1 , and bin the transmission in terms of the halo impact parameter, . The results are compared to observational measurements of F around QSOs from Mukae et al.  Sorini et al. (2018).
Several earlier studies have already discussed the level of agreement between hydrodynamical simulations and observations of the neutral hydrogen distribution around QSOs (e.g Fumagalli et al. 2014;Rahmati et al. 2015;Faucher-Giguère et al. 2016;Meiksin et al. 2017;Sorini et al. 2020;Nagamine et al. 2021). In general, differences in stellar and AGN feedback implementations, halo mass and numerical resolution all play an important role. The differences we find here are consistent with earlier work, where the relative transmission at small scales, < 0.5ℎ −1 cMpc, in Illustris-1 (blue curves) and TNG100-1 (black curves) is larger than the observed relative transmission. The relative difference between these two models, particularly in the ≥ 10 12.8 M bin, is most likely associated with the more aggressive AGN feedback implementation within Illustris-1, which leads to more hot, collisionally ionised gas.
Recently, however, Sorini et al. (2020)  various ionisation models. In Fig. 2 we find these differences are largest for the highest mass haloes with ≥ 10 12. 8 . In general, the NoCol and LoSo models show less and more Ly-transmission relative to the fiducial TNG100-1 model, respectively. In the NoCol model (orange curves) this is due to neglecting collisional ionisation from hot circumgalactic gas, where in general, the temperature and physical extent of the hot gas increases with halo mass. Interestingly, the NoCol model predicts too little transmission in the highest mass bin relative to the Prochaska et al. (2013) measurements, suggesting that collisional ionisation (and hence gas temperature) plays an important role in setting the Ly-transmission at < 1ℎ −1 cMpc (see also Sorini et al. 2018).
The increased transmission in the LoSo model (green curves) due to enhanced ionisation by the proximity effect is also most pronounced in the ≥ 10 12.8 bin, as these haloes are populated with the highest luminosity AGN in our model. 3 Note, however, that in contrast to the NoCol case (orange curves), the differences between the LoSo (green curves) and the fiducial model (black curves) are largest at 1ℎ −1 cMpc ≤ b ≤ 5h −1 cMpc. This is because the enhanced photo-ionisation rate only begins to dominate over collisional ionisation at 1ℎ −1 cMpc. By contrast, the haloes in the lower two mass bins host either fainter AGN, or are unoccupied. As a result, the local source model does not have a significant effect on the Ly-transmission profiles for haloes with masses 10 12.8 M . Note, however, that we have deliberately adopted a 3 If the AGN emission is preferentially beamed along the line of sight rather than in the transverse direction, our isotropic emission model will overestimate the impact of the proximity effect on the transmission profile. Similarly, non-equilibrium photo-ionisation and light travel time effects due to flickering AGN emission may also result in gas that is less highly ionised model that maximises the proximity effect around the most massive haloes, and adopting a duty cycle duty < 1 (e.g. Shankar et al. 2010) would push these AGN into lower mass hosts. Finally, in the ≥ 10 12.8 M bin the LoSo model is in slightly better agreement with the Mukae et al. (2020a) data at < 5ℎ −1 cMpc, although due to the large error bars the significance is not high. This appears to be consistent with the interpretation advanced by Mukae et al. (2020a) that the MAMMOTH1-QSO tomographic map exhibits a QSO proxmity zone.
Since the LoSo and NoCol models effectively bracket the plausible range in the Ly-transmission profiles, we proceed to investigate the effect these models have on the Ly-transmission associated with protoclusters in TNG100-1. The different sub-grid physics implementation in Illustris-1 sits between the extremes explored by these models, and so we do not investigate it further.

Smoothed Ly-forest transmission maps
We now turn to investigate how the Ly-transmission around protoclusters is altered by changes in the local ionisation state of the IGM. As already discussed , we do not create the Ly-transmission maps by forward modelling the observational data (e.g. Stark et al. 2015). Instead, we use the noiseless spectra drawn from the simulations to create idealised maps of the relative transmission, F , around each of the 22 protoclusters in the TNG100-1 volume. We then degrade these maps to match the final resolution of the observational data presented by Lee et al. (2018) and Newman et al. (2020) by smoothing with a Gaussian filter. Our results will there- F / correspond to Ly-transmission that is larger (smaller) relative to the average for the IGM. The maps are obtained by averaging the Ly-forest over a Δ = 1000 km s −1 window, smoothing in the transverse direction with a Gaussian filter with standard deviation 4ℎ −1 cMpc, and then centring on velocity windows Δ = ±500 km s −1 where F / is minimised within 95 for each protocluster. Each row shows a different protocluster for the fiducial (left), no collisional (centre) and local sources (right) ionisation models. On top of each map we display the locations of coeval Ly-emitting galaxies that satisfy the criteria Ly > 10 41.5 erg s −1 and EW Ly > 15 Å (grey filled circles) using a simple empirical model (see text for details). The grey contours show the logarithm of the LAE overdensity, log(1 + LAE ) = log( LAE / LAE ), obtained from the distance to the 5th nearest neighbour in increments of 0.2 dex. In the right column, the yellow stars show the locations of AGN in the local sources model, with sizes scaled according to their luminosity. Red crosses display the locations of coherently strong Ly absorption systems (CoSLAs). The dashed black circle shows the 2D cross-section of the sphere of radius 95 -the radius that contains 95 per cent of the = 0 mass, z=0 -that intersects the velocity window for each protocluster. From top to bottom, the selected protoclusters have z=0 = 10 14.18 , z=0 = 10 14.46 and z=0 = 10 14.43 . Note the protocluster in the middle row is also shown in Fig. 1. fore not capture the effect of any systematic uncertainties associated with the accuracy of tomographic reconstruction techniques, or the signal-to-noise properties of the data.
We first extract spectra in a 90 × 90 grid in a 15 ℎ −2 cMpc 2 area centred on each protocluster's centre of mass, following the procedure described in Section 2.4. We then construct Ly-transmission maps by obtaining the average Ly-transmission over velocity windows, Δ = 1000 km s −1 and then smoothing the relative transmission, F , in the transverse direction using a Gaussian with standard deviation 4 ℎ −1 cMpc. The velocity window, Δ , is chosen to match the full width at half maximum of the Gaussian filter at = 2.44. This choice matches the transverse smoothing scale applied in the Ly-Tomography IMACS Survey (LATIS, Newman et al. 2020) and COSMOS Ly-Mapping and Observations survey (CLAMATO, Lee et al. 2018) tomographic surveys. Finally, we normalise each transmission map by the standard deviation of F obtained from the full simulation volume. We obtain a standard deviation of = 0.076, = 0.081 and = 0.070 for the fiducial, NoCol and LoSo models, respectively. The standard deviation is slightly increased in the NoCol model with respect to fiducial, because ignoring collisional ionisation decreases the neutral hydrogen fraction in dense, hot gas, thus increasing the amount of strong Ly-absorption. Conversely, the standard deviation is reduced relative to fiducial in our LoSo model, as the ionising sources (AGN) are placed into high density regions, thus reducing the incidence of strong Ly-absorption.
The resulting Ly-transmission maps for three different protoclusters are shown in each row of Fig. 3. The different local ionisation models for the protoclusters are displayed in each column, with the yellow stars in the right column showing the location of coeval AGN in the local sources model. The three protoclusters have been selected to show: the region containing the most massive halo (and hence also the brightest AGN) in the TNG100-1 simulation (upper row, z=0 = 10 14. 18 ), a region where the Ly-transmission within 95 is higher than average (middle row, z=0 = 10 14. 46 ) and a region that is representative of the average Ly-transmission associated with a protocluster in TNG100-1 (lower row, z=0 = 10 14. 43 ). Note the protocluster in the middle row is also displayed in Fig. 1. The position of the maps are selected using the velocity window within 95 where F / is minimised, similar to how these structures are identified within observed tomographic maps.
In each map we also mark the locations of individual sight lines that contain coherently strong Ly-absorption systems (CoSLAs) using red crosses. Following Cai et al. (2017), CoSLAs are defined as sight lines that exhibit a fluctuation in the Ly-forest effective optical depth, eff > 3.5, over a scale of 15 ℎ −1 cMpc, after excluding any Ly-absorbers with damping wings, N HI ≥ 10 19 cm −2 (see also Paper I for further details). This allows us to assess how CoSLAs are distributed relative to the Ly-transmission maps.
Lastly, we also use a simple model based on empirically derived scaling relations to display the locations of coeval Ly emitting galaxies (grey circles). The Ly luminosities and equivalent widths for the galaxies were estimated using the stellar mass and star formation rate (SFR) for each sub-halo in TNG100-1. We convert the instantaneous SFR into an luminosity at 1216 Å using the relation from Dĳkstra (2017) where Ly = 0.445 (Hayes et al. 2011). The quantity Ly is derived using the relation between extinction at H wavelengths and stellar mass derived by Garn & Best (2010), and then converting to Ly using the Calzetti et al. (2000) dust law. We also estimate the rest frame equivalent width in Angstroms, EW Ly , for each Ly-emitter (LAE) using the relation EW Ly = f Ly esc /0.0048 from Sobral & Matthee (2019). The LAEs displayed in the maps are selected by requiring Ly > 10 41.5 erg s −1 and EW Ly > 15 Å (e.g. Shimakawa et al. 2017). The grey dashed contours correspond to the logarithm of the LAE overdensity, log(1 + LAE ) = log( LAE / LAE ), determined by a fifth nearest neighbour algorithm. We note, however, that this simple model does not include a self-consistent coupling between the visibility of the Ly-emission line and the Ly-opacity of the intervening circumgalactic medium (CGM) or IGM in the TNG100-1 simulation, It furthermore does not follow the complex Ly-radiative transfer within the interstellar medium of the galaxies (e.g. Laursen et al. 2011;Gurung-López et al. 2020). As such, while the model is consistent with average LAE properties by design, it may still underestimate the variation in Ly for a given stellar mass.
We first consider the fiducial ionisation model, displayed in the left column of Fig. 3. There is an anti-correlation between the LAE density and F / for all three protoclusters, and any CoSLAs are typically situated where the galaxy clustering is strongest. The maps generally exhibit a smaller F / (less Ly-transmission) where the LAE density is largest. However, for the protocluster displayed in the top row of Fig. 3 there is an offset between where the LAEs are most strongly clustered around a massive halo with = 10 13.5 at ( , ) (−4, 0)ℎ −1 cMpc and the largest Ly-transmission decrement at ( , ) (3, −6)ℎ −1 cMpc. This is qualitatively similar to the observation from Lee et al. (2016), where no strong Ly-transmission decrement was detected around a galaxy overdensity in their CLAMATO tomographic maps. These authors speculated that higher gas temperatures due to shocks or feedback may play a role in ionising gas and hence suppressing Ly-absorption in the vicinity of galaxy overdensities. This is indeed the case for the example here; the gas around the massive halo at ( , ) (−4, 0)ℎ −1 cMpc has been heated to > 10 6 K and is therefore highly ionised, whereas the IGM associated with the Ly-transmission decrement in the lower right of the map is significantly cooler, with < 10 5 K. The protocluster displayed in the middle row (see also the same object in Fig. 1) exhibits more Ly-transmission compared to the other protoclusters for a similar reason; in addition to the presence of larger underdensities within 95 in this protocluster, there is an extended region of > 10 6 K gas around the protocluster centre of mass that further increases the Ly-transmission.
In the central column of Fig. 3 we show the smoothed Lytransmission maps for the same three protoclusters, but now using the NoCol ionisation model. As expected, all three protoclusters exhibit smaller values of F / where the LAE density is largest, but there is no significant change in F / where the LAE density is lower. This is because hot, collisionally ionised gas is found around massive haloes and filaments, and this is the environment where most of the LAEs reside in our model. Another striking feature of the NoCol ionisation models is that they contain a much higher incidence of CoSLAs (red crosses). There are two reasons for this. The first is that ignoring collisional ionisation produces larger H I fractions, and hence stronger Ly-absorption. However, in the No-Col model we also neglect the effect of self-shielding to Lyman continuum photons on the Ly-absorption. This means that strong Ly-absorbers with column densities HI > 10 19 cm −2 in the fiducial model (i.e. damped systems) are over-ionised and have lower column densities in the NoCol model. Hence, damped absorption systems that are excised when selecting the CoSLA sample in the fiducial model are erroneously classified as lower column density CoSLAs in the NoCol model. As discussed in Paper I, this highlights the importance of correctly modelling high column density absorbers when simulating the incidence of coherent Ly-systems.
Finally, in the right column of Fig. 3 we show the smoothed Ly-forest transmission maps for the LoSo ionisation model. The AGN positions are marked in the maps with star symbols. Due to the proximity effect, all three protoclusters exhibit larger F / (more transmission) in comparison to the fiducial and NoCol models. The greatest increase in F / occurs where the LAE density is largest, but there is also a small increase in F / at lower densities. This can be further understood from the halo profiles in Fig. 2, where the brightest AGN in the model can ionise their surroundings up to ∼ 5 ℎ −1 cMpc from the centre of their host halo. The largest proximity zone in the simulation volume is shown in the upper left panel of Fig. 3, where the massive halo at ( , ) (−4, 0)ℎ −1 cMpc hosts an AGN with 1450 = −24.7. This further enhances the existing spatial offset between the largest LAE density and the weakest Lytransmission/strongest Ly-absorption. A qualitatively similar observational result, but on much larger scales of 40ℎ −1 cMpc, has been reported by Mukae et al. (2020a), who find an H I underdensity in the CLAMATO tomographic maps associated with a LAE overdensity. These authors suggest this is due to the enhanced ionisation of the IGM by multiple nearby QSO proxmity regions.
Due to the increased level of ionisation around massive haloes, the LoSo model also has a slightly reduced incidence of CoSLAs in comparision to the fiducial model. Interestingly, however, there are a few cases in which a CoSLA is present in the LoSo model but missing in the fiducial model. An example of this can be seen in the central region of the protocluster in the middle row of Fig. 3. This is the result of absorption systems that are classified as damped ( HI > 10 19 cm −2 ) in the fiducial model and are thus rejected when selecting CoSLAs, but instead correspond to lower column density absorbers in the LoSo model. The column densities of the damped absorbers in the fiducial model are reduced due to the proximity effect, and these regions are then classified as CoSLAs.

Protocluster masses and the correlation between LAEs and Ly-transmission in smoothed maps
It is apparent from our qualitative discussion of the IGM transmission maps in Section 4.1 that local ionisation plays an important role in the correlation between Ly-transmission, coeval galaxies and the distribution of coherent Lyman-alpha absorption systems within individual protoclusters. We now consider how local ionisation variations impact on two quantities derived from tomographic maps: estimates of the = 0 protocluster mass, z=0 (Lee et al. 2016;Newman et al. 2020) and the correlation between LAE overdensity and Ly-transmission (Mukae et al. 2017(Mukae et al. , 2020aLiang et al. 2021). We first examine the relationship between the minimum relative transmission within the protocluster, ( F / ) min , and the = 0 mass of the clusters, z=0 , in Fig. 4. Candidate protoclusters are identified in both the CLAMATO and LATIS tomographic surveys by applying ( F / ) thresholds to the smoothed Ly-transmission maps. Newman et al. (2020) define their matter overdensity/protocluster candidates as regions with F / < −2.35 in the LATIS survey, whilst for the CLAMATO survey Lee et al. (2016) use a more conservative value of F / < −3.
The left panel of Fig. 4 shows ( F / ) min obtained from our smoothed transmission maps, centred on the velocity window containing ( F / ) min within 95 for each protocluster. We find the wide variety of protocluster morphologies (see also Paper I) means the minimum transmission can be located anywhere up to ∼ 10ℎ −1 cMpc from the true protocluster centre of mass. For comparison, the right panel shows ( F / ) min when the maps for all models are instead centred at the location of ( F / ) min in the fidu- cial model. Additionally, in this case we use = fid for all three models. This allows us to focus on how the ionisation models affect F in the same physical region, as opposed to selecting F / min in a way that mimics the observations. On average, the changes in ( F / ) min for the different local ionisation models are small in the left hand panel of Fig. 4, with the largest differences occurring for the protocluster with z=0 = 10 14.22 that harbours the brightest AGN/most massive halo in the TNG100-1 volume (see the upper panels of Fig. 3). This suggests that smoothing Ly-tomographic maps on ∼ 4ℎ −1 cMpc scales should help mitigate for the possible bias in inferred cluster masses due to local ionisation variations, as well as optimising protocluster detectability (Stark et al. 2015). This is furthermore consistent with our earlier finding that the largest differences in the local ionisation models occur on scales < 1ℎ −1 cMpc (see Fig. 2). For comparison, we find that when selecting the same physical locations in the transmission maps and fixing = fid (right hand panel), in almost all cases ( / ) min is largest in the LoSo model and lowest in the NoCol, as one would naively expect. However, the differences between the ionisation models are again modest for most protoclusters.
From the left hand panel of Fig. 4, the protocluster completeness for the selection thresholds ( F / ) < 2.35 (< 3) in the fiducial model is 86 (36) per cent, and this remains similar at 86 (41) per cent and 82 (32) per cent for both the NoCol and LoSo models, respectively. However, we find the best fit linear relation between ( F / ) min and z=0 is slightly shallower compared to the relationship obtained by Lee et al. (2016) from collisionless cosmological simulations (red dashed line in Fig. 4). Our best fit relation to the fiducial model is =0 = 10 11.9−0.89( / ) min , shown by the blue dashed line in the left panel of Fig. 4. This implies that, for a given ( F / ), our fiducial model will favour larger = 0 masses for the most massive candidate protoclusters compared to the Lee et al. (2016) calibration, possibly as a result of including hot gas with > 10 6 K from shocks and AGN feedback (see also fig. 6 in Lee et al. (2016) and the related discussion). We caution, however, that the relatively small TNG100-1 box means we also have a much smaller sample of z=0 > 10 14 protoclusters compared to Lee et al. (2016), who use a collisionless dark matter simulation with box size 256ℎ −1 cMpc. Note also that in this work we analyse ide-alised transmission maps, and we have not performed a tomographic reconstruction of the Ly-forest transmission using noisy data.
In the smoothed transmission maps in Fig. 3 we also observed an anti-correlation between the LAE overdensity, LAE , and the relative transmission F / . In Fig. 5 we examine this further by showing the relationship between F / and LAE for all 22 protoclusters in the TNG100-1 volume. The three different ionisation models are shown in the individual panels. The filled diamonds correspond to the LAEs in the transmission maps centred on ( F / ) min , while the red curve shows the median relation obtained by randomly sampling the maps. In all three models there is significant scatter in / at fixed LAE , but the median trend shows decreasing F / with increasing LAE for LAE 1.5. The Spearman's rank correlation coefficient for the LAEs with LAE < 1.5 is −0.4 in all three models, consistent with a weak anti-correlation. This is followed by a flattening at LAE 1.5 due to the 4ℎ −1 cMpc Gaussian smoothing we apply to the transmission maps; we have verified that adopting a smaller smoothing scale reduces this apparent flattening and extends the anti-correlation to larger values of LAE . As was the case in Fig. 4, the relative transmission in the NoCol and LoSo models typically decreases and increases, respectively, compared to the fiducial model. However, any changes remain very small compared to the scatter in the F / -LAE plane, and are unimportant for the shape of the median trend. The colours of each point in Fig. 5 show the Ly-luminosity of the LAEs, which are selected using the criteria Ly > 10 41.5 erg s −1 and EW Ly > 15 Å. There is no correlation (Spearman's rank coefficient −0.04 in all three models) apparent between F / and the LAE luminosity, Ly in our maps.
We may also compare the results in Fig. 5 to recent observational determinations of the relationship between F and LAE from Liang et al. (2021) (see also Mukae et al. 2017Mukae et al. , 2020aMomose et al. 2021, for closely related work), as well as the results from other cosmological hydrodynamical simulations (Nagamine et al. 2021). Note that these different studies do not calculate LAE in the same way as this work, so a direct comparison with the results we present here is not possible. Nevertheless, we may still gain some insight from a qualitative comparison. Liang et al. (2021) identify LAEs at ∼ 2.2 from Subaru/Hyper Suprime-Cam data and compare the LAE overdensity to nearby Ly-absorbers in the Extended-BOSS database (Dawson et al. 2016). These authors do not use a tomographic reconstruction of the Ly-forest, and instead compute LAE and F within cylindrical apertures. Assuming a best fit relation of F = LAE + , these authors find an anti-correlation with = −0.116 +0.018 −0.022 and = −0.248 +0.082 −0.093 . Similarly, Nagamine et al. (2021) use the GADGET3-Osaka simulations to find a shallower relation with = −0.0664 ± 0.00476 and = −0.100 ± 0.0006, also obtained using a cylindrical aperture matched to the Liang et al. (2021) measurement. Although the slopes and normalisation of the linear fits from these two studies differ, the relationship between F and LAE is qualitatively similar to the weak anti-correlation we observe at LAE 1.5. This is consistent with the interpretation that LAEs are preferentially located in regions with increased Ly-absorption and hence larger H I densities at 2-3. Finally, although the relationship between F and LAE is not significantly altered in our different ionisation models, we note that the visibility of Ly-emission lines and variations in the IGM/circumgalactic medium (CGM) Ly-transmission are closely coupled. As discussed previously, the volume averaged effective escape fraction we use, Ly esc , does not self-consistently capture the effect of this coupling on LAE in the TNG100-1 simulation. Detailed Ly-radiative transfer models that include the effect of both inflows and outflows in the CGM will be required to investigate the relationship between F and LAE further (e.g. Barnes et al. 2011;Laursen et al. 2011;Gurung-López et al. 2020)

CONCLUSIONS
In this work we have investigated the effect that local ionisation variations in the intergalactic medium (IGM), due the proximity effect from AGN and hot, > 10 6 K gas from shocks and AGN feedback, have on the Ly-absorption signature of protoclusters in the Illus-trisTNG simulations at redshift 2.4. We consider three different local ionisation models in our analysis: a fiducial model with a spatially uniform UV background model, a second model that ignores the effect of collisional ionisation and self-shielding on the H I fraction in the IGM, and final model where we incorporate spatial variations in the UV background due to the proxmity effect from AGN. The impact of "ionisation bias" on the Ly-transmission profiles around massive haloes and Ly-transmission maps is then investigated. We quantify this by computing the relative Ly-transmission averaged over a velocity window Δ , F = ( Δ / ) − 1, where a negative (positive) value of F represents a decrease (increase) in the Ly-transmission relative to the mean IGM transmitted flux, . We furthermore examine the relationship between the relative Ly-transmission in the smoothed transmission maps and the distribution of coeval Ly-emitting galaxies (LAEs) and coherently strong Ly-absorption systems (CoSLAs). Our main conclusions are as follows: • We find local ionisation effects have a significant impact on Ly-absorption in the vicinity of massive dark matter haloes with ≥ 10 12.8 for impact parameters 1 ℎ −1 cMpc (see also Sorini et al. 2018). In particular, the presence of hot ( > 10 6 K) collisionally ionised gas will strongly increase F within ∼ 1 ℎ −1 cMpc of dark matter haloes. We furthermore find that the proximity effect associated with AGN (which have absolute magnitudes in the range −24.7 ≤ 1450 ≤ −18.9 in our model) results in a modest increase in F for impact parameters, 1 ℎ −1 cMpc ≤ ≤ 5 ℎ −1 cMpc, corresponding to distances where the photo-ionisation of the IGM begins to dominate over collisional ionisation. However, both of these effects become less important on larger scales, 5 ℎ −1 cMpc, and around less massive haloes with < 10 12.8 in our model.
• We construct idealised mock Ly-transmission maps around the 22 protoclusters with z=0 ≥ 10 14 in the TNG100-1 volume (cf. Lee et al. 2018;Newman et al. 2020). We find that local ionisation effects can play an important role in the correlation between the Ly-transmission, coeval galaxies and CoSLAs within a small number of individual protoclusters. In particular, we find a spatial offset of ∼ 9ℎ −1 cMpc between a LAE overdensity around a massive halo and the largest Lyflux decrement in a protocluster with z=0 = 10 14.18 . This offset is due to collisionally ionised gas with temperature > 10 6 K surrounding the halo associated with the LAE density peak. This is qualitatively similar to the galaxy-Ly-absorption offset observed by Lee et al. (2016) and Mukae et al. (2020a) in CLAMATO tomographic maps. The transmission contrast of this spatial offset is further enhanced by the proximity effect associated with a 1450 = −24.7 AGN hosted within the halo. We furthermore find that the incidence of CoSLAs within protoclusters is sensitive to changes in our local ionisation models, largely as a result of changes in the number of self-shielded, damped Ly-absorbers with HI ≥ 10 19 cm −2 (see also Miller et al. 2019).
• After smoothing the simulated Ly-transmission maps with a Gaussian of standard deviation 4 ℎ −1 cMpc (Lee et al. 2018;Newman et al. 2020) we find that local ionisation effects have a rather limited impact on the completeness of protocluster identification if using a fixed identification threshold of ( F / ) min ≤ −2.35 (Newman et al. 2020) or ( F / ) min ≤ −3.00 (Lee et al. 2016). For an ensemble of 22 protoclusters drawn from the TNG100-1 volume, we obtain a completeness of 82-86 per cent and 32-41 per cent, respectively for these thresholds if applied across all three of our ionisation models. These results suggest that, in addition to optimising protocluster detection (Stark et al. 2015), smoothing the Ly-tomographic maps on 4 ℎ −1 cMpc scales may also help mitigate for a possible "ionisation bias" in the completeness of a statistical sample of protoclusters. Within our model, this is because the largest differences in the Ly-forest transmission typically occur on scales < 4ℎ −1 cMpc around dark matter haloes. However, we also find the presence of hot gas around haloes may still result in systematically lower estimates of z=0 for the most massive protoclusters if calibrating against mock tomographic Ly-maps created using the fluctuating Gunn-Peterson approximation. We find z=0 = 10 11.9−0.89( F / ) min for the 22 protoclusters in our fiducial model.
• A simple model that uses empirically derived scaling relations for the volume averaged effective Ly-escape fraction (Hayes et al. 2011) and the Ly-rest frame equivalent width (Sobral & Matthee 2019) is used to populate the IGM transmission maps with Lyemitting galaxies. In qualitative agreement with recent results from observations (Mukae et al. 2017;Liang et al. 2021) and cosmological hydrodynamical simulations (Nagamine et al. 2021), we observe a modest anti-correlation (Spearman's rank correlation coefficient of ∼ −0.4) between and the LAE emitter overdensity, LAE for all three of our ionisation models at LAE 1.5. This is consistent with these galaxies being preferentially located in overdense regions which exhibit smaller F (i.e. stronger Ly-absorption) at 2.4 compared to the average IGM value.
There remains plenty of scope for improving upon the numerical modelling in this work. In particular, the dynamic range of the hydrodynamical simulations should ideally be larger. A mass resolution of gas ∼ 10 6 is required to resolve Ly-absorption from the IGM at 2 (Bolton & Becker 2009;Miller et al. 2019). While the Ly-forest in the TNG100-1 simulation is therefore well resolved, the statistics are somewhat limited with only 22 protoclusters with =0 ≥ 10 14 . Additionally, the lack of any =0 ≥ 10 15 clusters in the TNG100-1 simulation means that we are unable to study the impact of local ionisation effects on the most massive structures. In our local sources model we have furthermore assumed isotropic AGN emission, and have ignored non-equilibrium ionisation and light travel time effects (e.g. Schmidt et al. 2019). Finally, we have adopted a simple model for the distribution of LAEs in the transmission maps that does not include the effect of the local IGM opacity on Ly-emitter visibility. Simulations of Ly-radiative transfer through the interstellar and circumgalactic/intergalactic medium will be required to address this question further (e.g. Barnes et al. 2011;Laursen et al. 2011;Gurung-López et al. 2020) In summary, models that incorporate all of these physical effects will be important for fully unravelling the relationship between H I gas density and galaxies from Ly-tomographic surveys at 2. Encouragingly, however, our results confirm that the identification and completeness of z=0 10 14 M protoclusters identified from Ly-forest transmission maps smoothed on scales 4ℎ −1 cMpc should not be strongly affected by the variations in the local ionisation state of the IGM at 2.4.

APPENDIX A: THE EFFECT OF BOX SIZE AND MASS RESOLUTION ON Ly-ABSORPTION AROUND HALOES
Following on from Fig. 2 and the associated discussion in Section 3, the effect of simulation mass resolution and box size on the transmission profiles around haloes are shown in Fig. A1 and Fig. A2. Here, in addition to the fidicial TNG100-1 model, we use the publicly available TNG100-2, TNG100-3 and TNG300-1 simulations. The properties of these additional simulations are outlined in Table 1. There is generally very good agreement between the different simulations in Fig. A1, suggesting that our results should be sufficiently converged with respect to mass resolution. However, for the simulations with varying box sizes in Fig. A2 we observe larger differences. In the case of the highest mass bin (left panel), there is a divergence between the two simulations at impact parameters < 1 ℎ −1 cMpc. This is caused by the larger number of massive haloes with ≥ 10 12.8 M present in the TNG300-1 simulation, many of which are surrounded by hot > 10 6 K gas with correspondingly low H I fractions. This explanation is consistent with the fact that in both of the lower mass bins we observe a very good agreement between the two simulations. On larger scales ( > 1ℎ −1 cMpc), the level of agreement with the Font-Ribera et al.
(2013) data is improved for the TNG300-1 model, particularly for the largest halo mass bin. This appears to be consistent with the suggestion by Sorini et al. (2020) that smaller volumes lacking the most massive haloes may predict transmission that is systematically above the Font-Ribera et al. (2013) Figure A1. As for Fig. 2, but comparing simulations with fixed box size and different mass resolutions. The fiducial TNG100-1 simulation (black curves) is compared to the TNG100-2 (blue curves) and TNG100-3 (orange curves) simulations. These have dark matter particle masses a factor of 8 and 64 times larger than the TNG100-1 simulation, respectively. The transmission profiles are consistent within the 68 per cent scatter around the median, shown by the shaded regions.  Fig. 2, but comparing simulations with different box sizes and very similar mass resolutions. The TNG100-2 simulation (black curves) uses the same box size as the fiducial TNG100-1 model, whereas the TNG300-1 simulation (blue curves) has a volume 27 times larger. The TNG300-1 simulation is in slightly better agreement with the observational measurements from Font-Ribera et al. (2013) on large scales, particularly for the highest halo mass bin in the left panel. The TNG300-1 model also exhibits more transmission at impact parameters < 1ℎ −1 cMpc around the most massive haloes.