Lessons from a high CO2 world: an ocean view from ~3 million years ago

A range of future climate scenarios are projected for high atmospheric CO2 concentrations, given uncertainties over future human actions as well as potential environmental and climatic feedbacks. The geological record offers an opportunity to understand climate system response to a range of forcings and feedbacks which operate over multiple temporal and spatial scales. Here, we examine a single interglacial during the late Pliocene 55 (KM5c, ca. 3.205 +/0.01 Ma) when atmospheric CO2 concentrations were higher than pre-industrial, but similar to today and to the lowest emission scenarios for this century. As orbital forcing and continental configurations were almost identical to today, we are able to focus on equilibrium climate system response to modern and nearfuture CO2. Using proxy data from 32 sites, we demonstrate that global mean sea-surface temperatures were warmer than pre-industrial, by ~2.3 oC for the combined proxy data (foraminifera Mg/Ca and alkenones), or by 60 ~3.2-3.4oC (alkenones only). Compared to the pre-industrial, reduced meridional gradients and enhanced warming in the North Atlantic are consistently reconstructed. There is broad agreement between data and models at the global scale, with regional differences reflecting ocean circulation and/or proxy signals. An uneven distribution of proxy data in time and space does, however, add uncertainty to our anomaly calculations. The reconstructed global mean sea-surface temperature anomaly for KM5c is warmer than all but three of the PlioMIP2 model 65 outputs, and the reconstructed North Atlantic data tend to align with the warmest KM5c model values. Our results demonstrate that even under low CO2 emission scenarios, surface ocean warming may be expected to exceed model projections, and will be accentuated in the higher latitudes.

and Tingley, 2018). It has also been proposed that previous approaches to integrating Pliocene sea-surface temperature (SST) data may have introduced bias to data-model comparison . For example, the Pliocene Research Interpretation and Synoptic Mapping (PRISM) project generated warm peak averages within specified time windows (Figure 1) (outlined in Dowsett et al., 2016 and references therein). However, by integrating multiple warm peaks within the 3.1-3.3 Ma mid-Piacenzian data synthesis windows (Figure 1), 90 regional and time-transgressive responses to orbital forcing (Prescott et al., 2014;Fischer et al., 2018;Hoffman et al., 2017;Feng et al., 2017) are potentially recorded in the proxy data, which may not align with the more narrowly-defined time interval being modelled Dowsett et al., 2016).
Here, we present a new, globally-distributed synthesis of SST data for the mid-Piacenzian stage, addressing two concerns. First, we minimise the impact of orbital forcing on regional and global climate signals by synthesising data from a specific interglacial stage, a 20-kyr time slice centred on 3.205 Ma (KM5c; see Figure 1). At 3. 205 Ma, both seasonal and regional distributions of incoming insolation are close to modern, making this time an important analogue for 21 st century climate . The low variability in orbital forcing through KM5c minimises the potential for time-transgressive regional signals to be a feature of the geological data 100 Prescott et al., 2014). Second, we provide a range of estimates from different SST proxies, taking into consideration the uncertainties in proxy-to-temperature calibrations and/or secular processes that may bias proxy estimates. This synthesis is possible due to robust stratigraphic constraints placed on the datasets by the PAGES-PlioVAR working group (see Methods).

The KM5c Interglacial
KM5c (also referred to as KM5.3) is an interglacial centred on a ~100 kyr window of relatively depleted benthic 18 O values, that immediately follows a pronounced  18 O peak during the glacial stage M2 (3.3 Ma; Figure 1).

Age models
The PAGES-PlioVAR working group agreed on a set of stratigraphic protocols to maximise confidence in the identification and analysis of orbital-scale variability within the mid-Piacenzian stage (McClymont et al., 2017).
Sites were only included in the synthesis if they had either (i) ≤10 kyr resolution benthic  18 O data which could 120 be (or had been) tied to the LR04 stack (Lisiecki and Raymo, 2005) or the HMM-Stack (Ahn et al., 2017); and/or (ii) the palaeomagnetic tie-points for Mammoth top (C2An.2n (b) at 3. 22 Ma) and Mammoth bottom (C2An.3n (t) at 3. 33 Ma). At one site (ODP Site 1090) these conditions were not met (see Supplement), but tuning to LR04 had been made using a record of dust concentrations under the assumption that higher dust flux occurred during glacials as observed during the Pleistocene (Martinez-Garcia et al., 2011). At ODP Site 806, uncertainty over age 125 control resulted from the absence of an agreed splice across the multiple holes drilled by ODP, and a new age model has been constructed (see Supplement). For some sites (see online summary at https://pliovar.github.io/km5c.html), revisions to the published age model were made, for example if the original data had been published prior to the LR04 stack (Lisiecki and Raymo, 2005) or prior to revisions to the palaeomagnetic timescale (Gradstein et al., 2012). In total, data from 32 sites was compiled, extending from 46°S to 69°N (Figure 2).

Proxy sea-surface temperature (SST) data
A multi-proxy approach was taken, to maximise the information available on changing climates and environments during the KM5c interval. Two SST proxies were analysed: the alkenone-derived U K 37' index (Prahl and Wakeham, 1987) and foraminifera calcite Mg/Ca (Delaney et al., 1985). Both proxies have several calibrations to 135 modern SST: here we explore the impact of calibration choice on KM5c SST data, by comparing and contrasting outputs between proxies and between calibrations. Although the TEX86 proxy (Schouten et al., 2002) has also been used to generate mid-Piacenzian SSTs (e.g. O'Brien et al., 2014;Petrick et al., 2015;Rommerskirchen et al., 2011), this data is not included here because it could not be confidently assigned to the KM5c interval either due to low sampling resolution and/or our age control protocol was not met.

Alkenone SSTs (the U K 37' index)
The majority of the 23 alkenone-derived sea-surface temperature (SST) datasets included in the PlioVAR synthesis used the U K 37' index, and applied the linear core-top calibration (60°S-60°N) (Müller et al., 1998) (hereafter Müller98; Tables S2 and S3). The Müller98 calibration applies the best fit between core-top U K 37' and 145 modern SSTs, recorded at the sea surface (0 m water depth) and consistent with haptophyte productivity in the photic zone. The sedimentary signal is proposed to record annual mean SST based on linear regression (Müller et al., 1998). Cultures of one of the dominant haptophytes, Emiliania huxleyi, generated only minor differences in the slope of the U K 37'-temperature relationship (Table S2), where growth temperature was used for calibration (Prahl et al., 1988). Several PlioVAR datasets were originally published using the Prahl et al. (1988) calibration (Table S3).
A recent expansion of the global core-top database (<70°N) was accompanied by Bayesian statistical analysis to assess the relationship(s) between predicted (from U K 37') and recorded ocean temperatures (Tierney and Tingley, 2018). The revised U K 37' calibration, BAYSPLINE, addresses non-linearity in the U K 37'-SST relationship at the 155 high end of the calibration, i.e. in the low-latitude oceans (Pelejero and Calvo, 2003;Sonzogni et al., 1997).
BAYSPLINE also highlights scatter between predicted and observed SSTs at the high latitudes, and explicitly reconstructs seasonal SSTs >45°N (Pacific) and >48°N (Atlantic), and in the Mediterranean Sea (Tierney and Tingley, 2018).

160
To test the impact of different alkenone temperature calibrations on the quantification of mid-Piacenzian SSTs, we converted all U K 37' data to SSTs using both the Müller98 calibration and the BAYSPLINE calibration. For most sites, BAYSPLINE was run with the recommended setting for the prior standard deviation scalar (pstd) of 10 (Tierney and Tingley, 2018). At high U K 37' values (above ~24°C) it is recommended to use the more restrictive value of 5, to minimise the possibility of generating unrealistic SSTs (e.g. >40°C) given that the slope of the U K 37'temperature calibration becomes attenuated (Tierney and Tingley, 2018).
To test the impact of different foraminifera Mg/Ca SST calibrations on mid-Piacenzian SSTs, we compare published SSTs with the recently developed BAYMAG calibration (Tierney et al., 2019b). We use published SSTs because the original researchers used their best judgement to choose a particular Mg/Ca-SST calibration, given that it (i) fitted modern (regional) core-top values; (ii) accounted for known environmental impacts (e.g.
[CO3 2-] correction); (iii) was developed within a particular research group; and/or (iv) fitted conventional wisdom at the time. BAYMAG uses a Bayesian approach that incorporates laboratory culture and core-top information to generate probabilistic estimates of past temperatures. BAYMAG assumes a sensitivity of Mg/Ca to salinity, pH and saturation state at each core site, and also accounts for Mg/Caseawater evolution through a linear scaling (i.e.

190
there is no change in the sensitivity of the palaeo-temperature equation as Mg/Caseawater evolves) (Tierney et al., 2019b). For each site with Mg/Ca data, we computed SSTs using BAYMAG's species-specific hierarchical model. In the absence of knowledge concerning changes in salinity, pH, and saturation state in the Pliocene, we assumed that these values were the same as today. We drew seasonal sea-surface salinity from the World Ocean Atlas 2013 product (Boyer et al., 2013), and pH and bottom water saturation state from the GLODAPv2 product 195 (Lauvset et al., 2016;Olsen et al., 2016). We used a prior standard deviation of 6˚C for all sites.

Climate models
The model outputs used here were generated from the 15 models that contribute to the Pliocene modelling intercomparison project, Phase 2 (PlioMIP2) (Haywood et al., in review, 2020). The boundary conditions for the 200 experiments, and their large-scale results for Pliocene and pre-industrial climates are detailed elsewhere (Haywood et al., in review, 2020;Haywood et al., 2016b), so are briefly outlined here.
The Pliocene simulations are intended to represent KM5c (~3.205Ma) and were forced with PRISM4 boundary conditions (Haywood et al., 2016b). Atmospheric CO2 concentration was set at 400 ppmv (Haywood et al., in 205 review, 2020), in line with the upper estimates of atmospheric CO2 from boron isotope data ( Figure 1; Foster et al., 2017). Lower estimates from the alkenone carbon isotope proxy ( Figure 1) are likely to reflect an insensitivity of this proxy to atmospheric CO2 in the Pliocene (Badger et al., 2019). All other trace gases, orbital parameters and the solar constant were specified to be consistent with each model's preindustrial experiment. The Greenland Ice Sheet was confined to high elevations in the Eastern Greenland Mountains, covering an area approximately 210 25% of the present-day ice sheet. The Antarctic ice sheet has no ice over Western Antarctica. The reconstructed PRISM4 ice sheets have a total volume of 20.1 × 10 6 km 3 , equating to a sea-level increase relative to present day of less than ∼24 m (Dowsett et al., 2016).
Modelling groups had some choices regarding exact implementation of boundary conditions, however 14 of the 215 15 models used the 'enhanced' PRISM4 boundary conditions (Dowsett et al., 2016) which included all reconstructed changes to the land/sea mask and ocean bathymetry. Key ocean gateway changes relative to modern are closure of the Bering Strait and Canadian archipelago. and exposure of the Sunda and Sahul shelves (Dowsett et al., 2016). Initialisation of the experiments varied between models (Haywood et al., in review, 2020). Some models were initialised from a pre-industrial state while others were initialised from the end of a previous Pliocene 220 simulation or another warm state. The simulations reached equilibrium towards the end of the runs as per PlioMIP2 protocol.

Statistical analysis (calculating of global means / gradients)
For all anomaly calculations we obtain pre-industrial SST from the NOAA-ERSST5 dataset for years 1870-1899 225 CE (Huang et al., 2017), ensuring alignment between the KM5c proxy data and the KM5c model experiments (Haywood et al., in review, 2020). This pre-industrial time window excludes the largest cooling linked to the Little Ice Age and pre-dates the onset of 20 th century warming (Owens et al., 2017;PAGES2k Consortium, 2017). The global mean SST anomaly from the proxy data was obtained as follows: firstly, the SST anomaly between the proxy data and the NOAA-ERSST5 data was obtained for each location, and the data collated into bins of 15° of 230 latitude. It is assumed that the average of all the data in each bin represents the average SST anomaly for that latitude band. Next, the area of the ocean surface for each bin is obtained. The average SST anomaly is then the average of all the bins weighted by the ocean area in the relevant latitude band.
Meridional gradients were obtained in a similar way. A low-latitude SST anomaly was obtained as the weighted 235 average of all the bins containing low-latitude SSTs (for example the 4 × 15° bins containing latitudes of 30°S -30°N). A high-latitude SST anomaly was obtained as the weighted average of all bins containing high latitude SSTs (>60°N, because there were no proxy data points >60°S). As only the Atlantic Ocean contained data points poleward of 65°N, the high latitude region used in the gradient calculations for both proxies and models was focused on the longitudinal window from 70°W-5°E. The meridional gradient SST anomaly is then the low-240 latitude SST anomaly minus the high-latitude SST anomaly, relative to the pre-industrial.
There are some uncertainties in this calculation of the global mean SST, in particular, the fact that the proxy data is not evenly distributed throughout a latitude bin, and also that some bins contain very few data points. There is a higher density of data in the Atlantic Ocean, compared to the Indian Ocean and Pacific Ocean, and no highlatitude data is available to consider a Southern Ocean response ( Figure 2). Nevertheless, this method of calculating averages does attempt to account for unevenly distributed data and provides a SST anomaly (SSTA) that is comparable with model results. The impact of proxy choice was examined in the calculation of the global means and meridional SST gradients. As no Mg/Ca data were available >50°N or >30°S, we calculated global mean SST and the meridional SST gradients either including or excluding the Mg/Ca data; both results are outlined 250 below and shown in Table 1.

Results
Relative to the pre-industrial, the combined U K 37' and Mg/Ca proxy data, using the original calibrations, indicate a KM5c global mean SST anomaly of +2.3ºC and a meridional SST gradient reduced by 2.6ºC ( Figure 3). The amplitude of the global SST mean anomaly in the combined proxy data exceeds those indicated in 10 of the 255 PlioMIP2 models, but is lower than the global SST anomaly from six models. The meridional temperature gradient anomalies are more comparable (Figure 3). If only the U K 37' data is used, the global mean SST anomaly from proxies is higher than all but three of the PlioMIP2 models, and the U K 37' meridional gradient calculations are smaller than all models (BAYSPLINE) or one model (original calibration; Figure 3).

260
Overall, the proxy data show the lowest temperature anomalies in the low latitudes, regardless of proxy (from +3ºC to -4ºC for sites <30°N/S). A larger range of temperature anomalies is reconstructed in the mid-and highlatitudes (from +9ºC to -2ºC for sites >30°N/S) ( Figure 4). Thus, there is a broad, but complex, pattern of enhanced warming at the mid-and high-latitudes, reflecting a combination of regional influences on circulation patterns, and to some extent, proxy choice. This pattern is not explained by temporal variability nor sample density within 265 the KM5c time interval: regardless of sample number per site, the standard deviation at any site within the KM5c time bin is <1.5ºC ( Figure S4). We note that of the 32 sites examined here, 7 provided a single data point for the

270
Calibration choice has a small impact over the reconstructed patterns of KM5c SST anomalies (Figure 4,and Figures S2 and S3). Below 24°C, absolute U K 37' SSTs using Müller98 are <1°C lower than those using BAYSPLINE. At high temperatures the non-linearity in the BAYSPLINE calibration means that BAYSPLINE-SSTs can be up to 1.67 °C ± 0.01°C higher than when using Müller98 ( Figure S2). The low latitude offset between

275
Müller98 and BAYSPLINE has two effects: it elevates the global mean SST (Figure 3, Table 1) and increases the KM5c meridional SST gradient towards pre-industrial values (Figures 3 and 4, Table 1). The calibration offsets are less systematic for Mg/Ca. There is a wider range of offsets between BAYMAG and published SST values (from -4 to +5°C, Figure S3 Table S3), although the smallest KM5c SST anomalies continue to be reconstructed in the low-latitudes, regardless of which Mg/Ca calibration is applied (Figure 4).
Overall, the U K 37'-temperature anomalies lie within the range given by PlioMIP2 models (Figure 4). The Mg/Ca estimates are mainly from the low latitudes, and high-latitude (>60ºN/S) Mg/Ca SST data are not available to calculate meridional gradients using foraminifera data alone (Figure 4). Mg/Ca-SST anomalies are generally lower than for U K 37', and a cooler KM5c than pre-industrial is consistently (but not always) recorded in the low-latitudes by Mg/Ca regardless of calibration choice ( Figure 4). As a result, combining U K 37' and Mg/Ca data leads to a cooler global mean SST (~2.3°C) than when using U K 37' alone (~3.2°C with Müller98, ~3.4°C with BAYSPLINE, Figure 3 and Table 1). At 8 sites, the negative KM5c SST anomalies in Mg/Ca disagree with both the U K 37' data and the PlioMIP2 model outputs (Figure 4)

300
KM5c is characterised by a surface ocean which is ~2.3°C (alkenones and Mg/Ca), ~3.2°C (alkenones-only, Müller98 calibration) or ~3.4°C (alkenones-only, BAYSPLINE calibration) warmer than pre-industrial, with a ~2.6°C reduction in the meridional SST gradient. The global mean SST anomaly is higher than the 1.7°C previously calculated for the wider mid-Piacenzian warm period (3.1-3.3 Ma), regardless of proxy choice (IPCC, 2014b). Previous analysis of a suite of models suggested that a climate state resembling the mid-Piacenzian was 305 likely to develop and be sustained under RCP4.5 (Burke et al., 2018). The PlioMIP2 ensemble (Haywood et al., in review, 2020) indicates that best estimates for mid-Piacenzian warming in surface air temperatures (1.7-5.2°) are comparable to projections for the RCP4.5 to 8.5 scenarios by 2100 CE(RCP4.5 = 1.8±0.5°, RCP8.5 = 3.7±0.7°C (IPCC 2013). Our proxy-based mean global SST anomaly is larger than in most PlioMIP2 models when we use alkenones-only or alkenones and Mg/Ca combined (Figure 3), and hence our results suggest that the global 310 annual surface air temperature anomaly for KM5c would exceed the PlioMIP2 multi-model surface air temperature mean of 2.8°C (Haywood et al., in review, 2020). The higher global SST mean recorded in the KM5c proxy data, compared to the PlioMIP2 models, occurs despite the available atmospheric CO2 reconstructions indicating values below the ~400 ppmv used in the PlioMIP2 models (Figure 1). Our synthesis of SST data thus indicates that with atmospheric CO2 concentrations ≤400 ppmv (comparable to RCP4.5), the surface ocean 315 warming response will likely be larger than indicated in models. Further work is required to increase the temporal resolution of the atmospheric CO2 reconstructions through KM5c, to improve our understanding of the reconstructed SST response to CO2 forcing, including whether (or by how much) the reconstructed atmospheric CO2 differ from model boundary conditions, and whether other changes in the model boundary conditions also influence SST patterns (e.g. gateway changes outlined in section 2.4).
Proxy choice, calibration choice, and site selection, have all had an impact on the magnitude of the change in meridional SST gradient for KM5c compared to the pre-industrial (Table 1)

. Focussing only on a Northern
Hemisphere SST gradient leads to higher gradient anomalies than when all of the low-latitudes are included (30°S-30°N), because it excludes the high SST anomalies of the Benguela upwelling sites (20-25°S, discussed below, 325 Figure 4). Smaller meridional SST gradient anomalies occur using BAYSPLINE (+0.03 to -1.66°C) than the Müller98 calibration for U K 37' (-1.18 to -3.00°C, Table 1), due to the increased low-latitude SST anomalies generated by BAYSPLINE (Figure 4). Due to several (but not all) low-latitude sites recording negative SST anomalies for KM5c using foraminifera Mg/Ca, the inclusion of Mg/Ca data leads to a larger difference in the meridional SST gradient relative to the pre-industrial (-2.19 to -4.08°C). Further work is required to fully 330 understand the negative KM5c SST anomalies in some of the low-latitude sites (discussed further below), given their impact on the meridional SST gradients. However, a robust pattern emerging from the data is that the KM5c proxy data detail smaller low-latitude SST anomalies than those of the mid-and high-latitudes SST anomalies ( Figure 4), leading to a reduction in the meridional SST gradient relative to the pre-industrial. Enhanced mid-and high-latitude warming has been observed in other warm intervals of the geological past, including the last 335 interglacial and the Eocene (Evans et al., 2018;Fischer et al., 2018), and is a feature of future climate under elevated CO2 concentrations (IPCC, 2014a).
There is complexity in the amplitude of the KM5c SST anomaly by latitude and basin, which may reflect patterns of surface ocean circulation. In the Northern Hemisphere, relatively muted warming in the East Greenland Current 340 (ODP Site 907, 69ºN) may reflect the presence of at least seasonal sea-ice cover from c.4.5 Ma (Clotten et al., 2018). In contrast, relatively high SST anomalies at ODP Sites 642 (67ºN) and 982 (58ºN) track northward flow of the North Atlantic Current, accounting for the enhanced warming relative to North-east Pacific IODP Site U1417 (57ºN; Figures 4 and 5). The large North Atlantic SST anomalies also contribute to an enhanced Northern Hemisphere meridional SST gradient of up to 4ºC (>60ºN minus 0-30ºN; Table 1). For the Southern Hemisphere,

355
Given that our proxy data meridional SST gradient calculations use only two sites to calculate the high-latitude SSTs (ODP Sites 907 and 642), which are also both from the Nordic Seas (Figure 2), we explored the impact of expanding our high-latitude band into the mid-latitudes. We also explored narrowing the low-latitude band so that it does not include the Benguela upwelling sites, which have a significant data-model offset ( Figure 4) and may be influenced by localised circulation changes (see section 4.2). Previous calculations of Pliocene meridional SST gradients have also considered differences between the mid and low latitudes through time (Fedorov et al., 2015).
Despite adding 4 more sites by expanding the high-latitude band to 45°N/S, the meridional SST gradients are reduced by <0.4°C, from -1.18 to -1.56°C using the original U K 37' data (Table 1). However, it is clear from the distribution of sites (Figure 2) that our reconstructed KM5c SSTs (and thus the global mean and meridional gradients) have a strong signal from the Atlantic Ocean. There is a relative scarcity of sites from the Indian Ocean, Pacific Ocean and Southern Ocean, but it is difficult to ascertain what impact this may have had on our global analysis. Further work is required to increase the spatial density of SST data for KM5c and the wider mid-Piacenzian stage, to better evaluate the magnitude of the warming and gradient changes outlined here.

370
For the mid-and high-latitudes, we find broad proxy data/model agreements for most sites. In the North Atlantic Ocean, reconstructed SST KM5c anomalies from U K 37' fall within the ranges provided by the PlioMIP2 models ( Figure 4) for all but one site (IODP Site U1387, 37°N). The overall U K 37'-model agreement for the North Atlantic Ocean suggests that, as proposed by Haywood et al. (2013), a focus on a specific interglacial within the mid-Piacenzian provides an improved comparison to the climate being simulated by the PlioMIP2 models. Thus, some 375 of the data-model mismatch in previous mid-Piacenzian syntheses (e.g. Dowsett et al., 2012) may have been due to the averaging of warm peaks which may not have been synchronous in time between sites and/or with the interval being modelled. Disagreements occur between proxies (North Atlantic Ocean) and between proxies and models (Benguela upwelling, Gulf of Cadiz, Mediterranean Sea) (Figure 4). Here, we explore the potential causes for these offsets in turn.

380
The largely U K 37'-derived data from the North Atlantic Ocean tend to align with the warmest model outputs (Figure 4), and the U K 37'-SST anomalies also tend to be larger than those from Mg/Ca. A challenge for understanding the cause(s) of the U K 37'-Mg/Ca differences is that only two sites have data from both proxies, and these do not show a consistent signal. There is good correspondence between U K 37' and the original published

385
Mg/Ca SSTs for IODP Site U1313 (41°N), whereas at ODP Site 609 Mg/Ca SSTs (both calibrations) are between 4.6-6.1°C cooler than U K 37'. It has also been shown that the U K 37'-Mg/Ca SST offset at Site 609 is not constant with time for the late Pliocene (Lawrence and Woodard, 2017). When using BAYMAG, warmer KM5c SSTs are reconstructed than the original published data at DSDP Site 603 and IODP Site U1313 (35°N and 41°N, Figure   4), but BAYMAG reconstructs SSTs only 0.8°C warmer than from the original published SSTs at Site 609. These 390 mid-latitude North Atlantic Mg/Ca data are provided by G. bulloides, which may calcify at depth in the water column (e.g. Mortyn and Charles, 2003;Schiebel et al., 1997), and account for those sites where Mg/Ca reconstructions give lower reconstructed SSTs than from U K 37' (Bolton et al., 2018;De Schepper et al., 2013). Alternatively, an offset between alkenones and Mg/Ca might be accounted for if there is a seasonal bias to the U K 37' calibration (e.g. Conte et al., 2006;Schneider et al., 2010). Despite documented seasonality in alkenone 395 production at high latitudes, it has been proposed that mean annual SSTs continue to be recorded by U K 37' in sediments (Rosell-Melé and Prahl, 2013), as indicated by the original U K 37' calibration (Müller et al., 1998). In contrast, BAYSPLINE explicitly assumes an autumn signal is recorded in Atlantic sites >45ºN (Tierney and Tingley, 2018). Despite these differences in interpretation, BAYSPLINE values for KM5c are <0.7ºC cooler than the original published U K 37' data ( Figure S2). Although the North Atlantic U K 37' data align with range of mean annual SST anomalies generated by the PlioMIP2 models (Figure 4), three of the sites show alignment between U K 37'-SSTs and the July-November values from the multi-model means ( Figure 5). In contrast, Site 907 aligns with cool spring temperatures in the models, perhaps reflecting production after sea ice melt.
The large data-model discrepancy at 30ºS reflects 3 sites which today sit beneath the Benguela upwelling system 405 in the South-East Atlantic (20-26°S, Figure 4). Part of the data-model discrepancy in the KM5c anomaly can be attributed to the models over-estimating pre-industrial SSTs in the northern Benguela sites (NOAA-ERSST5 SSTs are 2-5ºC below the pre-industrial model range), and suggests that models are not fully capturing the local dynamics of the coastal upwelling today (Small et al., 2015). Realistic representations of the Benguela upwelling system today are proposed to require realistic wind stress curl and high-resolution atmosphere and ocean models 410 (<1°, Small et al., 2015). Most of the PlioMIP2 simulations use lower resolution atmosphere and ocean models (Haywood et al., in review, 2020). An increased density of proxy data reconstructing KM5c atmospheric circulation, as well as application of high-resolution models, may help to understand the observed KM5c datamodel discrepancy. Furthermore, there was a deep thermocline during the Pliocene (as reconstructed in the Equatorial Pacific (Ford and Ravelo, 2019;Ford et al., 2015;Steph et al., 2006;Steph et al., 2010) and theorized 415 globally (Philander and Fedorov, 2003)), so that warmer subsurface waters than today were upwelled, enhancing local warming. However, warming of ~3.4ºC in subsurface waters (Ford and Ravelo, 2019) and ~2.5ºC in intermediate waters (McClymont et al., 2016) for Pliocene interglacials suggest that Pliocene upwelling of warmer waters is unable to fully account for the 7-10ºC SST anomalies in Benguela sites for KM5c. Changes to the distribution of export productivity and SSTs indicate that an overall poleward displacement of the Benguela

420
Upwelling system occurred during the Pliocene, so that the main zone of upwelling likely sat close to ODP Site 1087 at 31ºS (Etourneau et al., 2009;Petrick et al., 2018;Rosell-Melé et al., 2014). As the northern and southern Benguela regions are today marked by differences in the seasonality of the upwelling, a temporal shift in upwelling intensity may also account for some of the large SST anomaly (Haywood et al., in review, 2020). Thus, the datamodel disagreement may be accounted for by a combination of displaced upwelling and warmer upwelled waters,

425
giving large SST anomalies in Benguela proxy data, alongside the challenges of modelling both the pre-industrial and KM5c upwelling system and its associated SSTs.
Data-model disagreement also occurs at two northern hemisphere sites where U K 37'-SST anomalies exceed those given by the range of model predictions (Figure 4). Punto Piccola (Sicily, 37°N) is located within the Mediterranean Sea, whereas IODP Site U1387 (37°N, Iberian margin) records the influence of the waters sourced from the Azores Current and the subtropical gyre. The data-model disagreement for KM5c reflects warmer SST estimates from the proxy data compared to the models, despite the good agreement for the pre-industrial suggesting that locally complex ocean circulation in these near-shore and marginal marine settings may have been captured in the models. The data-model offset is also likely to be a minimum, because BAYSPLINE 435 Mediterranean SSTs explicitly record November-May temperatures (Tierney and Tingley, 2018), and alkenone production below the sea surface has also been proposed (Ternois et al., 1997): both scenarios would act to raise mean annual SSTs further from those simulated in the PlioMIP2 models (Figure 4). Further multi-proxy investigation is required to identify whether the data-model disagreements in Benguela upwelling, Gulf of Cadiz, and Mediterranean Sea reflect challenges in modelling near-shore or complex oceanographic systems and/or 440 biases in the temperature signal recorded by the proxy data.

Data-model comparisons for low-latitude sites
The low-latitude U K 37'-SST anomalies for KM5c align well with the PlioMIP2 models (Figure 4). At ODP Sites 806 and 959, the Mg/Ca anomalies using the original calibrations are both +0.3°C compared to the pre-industrial 445 ( Figure 4) and also align with the PlioMIP2 models. At Site 806 the BAYMAG KM5c anomaly (+1.7°C) also aligns with the PlioMIP2 models. Only one low-latitude site has both U K 37' and Mg/Ca SST data: ODP Site 1143 (9°N) records KM5c anomalies of +0.8 to +2.5°C (U K 37') or -0.7 to -1.3 °C (Mg/Ca). Although the U K 37' data align with the model outputs for Site 1143, the negative anomaly in Mg/Ca lies outside the model range for mean annual SST (Figure 4).

450
Six of the low-latitude sites have negative low-latitude SST anomalies in KM5c from foraminifera Mg/Ca; these occur regardless of whether the original or BAYMAG calibrations are applied, and for both G. ruber and T.
sacculifer-based reconstructions. The negative KM5c Mg/Ca-SST anomalies lie beyond those shown across the PlioMIP2 model range (Figure 4), despite the absolute Mg/Ca-SSTs reconstructed from these sites for KM5c 455 falling within the model range for all but two of the sites (ODP Sites 999 (13°N) and 1241 (6°N); Figure S5).
However, the absolute SST values reconstructed for KM5c from Mg/Ca tend to align with the colder model outputs ( Figure S5).
Mg/Ca-SST calibration choice has no consistent impact on the KM5c anomalies (across all latitudes, Figure 4).

460
Therefore, the corrections for secular seawater Mg/Ca change and/or non-thermal influences over Mg/Ca, which are accounted for in BAYMAG (Tierney et al., 2019b) do not account for these cold tropical KM5c anomalies.
For example, for ODP Site 806 in the Western Pacific warm pool, BAYMAG SST estimates for KM5c are ~1°C warmer than the published Mg/Ca record (Wara et al., 2005). For Site 999 in the Caribbean Sea, BAYMAG SST estimates for KM5c are ~0.5°C cooler than the published Mg/Ca record (De Schepper et al., 2013). This also 465 suggests the impact of Mg/Caseawater change on SST is small on warm pool sites. The Mg/Caseawater correction used in BAYMAG is conservative, drawing on multiple lines of physical evidence (corals, fluid inclusions, calcite veins, etc) (Tierney et al., 2019b). Given the variable directions of the offsets between published and BAYMAG SSTs shown here, the Mg/Caseawater correction is unable to account for the data-model offsets observed for the low latitudes.

470
CaCO3 dissolution in the water column and sediments could lead to a cool bias on the Mg/Ca-SSTs (Dekens et al., 2002;Regenberg et al., 2006;Regenberg et al., 2009). However, the cool KM5c anomalies also occur if the forward-modelled core-top Mg/Ca SSTs from BAYMAG are used as the pre-industrial 'reference' ( Figure S6).
The cold low-latitude anomalies for KM5c could reflect an increase in the calcification depth of the foraminifera, 475 since the surface-dwelling foraminifera analysed here calcify at a range of depths, particularly in the tropics where the thermocline is deep in comparison to mid-to high-latitudes (Fairbanks et al., 1982;Curry et al., 1983). The negative anomalies are broadly smaller for G. ruber (-0.4 to -1.2°C) than for T. sacculifer (-0.6 to -3.5°C), consistent with a deeper depth-habitat for the latter (Curry et al., 1983), although at Site 959 the G. ruber anomaly using BAYMAG is -3.8°C. There is therefore a lack of consistency between sites, which is difficult to resolve when single species have been analysed for each of the sites through KM5c.
Where there are very large differences between BAYMAG and published Mg/Ca SST estimates, regardless of latitude (e.g. North Atlantic, Figure 4), we suggest that some combination of calibration difference, Mg/Caseawater change and/or other environmental factors including seasonality and calcification depth may offer an explanation.

485
To fully investigate the cause(s) of offsets in Mg/Ca SST reconstructions requires future multi-species analysis for Mg/Ca for each site, and multi-proxy analysis for each site. Such an approach would enable exploration of a wider range of potential influences over both the Mg/Ca and U K 37' SST reconstructions, and a reduction in the uncertainties of the reconstructed SSTs and their anomalies. Alongside foraminifera Mg/Ca and U K 37' analyses, additional proxies which are likely to add valuable information about water column structure and seasonality 490 could include TEX86 (Schouten et al., 2002), long-chain diols (Rampen et al., 2012), and clumped isotopes (Tripati et al., 2010). Previous research has demonstrated that even within a single site there can be offsets between proxies which are not continuous through time (e.g. Lawrence and Woodard, 2017;Petrick et al., 2018), so that high resolution and multi-proxy work is required to fully understand the offsets we have identified here. Resolving the causes of the different proxy-proxy and proxy-model offsets is important, because they impact the calculation of the global mean SST anomaly relative to pre-industrial; however, even with inclusion of the overall cooler Mg/Ca data, the combined KM5c proxy data still indicate a global mean SST anomaly which is larger than most models from the PlioMIP2 experiments ( Figure 3).

500
This study has generated a new multi-proxy synthesis of SST data for an interglacial stage (KM5c) from the Pliocene. By selecting an individual interglacial, with orbital forcing similar to modern, we are able to focus on the SST response to atmospheric CO2 concentrations comparable to today and the near-future (~400 ppmv), but elevated relative to the pre-industrial. Using strict stratigraphic protocols we selected only those data which could be confidently aligned to KM5c. By comparing different calibrations and two different proxy systems (U K 37' and

505
Mg/Ca in planktonic foraminifera) we identified several robust signals which are proxy-independent. First, global mean SSTs during KM5c were warmer than pre-industrial. Second, there was a reduced meridional SST gradient which is the result of relatively small low-latitude SST anomalies and a larger range of warming anomalies for the mid-and high-latitudes. Overall, there is good data-model agreement for both the absolute SSTs and the anomalies relative to the pre-industrial, although there are complexities in the results. Further work is required to 510 generate multi-proxy SST data from single sites, accompanied by robust reconstructions of thermocline temperatures using multi-species foraminifera analysis, so that the range of factors explaining proxy-and calibration-offsets can be explored more fully.
The choice of proxy for SST reconstruction impacts the overall calculation of global mean SST and the meridional 515 gradients. The negative anomalies in Mg/Ca-SSTs in six of the sixteen low-latitude sites lowers the global mean SST of KM5c from ~3.2-3.4ºC (U K 37'-only) to ~2.3ºC (combined U K 37' and Mg/Ca). The meridional SST gradient anomalies are decreased to -2.6ºC (combined U K 37' and Mg/Ca) relative to the pre-industrial, although a more muted reduction (up to -1.18°C) occurs with U K 37' alone. A number of factors may lead to a cool bias in the foraminifera Mg/Ca SSTs, which require further investigation through multi-proxy and multi-species analysis, particularly in low-latitude sites.
We identify the strongest warming across the North Atlantic region. The results are consistent with the PlioMIP2 models, although the largely U K 37' data sit at the high end of the calculated model anomalies. Although seasonality may play a role in the proxy data signal, these results also suggest that many models may under-estimate high-525 latitude warming even with the moderate CO2 increases identified in KM5c relative to the pre-industrial. More data points are required to fully explore these patterns: for seven sites only one data point lay within KM5c, and more than half of the analysed sites (18/32) recorded Atlantic Ocean SSTs.
Both the PlioMIP2 models (Haywood et al., in review, 2020) and future projections (IPCC, 2018) indicate that 530 warming is higher over land than in the oceans in response to higher atmospheric CO2 concentrations. Our synthesis of KM5c thus likely represents a minimum warming to be expected with atmospheric CO2 concentrations of ~400 ppmv. Even under low CO2 emission scenarios, our results demonstrate that surface ocean warming may be expected to exceed model projections, and will be accentuated in the higher latitudes.

555
The authors declare they have no conflict of interest.

Acknowledgements
This work is an outcome from the several workshops sponsored by Past Global Change (PAGES) as contributions to the working group on Pliocene Climate Variability over glacial-interglacial timescales (PlioVAR). We 560 acknowledge PAGES for their support and the workshop participants for discussions. Funding support has also been provided by NERC (NE/I027703/1 and NE/L002426/1 to ELM, NERC NE/N015045/1 to HLF), Leverhulme   Table S3 (U K 37') and    60°N-75°N, so that a more negative change in the gradient reflects a larger warming anomaly at high latitudes relative to low latitudes. As we only had data points poleward of 65°N in the Atlantic Ocean, the high latitude region for both proxy and model gradient calculations focuses on the longitudinal window from 70°W-5°E. Proxy data calculations were made using either all proxy data (U K 37' and Mg/Ca using their original calibrations), or using only U K 37' data and comparing the original and BAYSPLINE calibrations. No Mg/Ca data is 865 available >60°N so we were unable to calculate Mg/Ca-only gradients (Figure 4). The impact of changing the low-and high-latitude bands is explored in Table 1.

885
(yellow), which tend to be the maxima and minima, respectively.