Effects of strongly eddying oceans on multidecadal climate variability in the Community Earth System Model

Climate variability on multidecadal time scales appears to be organized in pronounced patterns with clear expressions in sea surface temperature, such as the Atlantic Multidecadal Variability and the Pacific Decadal Oscillation. These patterns are now well studied both in observations and global climate models and are important in the attribution of climate change. Results from CMIP5 models have indicated large biases in these patterns with consequences for ocean heat storage variability and eventually the global mean surface temperature. In this paper, we use two multi-century Community Earth 5 System Model simulations at coarse (1◦) and fine (0.1◦) ocean model horizontal grid spacing to study the effects of the representation of mesoscale ocean flows on major patterns of multidecadal variability. We find that resolving mesoscale ocean flows both improves the characteristics of the modes of variability with respect to observations and increases the amplitude of the heat content variability in the individual ocean basins. The effect on the global mean surface temperature is relatively minor.

coupler. The high resolution (HIGH) simulation employs a 0.1 • ocean horizontal grid spacing on a tripolar grid, while the low resolution (LOW) simulation has a 1 • ocean horizontal grid spacing with a displaced dipolar grid. The effect of subgrid-scale 95 processes on tracer and momentum transport is captured with a biharmonic diffusion operator in the HIGH simulation and the GM parameterization in the LOW simulation (Gent and McWilliams, 1990). The HIGH simulation was initialized from a simulation of several decades provided by the National Center for Atmospheric Research, while the LOW simulation was initialized from a decally averaged ocean state at year 1000 of a CESM 1.1.2 simulation performed with the same resolution. CESM 1.1.2 and CESM 1.0.4 exhibit only minor differences in their ocean state with CESM 1.0.4 having slightly higher SSTs 100 (not shown). Table 1 summarizes the important simulation characteristics. From earlier simulations with the same model components (CCSM3.5: Kirtman et al. (2012) and CESM1: Small et al. (2014)), it is known that the climatology of the higher resolution simulation improves compared to the lower resolution simulation in many aspects. The overall SST biases reduce due to a better representation of boundary currents, ocean upwelling and air-sea interactions (Small et al., 2014). Some biases remain, though, such as too warm high latitude SSTs which result in low 105 sea ice extent and sea ice volume biases in the high resolution simulation (Kirtman et al., 2012). Figure 1 shows the global mean surface heat flux into the ocean indicating the equilibration of the simulations. The HIGH simulation equilibrates significantly faster than the LOW simulation. The GFDL CM2 climate models show a similarly decreased drift in their higher resolution setups which is due to an enhanced upward heat transport by the explicitly resolved eddies (Delworth et al., 2012;Griffies et al., 2015). In our CESM simulations, all basins are initiated too cool and hence the The results in Fig. 1 indicate that a trend must be removed despite the constant forcing as the model simulations are still 115 adjusting towards their statistical equilibrium. We remove a quadratic trend at each grid point of the relevant model output.
This approach allows for different proximities to the statistical equilibrium and different time scales of adjustment at each grid cell. A further choice in the analysis is the selection of model years which are used in the analysis. Of the HIGH simulation we ignore the first 50 spinup years as strong adjustment is evident in the global mean SST approximately until year 40. As mentioned, the LOW simulation is initialized at year 1000 of a CESM 1.1.2 simulation with the same forcing and grid that maintains a near constant surface heat flux of 2.2 ZJ yr −1 concurrent to the LOW simulation. We choose to discard the first 10 years of data to avoid fast adjustment processes and analyze years 11-260 (cf. Table 1). Monthly data is de-seasonalized by removing the mean annual cycle, i.e. the mean monthly difference to the annual mean.
To compare the SST model results with observations, we use the HadISST sea surface temperature dataset from 1870-2018 which is provided on a 1 • × 1 • grid (Rayner et al., 2003). The observational dataset must be detrended with an appropriate 125 estimate of the historical forced signal to allow for a fair comparison with the results of the model simulations, which are obtained under constant forcing. A simple linear detrending is unable to remove the non-linear historical forcing signal . When an ensemble of model simulations is available, the forced signal can be approximated as the ensemble mean assuming that the internal variability of the individual ensemble members is uncorrelated. One can use either the mean of a single model ensemble (such as that of the Max-Planck Institute Grand Ensemble (Maher et al., 2019) or the CESM Large

130
Ensemble (Kay et al., 2015)) or a multi-model ensemble (multi-model mean: MMM; such as that of the CMIP5 ensemble).
The MMM is generally superior to the single model ensemble mean in the historical period (Frankcombe et al., 2018), so we use CMIP5 ensemble results. The forced signal can furthermore be estimated as a single time series (single factor detrending; used for example by Steinman et al. (2015)) or as a linear combination of different forcing signals (multi-factor detrending as described below).

135
For the HadISST dataset, we choose to separate the natural and anthropogenic forcing signals which was found to be superior to single-factor detrending (Frankcombe et al., 2018). We use the CMIP5 MMM forced with observed aerosol and greenhouse gas concentrations until 2005 and those from the RCP8.5 scenario afterwards until 2018 (like, e.g. Kajtar et al. (2019)). The natural forcing is assumed to equal the CMIP5 MMM of the historical (natural forcing only) simulations. The anthropogenic contribution (MMM ant ) due to greenhouse gases and aerosols is calculated as the difference between the fully forced MMM 140 (MMM all ) and the MMM with only natural forcing (MMM nat ), i.e., At each grid point, scaled versions of these two time series are subtracted from the historical reconstructed SST, such that the detrended SST is To assess the spatial expression of the modes of multidecadal variability, we use regression plots (Deser et al., 2010). The plotted regression values R are defined as the covariance of the SST field and the normalized index X(t), where X = AMV, 150 PDO, or SOM, according to R(x, y) = cov(SST(x, y, t), X(t)) std(X(t)) , such that R has units of temperature. The significance against a no-correlation null hypothesis is tested with a two-tailed Student's t-test. Because the time series are autocorrelated and filtered, the effective number of data points n is lower than the original sample size n and can be estimated as the maximum of the reduced sample size due to filtering and due to 155 autocorrelation (Trenberth, 1984): where r 1,X (r 1,Y ) is the lag-1 autocorrelation of time series X (Y ), f the filtering frequency, and ∆t the time step.
We analyze time series in the spectral domain with multi-taper spectra (Ghil et al., 2002). This estimator of spectral density is superior to the classic periodogram in that it reduces spectral leakage and is statistically robust, i.e. the estimated noise 160 reduces with more data points. However, as a trade-off the effective spectral resolution is reduced which becomes problematic at periodicities near the length of the time series. We therefore limit our analysis to periodicities below 50 years when focussing on model-observation comparisons but extend the range to 100 year periods for comparisons between the simulations. We use a bandwidth parameter of 2 and the number of tapers is 3. To test significance against a red noise null hypothesis (Hasselmann, 1976), we generate 10,000 Monte-Carlo first order autoregressive (AR(1)) processes. The autocorrelation coefficient and noise    ison between the results, also the period of variability of the historical data has been extended to 50 years, although the higher period results are less reliable than in the models, because of the difference in data series length. In the detrended observations, 220 significant (99%) spectral power for all three indices occurs at large periodicities above 40 years. The PDO signal also extends beyond the red noise (95%) at periods of about 20 years. Most previously published spectral analysis of the North Atlantic observed temperatures find significant periodicities around 50-70 years, but they overwhelmingly remove a linear trend only which distorts the signal . Longer proxy data records reveal significant multidecadal variability around the North Atlantic from sources including Greenland ice cores (Chylek et al., 2011), tree rings (Gray et al., 2004), sediments 225 (Knudsen et al., 2011), or a combination of proxies (Delworth and Mann, 2000;Wang et al., 2017). In the HIGH simulation, significant spectral power peaks only exist at 40-50 years for the AMV index (>95%), around 13-15 and 20-25 years for the PDO index (>99%), while the SOM index shows significant variability around 15 years (>95%) and above 40 years (>95%).
The indices of the LOW simulation exhibit no significant spectral power for either the AMV or SOM indices and the red noise null hypothesis cannot be rejected at 95% level, and only the PDO index has a significant spectral peak (95%) at 16 years. year HIGH and LOW simulations (center and right, respectively). The units of the spectral power are [K 2 yr] for the SST average AMV and SOM and [yr] for PDO index which is dimensionless as a principal component. As a null hypothesis 10,000 AR(1) processes were simulated to estimate the 95% and 99% red noise confidence interval (solid and dashed orange lines).  integrated Atlantic and Pacific multidecadal spectral power is higher in the HIGH simulation compared to the LOW simulation.

Surface Heat Fluxes
Remarkable is further that the Southern Ocean SHF spectrum is nearly white with similar power at all frequencies, while both the Atlantic and Pacific show decreasing power at lower frequencies.

Ocean Heat Content
The basin-scale ocean heat content ( (2019)), but they lack the detail that we aim to investigate here so that we compare model data only.
The HIGH simulation temperature data has been interpolated to a 0.4 • rectangular grid and the OHC calculations were performed on that grid. The OHC of the LOW simulation is calculated on the original displaced dipole grid and zonal integrals 250 are performed along the grid x-direction (so not along parallels in the high northern latitudes). The first two columns of Figure 6 are Hovmöller diagrams of 13 year low-pass filtered zonally and vertically integrated OHC anomalies, indicated by OHC v , i.e., OHC v (y, t) = c p ρθ(x, y, z, t) dx dz , where θ is the potential temperature, ρ the density, and c p = 3996 J kg −1 K −1 the sea water heat capacity used in the CESM.

255
The last column in Figure 6 shows the standard deviation of the 13 year low-pass filtered anomalies as a function of latitude.
Almost everywhere the variability as measured by this standard deviation is higher in HIGH compared to LOW. The first row shows the global signal which also captures the Southern Ocean signal south of 34 • S (marked by the green line). Here signs of the SOM in the HIGH simulation can be seen with north-and southward propagating anomalies; these are absent in the LOW simulation. The Atlantic behavior is qualitatively similar between the HIGH and LOW simulations with southward propagating This is also visible to a lesser extent in the South Pacific just north of 30 • S. In the equatorial Pacific, the unrealistically strong ENSO signal of the LOW simulation is filtered out by the 13 year low-pass filter. However, compared to the HIGH OHC v , 265 the difference is reduced between the tropical and extratropical low frequency spectral power as revealed by the standard deviations. The Atlantic and Pacific peaks at northern midlatitudes are shifted equatorward in the HIGH simulation compared to the LOW simulation due to a better representation of the western boundary current separation. In the Indian Ocean, the HIGH simulation shows a less meridionally coherent pattern compared to the LOW simulation. Figure 7 shows 13 year low-pass filtered Hovmöller diagrams of the horizontally integrated OHC anomalies, given by 270 OHC h (z, t) = c p ρθ(x, y, z, t) dx dy, 12 https://doi.org/10.5194/os-2020-85 Preprint. Discussion started: 18 September 2020 c Author(s) 2020. CC BY 4.0 License.

Global Mean Surface Temperature
The multidecadal global mean surface temperature (GMST) evolution is a consequence of the heat flux convergence in the atmosphere as energy is exchanged through the sea surface with the oceans and through the top of the atmosphere with outer space. Figure 8 shows the time series and spectral estimates of the GMST. At periods around 4 years, the too strong ENSO of the LOW simulation (as extensively analysed in Wieners et al. (2019)), leads to a larger interannual variability in the GMST time 290 series and manifests iteself as a peak of the GMST spectral signal. At multidecadal time scales (beyond periods of 13 years) the integrated spectral power in the HIGH simulation is higher than in the LOW simulation, but for periodicities at 35-60 years the LOW GMST spectral power is higher than that of the HIGH simulation. The spectral power of the two-factor detrended historical GMST data lies between both simulations at sub-decadal time scales, agrees well with both simulations at (inter-) decadal time scales, and exceeds both simulations at periods above 30 years.

Summary and Discussion
We investigated the effect of ocean model resolution on multidecadal variability by contrasting two multi-century simulations with the Community Earth System Model (CESM), one with a non-eddying ocean typical of the CMIP5 models (LOW) and one with a strongly eddying ocean (HIGH). The enhanced horizontal ocean model resolution allows for more detailed features of the circulation with many documented improvements, such as better representation of boundary currents, reduced SST biases, more pronounced in the HIGH simulation than in the LOW simulation (Fig. 2) and compare more favourably to observations both in regression patterns (Fig. 3) and spectral properties (Fig. 4). At multidecadal time scales, stronger surface heat fluxes 305 (globally and in all major ocean basins) are associated with this larger SST variability (Fig. 5). The integrated ocean heat content (OHC) signal varies more strongly in the HIGH simulation than in the LOW simulations and anomalies extend to greater depths (Figs. 6, 7). However, while the integrated spectral power of the global mean surface temperature (GMST) at multidecadal tie scales is larger in the HIGH simulation than the LOW simulation, there are frequencies at which the LOW GMST variability is larger (Fig. 8). From the analyzed LOW and HIGH CESM simulations, we conclude that representing 310 mesoscale eddies in the CESM leads to an enhanced intensity of multidecadal variability in the ocean.
In the Atlantic, an improvement in the simulation of the AMV pattern is evident in the HIGH simulation compared to the LOW simulation. While the overall AMV pattern is similar in observations and model simulations, the subpolar maximum in the regression patterns in the HIGH simulation matches the detrended observations in contrast to the subtropical maximum in the LOW resolution. Due to the specifics of our detrending approach, the historical regression pattern looks somewhat different from Deser et al. (2010), especially outside the North Atlantic. In particular, the pattern shows a statistically significant positive correlation in the northwestern tropical Pacific that is not evident in the regression pattern of Deser et al. (2010). The AMV spectra also show that the HIGH simulation performs better compared with the historical SST data by showing significant power at multidecadal periodicities against the red noise null hypothesis, while the spectral estimate of the LOW AMV signal fails to reject this null hypothesis (Fig. 4). While the magnitude of OHC variability is larger in the HIGH simulation, the 320 meridional and depth structures of the Atlantic OHC variability are similar between the HIGH and LOW simulations (Figs. 6   7). This suggests that the AMOC effects on the OHC variability are captured with the coarse resolution in agreement with earlier studies (Delworth et al., 1993;Delworth and Mann, 2000) providing support for physical mechanisms of the AMV deduced from idealized models which are independent of mesoscale variability (Te Raa and Dijkstra, 2002). Increasing the resolution can still improve the AMV relative to observations by reducing ocean mean state biases, in particular the representation of the 325 Gulf Stream and of deep water formation.
We use the PDO index to capture Pacific low frequency variability and the resulting regression patterns are similar between observations and simulations. The spectral estimates of both the historical PDO and simulated ones extend beyond the 95% confidence interval interdecadal time scales around 20 year periodicity. The LOW simulation exhibits a strong and regular ENSO signal (Wieners et al. (2019); visible e.g. in Fig. 5c) and the low frequency tropical Pacific SHF spectral power is rela-330 tively larger than the extratropical spectral power compared to the HIGH simulation (Fig. 5i). That the PDO partly comprises a low-frequency ENSO signal is visible in the strong tropical Pacific regression maximum on the LOW simulation. With the better representation of the western boundary currents, in particular the Kuroshio, in the HIGH simulation, there are qualitative differences in the meridional structure of the Pacific OHC anomaly propagation: around 30 • N, and to a lesser degree around 30 • S, equatorward propagation is evident in the HIGH simulation while meridionally coherent anomalies appear in the LOW 335 simulation (Fig. 6).
The representation of the Antarcic Circumpolar Current in the Southern Ocean changes dramatically between the HIGH and LOW simulations due to the presence of mesoscale eddies. The multidecadal SHF variability is much enhanced in the strongly eddying HIGH simulation compared to the LOW simulation. Ocean heat anomalies also penetrate much deeper into the Southern Ocean, suggesting a crucial difference between explicitly simulated mesoscale eddies and their parametrization 340 in advecting heat downward (Fig. 7). In the HIGH simulation, the SOM is visible in the meridional structure with north-and southward moving anomalies, while anomalies simply converge on 45 • S in the LOW simulation (Fig. 6). The presence of the SOM is expected in the HIGH simulation as it is thought to be caused by the interaction of mesoscale eddies and the mean flow (Hogg and Blundell, 2006;Le Bars et al., 2016;Jüling et al., 2018). However, the SOM regression patterns of the HIGH and LOW simulations are more similar to one another than to the observations with a more azonal, circumpolar wavenumber-3 345 pattern and stronger correlations extending into the South Pacific. However, the pattern in the HadISST dataset is likely to be biased due to the limited number of observations in this region, in particular prior to satellite observations.
Multiple mechanisms through which mesoscale eddies can generate multidecadal variability have been alluded to in the introduction, such as eddy-mean flow interactions (Hogg and Blundell, 2006;Berloff et al., 2007a), the changes to the stratifi-cation introducing memory (Manucharyan et al., 2017) or the modification of conditions for convection (Dufour et al., 2017).
Furthermore, only eddying models are capable of simulating observed high-frequency variability correctly (Penduff et al., 2011), where an 'inverse temporal cascade' is important for the shift of variance from high to low frequencies. Any mechanism, involving eddies or not, leading to such a cascade thus requires a source of spectral energy at high frequencies which can be provided by the eddies. It is beyond the scope of this study to analyze the exact mechanisms at play in the HIGH simulation, but it will be the subject of further study.

355
Naturally, the comparison between observations and model results is fraught with challenges, not least of which is the choice of an appropriate detrending procedure. The inadequate removal of the forced signal in the observations leads to biased regression patterns (Brown et al., 2015). In the model simulations the forcing is held constant and there is only a modest model drift to detrend ( Fig. 1) for which we chose to a second order polynomial. The observations are based on the climate system's response to non-stationary forcing and we chose the scaled multi-model mean approach (Frankcombe et al., 2015). Here the 360 detrending signal is derived from the multi-model mean of the CMIP5 models which are biased in their own ways and have different relative climate sensitivities to different forcings. Furthermore, the external forcing prescribed in the CMIP5 models is in itself uncertain and hence may alter the relative contribution due to external forcing and internal variability. Furthermore, the atmospheric grid spacing decreases from 1 • in the LOW to 0.5 • in the HIGH simulation. However, no new essential atmospheric processes are resolved, so no significant changes are expected apart from coupling to different ocean boundary 365 conditions.
These caveats do not detract from our main point which is to emphasize that low frequency variability may be underestimated in climate models with low-resolution ocean model components and that representing mesoscale eddies in climate models can improve the simulation of this variability. Of course, we showed this here only for the CESM but expect the improvement of multidecadal variability to be generalizable to other coupled climate models, as some of the features like enhanced vertical heat 370 transport are consistent with results obtained with other models (Griffies et al., 2015). The introduction of high frequency variability through mesoscale ocean features and the reduced ocean state bias can further improve the simulations' multidecadal variability skill. Certainly, better eddy parametrizations may improve shortcomings of the traditional Gent-McWilliams approach (Zanna et al., 2017). A recent study claimed an absence of evidence for multidecadal variability in CMIP5 models (Mann et al., 2020), but all these models use non-eddying ocean components. As most CMIP6 models outside the High Reso-375 lution Model Intercomparison Project (Haarsma et al., 2016) still use non-eddying ocean components, not much improvement is expected on the representation of multidecacal variability in these models compared to CMIP5 models.
Internal variability obscures any forced GMST signal and periods of accelerated and decelerated warming are observed, such as the recent warming trend slowdown . Many studies use global circulation models with coarse resolution ocean components to investigate, for example, the origin of these so-called hiatuses (e.g. Maher et al. (2014)). In 380 light of our findings, estimates of the frequency and magnitude of excursions from the forced trend may be systematically low biased in low-resolution models, as internal multidecadal variability is underestimated. The increased internal variability also implies that the attribution of forced signals becomes more difficult and the issue of the origin of the recent warming trend slowdown may therefore never be satisfactorily resolved (Hedemann et al., 2017). Finally, our finding of stronger than