The Zone of Influence: Matching sea level variability from coastal altimetry and tide gauges for vertical land motion estimation

Vertical land motion (VLM) at the coast is a substantial contributor to relative sea level change. In this work, we present a refined method for its determination, which is based on the combination of absolute satellite altimetry (SAT) sea level measurements and relative sea level changes recorded by tide gauges (TG). These measurements complement VLM estimates from GNSS (Global Navigation Satellite System) by increasing their spatial coverage. Trend estimates from SAT and TG 5 combination are particularly sensitive to the quality and resolution of applied altimetry data as well as to the coupling procedure of altimetry and tide gauges. Hence, a multi-mission, dedicated coastal along-track altimetry dataset is coupled with highfrequency tide gauge measurements at 58 stations. To improve the coupling-procedure, a so-called ’Zone of Influence’ is defined to identify coherent zones of sea level variability on the basis of relative levels of comparability between tide gauge and altimetry observations. Selecting 20% of the most representative absolute sea level observations in a 300 km radius around 10 the tide gauges results in the best VLM-estimates in terms of accuracies and uncertainties. At this threshold, VLMSAT-TG estimates have median formal uncertainties of 0.59 mm/year. Validation against GNSS VLM estimates yields a root-meansquare (RMS∆VLM) of VLMSAT-TG and VLMGNSS differences of 1.28 mm/year, demonstrating the level of accuracy of our approach. Compared to a reference 250 km radius selection, the 300 km Zone of Influence improves trend accuracies by 12% and uncertainties by 28%. With progressing record lengths, the spatial scales of coastal sea level trend coherency increase. 15 Therefore the relevance of the ZOI for improving VLMSAT-TG accuracies decreases. Further individual Zone of Influence adaptations offer the prospect of bringing the accuracy of the estimates below 1 mm/year.

. Applied models and geophysical corrections for estimating sea level anomalies.

Parameter
Model reference Range and Sea State Bias ALES (Passaro et al., 2014) Inverse barometer DAC-ERA (Carrère et al., 2016) Wet troposphere GPD+ (Fernandes et al., 2015) Dry troposphere VMF3 (Landskron and Böhm, 2018) Ionosphere NIC09 (Scharroo and Smith, 2010) Ocean and Load tide FES2014 (Carrère et al., 2015) Solid Earth and Pole tide IERS 2010 (Petit and Luzum, 2010) Mean Sea surface DTU18MSS (Andersen et al., 2018) the empirical ocean tide model (EOT19p) by Piccioni et al. (2019). If available, the Dynamic Atmopsheric Correction consists of the ECMWF ERA-Interim reanalysis (DAC-ERA, Carrère et al. (2016)). This product especially reduces along-track sea 160 level errors in the earlier missions (in this study . Because this product is unavailable for the very recent missions, we implement the DAC (Carrère and Lyard, 2003) based on ECMWF for the last cycles of Jason-2 (and its extended mission) and the full Jason-3 and Saral missions. To reduce radial errors in the different missions, the tailored coastal altimetry product is cross-calibrated using the multi-mission crossover analysis (MMXO) global calibration (Bosch and Savcenko, 2007;Bosch et al., 2014). 165 We map all altimetry records on 1-Hz nominal tracks consistent with the CTOH nominal paths (Center for Topographic studies of the Ocean and Hydrosphere, www.ctoh.legos.obs-mip.fr) of the individual missions, using nearest-neighbour interpolation. Then, we scan the data for outliers along the tracks, to hinder spurious extreme values to propagate in time series.
This scheme features: -Absolute thresholds: Any absolute SLA exceeding ±2 m is excluded.

170
-Running median test: If the absolute difference of the data and its running median (centered, over 20 points) is greater than ± 12 cm, data are excluded.
-Consecutive difference test: Outliers are detected when the difference of consecutive points exceeds ±8 cm. The test identifies the outliers according to the differences of the other neighbouring values The absolute thresholds (±12 cm, ±8 cm) correspond to 2-σ of the median running variability and 2-σ of absolute consec-175 utive differences based on the analysis of different tracks of Jason-2 and ERS-2.
SLAs along the same track and cycle are then averaged over predefined areas as described in sections 2.5 and 2.6. We built a time series by considering all averaged SLAs from the along-track multi-mission dataset for the study period. To check for outliers in each SLA time series, we exclude values exceeding absolute values of ±3-σ of the data. This cleaned 1 Hz coastal altimeter dataset is hereinafter called ALES and used for the combination with the TG-datasets described in section 3.1.

Gridded altimetry data -AVISO
The gridded Ssalto/Duacs altimeter product was produced and distributed by the Copernicus Marine Environment Monitoring Service (CMEMS, http://marine.copernicus.eu) and is hereinafter called AVISO as it was previously distributed by CNES AVISO+. We use monthly sea level anomalies, which are resolved on a 0.25 • Cartesian grid and cover the period from 1992-

Monthly tide gauge data -PSMSL
We use monthly mean tide gauge data from the datum controlled PSMSL (Holgate et al., 2013) database. PSMSL constitutes 190 the primary source of tide gauge data for most of sea-level research, or for the assessment of long-term trends of VLM based on SAT and TGs. The service undertakes quality control of the data including checks for consistency of the annual cycle, outlier detection or intercomparisons with neighbouring stations, which enhances the reliability of the data. Among all available stations, we select those which contain at least 180 months (15 years) of valid measurements during the altimetric era (1993 -present), resulting in a total number of 627 stations. We apply the same monthly-averaged DAC-correction as used for the 195 AVISO data (Carrère and Lyard, 2003). To match the DAC-correction with the tide gauge records, we select among the 9-closest grid-points of the solution, the one which results in the highest variance reduction.

High frequent tide gauge data -GESLA
In addition to monthly-mean PSMSL TG-data, we exploit the GESLA dataset (Woodworth et al., 2016), which contains a large global collection of high-frequency TG records with sampling rates ranging from hours down to 6 minutes. The latest version 200 GESLA 2 contains in total 1355 station records and was assembled from a variety of international and national databanks (e.g. UHSLC (University of Hawaii Sea Level Center) and GLOSS) or independent sources. It thus also shares many stations with the monthly PSMSL database, the preferred dataset for VLM SAT-TG computation. Unfortunately, at this time, GESLA holds only data until 2015. Therefore, we also restrict the considered period for all dataset combinations (see section 3.1) to before 2015. As for the PSMSL data, we select stations with at least 180 months of valid data.

205
In contrast to PSMSL data used in WM16, GESLA tide gauges feature no rigorous outlier rejection by default in except that of the primary data providers (Woodworth et al., 2016). Extreme values from strong signals like tsunamis or station shifts and other irregularities are still present in the data. Some of those issues are adressed on the GESLA-webpage, however, for the sake of long-term trend evaluation, we perform a further global outlier analysis. Therefore, we check all tide gauge time series manually for irregularities: Station shifts from seasonal to interannual timescales are either handled by dismissing 210 certain sections of the time series or completely excluding the tide gauge from the analysis. Single extreme events from hourly to monthly timescales are only excluded, when they deviate from the measurements by several meters, because we want to maintain as much data as possible. If such events are present, we flag any values beyond the upper/lower 0.999 quantiles of a fitted normal distribution of the data. Occasionally we apply this quantile-outlier exclusion recursively.
To obtain a uniform temporal resolution, we resample this outlier-free tide gauge set to hourly records by cubic interpolation.

215
The records are then corrected for the tidal signal as well as for the ocean response to atmospheric wind and pressure forcing.
The tidal variability is suppressed by using a 40-h loess (locally estimated scatterplot smoothing) filter (Cleveland et al., 1988) as in Saraceno et al. (2008). In accordance with PSMSL-tide gauge data, we implement the same Dynamic Atmospheric Correction (Carrère and Lyard, 2003). This solution features a 6h sampling frequency, which is therefore down-sampled to hourly anomalies by cubic interpolation. For the global dataset, we obtain a mean variance reduction of 37.8% and a mean 220 correlation of 0.6. As in WM16 and Ponte (2006), we find a distinct latitude dependence of correlations and variance reduction, with decreasing performance in low-latitude regions. We note, that the total variance reduction, which we apply on the high-rate tide gauge data is naturally less than in WM16, who corrected monthly mean, detrended and deseasoned data.

Dataset combinations 225
To understand the sensitivity of the VLM estimations on (1) quality and resolution of the data and (2) the selection procedure, we analyse the performances of four different dataset combinations: ALES-PSMSL-250km, ALES-GESLA-250km, AVISO-PSMSL-250km and ALES-GESLA-ZOI.
The first three combinations are constructed to compare the performances of the along-track (ALES) against the gridded altimetry product (AVISO) combined with monthly tide gauge observations. With ALES-GESLA-250km we also investigate 230 the possible advantage of using the GESLA high rate tide gauge product. For all these experimental sets, SLA time series are merged or averaged within a 250 km radius around the tide gauges, which is thus a selection procedure independent of the actual comparability of SLAs.
To produce the ALES-GESLA-250km dataset, we derive differences of the merged, non-uniformly sampled SLAs and the hourly-sampled GESLA tide gauge records, by cubic interpolation of the latter. We down-sample these high-rate differenced 235 time series to monthly means. For ALES-PSMSL-250km on the other hand, we first compute monthly-means from SLAs and subsequently subtract these monthly SLAs from the monthly-sampled relative SLAs from PSMSL. Finally, we directly compute the differenced SAT-TG time series from the averaged monthly AVISO and the PSMSL data, which yields the AVISO-PSMSL-250km dataset.
Using these combinations, we investigate the mere changes from matching along-track data at high or at low frequency 240 (ALES-GESLA-250km and ALES-PSMSL-250km), or using monthly gridded data (AVISO-PSMSL-250km). Here, 'highfrequency' refers to daily time scales of variability and 'low-frequency' to monthly time scales. The dataset ALES-GESLA-ZOI incorporates further SLA-selection-schemes, which are explained in the following section.

The Zone of Influence
We aim to develop a new SLA-selection scheme, which accounts for the observed coherency of sea level variability. However, due to the diversity of the underlying physical mechanisms and their complex interplay with the coast, the spatial coherency of sea level dynamics is highly variable in coastal regions (Woodworth et al., 2019). Coastally trapped waves, for instance, were argued to establish long-range correlations along the continental slopes (Hughes and Meredith, 2006)  The key concept of our approach is to capture the extent to which coastal altimetry measurements are similar to the in situ tide gauge observations. To do so, we extend the methodology proposed by Santamaría- Gómez et al. (2014), who looked for 255 the altimetry grid point mostly correlated with the tide gauge, and Kleinherenbrink et al. (2018), who considered a larger set of points based on absolute thresholds of correlation. In contrast to these previous studies, we assess the influence of using relative thresholds of comparability on both the accuracy and the uncertainty of the trends.
We exploit combinations of along-track ALES data and high-frequent GESLA records, to identify regions of sea level variability that show maximum coherency with tide gauge observations, which we hereinafter call the Zone of Influence (ZOI).

260
With this approach, our objective is to decrease noise of the differenced, high-frequent VLM SAT-TG time series using the ZOI to hone trends and uncertainties estimates.
To define the ZOI, we investigate different statistical criteria S, which provide a measure of similarity of sea level variability between TG and SAT observations. Here, we use the Pearson correlation coefficient, the RMS SAT-TG as well as as the amplitude of the residual annual cycle between both TG and SAT records. We compute each of those measures for every point of the 1 Hz 265 along-track data (ALES) in combination with the TG records from GESLA. As for ALES-GESLA-250km, tide gauge data are interpolated onto the time step of the altimetry records. Correlations and RMS SAT-TG are computed from the detrended TG and SAT time series. The amplitude of the residual annual cycle is obtained from the remaining seasonal signal of the difference of the time series (SAT-TG). We acquire a dataset, holding information of the performance of multi-mission along-track data in the vicinity of every GESLA TG.

270
To confine the ZOI, we select sub-sets of the data containing the best-performing statistics (i.e. highest correlation, lowest RMS SAT-TG or residual annual cycle) above the Xth-percentile according to the distribution of the statistic S in a 300 km radius around the tide gauges. Every sub-set (X, S) represents an individual ZOI, in which we average SLAs in accordance with the steps involved in the aforementioned 250 km-radius-selection (ALES-GESLA-250km, section 3.1). The high-rate SLA time series (ALES) are then again subtracted from GESLA, providing the ALES-GESLA-ZOI dataset for VLM estimation (section 275 3.3).
Note that in contrast to the 250 km selection, we extend the range in which SLAs are taken into account to 300 km to define the ZOI. The previous 250 km selection is, as in Kleinherenbrink et al. (2018), based on the space auto-correlation scales of SLAs, which reflect characteristic eddy length scales (Stammer and Böning, 1992;Ducet et al., 2000). These scales decrease towards higher latitudes with changing internal Rossby radius. However, several studies found much larger correlation length 280 scales of SLAs, in particular along shorelines (Calafat et al., 2018;Hughes and Meredith, 2006). Other mechanisms than mesoscale eddy activity were investigated to account for these coherent changes. One example is given by Calafat et al. (2018) The obtained coherent structures reveal notable dependencies on the local bathymetric and coastal properties. Figure 1 (a), for instance, shows far-reaching alongshore correlations, which is supported by all of the analysed selection criteria. In this example, the separation of the region of the coastal shelf-sea dynamics from region of offshore variability is in good 295 agreement with the underlying bathymetric gradients. Kurapov et al. (2016) found similarly pronounced SLA coherency along for the Californian coast, as shown in Figure 1b). Based on model data and tide gauge observations, they explained the largescale along-shore correlation pattern in part with the propagation of coastal trapped waves. In other locations such as in Chichijima island (see Figure 1c), coastal and bathymetric control of SL is reduced and different structures of coherency evolve.
Consequently, the ZOI can strongly vary in shape depending on the local coastal features and drivers of coastal variability.  Comparing these three examples, we also observe that absolute values of the statistics differ from site to site. Correlations of along-track data near the Australian coastline, for instance, outperform the ones in example Figure 1b. The same holds for RMS SAT-TG values. These differences not only indicate different degrees of coherency, but can also stem from regional deviations in the quality of data, i.e. quality of tide gauge records or error sources in the altimetric product, such as tidal adjustments or coastal corrections. Differences can also be caused by coastal properties, e.g. when the tide gauges are located 305 in sheltered areas, which separates the in situ variability from the one measured at distant altimeter tracks. We analyse therefore the use of relative thresholds, to select the SLAs, since setting absolute thresholds as in Kleinherenbrink et al. (2018) might not be applicable in all cases. Figure 1c) also shows that different statistics can determine different extents of the ZOI, considering that rather poorly correlated areas are partially characterised by low residual annual cycle amplitudes.
A correct choice of the ZOI based on a sub-set of high performant SLAs can significantly reduce the SAT-TG residuals as are 310 exemplified in Figure 2. Here, we show three time series of SAT-TG differences for the Australian site (see Figure 1a). The first series ( Figure 2a) indicates much lower residual noise, when the time series is constructed from the 20% best SLAs (according to the RMS SAT-TG ). Here, the ALES-GESLA-ZOI residuals outperform those of the other combinations ALES-PSMSL-250km and AVISO-PSMSL-250km, which are still affected by a pronounced annual cycle not related to VLM.
While using relative thresholds can reduce the noise of VLM SAT-TG time series for individual stations, we seek to identify 315 a globally optimal ZOI definition and associated criteria and thresholds, which lead to largest improvements of uncertainties and accuracies of VLM SAT-TG . Therefore, we vary the relative thresholds X between 0.0 and 0.975 (with a stepsize of 0.025), which refers to using 100% and 2.5% of the best performing SLAs according to each criteria. For each threshold and criterion we derive an individual global VLM SAT-TG trend and uncertainty dataset. We validate the performance of the trend estimates for a specific ZOI definition in accordance with section 3.4. Optimal parameter X, S are then suggested for the global application 320 (section 4.2).
When combining altimetry and tide gauges for VLM estimation, several sources can contaminate the differenced time series and inflate the actual 'red'-noise (low-frequency) content in the residuals, which generates auto-correlated signals in the data.
Such sources include next to SLA correction or adjustment errors most predominantly sea level dynamics, which do not reflect 330 the tide gauge observations. Therefore, to avoid underestimation of the uncertainties of the parameters, we take into account auto-correlation in the residuals of the detrended and deseasoned time series. We describe the power spectral density of the noise with a combination of a power-law and a white noise model (using the Hector software (Bos and Fernandes, 2019)).
The power-law process assumes that time-correlated noise power is proportional to f κ , which for negative spectral indices κ describes increasing power at lower frequencies f and a white-noise process when κ = 0 (Agnew, 1992). Gómez et al. (2011) 335 showed that this combination (of power-law and white noise model) represents the best approximation of the noise content for 275 GNSS station position time series. This combination was also implemented in studies concerned with VLM SAT-TG estimation (WM16, Kleinherenbrink et al. (2018); Ballu et al. (2019)). In particular, the spectral index κ can contribute to detect the intrusion of low-frequency signals in the differenced time series. Next to the spectral index κ we estimate the individual fractions of the power-law and white noise models, as well as the total variance σ 2 which scales the amplitude of 340 the noise.

Validation of VLM SAT-TG with VLM GNSS trends
To validate SAT-TG-based trend estimates, we use the ULR6a GPS solution provided by the GNSS data assembly centre  (2000)) ellipsoid. Although there is difference of 70 cm between the 350 semi-major axes of both ellipsoids, the GNSS and SAT vertical trends can be compared without degradation of precision, as both ellipsoids are geocentric and have the same orientation with respect to the Earth's body (e.g., the ellipsoid minor axes coincide with the mean Earth's rotation axis, and the major axes are on the Earth's equatorial plane). We take into account GNSS stations which are closer than 1 km to a tide gauge. With this constraint we aim to avoid potential differential vertical motions between the tide gauge and the GNSS-antenna (WM16). The tide gauge locations and record lengths differ among the presented experimental datasets (section 3.1). Therefore, we define several requirements for the validation of those experimental-datasets, to obtain a consistent set of TG and GNSS validation pairs. In contrast to PSMSL records GESLA-TG observations only last until 2015. Even when PSMSL TG records are limited to before 2015, they still contain more months of valid data than GESLA (see Figure 3b). Hence, we align the time period covered by the PSMSL-TGs to the corresponding GESLA-TGs for all following experimental datasets. Generally, we 360 only take into account SAT-TG time series, when they cover at least 120 months of valid data. Note, that the outlier analysis (section 2.4) or coupling of high-frequent TG data in the ZOI can reduce the length of the SAT-TG time series for GESLA TGs.
Taking into account all these requirements, we obtain 52 common GESLA and PSMSL TGs, which provide a neighbouring We compute the RMS ∆VLM and the median of the differences (∆VLM) of VLM SAT-TG and VLM GNSS for a given dataset combination. We analyse also the median of the absolute value of differences (|∆VLM|). This metric is less prone to extreme deviations and can thus consolidate the evaluation of the dataset performances. We generally assume that GNSS provides a more accurate estimation of the linear component of the VLM with a smaller error than VLM SAT-TG , also despite shorter time

The Zone of influence improves VLM estimates
We investigate how the ZOI selection of SLAs fosters quality of SAT-TG VLM estimates. As addressed in section 3.2, we 405 build the ZOI upon different criteria of comparability: RMS SAT-TG , correlation and the residual annual cycle. First, we focus on the results of using the RMS SAT-TG of the detrended differenced time series (Table 2 and Figure 4, ALES-GESLA-ZOI).
We observe that the RMS ∆VLM , the median of absolute and total differences, as well as trend uncertainties decrease towards higher relative thresholds. The statistics converge to a minimum when the ZOI is restricted to the 30-20% best data. To compare ALES-GESLA-ZOI with the other dataset combinations, we compute the statistics for the same 52 tide gauges used in these 410 configurations (because the shown statistic in Table 2 Figure 2, selecting e.g. highly correlated SLAs efficiently reduces the noise of the The integration of the ZOI primarily reduces uncertainties of VLM SAT-TG trend estimates. Over a considerable range of thresholds (80 -20% of best performing data) trend accuracies do not improve as strongly as the uncertainties decrease. This is in line with Kleinherenbrink et al. (2018), who showed that for a highly correlated sub-set of TGs, increasing absolute correlation thresholds would not significantly reduce the RMS ∆VLM . We thus strive to understand better why trend estimates not always 455 improve when selecting highly comparable (w.r.t. tide gauge) or closely located absolute SLA measurements. This question ultimately leads to the discussion of the importance of identifying the small-scale dynamical components of local sea-level variability, given that long-term absolute sea level trends are large-scale signals.

Space and time dependencies of coastal sea level trends
The results presented in Figure 5 and Table 2 denote average performances for the globally distributed TG-GNSS station 460 pairs for ALES-GESLA-ZOI and support an optimal threshold at 20%. It is however unclear, whether the described optimum threshold for this 'global' selection also reflects the best choice at every considered coastal site. Therefore, we investigate at which relative levels individual VLM SAT-TG and VLM GNSS trends estimates yield the smallest absolute deviations. Postulating that the actual VLM at the TG location is linear and perfectly detected by the GNSS station, these thresholds denote the 'local' optimal levels. 465 Figure 5b displays the distribution of local optimal thresholds for TG-GNSS stations for the ALES-GESLA-ZOI dataset.
Overall, the optimal levels X are broadly distributed from 0 to 0.975. We find highest concentrations between 0.8-0.975, which slightly exceeds the range of the global optimum. At these high thresholds, the mean distance to coast averaged over all SLA measurements in a ZOI is 33 km. Hence, a large proportion of these SLAs heavily profits from demonstrated coastal advancements of the along-track dataset, which are most pronounced in the last 20 km off the coast (Passaro et al., 2015).

470
In contrast to these examples, we find very low local optima for some stations (Figure 5b). Here, local VLM SAT-TG and GNSS trend differences do not converge to a minimum when increasing the comparability of SAT and TG observations. Accordingly, in these cases, vertical land motion estimates do not necessarily benefit from high coastal resolution of the data, because a low relative threshold is simultaneously linked to a larger-scale selection of SLAs (Figure 5d). At the lower level ranges, for instance at 0-0.2, SLAs have an average distance of 95 km to the coast. Supposing that the sources of these larger scales of 475 coherency of coastal SL trends would be known, a more advanced adaption to these additional factors would further increase accuracies of VLM estimates. An associated ideal selection of trends, based on optimal individual levels shown in Figure 5d would largely reduce to RMS ∆VLM to 0.9 mm/year.
To further shed light on the relationships between dynamical-sea-level-based SLA selection and spacial coherence of trends and uncertainties, we show trend and uncertainty maps ( Figure 6) in accordance with those in Figure 1 (displaying maps of sta-480 tistical criteria). Here, we map linear trends and uncertainties onto observed levels of comparability defined by the correlationcriterion. We thus compute these VLM SAT-TG trends over different coastal regions (see further details in the Appendix A). As a result, we observe sharp trend gradients consistent with the degree of comparability: Figure 6a for instance, shows high