Extreme sea levels may cause damage and the disruption of activities in coastal areas. Thus, predicting extreme sea levels is essential for coastal management. Statistical inference of robust return level estimates critically depends on the length and quality of the observed time series. Here, we compare two different methods for extending a very short (

Extreme sea levels (ESLs) can have disastrous consequences in coastal zones in terms of flooding vulnerable assets, loss of lives, and disturbances (Brown et al., 2018; Vousdoukas et al., 2020; Wahl et al., 2017). Coastal floods generally result from a combination of ESLs, wind, waves, tides, and local conditions, including bathymetry and terrain features. Climate change also affects ESL events due to sea level rise and changes in storm frequency and/or intensity (Rutgersson et al., 2022). Reliable estimates of current and future ESLs are urgently needed to mitigate the impacts of disaster risks and to inform adaptation to climate change. Long time series of observed sea levels are essential for improving confidence in statistically inferred return levels (RLs) (Menéndez et al., 2010; Woodworth et al., 2011) and are often considered essential for coastal planning. International initiatives such as the Global Sea Level Observing System (GLOSS) (Caldwell, 2012; Merrifield et al., 2012) and other works (Woodworth et al., 2010) have highlighted this necessity and called for the recovery of historical records in what is known as “data archaeology”. Nevertheless, the temporal paucity of sea level time series (Holgate et al., 2013) remains a limitation for adequately estimating RLs and ESLs in many places.

This technical note evaluates a machine learning method called random forest (RF) (Breiman, 2001) for extending the sea level time series obtained by a tide gauge of interest using a longer time series at a neighbouring tide gauge in the context of analysing sea level extremes. This is particularly relevant when the initial time series is very short, e.g. in the order of

Our study area lies within the Kattegat Basin, located on the western coast of Sweden, around the city of Halmstad (Fig. 1). Here, according to the Swedish Meteorological and Hydrological Institute (SMHI), the highest recorded Swedish sea level of 235 cm was observed in November 2015. This event was mainly due to local conditions leading to a sea level increase of 50–100 cm in comparison with neighbouring stations, such as Viken; however a seiche effect could also have added around 25 cm to the total sea level (Johansson, 2018). In this area, tides vary with an amplitude of around 20 cm during spring tides (Svansson, 1975), and current ESLs are mainly due to storm surge effects. However, other factors could also play a role, such as the preconditioning of the Baltic Sea (Andrée et al., 2023). Hieronymus and Kalen (2020) showed that the Swedish western coast is expected to be one of Sweden's most exposed areas due to rising sea levels.

Different methods have been proposed to extend sea level records. For example, Bernier et al. (2007) used short observation time series associated with a 40-year hindcast surge model. Reconstructions by Cid et al. (2018) were based on tide gauge data and atmospheric conditions. Hieronymus et al. (2019) showed good performance of neural networks with respect to predicting sea levels at tide gauges located along the Swedish coast based on different atmospheric variables and tide gauge records. Granata and Di Nunno (2021) found similar results when forecasting tides in the Venice region using different machine learning methods, including RF, regression tree, and multilayer perceptron. Recently, Bellinghausen et al. (2023) demonstrated the utility of using an RF classifier to satisfactorily predict the occurrence of ESLs at a few stations around the Baltic Sea within 3 d based on surface wind and pressure fields, precipitation, and the pre-filling state of the Baltic Sea.

In the following, we systematically evaluate the performance of RF as means of extending a very short time series of only 10 years and of reconstructing past sea level variations based on a more extended time series from a neighbouring station. This approach is compared to the linear regression approach. Both methods have previously been found to reduce biases efficiently and are relatively computationally inexpensive with low complexity when applied to a small number of input variables, as is the case in this study. To evaluate the sensitivity of the reconstructed sea level with respect to the geographic distance from neighbouring stations, we apply the method to data from different stations. Finally, we consider the method's potential and limitations with respect to the reconstruction of sea level extremes when the time series of interest is very short and inherently provides a poor sampling of even moderately extreme events.

Map of northern Europe indicating the study area and the tide gauge stations (red dots).

The datasets used in the analysis are hourly sea level observations from different stations available from SMHI (SMHI, 2021) and the Danish Meteorological Institute (DMI). Three stations are located on the western coast of Sweden – Ringhals (station no. 2105, “RINGHALS”), Halmstad (station no. 35115, “HALMSTAD SJÖV”), and Viken (station no. 2228, “VIKEN”) – and one station is located on the east coast of Denmark – Hornbæk (Hansen, 2007) (Fig. 1). The distance between Hornbæk and Viken is around 9 km, the distance between Hornbæk and Ringhals is around 130 km, and the distance between Viken and Ringhals is around 127 km (Table 1). The geographical location of the stations is important, as it can change how the water level behaves, for example, stations may be constricted in a channel, such as Viken and Hornbæk. Here, ESLs are defined as the total highest measured sea level including tides and storm surges; this choice is motivated by the low tidal range in the area (Svansson, 1975).

Each hourly time series is first linearly detrended and transformed into a time series of daily maxima from which the annual maximum is determined for each year in the series. When determining the annual daily maximum, we enforce a minimum temporal separation of 2 d to ensure the independence of events at each station. The datasets are of varying lengths (Fig. 2), ranging from 12 years (Halmstad station) to 129 years (Hornbæk station). Long-term linear trends (i.e. sea level rise) were estimated over the whole time series for all stations and found to be 0.33 cm (Ringhals), 0.35 cm (Hornbæk), 1.47 cm (Viken), and 5.51 cm (Halmstad) per decade.

After being detrended, the Hornbæk sea level varied from

Sea level time series from the four tide gauge stations showing daily maximum values (blue) and yearly maximum values (red circles).

The proposed approach for temporally extending short observed sea level time series at the station of interest (

For both approaches, we only use the sea level of the daily maxima at station

Based on each

All coefficients values from the linear fits are positive and fairly close to 1 (0.765–1.12), meaning that a low sea level at one station corresponds to a low sea level at another station; a similar effect is also then found for high and intermediate sea levels. Therefore, the sea level at one station varies at a rather similar rate to the other station, as a coefficient value of 1 would mean that the sea level measured at one station would be increasing or decreasing as the same rate at another station. The x_Hornbæk–y_Halmstad set presents a coefficient that is closer to 1, highlighting a strong correlation between those two stations. Only the x_Ringhals–y_Halmstad and x_Viken–y_Halmstad station pairs present a coefficient higher than 1. This suggests that the sea level at Halmstad varies at a higher pace than at the two predictor stations.

A probabilistic RF model is trained using the sea level at one station as the predictor (

To evaluate the proposed methodology, different analyses with different combinations of stations are used to test the spatial and temporal sensitivity (Table 1). Six analyses using different combinations of station data obtained at Hornbæk, Viken, and Ringhals are carried out using the recent 10 full years (2010–2020) as the common set-up period for model training and validation (see Sect. 2.2.2). Six additional analyses are carried out to predict Viken sea levels from Hornbæk data using the two previous time periods (2000–2010 and 1990–2000) as well as using a 20-year set-up period (1990–2010 and 2000–2020) and a 30-year set-up period (1990–2020) for training and validation to evaluate the temporal sensitivity. All 36 possible combinations have then been analysed to better estimate the spatial and temporal sensitivity. Finally, we compare the reconstructed sea levels at Halmstad using the station data from Hornbæk, Viken, and Ringhals, respectively (see Sect. 3.2), for the period from 2010 to 2020. In the latter case, we also estimate RLs based on the reconstructed time series and compare them to previous results reported for Halmstad.

To assess the performance of each model, different goodness-of-fit (GOF) metrics are chosen: the root-mean-square error (RMSE) and the Pearson correlation coefficient (

To evaluate the model's performance towards the extremes, annual maxima and values above the 95th, 97th, and 99th percentiles from observations are compared with the corresponding predicted values.

Experimental set-up and summary of analyses. The case study of Halmstad city is highlighted in italic.

The RF method estimates the standard deviation associated with the predicted sea level daily maximum at each time point. We denote the following introduced methodology as the “RF method with random sampling”. Based on the RF daily means and standard deviations, we select the corresponding annual maxima from the reproduced time series and their associated standard deviations. We assume that a Gaussian distribution describes the probability for each predicted annual maximum. RLs are subsequently calculated using a generalized extreme value (GEV) distribution fitted to the annual maxima (Coles, 2001). This yields an ensemble of randomly drawn RL curves. The 95th percentile of the ensemble spread is calculated.

We denote

We can then extract the time series of annual maxima from the mean predictions and its associated standard deviation (which we denote as

This method is further compared with the commonly used GEV approach applied directly to the

To validate the models, GOF metrics are calculated (and partly presented in Table 2). For the time series of daily maxima, roughly similar statistics are found for all datasets, irrespective of whether the RF or LR is used. In general, we find slightly (but not significantly) better

Scatter plot between observations and the LR

RMSE and bias values between different datasets evaluated in the validation period. Noticeable improvements (

Table 2 analyses the sensitivity to the distance between the two tide gauges for the 2010–2020 period. When the distance between two stations grows, the accuracy of both models seems to decrease, especially towards the extremes. For example, when looking at daily maxima values as well as the extreme set values, RMSE values of around 8–20 cm and

The highest sea level recorded in Sweden occurred in Halmstad, indicating that Halmstad is highly susceptible to ESLs. However, the length of the local sea level time series is very short. Subsequently, three stations – Hornbæk, Viken, and Ringhals – are used to reconstruct the Halmstad sea level time series (Table 2). As shown above, using an RF or LR method, we can, in principle and with reasonable confidence, reconstruct Halmstad sea levels back until 1891 for the period before observations became available in 2009 using Hornbæk station as a predictor, as it has the longest observed time series.

RLs from each reconstructed time series to predict the sea level for Halmstad based on RF model mean outputs

Because of the short length of the Halmstad time series, the training period is almost identical to the full time series; in practice, this makes it difficult to assess the model behaviour on extremes. Therefore, we used different 2-year testing and 8-year training periods to analyse how the model behaves for Halmstad station (the set-up period was from 2010 to 2020 with different testing periods: 2010–2012, 2011–2013, 2012–2014,..., 2017–2019, and 2018–2020); this has also been done to predict the Halmstad sea level from Hornbæk, Ringhals, and Viken separately. Overall, the difference between each testing period is rather small, with RMSE values ranging from 1.5 to 4.1 cm,

In previous studies, Halmstad's RLs have been calculated for current and future climate scenarios based on reconstructed sea levels from local wind speed observations of the Nidingen offshore station and Viken tide gauge data (Andersson, 2021). For Halmstad, RLs based on extended time series using the three neighbouring stations permit a reduction of the 95th percentile confidence interval (CI) compared with observations. Here, the full-period length of Halmstad's observed values (station

Halmstad's RLs from reconstructed time series using the outputs from the RF method and the RF method with random sampling applied to the Hornbæk (italic) station compared with the assessment by Andersson (2021; bold italic).

Conversely, we apply RF-based random sampling to predict RLs probabilistically, as described in Sect. 2.2.4 (Fig. 4b), at Hornbæk, Viken, and Ringhals (which results in an extended time series of around 120 years, 35, and 45 years, respectively). As would be expected due to the long time series, estimates based on Hornbæk data deliver the best performance and yield what seems like a reasonable 95th percentile ensemble spread (Fig. 4). The inferred RLs are slightly higher than the RLs derived directly from observations, which are associated with a very large 95th percentile CI due to the short length of the time series. The predictions using Viken data present the lowest RLs, with a 95th percentile ensemble spread (upper values) almost corresponding to the median RLs from observations probably underestimating the extremes. On the other hand, predictions from Ringhals result in the highest RLs; however, like Viken, these values are also associated with a rather large ensemble spread. Because of the lengths of the respective time series, there is low confidence in return periods of rare occurrences such as a 200-year event (although this is a little less pronounced for Hornbæk-based predictions). This challenge with respect to rare occurrences is evident when looking at the 95th percentile CIs for each RL curve resulting from the RF method with random sampling. For Halmstad, RLs based on inputs from the Hornbæk station following the RF method with random sampling are close to those reported by Andersson (2021), highlighting the importance of considering the full uncertainty range when predicting high sea levels from a small sample of such events (Table 3).

It is evident that our statistical reconstructions are limited by several factors, in particular local ocean dynamics and the length of the time series used. Both are especially important for extreme analysis. We implicitly assume that a time window of only 10 years is sufficient to describe the relationship between two stations under normal ocean conditions. While this study seems to support this hypothesis, it is by no means assured that this will be the case for any two neighbouring stations, especially when the relationship is found to be highly non-linear. For non-normal situations like ESLs, it is evident that our set-up period is principally much too short to learn the (inherently non-linear) dynamics related to rare sea level extremes and that our modelling essentially yields an extrapolation of the normal ocean dynamics relating two sites, which may introduce significant biases in the subsequent RL estimates. This limitation is general for most, if not all, types of extensions of observed time series using neighbouring data. Even so, it is trivial to assume that non-linear and non-parametric methods like RF outperform other methods in terms of capturing extreme trends within a very short time window.

As indicated earlier, RF is limited in range by the input values. Hence, in principle, this method is not suitable for extrapolating to higher values than what is seen in the training period, as highlighted when predicting the Hornbæk sea levels from the Viken tide gauge based on the 1990–2000 and 2000–2010 set-up periods. This limitation is a known issue when applying RF-based prediction models (Tyralis et al., 2019; Hengl et al., 2018); it can be mitigated to some extent by using many extended time series for model training as new data become available. In this study, we did not find out-of-sample issues to have a strong influence, as the RF model reproduced extremes rather well. Adding additional sources (e.g. observed wind information) may also improve predictions (Johansson, 2018) or reanalysis (Hieronymus et al., 2019). However, these approaches were outside the scope of this technical note, which focused on exploring the limitations and advantages of only using neighbouring observations of sea level. If more complex methods can achieve additional accuracy, this is of course of great value, but it may also confuse the interpretations at times, which is not preferable. In preliminary tests, additional improvements due to adding reanalysis and hindcast data did not appear to add enough value to warrant the decreased interpretability, but this is certainly a promising research area.

Finally, this study focused on a limited area of the Swedish western coast. The methodology is generally applicable, but it is contingent on local conditions; hence, further research is needed to investigate if similar performance can be found when applying the proposed method to other areas with different ocean dynamics.

This study demonstrates that a sea level time series of daily maxima can be relatively successfully reconstructed from a neighbouring station employing the LR or RF approaches using even very short overlapping intervals (10 years). As expected due to the short length of the overlap, ESLs are somewhat underestimated. The RF model is better able to capture the inherent non-linearities and, hence, proves to be more accurate under those conditions. The corresponding absolute bias values are generally lower than those found from the LR. The best reconstructions are generally achieved for stations spatially closer to each other, although this can be partially offset using the RF, which is found to yield better results than the LR for stations located further away from each other. We tested another method that we named the RF method with random sampling in the case of Halmstad. When applied to reconstructed time series from a 10-year dataset, the method confirmed the results from a previous more physically based study, reproducing RLs with a reasonable uncertainty range given by the 95th percentile ensemble spread.

The method is easily applicable to other sites and can also be applied across regions as long as two neighbouring stations' sea level time series are available. Overall, using the RF method with random sampling to represent the uncertainty in extremes could be an advantage compared with many single-output machine learning predictions.

The code used to generate the figures and tables can be acquired by contacting the first author (kevin.dubois@geo.uu.se).

The Hornbæk data used are available upon request from the Danish Meteorological Institute. The data from the different Swedish stations are available from

KD developed the code and conducted the analysis. KD prepared the manuscript with contributions from all co-authors.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

The work forms part of the “Extreme events in the coastal zone – a multidisciplinary approach for better preparedness” project.

This research was funded by the Swedish Research Council FORMAS (grant no. 2018-01784) and the Centre of Natural Hazards and Disaster Science (CNDS). Anna Rutgersson and Erik Nilsson were also partially funded by the Research Council of Norway “Machine Ocean” project (project no. 303411).

This paper was edited by Anne Marie Treguier and reviewed by two anonymous referees.