Variability of distributions of wave set-up heights along a shoreline with complicated geometry

The phenomenon of wave set-up may substantially contribute to the formation of devastating coastal flooding in certain coastal areas. We study the appearance and properties of empirical probability density distributions of the occurrence of different set-up heights on an approximately 80 km long section of coastline near Tallinn in the Gulf of Finland, eastern Baltic Sea. The study area is often attacked by high waves propagating from various directions, and the typical approach angle of high waves varies considerably along the shore. The distributions in question are approximated by an exponential distribution with a quadratic polynomial as the exponent. Even though different segments of the study area have substantially different wave regimes, the leading term of this polynomial is usually small (between −0.005 and 0.005) and varies insignificantly along the study area. Consequently, the distribution of wave set-up heights substantially deviates from a Rayleigh or Weibull distribution (that usually reflect the distribution of different wave heights). In about three-quarters of the occasions, it is fairly well approximated by a standard exponential distribution. In about 25 % of the coastal segments, it qualitatively matches a Wald (inverse Gaussian) distribution. The Kolmogorov–Smirnov test (D value) indicates that the inverse Gaussian distribution systematically better matches the empirical probability distributions of set-up heights than the Weibull, exponential, or Gaussian distributions.

Abstract. The phenomenon of wave set-up may substantially contribute to the formation of devastating coastal flooding in certain coastal areas. We study the appearance and properties of empirical probability density distributions of the occurrence of different set-up heights on an approximately 80 km long section of coastline near Tallinn in the Gulf of Finland, eastern Baltic Sea. The study area is often attacked by high waves propagating from various directions, and the typical approach angle of high waves varies considerably along the shore. The distributions in question are approximated by an exponential distribution with a quadratic polynomial as the exponent. Even though different segments of the study area have substantially different wave regimes, the leading term of this polynomial is usually small (between −0.005 and 0.005) and varies insignificantly along the study area. Consequently, the distribution of wave set-up heights substantially deviates from a Rayleigh or Weibull distribution (that usually reflect the distribution of different wave heights). In about three-quarters of the occasions, it is fairly well approximated by a standard exponential distribution. In about 25 % of the coastal segments, it qualitatively matches a Wald (inverse Gaussian) distribution. The Kolmogorov-Smirnov test (D value) indicates that the inverse Gaussian distribution systematically better matches the empirical probability distributions of set-up heights than the Weibull, exponential, or Gaussian distributions.

Introduction
Global sea level rise in most existing projections of climate change (Cazenave et al., 2014) is often associated with major consequences for the coastal zone . The resulting economic damages to low-lying coastal areas (Darwin and Tol, 2001) may lead to a significant loss of worldwide wellbeing by the end of this century (Pycroft et al., 2016). Global sea level rise, however, contributes only a small fraction to the most devastating coastal flooding. These events, in addition to being economically extremely damaging (Meyer et al., 2013), may also lead to massive loss of life and the destruction of entire coastal communities (Dube et al., 2009).
Devastating flooding is usually caused by the interplay of several drivers, with fundamentally different predictability, and physical, dynamic, and statistical properties. A reasonable forecast of the joint impact of tides, low atmospheric pressure (inverted barometric effect), wind-driven surge, and wave-induced effects requires a cluster of dedicated atmospheric, ocean circulation, and wave models. The resulting high water levels may be additionally amplified by specific mechanisms and events such as tide-surge interactions (Batstone et al., 2013;Olbert et al., 2013), meteorologically driven long waves (Pattiaratchi and Wijeratne, 2014;Pellikka et al., 2014;Vilibic et al., 2014), or seiches (Vilibic, 2006;Kulikov and Medvedev, 2013). In addition, wavedriven effects at the waterline such as wave set-up and run-up (Stockdon et al., 2006) may greatly contribute to the damaging potential of extreme water levels. These phenomena are driven by the momentum carried by waves and have different timescales and appearance. When a wave crest reaches the shore, the resulting short-term inland movement of the water, with a timescale comparable with the wave period, is termed run-up (see Didenkulova, 2009, for an overview and references). In contrast, wave set-up is the increase in the mean water level due to the release of the momentum of breaking waves (Dean and Bender, 2006).
Along with contemporary numerical simulations and a direct search for worst-case scenarios (e.g. Averkiev and Klevanny, 2010), the use of the probabilistic approach is another classic way to quantify the properties of extreme water levels and related risks. The relevant literature contains substantial work on statistical parameters of water level variations (e.g. Serafin and Ruggiero, 2014;Fawcett and Walshaw, 2016), extreme water levels, and their return periods (e.g. Purvis et al., 2008;Haigh et al., 2010;Arns et al., 2013). Similar probabilistic analyses have been extensively applied to average and extreme wave properties (e.g. Orimolade et al., 2016;Rueda et al., 2016a), wave-driven effects at the waterline (Holland and Holman, 1993;Stockdon et al., 2006), and the properties of meteotsunamis (Geist et al., 2014, Bechle et al., 2015. On most occasions, severe coastal flooding occurs under the joint impact of several drivers. This feature generates the necessity to consider multivariate distributions of their properties. Most often, the simultaneous occurrence of storm surges and large waves is addressed (e.g. Hawkes et al., 2002;Wadey et al., 2015;Rueda et al., 2016b). A few studies also include an analysis of joint distributions of significant wave heights, periods, and directions (Masina et al., 2015).
Typical probability distributions of different constituents of extreme water levels may be fundamentally different. The distribution of observed and numerically simulated water levels is usually close to a Gaussian one (Bortot et al., 2000;Johansson et al., 2001;Mel and Lionello, 2014;Soomere et al., 2015). The total water level in semi-sheltered seas with extensive subtidal or weekly-scale variability may contain two components. In the Baltic Sea, one of these (that reflects the water volume of the entire sea) has a classic quasi-Gaussian distribution, whereas the other component (that reflects the local storm surge) has an exponential distribution and apparently mirrors a Poisson process  similar to the non-tidal residual in the North Sea (Schmitt et al., 2018). The probabilities of occurrence of different single wave heights are at best approximated either by a Rayleigh (Longuet-Higgins, 1952), Weibull (Forristall, 1978 or Tayfun distribution (Socquet-Juglard et al., 2005). The probability distribution of run-up heights usually follows the relevant distribution for incident wave heights (Denissenko et al., 2011) but can be approximated by a Rayleigh distribution even if the approaching wave field does not represent a Gaussian process (Denissenko et al., 2013). The empirical probabilities of average or significant wave heights in various offshore conditions usually resemble either a Rayleigh or a Weibull distribution (Muraleedharan et al., 2007;Feng et al., 2014), while Pareto-type distributions are more suitable for the analysis of meteotsunami heights (Bechle et al., 2015).
In this paper we focus on the appearance and properties of empirical distributions of wave-driven local water level set-up. This process, called set-up in the following, may often provide as much as one-third of the total water level rise during a storm (Dean and Bender, 2006) and significantly contribute to extreme sea level events (Hoeke et al., 2013;Melet et al., 2016Melet et al., , 2018. This phenomenon contributes to the overall level of danger in the coastal zone. For example, the baseline level of wave run-up (Leijala et al., 2018) includes the local elevation of water level owing to set-up. The physics of set-up has been known for half a century (Longuet-Higgins and Stewart, 1964). Adequate parameterisations of this phenomenon have been introduced more than a decade ago (Stockdon et al., 2006). Numerous field measurements have been performed to establish its properties (see Dean and Walton, 2010, and references therein) and many models take it into account to a certain extent (SWAN, 2007;Roland et al., 2009;Alari and Kõuts, 2012;Moghimi et al., 2013).
The contribution from wave set-up still provides one of the largest challenges in the modelling of storm surges and flooding (Dukhovskoy and Morey, 2011;Melet et al., 2018). This feature reflects the intrinsically complicated nature of its formation. First of all, the set-up height strongly depends on the approach angle of waves at the breaker line. This angle is well defined only if the coastline is almost straight, the nearshore is mostly homogeneous in the alongshore direction, and the wave field is close to monochromatic (Larson et al., 2010;Viška and Soomere, 2013;Lopez-Ruiz et al., 2014. Generally, this angle is a complicated function of shoreline geometry, nearshore bathymetry, wave properties, and instantaneous water level. Even if the basic properties of wave fields (usually given in terms of significant wave height, mean or peak period, and mean propagation direction) are perfectly forecast or hindcast at a nearshore location, the evaluation of the further propagation of waves is a major challenge, because for example, refraction properties and the location of the breaking line change with the local water level.
Several studies have focused on the maxima of set-up heights over certain coastal areas O'Grady et al., 2015) or the maximum contribution from setup to the local water level extremes (Pindsoo and Soomere, 2015). The problem of evaluation of maximum set-up heights has a relatively simple solution on comparatively straight open-ocean coasts. The nearshore of such coasts is usually fairly homogeneous in the alongshore direction, and the highest waves tend to approach the shore at relatively small angles. These features make it possible to use simplified schemes for the evaluation of the joint impact of refraction and shoaling in the nearshore (e.g. Larson et al., 2010). On many occasions it is acceptable to assume that waves prop-agate directly onshore (O'Grady et al., 2015) or to reduce the problem to an evaluation of the properties of the highest waves that approach the shore from a relatively narrow range of directions . In areas with complicated geometry, and especially in coastal segments where high waves may often approach at large angles, it is necessary to take into account full refraction and shoaling in the nearshore (Viška and Soomere, 2013;Pindsoo and Soomere, 2015).
The formation of high set-up depends on many details of the storms and the impacted nearshore. It does not necessarily exhibit its maximum level in the coastal sections that are affected by the highest waves. The maximum storm surge and maximum set-up usually do not occur simultaneously (Pindsoo and Soomere, 2015). On the contrary, in coastal areas with complicated geometry each short segment may have its own "perfect storm" that creates the highest sum of storm surge and set-up . These observations call for further analysis of the properties of the set-up phenomenon.
As described above, research into the statistical properties of the main drivers of high local water levels and the reach of swash generated by large waves that attack the shore has revealed that the relevant distributions of the magnitude of these drivers are very different. Knowledge of the shape and parameters of such distributions is often crucial in various forecasts and management decisions.
In this paper we address the basic features of empirical distributions of set-up heights along an approximately 80 km long coastal section in the vicinity of Tallinn Bay in the Gulf of Finland, Baltic Sea. The shoreline of the study area has a complicated geometry and contains segments with greatly different orientations. The goal is to identify the typical shapes of the distributions of the probability of occurrence of simulated set-up heights and to analyse the alongshore variability of these distributions.
The paper is organised as follows. Section 2 introduces the method of evaluation of the maximum set-up height for obliquely approaching waves, the method for the reconstruction of long-term wave climate, the forcing data, and the procedure of evaluation of properties of breaking waves based on the output of the wave model. Section 3 presents an insight into the appearance of the empirical distribution of wave set-up heights and its variations along the coast. The Kolmogorov-Smirnov test is applied to estimate the goodness of match of the empirical distribution of set-up heights with common theoretical distributions. Several implications of the results are discussed in Sect. 4. The classic concept of wave set-up (Longuet-Higgins and Stewart, 1962) relates the local increase in the water level with the release of the onshore component of radiation stress in the process of wave breaking. In ideal conditions, the maximum set-up height η max (with respect to the still-water level) created by monochromatic waves with a constant height propagating directly onshore along a planar impermeable beach is (McDougal and Hudspeth, 1983;Hsu et al., 2006) where γ b = H b /d b is the breaking index that is assumed to be constant all over the surf zone, d b is the still-water depth at the breaker line, and H b is the wave height at the breaker line ( Fig. 1). Equation (1) is used in many engineering applications (Dean and Dalrymple, 1991) and studies into the properties of the set-up of waves that approach the shore at a relatively small angle (see Soomere et al., 2013, and references therein). If waves approach under a non-negligible angle θ with respect to the shore normal, the situation is much more complicated. Shi and Kirby (2008) argue that the water level setdown at the breaker line is invariant with respect to the approach angle of waves. The average deviation η of the sea surface from the still-water level within the surf zone of an impermeable planar beach is Shi and Kirby, 2008; the power of γ b in the first term on the right-hand side of their expression being corrected) The last term on the right-hand side of Eq.
(2) represents the water level set-down η b at the breaker line, and θ b is the wave approach direction at breaking. Here d = d (x) represents the water depth measured from the still-water level at a particular distance x from the shoreline, and η is a function of x. The maximum wave set-up η max occurs somewhere inland where the maximum "depth" d max is negative, and the thickness of the water sheet d * = d max + η max = 0 and thus η max = −d max . For this location, Eq. (2) reduces to For shore-normal waves, θ b = 0 and Eq. (3) reduces to a linear equation: In this case the maximum set-up height η max is defined by Eq.
(3) is a quadratic equation with respect to q = d max /d b : This equation can be rewritten as Equation (6) has two negative solutions for physically reasonable values of γ b . The physically relevant solution to Eq. (6) must be bounded and should be almost equal to q ≈ −5γ 2 b /16 for very small approach angles (θ b ≈ 0). Therefore, the expression provides the desired solution. Equation (7) deviates from Expression (30) of Hsu et al. (2006) for reasons discussed by Shi and Kirby (2008). The maximum set-up height for obliquely approaching waves is thus This quantity is simply called set-up height below.

Wave time series in the nearshore of the study area
The study area, an approximately 80 km long coastal segment of Tallinn Bay and Muuga Bay (Fig. 2), is an example of a wave-dominated micro-tidal region. The shoreline is locally almost straight for scales up to 1-2 km. Several relatively straight sections along the Suurupi Peninsula (grid points 1-10 in Fig. 2) and the area of Saviranna (grid points 137-143 in Fig. 2) are open to the north. However, at larger scales (from a few kilometres), the coast contains large peninsulas and bays deeply cut into the mainland. The shores of these landforms are open to different directions and have greatly different wave regimes (Soomere, 2005). As the formation of set-up crucially depends on the wave height and approach direction, this type of coastal landscape makes it possible to analyse the wave set-up distribution for coastal sections with radically different wave climates . The fetch length in the Gulf of Finland is > 200 km for westerly and easterly winds but < 100 km for all other wind directions. The highest significant wave height (5.2 m) in the Gulf of Finland has been recorded at a location just a few tens of kilometres to the north of the study area (Tuomi et al., 2011). The strong winds in this region blow predominantly from the south-west and north-north-west. Easterly storms are less frequent but may generate waves as high as those generated by westerly storms (Soomere et al., 2008). Strong storms with winds from the north-north-west may generate significant wave heights > 4 m in the interior of Tallinn Bay (Soomere, 2005). The varying mutual orientation of high winds, propagation direction of waves, and single shoreline segments makes it possible to identify potential alongshore variations in the distributions of set-up heights.
We employ time series of wave properties (significant wave height, peak period, and propagation direction with a temporal resolution of 3 h) reconstructed using the wave model WAM cycle 4 and one-point high-quality wind information from the vicinity of the study area (a caisson lighthouse at Kalbådagrund, Fig. 2) for 1981-2016 in ice-free conditions. The wave model is implemented in a triple-nested version with the resolution of the innermost grid being about 470 m (Soomere, 2005). The wave height is estimated with an accuracy of 1 cm, wave direction is represented using 24 directions, and wave period is estimated using at least 25 frequencies from 0.042 Hz with an increment of 1.1. The details of model implementation and use of wind information are provided in Appendix A. The shoreline of the study area is divided into 174 coastal segments with lengths of about 500 m (Fig. 2). Each segment corresponds to a nearshore wave model grid cell. Ignoring the presence of sea ice may lead to an overestimation of the overall wave energy in the region.
To roughly estimate the adequacy of reconstructed time series of wave properties, the outcome of the wave model is compared with the results of wave measurements made by The data come from a pressure sensor and do not contain information about wave direction. The accuracy of estimates of significant wave height is ±0.1 m. The description of the location and parameters of the sensor are presented at http://efficiensea.org/files/mainoutputs/wp4/ efficiensea_wp4_27.pdf (last access: 31 August 2020). The data set is available at https://www.emodnet-physics.eu/ map/platinfo/piroosplot.aspx?platformid=8974 (last access: 31 August 2020).
As the wave measurements at Tallinnamadal are available starting from 2012, the comparison is performed for the time interval of 2012 to August 2016. The measured wave properties are compared with the modelled properties at the closest grid point of the sea area represented at 470 m resolution at 59 • 41 N, 24 • 45 E. The sensor is located about 3 km from the border of this area, and the distance between the location of the sensor and the centre of the closest grid cell is 3.34 km. The comparison (Fig. 3) only includes the instants when both measured (green) and modelled (red) significant wave height were available.
The basic properties of wave heights such as the maximum (measured 5.58 m, modelled 5.77 m), mean (measured 0.643 m, modelled 0.697 m) and median (measured 0.40 m, modelled 0.54 m) are represented reasonably. The bias of the model (about 0.05 m) is at the same level as the typical bias for modelled wave properties in the Baltic Sea in the most recent simulations . As our study basically relies on statistical properties of wave fields (the probability of occurrence of seas with different significant wave height, period, and direction), the analysis below, strictly speaking, does not require an exact reconstruction of the sequence of wave events. In this context, the root-meansquare difference of the modelled and measured time series wave heights (0.5 m) is reasonable. This value is about twice as large as in Björkqvist et al. (2018) for the Gulf of Finland (0.20-0.31 m) or northern Baltic Proper (0.26 m) and is comparable to the value of this quantity for the Bothnian Sea (0.31-0.56 m; Björkqvist et al., 2018).
The highest waves in the coastal area over the entire simulation interval are plausible (Soomere et al., 2013, Fig. 3). The maximum wave height in 1981-2012 was 5.2 m in a section that was fully open to one of the directions of the strongest winds from north-north-west. Such wave condi-tions have been measured twice since 2001 in the interior of the Gulf of Finland (Soomere, 2005;Pettersson et al., 2013).
The appearance of the empirical probability distributions of the occurrence of different wave heights is similar for both data sets (Fig. 3). The location and height of the peaks of these distributions (that represent the properties of most frequently occurring waves) have a reasonable match. The model overestimates to some extent the frequency of waves with heights of 0.8-1.5 m and underestimates the frequency of highest waves in the area (> 2 m). The overall appearance of the distribution for modelled wave heights resembles a Weibull distribution. The empirical distribution for measured wave heights higher than 0.4 m better matches an exponential distribution and exhibits much larger variability in the frequency of very high (> 3 m) waves.

Nearshore refraction and shoaling
The nearshore grid cells selected for the analysis (Fig. 2) are located in water depths of ≥ 4 m in order to avoid problems with reconstruction of wave heights under possible intense wave breaking in these cells during strong storms. Some of the cells are located in much deeper water at a depth of 20-27 m. The nearshore of the study area contains various underwater features and bottom inhomogeneities. This means that shoaling and refraction may considerably impact the wave fields even along the relatively short paths (normally ≤ 1 km in our model set-up) from the model grid cells to the breaker line. The predominant wind directions during strong storms are from the south-west, north-north-west, and west (Soomere et al., 2008). Consequently, high waves often approach some of the selected grid cells at large angles with respect to the shore normal. As a result, oversimplified approaches to replicate the changes in wave properties in the immediate nearshore (Lopez-Ruiz et al., 2014 and even advanced approximations of refraction and shoaling (Hansen and Larson, 2010) may fail.
For this reason, we calculate the joint impact of shoaling and refraction of approaching waves in the framework of linear wave theory. We assume that the numerically evaluated wave field for each time instant is monochromatic. The wave height is characterised by the numerically simulated (significant) wave height H , peak period T p , and mean propagation direction θ (clockwise with respect to the direction to the north). These properties are evaluated at the centre of each selected grid cell. The significant wave height at this location is denoted as H 0 . The approach direction θ 0 at this location with respect to the onshore-directed normal to the shoreline is calculated from θ based on an approximation of the relevant (about 500 m long) coastal segment by a straight line that follows the average orientation of the shoreline in this segment. It is assumed that the nearshore seabed from the centre of each grid cell to the waterline is a plane with isobaths parallel to this straight line. Finally, we assume that breaking waves are long waves. Then the wave height H b at the breaking line can be found as the smaller real solution of the following algebraic equation of sixth order Viška and Soomere, 2013): Here c g is the group speed, c f is the phase speed, and the subscripts 0 and b indicate the relevant value at the centre of the particular wave model grid cell and at the breaker line, respectively. The phase and group speed at the wave model grid cell are estimated based on the standard expressions of linear theory using the wave number k evaluated from the full dispersion relation for linear monochromatic waves, 2π/T p = √ gk tanh kd 0 , with the period equal to the peak period T p , and the water depth d 0 equal to the model depth for the particular grid cell. The set of assumptions is completed with the common notation that the breaking index is (Dean and Dalrymple, 1991).
The procedure of evaluation of set-up heights is thus as follows. We start from the numerically simulated significant wave height H 0 , peak period T p , and mean propagation direction θ with respect to the north. Next we calculate the phase (c f0 ) and group speed (c g0 ) of such waves at the model grid cell and find the propagation direction θ 0 with respect to the shore normal. Equation (9) is employed subsequently to evaluate the changes to the wave height owing to refraction and shoaling as the wave travels from the model grid point to the breaker line. In essence, it links the model output H 0 (given in terms of significant wave height) with the height H b of breaking waves and also makes it possible to evaluate the breaking depth from the definition of the breaking index. The phase and group speed at the breaker line are estimated from the dispersion relation for long waves: c gb = c fb = √ gd b . The wave approach direction θ b at the breaker line with respect to the shore normal is calculated from Snell's law: sin θ/c f = const. Thereafter we employ Eqs. (7) and (8) to find the set-up height for the particular time instant.
Several earlier studies of extreme set-up heights Pindsoo and Soomere, 2015) took into account only waves that propagated at small angles (not larger than ±15 • ) with respect to the shore normal and ignored the correction expressed in Eqs. (7) and (8) for waves that approached at a nonzero angle. This approach is denoted S2013 below.

Maximum set-up heights
The phenomenon of wave set-up is only significant if large waves propagate towards the shore. This is usually the case on open-ocean coasts where swells almost always create setup. The situation may be different in sheltered sea areas with complicated geometry where the wind wave energy may, on  (1)-(8) (red circles) and similar heights evaluated using only those waves that approach the shore at an angle of less than ±15 • from shore normal (blue diamonds). average, propagate even in an offshore direction. The wind regime of the study area is a superposition of four wind systems (Soomere et al., 2008). The most frequent wind direction is from the south-west (i.e. from the mainland to the sea). The proportion of wave fields that propagate onshore is 40 %-70 % along the study area (Fig. 4). The properties of set-up heights discussed below thus represent 40 000-70 000 examples of wave fields in each coastal segment. The only exception is grid cell 107 (Figs. 2, 4) between the Viimsi Peninsula and the island of Aegna that is sheltered for almost all wind directions.
We start from a comparison of maximum set-up heights evaluated using the above-described approach and the method employed in S2013. The two sets of estimates differ insignificantly (by less than 0.1 m) in about 80 % of the coastal segments (Fig. 5). The alongshore variations in the maxima of set-up heights evaluated from Eq. (8) are considerably smaller than those estimated using the approach of S2013. The largest examples of set-up heights reach 1 m and a majority of maximum set-up heights for single coastal sections are 0.6-0.8 m in both sets of estimates.
The largest differences between the two sets become evident in segments that are sheltered from predominant storm directions and most notably in deeply cut bays. Estimates based on Eqs. (7) and (8) are often remarkably (by up to 50 %) higher in these segments than those derived using S2013. This feature signals that the highest waves approach the shore at a relatively large angle in such segments.
In other words, refraction often redirects wave energy so that even beaches that are seemingly well sheltered geometrically may at times receive remarkable amounts of wave energy (cf. Caliskan and Valle-Levinson, 2008). The differences in the maxima of set-up heights evaluated using the two approaches for such coastal sections are often 0.2-0.3 m and reach up to 0.5 m. Such a strong impact of refraction is thought to be responsible for a local increase in wave heights in the Baltic Sea (Soomere, 2003) and also in extreme ocean conditions (Babanin et al., 2011). The processes that are not resolved by the wave model WAM such as reflection and diffraction may add even more wave energy to seemingly sheltered coastal segments.
On the contrary, S2013 overestimates the maximum set-up height in a few locations at headlands that are fully open to the Gulf of Finland (Fig. 5). A likely reason for this feature is the sensitivity of the formation of set-up with respect to the approach angle of waves. The magnitude of set-up decreases with an increase in the approach angle. This decrease is ignored in S2013, and the height of set-up created by waves that approached at angles of 10-15 • with respect to shore normal was overestimated.
Some differences between the results presented in this paper and those described in S2013 and Pindsoo and Soomere (2015) stem from the use of different time intervals in these papers. Simulations for 1981-2012 indicate that the maximum set-up heights in coastal areas open to the east were mostly created in the 1980s  even though the maximum wave heights occurred much later, starting from the mid-1990s. This feature may be related to a change in the directional structure of strong winds with easterly storms being relatively weak for about 2 decades. There is increasing evidence that strong easterly storms have returned to the area. For example, an event with a significant wave height of 5.2 m was recorded in the Gulf of Finland during an extreme easterly storm on 29-30 November 2012 (Pettersson et al., 2013). Pindsoo and Soomere (2015) observed that many new all-time highest set-up events apparently occurred in coastal segments open to the east since 2012. This process has led to the generation of maxima of simulated wave heights at a number of locations on the eastern Viimsi Peninsula (grid cells 108-130 in Fig. 2) in 2013 (Fig. 6).

Frequency of occurrence of set-up heights
As the modelled wave heights are evaluated with an accuracy of 1 cm, using even finer resolution for the construction of empirical probability distributions is not justified. The total number of entries in the time series of positive set-up heights is 40 000-70 000. Therefore, using the resolution of 1 cm ensures that at least a few examples will belong to the relevant "size classes" of set-up heights down to frequencies of about 10 −2 %. This range of frequencies is apparently large enough to estimate the main properties of the distributions in question.
The shape of empirical distributions of the occurrence of set-up heights varies extensively in the study area (Fig. 7). It matches an exponential distribution in the majority (about 75 %) of the model coastal segments. 1 Such a distribution is represented by a straight line in semilogarithmic (log-linear) 1 An early discussion version of this paper (available at https://doi.org/10.5194/esd-2016-76; Soomere and Pindsoo, 2017) contained a bug in the script for the calculation of set-up heights and for the subsequent evaluation of the parameters of their probability density function. This bug led to an erroneous conclusion about the frequency of occurrence of various kinds of distributions of set-up heights in different coastal sections as well as to severe coordinates used in Figs. 7 and 8. It apparently reflects a background Poisson process that also describes storm surges in the study area . This distribution appears in coastal segments that are open to the common strong wave directions. For example, grid point 23 (Fig. 7a) is open to the north-west and north, grid point 96 (Fig. 7c) is located near the western shore of Viimsi Peninsula and is open to the west and north-west, and grid point 145 (Fig. 7f) is widely open to the directions from the north-west to the north-east.
The distribution is convex upwards at a few locations that are sheltered from most of the predominant approach directions of strong waves, including north-north-west. This shape is evident in the most sheltered location of eastern Kopli Bay (grid point 43, Fig. 7b) and to a lesser extent in coastal sections sheltered by the island of Aegna (grid point 106, Fig. 7d). The relevant empirical distributions of set-up heights can be reasonably approximated by a twoparameter Weibull or Gaussian distribution. They both have a convex upwards shape in log-linear coordinates.
A subset (about one-quarter) of the presented distributions exhibit a different and clearly concave upwards shape in loglinear coordinates. This feature is evident in coastal sections that are sheltered from a few (but not all) predominant wave directions. Strong winds blow in this region usually from three directions: west, north-north-west, and east (Soomere et al., 2008). The segments that exhibit a concave upwards shape of the distribution (e.g. grid point 129, Fig. 7e) are mostly sheltered against waves that approach from the west or north-north-west but are open to the east. However, segments that are widely open to all strong wave directions (e.g. grid point 1, Fig. 8) may also exhibit a concave upwards shape of the empirical distribution of set-up heights.
This concave upwards appearance clearly differs from the shape of the usual distributions of the magnitude of wave phenomena (Fig. 8) such as the classic (Rayleigh) distribution of single wave heights (Longuet-Higgins, 1952), the Tayfun distribution of the heights of largest waves, the Weibull family of distributions for the occurrence of various wave conditions, or the Rayleigh distribution for run-up of (narrow-banded) Gaussian wave fields (Didenkulova et al., 2008).
To further explore the shape of the distributions of setup heights and their possible variations along the shoreline, we assume that these distributions belong to the family of exponential distributions. The overall appearance of empirical distributions in log-linear coordinates (Fig. 7) suggests that their shape can be, as a first approximation, matched with a quadratic polynomial az 2 + bz + c, where z is the setup height. In other words, the empirical probability density P (z) in log-linear coordinates used in Figs. 7 and 8 is approximated by the following function: overestimation of the frequency of matches of these distributions with an inverse Gaussian distribution.
In the case a = 0, distribution (10) reduces to a classic exponential distribution ∼ A exp(bz). The values a < 0 correspond to convex upwards distributions that eventually can be approximated by a Weibull or Gaussian distribution, whereas a > 0 indicates that the distribution is locally concave upwards. A more important difference between distributions with a = 0 and a = 0 is in the nature of their tails. If a = 0, the probability of large set-up heights decreases as ∼ A exp(bz) when z 1, while in the case of a < 0 this probability decreases much faster as ∼ A exp az 2 . It is often said that the former distribution has a heavy tail (and a comparatively large probability of very large set-up heights), whereas the latter distribution has a light tail (and a lower probability of very large set-up heights). The case a > 0 is only possible locally for a certain range of set-up heights and serves as an indication that some relatively large set-up heights are more probable that expected from distributions with a ≤ 0. Such a fitting procedure is not straightforward for several reasons. Firstly, the number of nonzero points of the distributions in Fig. 7 is highly variable along the study area, similar to the variation in the typical magnitude of the setup. Secondly, the relevant empirical distributions have gaps for some value(s) of the set-up height. A natural reason for this feature is that we are looking at very low probabilities (down to 0.001 %, i.e. a few occasions) of occurrence of relatively high set-up events in the period 1981-2016. Thirdly, a few locations have several outliers -remarkably high set-up events that do not follow the general appearance of the empirical distribution of set-up heights for the particular location (Fig. 7b, d, f). Such events apparently reflect severe storms in which the wind pattern was favourable for the development of very large waves that approached a certain coastal segment at a small angle. The presence of similar outliers is characteristic, for example, for time series of sea level in Estonian waters (Suursaar and Sooäär, 2007) and is associated with situations when strong storms blow from a specific direction.
To estimate the impact of these aspects on the results, we performed three versions of the fitting procedure. Firstly, we used all data points in the relevant distributions starting from the height of 0.01 m to evaluate the coefficients a, b, and c. Secondly, we used for the same purpose only set-up heights from 0.01 to 0.4 m (Fig. 7). This approach was not applicable in some locations where set-up heights did not reach 0.4 m. Thirdly, we evaluated these coefficients starting from the height of 0.01 m to the first gap in the empirical distribution (the lowest set-up height that did not occur in 1981-2016). Doing so made it possible to check whether the shape of the distribution is governed by the majority of events or if it is dominated by the presence of a few very large set-up heights (Fig. 8).
The particular values of the coefficients a, b, and c depend to some extent on the chosen version (Fig. 7). The shape of the approximate distribution is invariant with respect to the particular choice. All distributions also match reasonably well data points corresponding to the largest setup heights. The differences between the resulting theoretical distributions are mostly insignificant. The relevant estimates of the coefficients a, b, and c are located almost in the middle of the 95 % confidence intervals of each other (Fig. 7). All coefficients of the quadratic approximation of the exponent vary insignificantly along the study area. This is remarkable, because the shape of the relevant Weibull distribution (and thus the shape parameter of this distribution) for different wave heights varies considerably in the study area (Soomere, 2005).
The coefficients a at the leading term of the approximating polynomial (Figs. 9, 10) are mostly very small in the range of −0.005 to 0.005. Their 95 % confidence intervals normally include the zero value. This feature indicates that on most occasions the parameter a can be set to zero and the distribution of set-up heights can be reasonably approximated with an exponential distribution at a 95 % level of statistical significance. On such occasions, the entire process can be adequately approximated by a Poisson process, and the parameter b characterises the vulnerability of the particular coastal segment with respect to the set-up phenomenon similar to the analysis of storm-driven high water levels .
A few outliers of the parameter a in relatively sheltered coastal segments were negative and reached values down to −0.08 (Fig. 9). These values correspond to distributions with convex upwards shape in semilogarithmic coordinates and are thus qualitatively similar to the family of Gaussian or Weibull distributions.
The values of c are all positive and mostly in the range of 2.5-4 (Fig. 9). The values of parameter b are, as expected, almost everywhere negative, concentrated around −0.2, and typically varying between −0.1 and −0.4. A few locations with positive values of this parameter correspond to large negative values of a.

Comparison of fits to classic distributions
Importantly, in about a quarter of the coastal segments in the study area, the leading term a of the quadratic polynomial az 2 + bz + c is positive at a 95 % significance level. Note that this estimate is valid for single points only and is not applicable for the entire set of such points. A positive leading term corresponds to the concave-up appearance of the relevant distributions of set-up heights for a certain range of these heights. This means that large set-up events may be systematically much higher and/or occur much more frequently than one could expect from the Gaussian, Weibull, or Poisson-type statistics. The described features indicate that the empirical distribution of set-up heights can be, at least locally, approximated using an inverse Gaussian (Wald) distribution with a probability density function (Folks and Chhikara, 1978): For a certain set of parameters λ (the shape parameter) and µ (the mean), a part of the graph of this function has a concave upward shape in semilogarithmic coordinates (Fig. 8) and may locally well approximate the empirical distributions of set-up heights at the relevant locations. For large values of z, this function behaves similar to the probability density function of an exponential distribution. In the above-mentioned notation, it has a "heavy" tail and signals that large set-up heights are more frequent than for a Gaussian or Weibull process.
To shed more light on which theoretical distribution is most suitable for the description of the probabilities of occurrence of wave set-up heights, we fitted the distributions using the R packages "fitdistrplus" for fitting any distribution to the data (Delignette-Muller and Dutang, 2005;v. 1.0-14), "actuar" (v. 2.3-3) for fitting with the inverse Gaussian distribution, and "goft" (v. 1.3.4) to make an initial guess of the parameters of the inverse Gaussian distribution under R version 3.6.1.
As an example of the appearance of the fit by different theoretical distributions, we consider the westernmost point of the study area presented in Fig. 8. The fit was performed for the set-up values in the range from 0.01 to 0.4 m (Fig. 11).
While the inverse Gaussian fit performs well (Fig. 11a), exponential, Gaussian, and Weibull distributions (Fig. 11b) do not replicate the location and height of the maximum of the empirical distribution and inaccurately follow it for larger set-up values. A variation in the cut-off values leads to a certain change in the parameters of the fitted distribution, but the inverse Gaussian fit remains appropriate for all cut-off values.
The goodness of fit of the cumulative distribution function (CDF) with different theoretical distributions is much better (Fig. 12). Still, the empirical data do not follow a Gaussian distribution, whereas both Weibull and exponential distributions provide an acceptable match to the observed distribution. The inverse Gaussian distribution shows the best agreement among the tested CDFs. As expected, the differences between the empirical and theoretical distributions are relatively small for larger set-up heights but become more evident for the frequently occurring set-up heights (Fig. 12).
We applied a Kolmogorov-Smirnov test to clarify which theoretical distribution describes the data in question best.  The best fit of the same data set with exponential (rate 12.44 ± 0.06 m −1 ), Gaussian (mean 0.0804 ± 0.0003 m, SD 0.0700 ± 0.0002 m), and Weibull (shape parameter 1.247 ± 0.004, scale parameter 0.0867 ± 0.0003 m) distributions is also shown.
The smaller the D value (Kolmogorov-Smirnov statistic), the smaller the difference between the distributions. The corresponding D values are 0.039 for the inverse Gaussian distribution, 0.067 for the Weibull distribution, 0.120 for the exponential distribution, and 0.158 for the Gaussian distribution. Even though an inverse Gaussian distribution best matches the empirical distribution of set-up heights at this location, the hypothesis that the data and random points from the fitted inverse Gaussian distribution are from the same distribution was rejected, even at an 80 % confidence level. The probability that the empirical distribution represents a data set with one of the other tested theoretical distributions was much smaller.
This proportion of the "remoteness" of all four distributions from the empirical probability distribution of set-up heights persists for the entire study area and different cut- off values (Fig. 13). The cut-off values were tested in the range from 40 % to 95 % of the maximum set-up height in a particular coastal section. In terms of D values of the Kolmogorov-Smirnov test, an inverse Gaussian distribution systematically shows a better approximation to the empirical distribution of set-up heights than all other distributions. The Weibull distribution overall provides a slightly worse fit except for several (< 7 %) coastline locations (Fig. 13). An exponential distribution, even though it seems to follow the empirical distribution closely on most occasions, has a smaller probability of describing set-up heights adequately. This feature reflects the fact that exponential distributions are a particular (one-parameter) case of the family of (two-parameter) Weibull distributions. in this direction include the use of high-quality wind measurements for wave modelling, systematic calculation of the joint impact of refraction and shoaling from the wave model grid cells to the breaker line, and implementation of an appropriately corrected evaluation scheme of set-up heights for obliquely approaching waves. The core message from the analysis is that the empirical probability distribution of set-up heights, z, can usually be fairly well approximated by a standard exponential distribution, exp (−λz), even in very differently oriented segments of a coastal section with complicated geometry. When the exponent function of the general exponential distribution is approximated using a quadratic function, i.e. az 2 + bz + c, the coefficient of its leading term does not differ from zero at a 95 % significance level for more than three-quarters of the coastal segments of the study area. As the study area contains a variety of sections with different orientations and with radically different wave properties, it is likely that the qualitative shape of the distribution of set-up heights only weakly depends on the properties of the local wave climate. This conjecture is supported by a fairly limited alongshore variation in the coefficients b ≈ −0.2 ± 0.1 and c ≈ 3.25 ± 0.75 of this approximation (Fig. 9).
Another important message is that the basic shape of this distribution function is concave upwards in a log-linear plot for a substantial number of coastal segments. In these seg-ments, a > 0. The concave upwards local shape of such distributions can be adequately approximated with an inverse Gaussian (Wald) distribution. Similar to the exponential distribution, the probability density function of an inverse Gaussian distribution decays as exp (−λz) for z 1. Even though the absolute values of the coefficients of the leading term of such a quadratic approximation are relatively small, the fit with other classic distributions, such as Gaussian or Weibull distributions (Fig. 8) that decay as exp −λz 2 for z 1, is systematically worse. Therefore, even though there is no rigorous evidence that the empirical data follow an inverse Gaussian distribution, it provides the closest fit to the data compared to the other three tested distributions.
This result is intriguing, because applications of inverse Gaussian (Wald) distributions are scarce in descriptions of geophysical phenomena. Perhaps the most well-known example of the use of a Wald distribution is to describe the time something undergoing Brownian motion (with positive drift) takes to reach a fixed positive level. Other examples include statistical properties of soil phosphorus (Manunta et al., 2002), long-distance seed dispersal by wind (Katul et al., 2005), and some models of failure (Park and Padgett, 2005).
A comparison of Figs. 5 and 9 indicates that the variations in the leading coefficient a of the approximating quadratic function are uncorrelated with the values of maximum setup heights. It could thus be hypothesised that the locations where an inverse Gaussian distribution governs the properties of set-up heights appear because of a specific match of the directional structure of winds and the orientation of the coastline. This feature also signals that the basic features of the distribution of set-up heights are only weakly (if at all) connected with the severity of the local wave climate. This conjecture is supported by comparatively small variations in the values of other parameters in the polynomial approximation (Fig. 9b, c).
A subtle implication from the qualitative match of statistics of set-up heights with an inverse Gaussian distribution is that set-up events with heights close to extreme heights may be much more frequent than their estimates based on classic Gaussian or Weibull statistics and also clearly more frequent than similar estimates for Poisson processes. This increase in the probability of large wave set-up events is "balanced" by a similar decrease in the relative number of events with an average magnitude compared to normally or Weibulldistributed events. The described features basically indicate that the frequency and role of close-to-extreme set-up events (and their contribution to damages and economic losses) may be underestimated based on observations of similar events of average height. In particular, severe set-up events may occur substantially more frequently than might be expected from the probability of the occurrence of severe seas. The empirical distribution of measured wave heights (Fig. 3) provides an estimate of statistical properties of wave heights at the border of the study area. This distribution largely follows an exponential distribution for 1.2-3.2 m high waves and is slightly convex upwards for 0.5-1.5 m high waves. The appearance of the distribution of modelled wave heights in the offshore (Fig. 3) is clearly convex upwards in the range of relatively frequent wave heights of 0.5-1.7 m. It would thus be natural to expect that this property also becomes evident in set-up heights that largely follow breaking wave heights. This is, however, not the case: the appearance of the distributions of set-up heights is either an approximately straight line or a line that is locally concave upwards.
The difference in the shapes of distributions for modelled and measured wave heights suggests that the approximations employed to evaluate wave properties (Appendix A) are not responsible for the presence of concave upwards distributions of set-up heights. On the contrary, a natural conjecture from Fig. 3 is that, ignoring ice cover, the particular method for evaluation of wave properties and the use of discontinuous (3 h) one-point wind data have at least partially supported the convex upwards shape of the distribution of offshore wave heights compared to measured data.
Some of the introduced assumptions -such as the ideal plane and rigid seabed, the presence of a dry coast without any vegetation or sources of bottom shear stress, and ignoring the wave period (or steepness) and the particular value of the coastal slope (and thus wave reflection) -in the calculations are not fully realistic. In other words, it is assumed that all coastal segments are (i) favourable for the formation of high set-up and (ii) approximately homogeneous alongshore. These assumptions are only valid for some segments of the study area. They all generally lead to an overestimation of set-up heights. The presence of vegetation or the impact of bottom shear stress leads to a decrease in the set-up height by a constant factor (Dean and Bender, 2006). It is therefore likely that the use of these assumptions mainly leads to a stretch of the resulting distributions of set-up heights towards larger values but does not modify their basic shape.
It might be expected that the impact of other simplifications -such as assuming monochromatic wave fields, using a constant value of the breaking index, and employing a longwave approximation for breaking waves -generally emphasises the role of approach directions. Therefore, it could be hypothesised that the set of assumptions used makes the established features more noticeable than they would be for real wave fields.
Finally, we note that the presented results do not require any modification of the classic estimates of extreme values of set-up heights and their return periods based on, for example, the block maximum method. Namely, the limiting distributions of independent block maxima follow either a Gumbel, three-parameter Weibull, or Fréchet distribution, notwithstanding the distribution of the underlying values (Coles, 2004). This general theorem is obviously also valid for any time series that follows an inverse Gaussian distribution.

Conclusions
The shape of empirical probability distributions of set-up heights is evaluated for different segments of a coastal section with complicated geometry around Tallinn Bay in the Gulf of Finland, Baltic Sea. This distribution commonly follows an exponential distribution for a wide range of probabilities. The leading term of the relevant approximation of the exponent with a quadratic polynomial az 2 +bz+c, where z is the set-up height, is nonzero at a 95 % level of statistical significance in about one-quarter of the segments. In these segments, the coefficient a is positive, and the distribution approximately follows an inverse Gaussian (Wald) distribution. According to Kolmogorov-Smirnov statistics (D value), the inverse Gaussian distribution systematically better matches the empirical probability distributions of set-up heights than two-parameter Weibull, exponential, or Gaussian distributions.
Appendix A: Implementation of the wave model Time series of significant wave height, peak period, and propagation direction were reconstructed using the wave model WAM cycle 4 and one-point high-quality wind information. The wave model is implemented in a triple-nested version with the resolution of the innermost grid being about 470 m as in Soomere (2005). The regular rectangular grid of the coarse model covered the whole Baltic Sea with a step of 3 × 6 (3 along latitudes and 6 along longitudes, i.e. about 3 nmi, nautical miles). The full wave spectrum from the coarse model was used as boundary conditions for the first nesting with a similar regular grid with a step 1 × 2 (about 1 nmi). This grid covered the interior of the Gulf of Finland to the east of 23 • 18 E. The bathymetry of these two models was based on data from Seifert et al. (2001) with a resolution of 1 ×2 . The full wave spectrum from the model for the Gulf of Finland was used as boundary conditions for the second nesting. The study area (Fig. 2) was covered by a regular grid with a step of 0.25 ×0.5 between 24 • 28 and 25 • 16 E and to the south of 59 • 41 N. The bathymetry for the second nesting was constructed based on maps issued by the Estonian Maritime Board with a typical resolution of about 200 m. Wave height is presented in terms of significant wave height H . This quantity, often denoted as H S , is defined as H = √ m 0 , where m 0 is the zero-order moment of the one-dimensional wave spectrum, and it is estimated with a resolution of 1 cm. We applied 24 evenly spaced directions and 25 frequencies ranging from 0.042 to 0.41 Hz with an increment of 1.1. Experience with this model in the Baltic Sea and Finnish archipelago indicates that it is important to adequately represent wave growth in low wind and short fetch conditions (Tuomi et al., 2011(Tuomi et al., , 2012. To meet this requirement, in calculations with the wind speed below 10 m s −1 we applied an increased frequency range of waves up to 2.08 Hz in order to correctly represent wave growth in low wind conditions after calm periods (Soomere, 2005).
We employed a simplified method for rapid reconstruction of long-term wave climate. The computations are speeded up by replacing exact calculations of wave generation, interactions, and propagation by an analysis of precomputed maps of wave properties for different wind speed, direction, and duration. This simplification relies on a favourable feature of the local wave regime, namely that wave fields rapidly become saturated and have relatively short memory in the study area (Soomere, 2005). Consequently, a reasonable reproduction of wave statistics is possible by the assumption that an instant wave field in Tallinn Bay is a function of a short time period of wind dynamics. This assumption justifies splitting the calculations of time series of wave properties into independent sections with duration of 3-12 h. The details of the model set-up, bathymetry used, and the implementation and validation of the outcome have been repeatedly discussed in the literature (Soomere, 2005;Soomere et al., 2013).
As nearshore wave directions in areas with complex geometry and bathymetry may be greatly impacted by local features, it is crucial to properly reconstruct the offshore wave directions (Björkqvist et al., 2017). This is only possible if the wave model has correct information about wind directions. To ensure high quality of the input wind information, we use wind data from an offshore location in the central part of the gulf. The wind recordings at Kalbådagrund (59 • 59 N, 25 • 36 E; a caisson lighthouse located on an offshore shoal) are known to represent marine wind properties well (Soomere et al., 2008). Even though this site is located at a distance of about 60 km from the study area, it is expected to correctly record wind properties in the offshore that govern the generation of surface waves in the open sea. The wind measurements are made at Kalbådagrund at the height of 32 m above the mean sea level. Height correction factors, to reduce the recorded wind speed to the reference height of 10 m used in wave models, are 0.91 for neutral, 0.94 for unstable, and 0.71 for stable stratifications (Launiainen and Laurila, 1984). To the first approximation, the factor 0.85 was used in the computations similarly to Soomere (2005) and Soomere et al. (2013).
Wind properties at Kalbådagrund were recorded starting from 1981 once every 3 h for more than 2 decades, but since then they have been filed at a higher time resolution. To ensure that the forcing data are homogeneous, we downsampled the newer higher-resolution recordings by selecting the data entries once in 3 h. The entire simulation interval 1981-2016 contained 103 498 wind measurement instants with a time step of 3 h. In about 9000 cases (less than 10 % of the entire set) either wind speed or direction was missing. These time instants were excluded from further analysis. As some of these instants involved quite strong winds, our analysis may underestimate the highest wave set-up events for some segments of the shore. However, as we are interested in the statistical properties of most frequently occurring set-up heights and in the alongshore variations of these properties, it is likely that omitting these data does not substantially impact the results.
Code availability. The code is available from the authors upon request (MATLAB and R scripts).
Author contributions. TS designed the study, derived the equations and approximations used in the paper, produced Fig. 5, compiled the introduction and discussion, checked the consistency of the results, and polished the text. KP developed the scripts, ran the simulations, produced most of graphics, and drafted the parts of the article body. NK performed testing of fits of empirical data with different theoretical distributions, wrote the relevant parts of the text, and prepared Figs. 11-13. ME recalculated the graphics, performed a comparison of the properties of the reconstructed waves with measured waves near the study area, evaluated the distributions of set-up heights for time series of wave properties measured in the northern Baltic proper, drafted the relevant parts of the article, and produced updated maps.
Competing interests. The authors declare that they have no conflict of interest.