Ensemble Perturbation Smoother for Optimizing Tidal Boundary Conditions by Assimilation of High-frequency Radar Surface Currents – Application to the German Bight

High-Frequency (HF) radars measure the ocean surface currents at various spatial and temporal scales. These include tidal currents, wind-driven circulation, density-driven circulation and Stokes drift. Sequential assimilation methods updating the model state have been proven successful to correct the density-driven currents by assimilation of observations such as sea surface height, sea surface temperature and in-situ profiles. However, the situation is different for tides in coastal models since these are not generated within the domain, but are rather propagated inside the domain through the boundary conditions. For improving the modeled tidal variability it is therefore not sufficient to update the model state via data assimilation without updating the boundary conditions. The optimization of boundary conditions to match observations inside the domain is traditionally achieved through variational assimilation methods. In this work we present an ensemble smoother to improve the tidal boundary values so that the model represents more closely the observed currents. To create an ensemble of dynamically realistic boundary conditions, a cost function is formulated which is directly related to the probability of each boundary condition perturbation. This cost function ensures that the boundary condition perturbations are spatially smooth and that the structure of the perturbations satisfies approximately the harmonic linearized shallow water equations. Based on those perturbations an ensemble simulation is carried out using the full three-dimensional General Es-tuarine Ocean Model (GETM). Optimized boundary values are obtained by assimilating all observations using the co-variances of the ensemble simulation.


Introduction
Most ensemble-based assimilation schemes such as the EnKF (Ensemble Kalman Filter, Evensen, 2003Evensen, , 2004)), ESSE (Error Subspace Statistical Estimation, Lermusiaux and Robinson, 1999), SEIK filter (Singular Evolutive Interpolated Kalman filter, Pham, 2001) are sequential: the ensemble is updated using observations at the time they are measured.In this sequential approach however the model undergoes a sometimes vigorous adjustment process when the model is restarted (e.g.Malanotte-Rizzoli et al., 1989) if some assumptions of the underlying assimilation scheme are not verified (e.g.poorly known error covariances, model biases or non-Gaussian pdf).A too frequent assimilation of observations can even lead to the situation where the assimilation degrades the model results due to the High-Frequency motions generated by the assimilation (Talagrand, 1972).
Several approaches haven been proposed to reduce this well known problem.Instead of applying the analysis correction at the assimilation time, in the Incremental Analysis Update (Bloom et al., 1996) the correction is added incrementally over several time steps, reducing the generation of spurious gravity waves.This scheme has been used with largescale ocean models (e.g.Keppenne et al., 2005;Ourmières et al., 2006) but it is questionable if it can also be used in regional models representing fast ocean processes such as tides.
In free-surface ocean models, such transient motions often propagate as fast shallow-water barotropic waves.To prevent the generation of spurious gravity waves by assimilation, Dobricic et al. (2007) proposed an iterative method to damp the divergence in the analysis increment of the flow field.Barotropic gravity waves can also be removed explicitly by Published by Copernicus Publications on behalf of the European Geosciences Union.
Those approaches are however not appropriate for tidal simulation since the main processes here are barotropic waves.It would be difficult to distinguish between spurious gravity waves and tidal waves which are missing or misrepresented in the model solution.
As a first step for the assimilation of High-Frequency (HF) radar data, we concentrate for simplicity on the M2 tidal signal.Adding other tidal constituents would not add fundamentally any new complexity except that the model integration period would have to be substantially longer to resolve two constituents with a similar frequency as required by the Rayleigh criterion.In order to distinguish two periodic signals of frequency ν 1 and ν 2 , the length of the time series has to be at least |ν 1 − ν 2 | −1 (Emery and Thomson, 1998).
The objective of this paper is twofold: it aims to apply the ensemble generation scheme providing dynamically constrained perturbations proposed in Barth et al. (2009) to a realistic data assimilation case study and to assess the realism of the assimilation results using an ensemble assimilation scheme.The paper aims also to assess the usefulness of the HF radar data and altimetry-based tidal products to estimate tidal boundary conditions using a smoother scheme for data assimilation.
In Sect.2, the observations used in this assimilation study are presented.A brief description of the model is given in Sect 3. The assimilation strategy is detailed in Sect. 4. The assimilation results are discussed and validated in Sect. 5. Finally, the conclusions of this study are presented in Sect.6.

HF radar data
Surface current observation data were provided by the University of Hamburg by means of HF radar measurements carried out in the context of the PRISMA project (PRISMA, 1994).Two systems were installed at a distance of approx.50 km, the first on the island of Helgoland (54.19 • N, 7.88 • E) and the second on the mainland coast near the town of St. Peter-Ording (54.34 • N, 8.59 • E).They were based on an early CODAR design developed at NOAA (Barrick et al., 1977), which had been modified at the University of Hamburg.
The operating frequency was 29.85 MHz which made the system to couple to 5.02 m long ocean waves.The radar measures the radial component of the surface current by analyzing the additional Doppler shift caused by the Bragg-resonant waves of the underlying current field.Due to the decrease in orbital motion of these waves, they couple to ocean currents down to about 0.5 m below the sea surface.As a result the radar provides surface currents averaged over the top 0.5 m of the water column.The modified CODAR system used a transmit pulse length of 16 µs which results in a range resolution (range cell depth) of 2.4 km.However, the range cells were sampled every 8 µs giving interpolated data every 1.2 km.Azimuthal resolution was provided by means of a four-element array with the antennas arranged in a square at 0.5 λ diagonal spacing (5.02 m).The angle of arrival of the backscattered signal was determined by direction-finding in the frequency domain based on the phase difference between the antennas.After internal pre-processing the sampling time of the CO-DAR was 0.262 s.In each data run 4096 samples were acquired, resulting in an 18 min "coherent integration time" (CIT).
Figure 1 shows a typical backscatter spectrum acquired during a previous experiment.The red vertical lines mark the Doppler frequency expected from Bragg-resonant waves without any additional shift due to the current field (Bragg frequency).

Error statistics of the current velocity
After a fast Fourier transform, the backscatter spectra from the four antennas consist of N FFT = 4096 lines each and the additional Doppler shift caused by the current field is analyzed.As the radial component of the current velocity varies with angle, the radial component of current field is spread over numerous spectral lines.The direction of arrival is calculated for each spectral line with a signal-to-noise ratio S/N≥ 6 dB and the respective radial current velocity u r (i) as well as the associated S/N(i) is then sorted into "directionboxes" at 1 • increments.
The noise level of the backscatter spectrum is calculated as the average power of the areas marked in blue in Fig. 1, the S/N value is the ratio of the power of a signal line in the spectrum to the noise level.This calculation is done for all range cells.
In a next step, a 27 × 21 Cartesian grid with 3 km horizontal resolution is defined.To transform the measurements from radar coordinates (range, azimuth) to this Cartesian grid, all data within a circle of 3 km radius around a grid point is copied from the "direction-boxes" into a "grid-box".After this transformation, there are e.g.K samples of a radial current velocity u r (i) with a signal-to-noise ratio S/N(i) in a "grid-box".
From these K samples, the radial current velocity u r is calculated as This weighting with the S/N implements a "center of gravity" technique which gives preference to strong echoes and provides a more stable estimate of u r .
The variance of the radial current velocity σ 2 r is calculated as Note, that the variance contains measurement errors as well as the variability due to temporal changes in the current field within the CIT caused by e.g.tides.Finally, the accuracy of the radial current velocity Acc r is calculated as Because the value of K is not very large, a correction based on the Student's t distribution is applied.The radial current velocity u r as well as it's variance σ 2 r and accuracy Acc r are calculated for all points of the Cartesian grid, as long as K ≥ 3.
In a final step, the radial current velocities measured by two or more CODARs installed at different locations are combined to form the surface current vector u.The algorithm is published by Gurgel (1994) and is based on a leastsquares-fit.It makes use of the error statistics of the radial current velocities to be combined and thus also provides the variances σ 2 (u) and σ 2 (v) of the meridional (u) and zonal (v) components of the surface current u.The algorithm is outlined in appendix B. These variances include the influence of the geometry (the angle between the radial components) similar to the GDOP (Geometrical Dilution Of Precision) known from the GPS (Global Positioning System).Close to the line connecting the two radar sites, the GDOP becomes very large because both radial measurements basically reflect the same radial component of the current field.If the GDOP is larger than 5, the measurement with the higher S/N is used and the missing information perpendicular to the connecting line is interpolated from surrounding grid points.

The data set used for assimilation
The final data set comprises 8414 samples in time from 9 August 1991 until 4 February 1992 and has a time resolution of 30 min.Due to land coverage and distance from the antennas, the maximum spatial coverage is 396 of 567 available grid cells.The coverage shows high variation over time (see Fig. 2) with a mean spatial coverage of 202 grid cells.The mean coverage over time for each individual point on the observation grid is shown in Fig. 3.This variation is caused by a changing sea state and also due to Radio Frequency Interference (RFI) at the radar frequency.Medium wave height (1-2 m) results in the largest range of the radar measurements, while extremely calm or rough sea reduces the range.This observation is discussed in more detail in Gurgel et al. (1999).
Only observations from 1 September 1991 to 10 October 1991 are used in this work for data assimilation.Observations from 10 October 1991 to 30 October 1991 will be used for validation.The expected error variance obtained from the processing of the radar data is used in the assimilation.The error correlation between different velocity components is ignored here for simplicity.The corresponding error covariance matrix R HF is thus diagonal (but its diagonal elements depend on space).The representation error accounts for missing processes in the model (but present in the observations), and errors that cannot be corrected modifying only the M2 tidal boundary conditions.It also includes the fact that the model and observation error covariances are approximations.The representation error S 2 HF is assumed proportional to the identity matrix and must be added to this error covariance: Figure 4 shows the tidal analysis of the zonal and meridional component of the velocity observations.Since close to the line joining the HF radar sites data have partly been interpolated, the tidal parameters are less accurate near this line (Fig. 5).

Empirical Ocean Tides (EOT08a)
The EOT08a (Savcenko and Bosch, 2008) is a global data set for amplitude and tides of the major ocean tidal constituents, four diurnal tides (K1, O1, P1, and Q1), five semi-diurnal tides (M2, S2, N2, K2, and 2N2), and one non-linear tidal constituent (M4).It is based on empirical analysis of altimeter data of multiple satellite missions and is obtained by a harmonic analysis of the residual of the altimetry data relative to the ocean tidal model FES2004 (Lettellier et al., 2004;Lyard et al., 2006).It has a spatial resolution of 7.5 × 7.5 and it is available at ftp://ftp.dgfi.badw.de/pub/EOT08a/In this study only the M2 component is used (Fig. 6).The real and imaginary parts of the complex tidal parameters are represented as separate elements in the observation vector.For simplicity, the observational error covariance R EOT is assumed to be diagonal with a constant value S 2 EOT on its diagonal: The amplitude and phase of the M2 tidal signal is a time invariant field.

Model
The model used is the General Estuarine Ocean Model (GETM Burchard and Bolding, 2002).It solves the 3-D primitive equations with a free-surface on an Arakawa Cgrid (Arakawa and Lamb, 1977).In the vertical, the present configuration uses 21 σ levels.It covers the German Bight with a resolution of about 0.9 km.Its boundary conditions are extracted from a 5-km resolution North Sea-Baltic Sea model (Staneva et al., 2009).The bathymetric data for the different model configurations are prepared using the ETOPO-1 topography (Amante and Eakins, 2009), together with observations made available from the German Hydrographic Service (Bundesamt für Seeschiffahrt und Hydrographie, BSH, Germany, Dick et al., 2001).The larger scale model and the nested model include tides.The sea surface elevations of the open boundary of the North Sea-Baltic Sea model are generated using tidal constituents obtained from the TOPEX/Poseidon data via the OSU Tidal Inversion Software (Egbert and Erofeeva, 2002).Atmospheric fluxes are estimated by the bulk formulation using 6-hourly ECMWF re-analysis data.The model is also forced by hourly river run-off data provided by the BSH.More details about the large-scale model and the model nesting can be found in Staneva et al. (2009).
The amplitude and phase of the M2 tidal velocity (Fig. 7) are computed for a 60-day model run starting the 1 September 1991 (without data assimilation).The tidal parameters are shown for the same region as for Fig. 4. The overall structure of the u-component amplitude (amplitude of the meridional component of the currents) is in reasonable agreement with the observations.However, a significant phase difference is observed in the u-components.For the v-component (zonal component of the currents), the velocity amplitude is similar to the observations except in the northern part where it is underestimated in the model.The phase differences are however the largest in the southeastern part of the region covered by the HF radar observations.2 6 0  2 6 0   2 7 0  2 7 0   2 8 0  2 8 0   2 9 0  2 9 0  3 0 0   3 0 0   3 1 0   3 1 0   3 1 0   3   4 Data assimilation

Ensemble perturbations
The tides are governed by the shallow water equations which provide a strong dynamical link between elevation and depthaveraged currents.An ensemble of tidal boundary conditions is created as in Barth et al. (2009).In the following we briefly present this approach.The probability of a perturbation x is assumed to be Gaussian distributed: where n is the dimension of x and the function J , called a cost function in variational analysis, is a quadratic function of x given by: where W M , W D and W E are, for simplicity, diagonal weighting matrices, D is a diffusion operator and M is an a priori linear constraint.The subscript of W E refers to energy.The three terms in this cost function ensure respectively that the perturbations have a finite energy, are smooth and satisfy a linear constraint.The matrix B is the underlying covariance matrix of the ensemble perturbations.Its inverse is also called the Hessian matrix of the cost function J .
The precise definition of the matrices W M , W D , W E and M and of the vector x is specific to a particular application.In the present setup, only the M2 tidal constituent is perturbed since it is by far the largest in the German Bight (Schirmer et al., 1994).The perturbation vector x in this case is composed by the complex M2 tidal parameters of elevation ζ , and the depth-averaged currents u and v of the entire model domain: where ω is the M2 tidal frequency.The energy constraint is based on the barotropic wave energy governed by the shallow water equations integrated over the open boundary of the model domain: where g is the acceleration due to gravity and h is the water depth.The parameter α (a dimensional) determines how strong this constraint has to be enforced.In continuous form, the smoothness constraint is written as: l is the spatial dimension tangent to the open boundary.The weighting between variables is thus the same as for the energy constraint.As before, the parameter L determines the importance of the smoothness constraint relative to other constraints.The exponent is chosen such that L represents a length-scale.The smoothness constraint is also only enforced at the open sea boundaries.The dynamical constraint is active over the whole domain and can be written as: where ε ζ is the misfit of the shallow-water continuity equation and ε u and ε v are the misfits of the shallow-water momentum equations (with a linear bottom drag): The covariance matrix B does not need to be formed explicitly to create ensemble perturbations, only its inverse.Since the constraints use derivatives that are approximated as finite differences, the matrix B −1 is a sparse matrix.Only nonzero elements of this matrix are stored, which allows for an efficient implementation (Barth et al., 2009).
Unlike the approach presented in Barth et al. (2009), the energy constraints and the smoothness constraints are applied only at the boundaries.The spatial structure of the perturbations within the domain is thus given only by the dynamical constraint which is sufficient to ensure smooth perturbations.

Non-sequential assimilation scheme
In this sub-section, the general assimilation approach is detailed while in the next sub-section it is shown how this assimilation scheme is applied to the German Bight setup to estimate tidal boundary conditions.
A non-sequential assimilation scheme can be derived from the classical analysis scheme by embedding the time dimension in the observation vector.All observations within the model integration period are thus grouped into a single observation vector (y o ) with their corresponding error covariance (matrix R).
The vector x (k) , where k is the ensemble member index, includes all unknown forcings and parameters of the model.An ensemble of forcing fields are created by perturbing them within the range of their uncertainty.If the perturbations depend on time, then its time evolution is also grouped together into a single vector in a similar way as the observations.
For every perturbation, the model is integrated forward in time.But unlike classical Kalman Filters, the ensemble members are not influenced at this state by the observations.For each ensemble member, the observed part of the model state is extracted.Formally, this extraction can be expressed as a non-linear "observation operator" h(•) applied to the ensemble perturbation x (k) performing the following operations: adding the perturbation to the background estimate, integrate the model and then interpolate the model to the location of the observations.The model dynamics are thus part of the observations operator.Every element in h(x (k) ) can thus be directly compared to its corresponding element in the observation vector y o .www.ocean-sci.net/6/161/2010/Ocean Sci., 6, 161-178, 2010 A. Barth et al.: Ensemble smoother for optimizing tidal boundary conditions In sequential data assimilation, the variables estimated by the assimilation are often the entire model state.However, different definitions can be more appropriate depending on the assimilation problem.Here we choose to include the forcing fields, which is a common strategy in variational data assimilation (e.g.Hoteit et al., 2009).Assimilation can also be used to estimate model parameters for e.g.biological models (Spitz et al., 1998) or weights given to individual models in super-ensemble techniques (Rixen et al., 2009).
The definition of the observation operator depends equally on how the assimilation strategy is implemented.It can be a simple operator interpolating the model results at the location of the observations or a more complex transformation.For example, the observation operator can be an operator extracting the spectral information from the model to assimilate tomographic data (e.g.Rémy et al., 2002), an EOF projection to assimilate EOF amplitudes (e.g.Barth et al., 2006), or a complete radiance sub-model to assimilate radiance observations in a numerical weather model (e.g.Greenwald et al., 2002).
From the ensemble simulation, we derive the following matrices whose columns represent the deviation of the ensemble members around the ensemble mean (S) and the observations (E): where the index k refers to the ensemble member and • is the ensemble average.These matrices are scaled such that the following products represent covariance: The matrices SE T and EE T would be P b H T and HP b H T if the observation operator could be represented by the linear operator H.Those covariance matrices do not need to be formed explicitly as the analysis is performed in the subspace defined by the ensemble members (e.g.Nerger et al., 2005).
The Kalman filter analysis with non-linear observation operators as in Chen and Snyder (2007) provides the optimal perturbation minimizing its expected uncertainty: for a given background estimate x b .The analysis x a provides the "optimal" correction to the unknown forcings based on the observations.The model is then rerun with this perturbation applied.The observations influence thus the model solution only by choosing the optimal combination of the perturbations.
For a linear model and an infinite large ensemble, Eq. ( 22) minimizes, where n references to the indexed quantities at time n.The first term of this cost function is the constraint used to defined the dynamically plausible perturbations (Eq.7.).
The approach used here is closely related to the Ensemble Smoother (van Leeuwen, 2001), 4DEnKF (Hunt et al., 2004, 2007) and the AEnKF (Sakov et al., 2010) where model trajectories (i.e. the model results in space and time), instead of model states, are optimized.In the smoother scheme of van Leeuwen (2001), observations are perturbed as in the standard Ensemble Kalman Filter (Burgers et al., 1998).In 4DEnKF, the observation operator is modified to relate the observations at the time where they are measured to the time where they are assimilated.The AEnKF extends the observations vector and the matrix containing the ensemble members at the location where they are observed by vertically concatenating those vectors and matrices at different time instances.This approach is also used here because it is easier to implement than the 4DEnKF.But both approaches can be seen as equivalent.For an increasing number of ensemble members, the Ensemble Smoother does also converge to the 4DEnKF and AEnKF.In the present study, these approaches are not applied to directly estimate the model trajectory but to the perturbations of the forcings (or the trajectory of the forcing perturbation).Therefore, after the optimal correction of the forcings is computed, the model needs to be rerun to obtain the final model solution.For a linear model, these schemes provide the same results.However, for a non-linear model, the results might be different.The optimal solution from the Ensemble Smoother, 4DEnKF and AEnKF is not guaranteed to satisfy the model equations, while it is per construction the case in the scheme used here.Since the method used here aims to estimate the optimal perturbations, this approach might be called Ensemble Perturbation Smoother.

Application to data assimilation in the German Bight
Errors in the boundary conditions, bathymetry and poorly known bottom friction contribute to errors in the M2 tidal signal inside the model domain.Mourre et al. (2004) showed in a twin experiment how the errors in the bathymetry can be reduced by data assimilation using an Ensemble Kalman Filter.The present study is a first step and focuses on reducing the error generated due to unknown tidal boundary conditions.Also the uncertainties due to errors in the initial conditions, other lateral boundary conditions and atmospheric forcings are not considered here.
The vector x is composed in the present implementation by the complex M2 tidal parameter of elevation ζ , zonal u and meridional velocity v defined over the whole model domain.Those variables vary spatially but are constant in time.This vector has in total 52374 elements.In GETM version 1.6, tidal boundary conditions are implemented such that only elevation at the boundary is used for the ensemble simulation.In future studies, this vector can be augmented by other unknown parameters such as bathymetry or (space dependent) friction coefficients.
The observations vector y o includes the u and v components of the HF radar observations y HF n at all time instances t n (n = 1,...,N t with N t = 1869) within the 40 day model integration at a resolution of 30 min (at 51 time instances no data is available).It also includes the real and imaginary parts of the EOTs elevation M2 tidal parameters y EOT (related to the amplitude and phase).This latter field does not depend on time: In total y o has 2 120 396 elements.The observation error covariance matrix R is here a diagonal matrix whose diagonal elements are obtained by a concatenation similar to equation ( 26).
In order to create an ensemble x (k) of perturbations following the distribution (6), the Hessian matrix of the cost function (B −1 ) is decomposed into eigenvectors and eigenvalues: Only the 50 eigenvectors with the smallest eigenvalues are retained.The decision how many eigenvectors to retain was based on the distribution of the eigenvalues.Those eigenvectors correspond to the dominant error modes defined by the covariance matrix B. From this eigendecomposition, an ensemble of 51 members is created following the 2nd order exact re-sampling strategy of the SEIK Filter (Pham, 2001).
where N is the number of eigenvectors retained, A is a (N +1)×N matrix whose column vectors form an orthonormal basis perpendicular to the column vector with all N + 1 elements equal to one and ( ) k is the kth column of a N ×N random orthogonal matrix .This ensemble will have exactly a zero mean and its ensemble covariance is the covariance matrix B reduced to its 50 largest eigenvectors.
The tidal perturbations are added to the GETM boundary conditions and GETM is run for 40 days with each of those perturbed boundary values.The minimum length should be at least one string tide/neap tide cycle.Otherwise the errors in the S2 tidal signal could lead to modification of the M2 boundary conditions.This would result in a high RMS error when the model currents are to the validation of surface currents.
From those 51 model runs, the surface currents at the locations measured by the HF radar are extracted and will be compared to the HF radar observations.Also for every surface model grid point, the complex M2 tidal parameters of surface elevation are computed using the T TIDE package (Pawlowicz et al., 2002).These complex tidal parameters are interpolated at the grid of the EOT08a data.Once the optimized boundary values following Eq.( 22) are obtained, the model is rerun with the corrected boundary values for 60 days.

Sensitivity to the observational error covariance
In the present approach, the observational error covariances do not influence the ensemble simulation (unlike Ensemble Kalman Filters).Only at the analysis step, which is carried out only once, the observations are used.Different parameters of the observational error covariances can thus be tested without repeating the ensemble simulation which is by far the most CPU resource-intensive step.
In a first series of experiments, the expected error standard deviation of the EOT data S EOT is fixed at 0.01 m (the impact of this value will be discussed later) and 10 different values for S HF are used ranging from 0.002 m/s to 1 m/s.Each blue circle in Fig. 8 corresponds thus to a different analysis and a subsequent 60-day run of the model.The RMS error between the EOT M2 tidal parameter for elevation and the model (according to equation in appendix A) and the RMS error between the HF radar and the model surface currents are computed.The corresponding errors of the model run with unperturbed boundary values (hereafter called the free model run) are also included in Fig. 8 for reference (red line).
If the representative error of the HF radar currents is too small, then a degradation of the model results compared to EOT is observed.The HF radar data set covers indeed only a limited portion of the model domain.The correction of the boundary values is extrapolated from the information of the HF radar.For S HF > 0.12 m/s, an improvement for both data sets is obtained compared to the free model run.As a compromise between improvement relative to HF radar observations and EOT analysis, we choose here a value of S HF = 0.2 m/s for the following experiment.
In a second series of experiments, S HF is thus fixed to 0.2 m/s and S EOT is varied between 0.001 m and 0.5 m.As before, the analysis is repeated for those different values and a model run is performed for 60 days starting from 1 September 1991.As long as S EOT is smaller than 0.017 m, an improvement is observed compared to both data sets (Fig. 9).This justifies a posteriori the value of 0.01 m chosen previously.
www.ocean-sci.net/6/161/2010/Ocean Sci., 6, 161-178, 2010 Free run is in red and run with data assimilation in blue.The dotted line shows the adopted value for S HF .Free run is in red and run with data assimilation in blue.The dotted line shows the adopted value for S EOT .
In a last series of sensitivity experiments, we assess the benefit of assimilating only EOT data, and leaving the HF radar observations as an independent validation data set (Fig. 10).In this case, we notice that for all values of S EOT , the assimilation improves the results compared to the free model run for both data sets.However, as expected, the improvement relative to the HF radar data is smaller.In all following experiment, S HF and S EOT are set to 0.2 m/s and 0.01 m respectively.
Figure 11 shows the elevation correction from the assimilation scheme and the elevation M2 tidal amplitude and phase diagnosed from the assimilative run.Note that the scheme provides the correction over the whole domain but only the corrections at the boundary are actually used by the GETM   model.The assimilation increases the amplitude of the incoming tidal wave (in the south of the western boundary) and decreases its phase.

Error analysis
The amplitude and phase of the M2 tidal constituent are computed over the first 40 days of simulation (starting from 1 September 1991) by a tidal analysis (Fig. 12).The largest improvement obtained by the assimilation is the amelioration of the u-component phase relative to the free model.The assimilation did not damage the u-component amplitude which is close to the observations as it is the case for the free model run.The free model run underestimates the v-component amplitude in the northern part of the zone covered by the HF radar.This is improved by the assimilation, but the model v-component amplitude is still lower than the v-component amplitude obtained from the observations.The v-component phase is in general also closer to the observations in the model run with assimilation than in the free run.
A. Barth et al.: Ensemble smoother for optimizing tidal boundary conditions  To obtain an overall view of the impact of the assimilation, RMS maps of surface currents are computed between the observations and the free and assimilative model runs (Fig. 13).
where the subscripts o and m correspond to the observations and the model (free or with assimilation) respectively and N is the number of observations.It is important to note that the structure seen in this figure can be either due to errors in the model or in the HF radar currents.The higher errors near the line joining both antennas is probably due to the geometric dilution of precision.Since the signal strength decreases away from the antenna a higher error is also expected near the edges of the covered region, in particular near the western and southern edge of the covered zone.In the northeastern part we have probably a combination of this effect and model errors due to the complex topography.Indeed, these zones show a higher expected error standard deviation of the observations (left panel of Fig. 5).
In the interior of the covered zone, the free model run has an RMS error of about 0.2 m/s.This value is reduced to about 0.15 m/s in the model run with assimilation.Since the RMS error is computed directly based on the surface currents, it is the combination from several errors.The RMS error includes the errors in M2 tides generated by the boundary conditions (which the assimilation aims to reduce), but among others also errors in M2 tides generated by errors in the bathymetry and bottom roughness for example, errors due to other tidal constituents, errors in wind forcings and lack of resolution in general.
In an attempt to decompose the model errors we computed the RMS error only due M2 tides (RMS 2 M2 ) and the remaining errors sources (RMS 2 ): The contribution of the M2 tides RMS 2 M2 is computed by a tidal analysis of the observed and model velocity.The RMS error is then computed according to appendix A. This is equivalent of computing the RMS of the error projected onto the subspace spanned by the harmonic functions at the M2 tidal frequency.Such decomposition of the model error in a orthogonal vector space is a useful approach for model validation (e.g.Alvera-Azcárate et al., 2007).Figure 14 shows the RMS difference between observations and models only due to errors in the M2 tides.As expected, the relative error reduction is largest in this analysis: in the interior of the zone covered by the HF radar, the error is reduced from 0.14 m/s to about 0.08 m/s.To reduce the model error in the very shallow eastern part, a higher resolution would probably be required.
All RMS errors in Figs. 13 and 14 are computed based on the 40 days of HF radar velocity used in the assimilation.Those error analyses where repeated for the following 20 days of model simulation.The error reduction for this analysis is very similar to the one obtained by using the assimilated observations (Fig. 15).This indicates that the boundary conditions where not adjusted for a particular month and that they can also be used for simulating a different time period.However, the RMS error at the edges of the covered zone is higher when the model is compared to the data from 10 October 1991 to 30 October 1991.The fact that this is observed in both simulations (free and with assimilation) and also in the averaged expected error standard deviation of the observations indicate that the observations were inherently less accurate in the second time period (Fig. 5).

Tide gauge station
The model results are also compared to tide gauge data from Helgoland (54.18 • N, 7.88 • W) and from Cuxhaven (53.87 • N, 8.72 • W).The tide gauge station at Helgoland is covered by the HF radar observations but this is not the case for the tide gauge at Cuxhaven.For both sites, hourly sea level data from 21 January 2006 to 7 December 2006 are available.
Since the model time frame differs from the time period where sea level is available, a direct comparison of the sea level is not possible.However, the tidal amplitude and phase can be compared (after applying nodal corrections).Table 1 shows the M2 amplitude and phase computed by tidal analysis from the observations, the free model run and the model run with assimilation.At Helgoland, the phase difference is reduced from 14 • to −2 • and also the relative amplitude error is improved from 28% to 14%.At Cuxhaven, the assimilation reduced the phase error from −41 • to −28 • and the relative amplitude error from 30% to 21 %.Table 1 also contains the RMS error as computed by Eq. (A2) of Appendix A. Those RMS errors combine the contribution of the amplitude and phase improvements.The assimilation reduces the RMS error by a factor of 2 for Helgoland and by a factor of 1.4 for Cuxhaven.

Optimizing the model trajectory instead of tidal boundary conditions
Instead of optimizing the tidal boundary conditions one could directly try to estimate the best model trajectory (van Leeuwen, 2001;Hunt et al., 2004Hunt et al., , 2007;;Sakov et al., 2010).Whether one approach is preferred over the other depends on the application.In the present case, the amplitude and phase of the M2 tidal signal is a time invariant field.If tidal amplitude and phase at the boundary are corrected, the model can be rerun for any time period.This is not the case if the model trajectory is corrected.However, in the present approach one can easily try both methods without re-computing the ensemble members.
The vector x(k) represents the model trajectory (space and time) and the observation operator h(•) extracts the observed surface currents and elevation tidal parameters from the model trajectory.The rows of the matrix S are defined in a similar way as previously: The matrix containing the scaled ensemble anomalies of the observed part of the model trajectory E is the same as before.According to the analysis step of the Kalman filter, the optimal trajectory xa is given by:   The correction to the trajectory xa is thus a linear combination of the rows of S. The coefficients of this linear combination are the same as the ones obtained for optimizing the tidal boundary conditions Eq. ( 22) since: Therefore with almost no additional effort, one can compute the optimal trajectory xa .However, unlike the optimization of the boundary conditions, one cannot compute the model results obtained for a different time period and compare them to the corresponding HF radar observations.We choose thus to compare the model results to the tide gauge data to compare both approaches (as in Sect.5.3).By correcting the model trajectory, one obtains a RMS error 0.46410 m and 0.11976 m for Cuxhaven and Helgoland respectively.These results are very similar to the RMS error obtained by optimizing the tidal boundary condition (0.46431 m for Cuxhaven and 0.11990 m for Helgoland).One notices that the correction to the model state leads to slightly better results than correcting the boundary conditions but essentially both approaches provide virtually the same results.One could expect exactly the same results if the model dynamics would be linear.The fact that the RMS errors differ only on the order of tenths of millimeters shows a posteriori that the tidal propagation is mostly linear.The advantage however for correcting directly the tidal signal is that the new tidal parameters can be used for simulating different time periods.

Conclusions
A new ensemble generation scheme to create dynamically constrained perturbations is tested in a realistic assimilative model setup.Since tidal boundary conditions are determined by an ensemble assimilation scheme, the ensemble perturbations are required to satisfy approximately the harmonic shallow water equations.An ensemble based on this method was successfully used to improve M2 boundary conditions by assimilation of HF radar and tidal data derived from altimetry.
The feasibility of a non-sequential assimilation scheme to derive the optimal correction as a combination from ensemble perturbations is shown.Since the perturbations of uncertain forcings fields (here tidal boundary conditions) are analyzed by using all observations within the model integration period, the assimilation scheme acts similar as an Kalman Smoother.For a linear model and infinite ensemble, it would provide the same solution as 4D-Var but without requiring the formulation of an adjoint.The method aims to derive the optimal perturbation and not the optimal state.Therefore, the final solution is obtained by rerunning the model.The analyzed model solution satisfies thus the model equations exactly.No spurious, transient motion such as gravity waves, are generated during the model run with this procedure as it is often the case in sequential assimilation schemes.All quantities which are conserved by a free-running model (taking the fluxes through boundaries of the domain into account) will also be conserved by the analyzed solution.The applicability of this approach for different degrees of non-linearity and for uncertain time-dependent forcings remain to be shown.In the latter case, one could apply a timelocalization similar to the space localization used in reducedrank Kalman filters.
Besides the expected error reduction relative to the assimilated HF radar observations and the EOT08a data set (Empirical Ocean Tides), the improved boundary conditions also reduce the error relative to HF radar velocities set aside for validation and the M2 amplitude and phase of tide gauge observations at Helgoland and Cuxhaven.
Instead of optimizing the M2 tidal boundary conditions, one can with little effort also obtain the solution of the smoother problem where the trajectory is directly optimized (van Leeuwen, 2001;Hunt et al., 2004Hunt et al., , 2007;;Sakov et al., 2010).If the system is linear, both approaches should provide the same result.In the present case, both methods lead indeed to almost the same result indicating a posteriori that the non-linear effects in the M2 tidal propagation are indeed small.The main advantage however in optimizing the tidal parameters is that the correction can be used to simulate a different time period.
Several points can also be improved concerning the way the HF radar data are assimilated.Instead of working with the estimated u and v components of the surface currents, it would be preferable to directly assimilate the radial velocities.The u and v components of the surface currents can only be estimated where at least two radial measurements are available.Not all data can thus be used when the u and v components are assimilated.
Another future area of research is the handling of the Stokes drift which is included in the data but not in the model.For the analysis scheme it is thus seen as a "spurious" variability in the observations.The testing of various observation error variances and the comparison of the analyzed solution to the observations provide an indirect safeguard preventing to fit the Stokes drift by adjusting the tidal boundary conditions.However, an explicit approach to remove the Stokes drift from the data (or adding it to the model) is preferable.
Only errors due to uncertainties in the M2 tidal boundary conditions are considered in the manuscript.As a next step, this procedure could be extended to other tidal constituents.This would require a longer integration time period to distinguish constituents with similar frequency.However, the tidal signal is also affected by error bathymetry and friction.To further improve the representation of tides, those parameters can also be perturbed and their uncertainties reduced by data assimilation (see Mourre et al. (2004) for an application where errors in the bathymetry are taken into account).The proposed scheme is particularly well suited to optimize tidal boundary conditions since the tidal amplitude and phase are essentially time independent.Beyond the improvement of tides, it would be interesting to study if the scheme is also applicable for time dependent fields such as wind forcings and heat flux (by assimilating for example satellite SST).
The Source code of the Ensemble Perturbation Smoother is freely available at http://modb.oce.ulg.ac.be/mediawiki or from the author.The assimilation code runs on Matlab and GNU Octave.

Fig. 1 .
Fig.1.An example of a typical backscatter spectrum acquired during a previous experiment(Helgoland, German Bight, 2 December 1987, 21:36 UTC).The vertical red lines mark the frequency shift due to the phase velocity of the Bragg-resonant ocean waves traveling towards and away from the radar.The areas marked in blue indicate the part of the spectrum the noise level is calculated from, the blue horizontal line indicates the noise level found.

Fig. 2 .
Fig. 2.Temporal coverage of the HF radar zonal and meridional surface velocity observations: the number of observation grid points with valid data available is shown for each sample time.The Cartesian observation grid has 567 points, but due to land coverage and distance from the antennas, the maximum spatial coverage is 396 grid cells.

Fig. 3 .
Fig. 3. Spatial coverage of the HF radar zonal and meridional surface velocity observations: the number of samples available at each observation grid point is color-coded according to the colorbar.The entire data set comprises 8414 samples in time.Contour lines show depth in meters as used in the hydrodynamic model's bathymetry.The crosses show the location of HF radar antennas.

Fig. 4 .
Fig. 4. M2 amplitude (in m/s) and phase (in degrees) of zonal (u) and meridional (v) surface currents of the observations.

Fig. 5 .
Fig. 5.The square root of the averaged expected observational error variance (without the representation error) for the observations from 1 September 1991 to 10 October 1991 (assimilated; left panel) and from 10 October 1991 to 30 October 1991 (not assimilated; right panel).Units are m/s.

Fig. 6 .
Fig. 6.M2 amplitude (in m) and phase (in degrees) of EOT08a for the German Bight

Fig. 7 .
Fig. 7. M2 amplitude (in m/s) and phase (in degrees) of zonal (u) and meridional (v) surface currents of the free model run.

Fig. 8 .
Fig. 8. RMS error relative to EOT analysis (in m) and HF radar observations (in m/s) for S EOT = 0.01m and different values for S HF (in m/s).Free run is in red and run with data assimilation in blue.The dotted line shows the adopted value for S HF .

Fig. 9 .
Fig. 9. RMS error relative to EOT analysis (in m) and HF radar observations (in m/s) for S HF = 0.2 m/s and different values for S EOT (in m).Free run is in red and run with data assimilation in blue.The dotted line shows the adopted value for S EOT .

Fig. 10 .
Fig. 10.RMS error relative to EOT analysis (in m) and HF radar (in m/s) for different values for S EOT (in m) without assimilation of HF radar observations.Free run is in red and run with data assimilation in blue.

Fig. 11 .
Fig. 11.Amplitude and phase of the correction of the M2 tidal elevation (in m, left panel) and of the assimilative model run (right panel).The increment of the phase contour lines is 10.

Fig. 12 .
Fig. 12. M2 amplitude (in m/s) and phase (in degrees) of zonal (u) and meridional (v) surface currents of the model run with assimilation.

Fig. 13 .
Fig. 13.RMS difference (in m/s) between surface current observations and model results without (left panel) and with assimilation (right panel).

Fig. 14 .
Fig. 14.RMS difference (in m/s) between surface current observations due to the M2 tides and the corresponding model results without (left panel) and with assimilation (right panel).

Fig. 15 .
Fig. 15.RMS difference (in m/s) between surface current observation and model results without (left panel) and with assimilation (right panel) compared to independent data.

Table 1 .
Comparison with tide gauge observations.Amplitude and RMS are in m and phase in degrees.