Data assimilation of sea surface temperature and salinity using basin-scale reconstruction from empirical orthogonal functions: a feasibility study in the northeastern Baltic Sea

The tested data assimilation (DA) method based on EOF (Empirical Orthogonal Functions) reconstruction of observations decreased centred root-mean-square difference (RMSD) of surface temperature (SST) and salinity (SSS) in reference to observations in the NE Baltic Sea by 22 % and 34 %, respectively, compared to the control run without DA. The method is based on the covariance estimates from longterm model data. The amplitudes of the pre-calculated dominating EOF modes are estimated from point observations using least-squares optimization; the method builds the variables on a regular grid. The study used a large number of in situ FerryBox observations along four ship tracks from 1 May to 31 December 2015, and observations from research vessels. Within DA, observations were reconstructed as daily SST and SSS maps on the coarse grid with a resolution of 5× 10 arcmin by N and E (ca. 5 nautical miles) and subsequently were interpolated to the fine grid of the prognostic model with a resolution of 0.5× 1 arcmin by N and E (ca. 0.5 nautical miles). The fine-grid observational fields were used in the DA relaxation scheme with daily interval. DA with EOF reconstruction technique was found to be feasible for further implementation studies, since (1) the method that works on the large-scale patterns (mesoscale features are neglected by taking only the leading EOF modes) improves the high-resolution model performance by a comparable or even better degree than in the other published studies, and (2) the method is computationally effective.


Introduction
In the coastal oceans and marginal seas, basin-scale observation, modelling and forecasting of oceanographic and biogeochemical variables is a continuing challenge. As an example from the Baltic Sea, large-scale nutrient dynamics (Andersen et al., 2017;Savchuk, 2018) control the level of eutrophication and hypoxia, affected by nutrient loads and changing climate (Meier et al., 2019). Placke et al. (2018) have recently shown, by comparison of different models, that temperature is much better reproduced than salinity. A similar evaluation has been obtained earlier by Golbeck et al. (2015), based on 13 operational models used routinely in the Baltic and North seas.
Data assimilation (DA) is a key element to improve the model accuracy with respect to observations, both in the operational forecast and the reanalysis context (Martin et al., 2015;Buizza et al., 2018;Moore et al., 2019). DA methods are built upon dynamical models and they are based on some kind of minimization (minimum variance, variational cost function formulation etc.) of modelling errors (Carrassi et al., 2018), using estimated statistical characteristics of the studied variables. Most of the widespread methods (optimal interpolation, 3DVar, 4DVar, various options of the Kalman filter, including their ensemble formulations) use covariance as the basic statistical characteristic. Recent overviews on different DA applications in the Baltic Sea can be found in the papers by Liu and Fu (2018), Zujev and Elken (2018), Goodliff et al. (2019) and She et al. (2020). Whereas there are several results from Baltic Sea reanalysis studies available (Axell and Liu, 2016;Liu et al., 2017), the operational Baltic Sea forecasts within CMEMS (Copernicus Marine En-vironment Monitoring Service) do not presently include DA (Huess, 2020) and there is ongoing work to implement an automated DA system which would be robust, reliable and well validated.
Results of DA-based forecasting depend heavily on the spatio-temporal configuration of the observing system (Le-Traon et al., 2019). Unlike the regular weather observing networks, observation systems in marginal seas are rather fragmented, where areas and periods of dense sampling can be neighboured by large observation gaps. Therefore, special OSE (observing system experiment) studies have been initiated, to find optimal observation network configurations to achieve best skill of DA (Fuji et al., 2019). However, most of the observations of the Baltic Sea surface variables, not yet detectable by remote sensing (like salinity, nutrients etc.), stem from the FerryBox systems installed on board regularly cruising commercial passenger or cargo ships , and planning can be done only within the existing routes. Therefore, development of improved gap-filling techniques is a challenge and it would be highly beneficial for a region with sparse observations.
Recently, a novel method for EOF reconstruction of gridded sea surface temperature (SST) and salinity (SSS) fields, using the data from (mostly) irregular and (often) sparse observations, was presented and thoroughly tested in the NE Baltic Sea . The method relies on the estimate of covariance matrix from the long-term model data, which is decomposed into the full set of EOF modes. The mode values at observation points, together with the observed values, enable least-squares estimation of observational amplitudes. The method is able to follow on the regular grid the pointwise observed temporal changes of the mean state and of the major basin-scale gradients. The aim of the present study is to implement this statistical reconstruction technique into the data assimilation of the forecast model, and to study the feasibility of such an assimilation method.
The paper is organized as follows. In the section on data and methods, an overview of the sub-regional oceanographic background and a short model description are presented. Observational in situ data have been compiled from three sources, and they contain shipborne monitoring and Ferry-Box platforms. The reconstruction method is presented in detail, and the section ends with the description of the data assimilation method used. The results section starts with the presentation of experiments in order to find the optimized parameters for reconstruction of gridded fields. The rest of the section is devoted to the analysis of the results of data assimilation experiments, ending with the performance evaluation. Finally, discussion and conclusions are presented.
2 Data and methods

Study area and the circulation model
We have chosen the study area in the NE Baltic within 56. 9417-60.725 • N, 21.55-30.35 • E (Fig. 1), motivated by several Estonian national interests within the operational forecast of sea state and assessments of the marine environment. The region covers the Gulf of Finland, the Gulf of Riga and part of the Baltic Proper adjacent to these gulfs. The region is rather shallow: the mean and maximum depths are 26 and 62 m in the Gulf of Riga (Yurkovskis et al., 1993) and 37 and 123 m in the Gulf of Finland (Alenius et al., 1998), respectively.
The region lies in the temperate climatic zone. During the summer, SST exceeds usually 15 • C in July or August (Alenius et al., 1998), with highest values up to 25 • C observed in some years in the shallow coastal zones (Stramska and Białogrodzka, 2015). The warm upper layer of 10-20 m thickness is well mixed down to the thermocline or bottom, whichever of them is shallower. Occasionally, winddriven coastal upwelling processes disrupt this warm layer (Uiboupin and Laanemets, 2009). Nearly every winter, sea ice forms with variable extent and thickness; during severe winters, the Gulf of Finland and the Gulf of Riga are fully ice-covered (Jevrejeva et al., 2004). The region is impacted by large rivers: the Gulf of Finland and the Gulf of Riga together receive 34 % of the total freshwater discharge to the Baltic Sea as can be calculated from the data by Johansson (2017). As a result, there is an estuarine increase in SSS from east to west (Alenius et al., 1998;Yurkovskis et al., 1993), reaching 7-8 g kg −1 in the Baltic Proper (Kõuts and Omstedt, 1993). The Gulf of Finland has a free connection to the Baltic Proper without a sill or any other topographic restriction; therefore deeper more saline waters of the Baltic Proper penetrate into the Gulf of Finland and form an estuarine halocline (Liblik et al., 2013). A shallow sill with a depth of 15 m connects the Gulf of Riga with the Baltic Proper; therefore deep layers of the Gulf of Riga can receive only surface waters of the Baltic Proper (Lilover et al., 1998). The two gulfs, located in the NE Baltic, play an essential role in the dynamics of the whole Baltic Sea (Omstedt and Axell, 2003).
For the modelling, an Estonian sub-regional set-up ( Fig. 1) of the Baltic-wide HBM model was applied with a resolution of 0.5 × 1 arcmin by N and E (ca. 0.5 nautical miles) containing the entire Gulf of Finland, Gulf of Riga and NE portion of Baltic Proper (Lagemaa, 2012;Zujev and Elken, 2018). The model fields are three-dimensional having 455 × 529 × 39 grid cells (by latitude, longitude and depth correspondingly), with 750 088 wet points and 71 986 of them on the surface with a layer thickness of 3 m.
At the western open boundary, the data were taken from the Baltic-wide HBM model (Huess, 2020), operated by the Copernicus Marine Environment Monitoring Service (CMEMS, https://marine.copernicus.eu/, last access: 2 May  Rohde (1998). Location of our study area is shown in the insert by a red box. 2020). Atmospheric forcing was provided by the Estonian implementation of HIRLAM (Männik and Merilain, 2007). HBM uses the Arakawa C-grid and produces a forecast for 16 ocean variables including temperature, salinity, current speed and ice concentration. A detailed description of the HBM model and its validation can be found by Berg and Poulsen (2012); further analysis and evaluations are given by Golbeck et al. (2015), Hernandez et al. (2015), Tuomi et al. (2018), Huess (2020) and She et al. (2020). In particular, the CMEMS Quality Information Document (Golbeck et al., 2018) concludes that temperature forecast between the surface and about 100 m depth is one of the major strengths of the CMEMS-V4 product, below which the halocline deviations of forecast from observations increase. Regarding salinity, the values are slightly underestimated and the underestimation increases with depth.
The model set-up has been designed for operational forecasting. For computational reasons, it was decided to keep the operational 0.5 nautical mile grid resolution and to perform shorter feasibility experiments, instead of choosing larger grid steps and making longer experiments. The model is used routinely by the Estonian Weather Service (implemented by one of the authors, Priidik Lagemaa); SST is displayed on the web page https://ilmateenistus. ee/meri/mereprognoosid/merevee-temperatuur/ (last access: 8 May 2020) and SSS is shown on the page https:// ilmateenistus.ee/meri/mereprognoosid/soolsus/ (last access: 8 May 2020). In compliance and for comparability rea-sons with the recent study by Zujev and Elken (2018), we chose the study period from 1 May to 31 December 2015, to be used for the DA experiments. The model experiments were conducted in the framework of operational forecasting, where the forcing files were updated daily. There were no gaps during the study period in meteorological data nor in open-boundary conditions nor any other input.

Observational data
All available SST and SSS data from three sources were compiled:

The Copernicus Marine Environment Monitoring
Service (CMEMS, https://marine.copernicus.eu/, last access: 8 May 2020) contains among other data sources the quality-checked data set of Baltic in situ near-real-time multiparameter observations: https://resources.marine.copernicus.eu/?option=com_ csw&view=details&product_id=INSITU_BAL_NRT_ OBSERVATIONS_013_032 (last access: 24 October 2019). This data set, accessible through free-of-charge registration, contains in our study region data from several FerryBox systems (automatic observations made from ferries and other ships crossing the sea areas on a regular basis). There are also a number of coastal stations, but they record mainly sea level and water temperature, whereas salinity observations are missing; therefore we are not using coastal stations. In our study 3. National monitoring database KESE (https://kese.envir. ee/kese/viewProgramNew.action?uid=473556, last access: 11 December 2020, search for "mereseire") contains detailed records of all variables observed under the national environmental monitoring programme. The data that were downloaded on 18 October 2019 contain different data records for every environmental variable. Except for a few cases, these data are also found in the ICES/HELCOM database. Duplicate entries were avoided from the composite data set by averaging over small time and space intervals.
The largest amount of synchronous SST and SSS data originates from the FerryBox systems, accessed through the CMEMS (Table 1). There were about 330 000 initial observation points from FerryBox, distributed over a few ship lanes ( Fig. 2a) with a resolution of a few hundred metres and from daily to a few days interval. The analysed water is strongly mixed in the surface layer by the moving ship. Typical observation depth may be considered 5 m, although variations between the ships and due to the variable shipload exist (Lips et al., 2008;Karlson et al., 2016). There were also about 370 observations from shipborne monitoring stations. Distribution of the amounts of observations in selected temporal and longitude intervals (Fig. 2b)  Two sets of compressed (averaged) FerryBox data were created for further data analysis, containing mean observed values, coordinates and observation times over the selected intervals. Firstly, for the model validation study, daily mean spatial averages over a fine grid with a resolution of 0.5 × 1 arcmin by N and E (as in the used model) cells were created, resulting in about 110 000 values. Secondly, for the EOF pattern analysis and reconstruction of SST and SSS fields, daily mean spatial averages over the coarse grid (5 × 10 arcmin by N and E, about 5 nautical miles) were created. The main benefit of the coarse grid is to save computational costs while keeping the large-scale patterns well resolved (see Sect. 2.4 for more details on the advantages and disadvantages of using the coarse grid). In this procedure, the initial observations were compressed on the coarse grid by roughly 25 times yielding about 13 000 average values for SST and SSS. Within the temporal averaging, it was chosen not to apply any diurnal cycle correction, and all the observations at different hours were averaged to the closest midnight.
For the interpretation of model and DA results, meteorological data were taken from the model forcing fields. For the occasional comparison, CMEMS remote sensing SST Level 4 data were retrieved from the service portfolio https://resources.marine.copernicus.eu/?option=com_ csw&view=details&product_id=SST_BAL_SST_L4_NRT_ OBSERVATIONS_010_007_b (last access: 8 May 2020).

Reconstruction of gridded data from point observations
For the purpose of DA, we chose to use EOF reconstruction of large-scale SST and SSS fields, using the orthogonal patterns from models following the detailed outline by , and subsequent relaxation of gridded observations within the model time-stepping. In order to correct the modelled basin-scale patterns towards observations, the spatiotemporal distribution of in situ data was too irregular to use standard interpolation and filtering algorithms like the Cressman method or optimal interpolation with approximated covariance (see an example from the same region by Zujev and Elken, 2018). In this section, we summarize the well-known EOF decomposition and present general features of EOF reconstruction as a problem when the number of observations is less than the number of EOF modes (equal to the number of model grid cells). The basic option of EOF reconstruction uses at each DA time step time-fixed amplitudes , encountering the observations spanning over a certain time frame (which can be longer than DA time step) that are transferred to the fixed times by some interpolation or filtering/averaging procedure. The amplitudes are estimated using time-fixed observations by minimizing the root-mean-square-difference between the observations and the EOF reconstruction. The amplitudes at adjacent time moments are not directly related, but in the case of longer temporal filters when observations overlap on different DA time steps, indirect relations between adjacent amplitudes become evident.
Elken at al. (2019) also proposed an advanced method with time-dependent amplitudes. Within this approach, the amplitudes and their time derivatives are estimated together with observations within a selected time interval, in order to find least squares between the observations and EOF reconstruction in the observational framework.
The main steps of EOF reconstruction are as follows. During the standard EOF decomposition, the orthonormal eigenvector matrix E (contains the spatial eigenvectors e k ) is found from the eigenvalue problem BE = E, where B is M × M spatial covariance matrix, calculated from the M × N spatio-temporal matrix X of the "values of interest" by time averaging, and is a diagonal matrix that contains eigenvalues λ k . The data set X contains time slices x i that are spatial state vectors at time i. Although in the present study we use the data set X selection as 2D sub-sets of individual oceanographic fields, applications towards multivariate analysis and/or extending over the 3D physical domain are straightforward. While E is non-dimensional, the dimensional amplitudes (or in other words, factors) of EOF decomposition are found byã i = E T x i , and the decomposition is reconstructed to the "values of interest" by x i = Eã i . Here we have used the notationã i = a i , where a i is the nondimensional amplitude. The eigenvalues λ k present the variance (energy) of the eigenvectors e k over the whole period, and the sum of all eigenvalues is equal to σ 2 , the variance of X. EOF decomposition offers the possibility to keep only the most energetic modes in the reconstruction and truncate the higher modes in E. When L most energetic modes are taken into account in the sorted list of eigenvalues and vectors, the sum from λ 1 to λ L presents the explained variance, and the contribution of truncated modes forms the error variance. If white noise with a variance ε 2 is present in the decomposed data due to sub-grid-scale processes and/or sampling errors, the noise variance appears only as additive to the diagonal elements of the covariance matrix. The eigenvalue problem becomes B + ε 2 I E = E, where I is a unity matrix. Patterns of spatial modes remain unaffected by adding the white noise, but the eigenvalues and energy share of the modes decrease according to a factor 1 + ε 2 /σ 2 −1 . When the sum of eigenvalues of the included dominating modes is less than σ 2 − ε 2 , the contribution of noise is effectively smoothed. During EOF reconstruction from observations y i , the number of observations K is assumedly smaller than the number of points M in the spatial eigenvectors e k that are determined on the model grid and evaluated from the model statistics. For the comparison with observations, the model data x i are transformed to the observation points by the observation operator H i by the formula H ixi = H i Eâ i , wherê a i denotes the "observational" amplitudes. Further, theâ i values should follow least-square minimization of reconstruction error in relation to observations y i − H i Eâ i 96 M. Zujev et al.: A feasibility study in the NE Baltic Sea min. The expressions to find observational amplitudes and reconstructed fields arê In the reconstruction by Eq. (1), the critical point is a possibility of spurious amplitudes based on few and unfavourably spaced observation points. Experiments with pseudo-observations  revealed that the values ofâ i of dominating L modes should match the limits derived from statistics ofã i , whereas higher modes with outlying amplitudes should be neglected.
Most of the oceanographic observations are not made at the same time. It may take several days or even weeks to cover a larger sea area with shipborne monitoring. When P observations y p are taken at different times p, then construct an observation operatorĤ p that allows pointwise comparison of y p andĤ p x i converted from gridded values at specified time i. Assume that within the short time span the amplitudes depend linearly on time and introduceb p =â i +d i ·δt p , whereâ i is the time-fixed amplitude, d i is the rate of change vector, and δt p = t p − t i is the difference between the observation and reference times. The function to be minimized regarding reconstruction errors is , which for fixed time i yields a system of 2L linear equations obtained from ∂Q/∂â l = 0, ∂Q/∂d l = 0, l = 1. . .L: (2) Here the vector of unknowns combines the amplitudes and their rates of change z = â 1 . . .â L , d 1 . . .d L . Instead of the full set of EOF mode values, as would be used during standard decomposition, we take the modified/interpolated mode values at observation points; then f We note that when all observations have the same time stamp and δt p = 0, Eq. (2) is reduced to Eq. (1).
Time-dependent reconstruction allows the reference time and length of time interval to be selected. As with the timefixed reconstruction, the highest "usable" mode is determined by checking the amplitude values with statistical limits. The method also allows estimation of amplitudes and reconstruction only by backward observational data. This feature makes the method useful in operational forecasts, where only past observations can be taken into account for drawing the present nowcast maps.

Method for data assimilation
Many DA techniques use (irregular) point observations of a variable ψ as the input source. In our approach, gridded maps ψ o are used; they are optimized by EOF reconstruction as described in Sect. 2.3. Therefore, in the continuous equivalent, DA is performed by Newtonian relaxation (e.g. Holland and Malanotte-Rizzoli, 1989): a discrete form of which has been applied for DA, for example, using gridded climate data (Moore and Reason, 1993) or using optimally interpolated daily satellite-based SST data (Ravichandran et al., 2013). Equation (3) is then written for DA time step t in two stages as where ψ f is the raw forecast field calculated from the previous analysis field ψ a-1 using only the model operator F without DA during this time step, and ψ a is the new analysis field. Equation (3) contains adjustable relaxation time τ that is transformed in Eq. (4) to non-dimensional α = t/τ . This is the main DA calibration parameter, since extensive use of covariance statistics, including the effects of observation errors, has been included in the estimation of gridded reconstruction of point observations. Newtonian relaxation of gridded observations, applied during the model run at DA time steps is also named "analysis nudging" (e.g. Stauffer and Seaman, 1990), which has had recent meteorological applications (Bullock et al., 2018). In practical calculations, SST and SSS observational data were reconstructed on the coarser grid with a resolution of 5 × 10 arcmin by N and E (ca. 5 nautical miles) and interpolated or extrapolated by bilinear procedure to the finer model grid with a resolution of 0.5 × 1 arcmin by N and E (ca. 0.5 nautical miles). Such a simple transition of data from a coarse to a finer grid includes smoothing, since ψ o lacks the details that are present on the finer grid. We have tested that the effect of added smoothing is smaller than the physical diffusion. In our study area, generation of meso-and smallscale features is of high intensity; therefore relaxation to the smooth observation fields does not apparently damp the finegrid variability. The approach of using two grids with different resolutions is justified by irregular distribution of observations; reliable estimation is possible only for large-scale patterns of SST and SSS fields. The computationally more efficient coarser grid resolves these patterns with enough details.
The above DA method is computationally efficient. The EOF modes are calculated prior to DA cycles. For each DA time step, only one system of linear equations of rank of the number of EOF modes (about 3-6) has to be solved for the entire grid. The coefficients of the matrix are found by summation of the products of EOF mode values over the observation points (Eq. 2). For comparison, optimal interpolation requires solving the system of linear equations of rank of the number of observation points (about 100) for each grid cell (about 1000), with a single inverse matrix calculated for the time step.
The model performance with respect to observations was evaluated over those grid cells -time span pairs when observations were available. Since observations covered only a small part of the study domain, DA results were also compared with the control run without DA, but then it is possible to only analyse the changes due to DA, without evidence of possible improvement. Standard statistical characteristics were calculated for individual fields such as mean and standard deviation, and in the case of differences (for example, relative to observations) bias, RMSD (centred rootmean-square difference that equals to the standard deviation of difference field) and the Pearson correlation.

Covariance, modes and reconstruction tests
The EOF modes were calculated on the coarse grid (5 × 10 arcmin by N and E) on the basis of space-averaged results from the fine-grid (0.5 × 1 arcmin by N and E) model, running from 1 July 2010 to 30 June 2015 . This analysis revealed that mean distributions of modelled SST and SSS, serving as the basis for calculation of deviations in the variability studies, were close to the climatological maps calculated on the basis of observations (Janssen et al., 1999). The highest temporal variability was found in the shallow coastal areas for SST, whereas the largest SSS variations were revealed near the larger river mouths and in the NE area of the Gulf of Finland. While temporal changes strongly dominate in the variability of SST, spatial changes prevail in SSS variability.
Calculated SST and SSS covariance matrices have significant spreading of individual values over pairs of points, especially for the dominating gravest modes where big covariance values may occur over large distances. Covariance of residual fields (sum of higher EOF modes) has a decay scale of about 30 km with increasing space lag, both for SST and SSS. The first, most energetic EOF modes have nearly "flat" patterns without sign change (energy share 97.6 % for SST and 36.2 % for SSS); their amplitudes are dominated by a seasonal signal. A space-dependent mean biharmonic seasonal cycle was not removed from the model time series prior to the analysis, since special experiments revealed only a small effect of seasonality suppression on EOF mode patterns. The second EOF mode of SST (1.3 %) presents dif-ferential heating and cooling in shallow areas, compared to the deeper offshore waters. Transverse anomaly stripes near northern or southern coasts, like those due to coherent upwelling and downwelling in the region, were evident in the second SSS mode pattern (16.9 %) and third SST mode pattern (0.31 %). There is also a pattern of SSS changes in the freshwater spreading pathway near the northern coast of the Gulf of Finland (third SSS mode, 7.1 %) and longitudinal SST changes in the east-west direction (fourth SST mode, 0.14 %).
The data set used in the present DA study (Fig. 2) is rather irregular, compared to the reconstruction experiments by . Therefore, we revisit the covariance issues and perform additional reconstruction tests, before finding in the next subsection the best options for the automatic reconstruction procedure. Spatial interrelation of observed values at a specific point to the values in the rest of the region is found from the extract of the spatial covariance matrix, which can be shown as a map. One example of SSS covariance with a frequently sampled HELCOM monitoring station BMP F3 is shown in Fig. 3. The covariance of three dominating EOF modes (Fig. 3b) comprises most of the unfiltered data covariance (Fig. 3a) at large distances. High covariance locations have clear basin-scale geographical explanations: under the similar weather and seasonal forcing, which is spatially nearly uniform, SSS changes in distant river influence areas are closely interlinked. Correlation (not shown) may exceed 0.4 at distances greater than 500 km; therefore, assumptions of fast decay of correlation with space lag (like using the Gaussian covariance approximation), adopted in offshore areas with negligible coastal influence, are not valid. Covariance of residuals to the largescale variations are presented by higher EOF modes (Fig. 3c). Such smaller-scale variations have nearly Gaussian structure, with elliptical anisotropy stretched along the axis of the basins similar to the results by Høyer and She (2007): spatial scales in Fig. 3c are 30 and 15 km along the main axis and perpendicular to the axis, respectively. Similar regularitiesphysically explained high covariance at large distances, localized covariance patterns for the higher EOF modes -were found for other points of reference, both for SSS and SST fields.
The EOF reconstruction method relies on the full covariance matrix, without any approximation. Covariance is further treated using EOF modes. For the reconstruction procedure, we keep the lowest EOF modes without any approximation, and covariance from higher modes as shown in Fig. 3c is not taken into account. The large-scale features of the EOF reconstruction and associated DA exclude the possibility of creating spurious "bullseye" patterns around observation points, which may happen for instance during unfavourable selection of optimal interpolation parameters. Subsequently, our DA method handles the large-scale features and excludes the possibility to assimilate smaller-scale features, which can be described by the higher modes. A full covariance matrix can be implemented in optimal interpolation as well. While the EOF method needs to limit the number of included modes, smoothing in such a way smaller-scale variability and observational errors, optimal interpolation needs to include observational error variance ("nugget effect" in terms of Kriging method, equivalent to optimal interpolation); otherwise the system of underlying linear equations may become close to singular and the result may become unrealistically spiky. In some examples (not shown), EOF reconstruction and optimal interpolation based on full covariance produced similar results, but these relations need further studies. When observed values were close to the model-computed climatological background, visual similarity was caused mainly by the dominance of spatial gradients of mean SSS over the spatio-temporal variability. Optimal interpolation with Gaussian approximation to the covariance produced realistic results in the neighbourhood of observation points but gave unrealistic patterns and values in the distant SW extrapolation area.

Finding the parameters for reconstruction of gridded observation fields
Multiple checks performed on our data set suggested that only the three leading modes were included in the EOF reconstruction. In order to find the best options for reconstruction, experiments were made with different intervals (time window) t R around the reference time t i ; including the observations within time window from t i − t R /2 to t i + t R /2. The results were evaluated to fulfil the following goals: a small RMSD between the observed values and the reconstructed fields; a small number of gaps in the reconstructed time series; a low number or missing presence of "spikes" and/or "jumps" in the time series.
Two basic options for temporal handling of the reconstruction procedures were tested: a. application of procedure by Eq. (1) of time-fixed amplitudes; time average of observations was taken for each grid cell, time adopted in each grid cell as constant reference time; b. full application of the procedure by Eq.
(2) of timedependent amplitudes; all the daily mean observations (average was taken also over coordinates and time) were kept separate for each coarse grid cell where the observations existed.
In addition, the procedure by Eq.
(2) was tested with an option with a time average of observations in each grid cell, and with selection of observations closest to the reference time. These experiments provided more spikes and 70 % higher RMSD than the basic options (a) and (b) and they were neglected from further consideration. As a first step in all the experiments with variable time windows, the EOF amplitudes of the mode k were checked for the limit â i,k < 2 √ λ k = 2σ (ã k ) , where σ denotes standard deviation. DA data for the days with higher amplitudes were left blank since these reconstruction results most frequently became unrealistic. In addition, when the number of observations was less than six, reconstruction was not performed and the DA step using Eq. (4) was skipped.
The time windows t R for experiments (a) and (b) were selected to be 10, 20 and 30 d.  have found that the correlation timescales (e-folding drop, correlation value 0.368) of EOF SST amplitudes were 65 d for the seasonal first (overall heating/cooling) and second (faster heating/cooling in shallow coastal areas) modes, and 15 d for the third "upwelling" mode. Timescales of the SSS modes were 65 d for the second and third modes, representing the largescale gradients, and 110 d for the first mode describing longterm variations of mean salinity.
Methods of time-fixed (a) and time-dependent (b) reconstructions revealed similar statistical results during the study period in 2015, whereas RMSD between observed and reconstructed values of (a) was by 5 % larger than that of (b). By increasing the time window, RMSD of reconstruction slightly increases due to the stronger smoothing. The smoothing effect can be seen from the reconstruction examples given in Fig. 4. It should be noted that the reconstruction is designed to yield the best approximation to the observations over the entire region; therefore, it does not need to present the local best fit at individual points.
A network of observations, available during the study period, appeared favourable for the reconstruction, although observations were missing in the southern part of the Gulf of Riga and eastern part of the Gulf of Finland. With a time window of 30 d, there were no reconstruction gaps identified during the study period, determined for both of the methods by the above-described amplitude limit criteria. Smaller time windows yielded some gaps in 2015. During the longer period from 2010-2018, gaps were found in most of the years (except our study period), whereas shorter time windows result in more reconstruction gaps. Detailed comparison of the time-fixed (a) and time-dependent (b) methods revealed that time-fixed reconstruction might create spurious "jumps" when there is a gap in observations which has a length close to the time window. In that case, a backward average is taken before the gap and forward average after the gap, which may result in "jumpy" results. Time-dependent reconstruction, which also accounts for the temporal changes within the time window, handled such situations more smoothly.

Data assimilation experiments
We have used a two-scale DA approach (see detailed explanation in Sect. 2.4), where observations were reconstructed on the coarse grid. Results were interpolated into the fine grid of the model and were subsequently used for relaxing the fine-scale model results towards basin-scale observational patterns. More specifically, gridded observational SST and SSS data were pre-calculated each day using the timedependent EOF reconstruction method with a time window t R = 30 d as presented in Sect. 3.1. Reconstructed SST and SSS fields were interpolated bilinearly to the fine 0.5 nautical mile grid and used for relaxing the model results towards observational counterparts, based on Eqs. (3)-(4) with t = 1 d. Two basic experiments were conducted, with relaxation time 10 d (weight of observations 0.1, experiment code DA01) and with a relaxation time of 5 d (weight 0.2, experiment code DA02). In addition, a variety of short-term trials was performed in a preparatory phase (results not graphically presented) which led to the two basic experiments. Comparison data were coded as FR for the control run without DA, and FB for observed FerryBox data.

Example from the beginning of August
There was an interesting oceanographic situation in the beginning of August, when a moderate but extensive upwelling SST pattern at the northern coasts of the basins (Fig. 5), with some effects on SSS (Fig. 6), was combined with fast heating of the thin (6-9 m) surface layer (Fig. 7). Since the middle of July, moderate winds with speeds from 4 to 6 m s −1 , which had a westerly zonal component (favouring upwelling at the northern coasts of the basins), were blowing above the Gulf of Finland. After 3 August 2015 (the maps in Figs. 5 and 6 are taken on this date), wind ceased and air temperatures increased by 10 August across the study area up to 25-27 • C in the Gulf of Finland and up to 31 • C in the southern Gulf of Riga, creating a thin layer of warm surface water. Heating of surface waters was favoured by high night-time air temperatures, higher than SST. Vertical profiles (not shown) in the Gulf of Finland revealed a deep thermocline at 40 m depth near the southern (downwelling) coast and a shallower thermocline near the northern coast; the warm surface water column near Tallinn was 2 to 3 times thicker than near Helsinki. From the end of July to 10 August, warming resulted in an increase in SST (Fig. 7) near Tallinn from 16.5 to 18.5 • C and near Helsinki from 14.5 to 18 • C.
The SST maps presented in Fig. 5 include control run, reconstructed in situ observations, one experiment with DA (the other experiment yielded similar results) and satellite observations. When warm waters with SST above 17 • C dominated the study area, all the maps revealed moderate upwelling near the northern coasts of the basins. However, the minimum temperatures and the spatial extent of the colder waters were different. The warmest "cold" waters were observed on satellite images. While satellites measure SST of a thin surface layer, FerryBox and models acquire temperature over a much thicker layer. It is known that in the Gulf of Finland satellite and FerryBox can have similar SST values in cases of winds stronger than 5 m s −1 (Uiboupin and Laanemets, 2015); at smaller wind speeds the SST bias can be 1-3 • C in reference to FerryBox observations. Within these accuracy limitations, satellite observations presented in Fig. 5d confirm the model patterns to some extent. The control run (Fig. 5a) was characterized by SST contrasts that are too high, compared to the satellite data ( Fig. 5d; for the data source see Sect. 2.2). From the earlier study by Zujev and Elken (2018), it is known that the free model without DA forecasts faster heating and cooling of shallow coastal areas and slower heat dynamics in offshore areas. Data assimilation (Fig. 5c), made using the reconstructed FerryBox data (Fig. 5b), reduced discrepancies with satellite observation. The major large-scale differences between the satellite data (Fig. 5d) and the best DA02 (Fig. 5c) can be outlined as follows: (1) the colder upwelling water extended on the satellite image further to the east, (2) warmer waters were found on the satellite images in the southern Gulf of Riga, near the Daugava river and in the shallow areas between the Estonian islands, and (3) in the Gulf of Riga, a strip of colder waters was modelled along the western coast, while satellite observations revealed warmer waters near this coast.
There were also numerous mesoscale features evident on SST (Fig. 5) and SSS (Fig. 6) maps, like colder upwelling filaments along the northern coasts of the Gulf of Finland and the Gulf of Riga, and decaying anticyclonic warm-core eddies near the southern coast of the Gulf of Finland. The Irbe Front (Lilover et al., 1998;Raudsepp and Elken, 1999), formed by the salinity difference between the Gulf of Riga and the Baltic Proper, was found by the SSS maps in the outward position, stretching from the strait towards the open sea. This salinity structure was also repeated in the SST patterns; the satellite observations confirmed the predicted outward position during the taken snapshot. The model predicted that in the Gulf of Riga the Daugava river waters were spreading by narrow coastal strips of lower salinity in both the NE and NW directions (Fig. 6).

Time series in the areas of dense observations
Locations with dense observations allow us to validate the model and visually evaluate assimilation quality. We compared SST and SSS data of the control run (FR) and DA options DA01 and DA02 with FerryBox data (FB) at two points near Tallinn and Helsinki (Fig. 7). While SST followed the seasonal cycle, with weather-dependent deviations, then SSS behaviour was more irregular. In the given variation scales of SST and SSS (16 • C and 2 g kg −1 respectively), all the compared SST data sources were more similar to each other than that of SSS. Still, most of the time the assimilation curve (DA02) was closer to the FerryBox observations than the control run, for both SST and SSS.
Warm conditions in the beginning of August (Sect. 3.2.1) are clearly visible on SST time series (Fig. 7a, c). Comparing the values near Tallinn and Helsinki, the southern part of the Gulf of Finland was roughly 2 • C warmer than the northern part, whereas the northern part had an unstable day-to-day pattern, possibly due to the fluctuations of the upwelling pattern. This is consistent with the spatial maps given in Fig. 5. Near the southern coast, an upwelling event occurred in September, reducing SST during a few days by nearly 4 • C (Fig. 7a). A larger SST drop during the southern coast upwelling (at easterly winds), compared to the northern coast upwelling (at westerly winds of the same magnitude), is explained by the steeper topography slopes in the southern part of the Gulf of Finland ). This upwelling event was properly resolved by all the data sets, with DA02 being closest to observations. In general, a free model without DA expected warming at a lower rate during summer and was more precise in autumn, while both assimilation experiments properly corrected the SST and SSS values. However, in some cases, assimilated temperature was somewhat higher than observed and modelled SST.
Assimilation resulted in one major SSS improvement in early summer when the model predicted upwelling with salinity near Helsinki that is too high. Nevertheless, in some cases DA made minor corrections at one of the locations, ignoring observations and sticking to the control run (e.g. late July to early August near Tallinn, and the middle of October near Helsinki). When the model overshoots at both locations, DA properly corrects temperature and salinity values. This implies that DA of surface observations tends to correct the mean values better than the cross-gulf gradients, for which 3D circulation (presently not assimilated) has a significant impact.
In the salinity time series, a "freshwater event" with reduced salinity was observed in the Gulf of Finland at the end of September and beginning of October. In the daily SSS data (Fig. 7b, d) the event was spiky, possibly due to the mesoscale features not assimilated in the present study: without DA, the eddies tend to have a random phase, and the spikes in the time series of different model options and observations do not need to be coherent. However, in the weekly averaged data (not shown) the mesoscale activity was suppressed and the fresh event appeared simultaneously in all the data within the central and western part of the Gulf of Finland.
Increasing assimilation weight in Eq. (4) two times did not make assimilation results two times closer to the observations. As can be seen from Fig. 7, the results of assimilation experiments DA01 and DA02, with relaxation times of 10 and 5 d respectively, were not placed between the free run and the observations proportionally to the corresponding weights 0.1 and 0.2. They diverged as the study region experienced a temperature drop or daily trend change. Both options of assimilated SST could either coincide for a long time or go in parallel, but DA02 was systematically closer to the FerryBox observations. Salinity fluctuations had larger amplitudes in the free run without assimilation, but both DA options, with a "thumb" rule -the bigger the weight, the bigger the change -had properly corrected them. Still, in December DA01 showed better results, being closer to the FerryBox salinity than assimilation DA02.

Spatio-temporal dynamics
We have chosen to compare assimilation with the best results (DA02) to the control run without data assimilation (FR) and track the continuous time-latitude changes of SST and SSS (Fig. 8) in two sub-basins -the Gulf of Finland and Gulf of Riga -along the coast-to-coast transects given in Fig. 2a. Using DA, temperature was corrected approximately by 1-2 • C, and salinity by less than 1 g kg −1 . Major systematic change (in the Gulf of Finland this was validated as improvement; see Sect. 3.2.4 for further details) was seen near the coasts and in the spring and autumn periods, while summer temperatures underwent minor corrections. Salinity corrections had a more uniform distribution and smooth drifting pattern -DA consistently increased SSS values with time in both of the sub-basins.
Data assimilation had increased SST in the Gulf of Finland in open waters during the warming period and in late autumn all across the gulf and had decreased in the coastal areas during the warming period, whereas near the northern coast this decrease continued until September. In the Gulf of Riga, the SST increase dominated throughout the study period, but it was interrupted occasionally by basin-wide events when DA had decreased the temperature compared to the results from FR. The largest corrections of both SST and SSS were evident in the coastal waters. Salinity was increased by DA in most of the cases in the Gulf of Finland, except for May-July near Tallinn. The largest increase in SSS occurred in November and December, when control run results dropped compared to the earlier period.
Some unusual basin-wide events can be found on the difference charts in Fig. 9. For example, abrupt warming of the surface around 10 August 2015 (Sect. 3.2.1) was correctly predicted by the free run model (Fig. 7c), but it was oversmoothed by the data assimilation. A similar line in December on both charts denotes the occurrence of fronts of cold and saline water due to strong winds and storms.
As there are not enough observations available in the Gulf of Riga for validation, we cannot definitely say whether DA improved the situation in the region and to what extent.

Evaluation of DA-based forecast performance
Ocean model performance (e.g. Stow et al., 2009;Golbeck et al., 2015;Placke et al., 2018) is usually evaluated by the differences between the observations and the model results, transferred to the times and locations of observations so that they can be directly compared. The overall mean difference (over time and space) is termed bias and the standard deviation of differences at all the observation points is denoted as RMSD (centred root-mean-square difference). The forecast skill is usually non-dimensional, with the RMSD of the studied option (in our case, DA) scaled to reference data (FR in our case) as skill = function of [RMSD(DA,FB) / RMSD(FR,FB)]. The present ocean model has a fine resolution of about 0.5 nautical miles (930 m) (Sect. 2.1); therefore for comparison with observations we used a simplified approach and took av- Data from the DA experiments DA01 and DA02 were compared to the same compressed observational FB data as the data from the control run without assimilation (FR). Hernandez et al. (2015), who reviewed the problems of performance evaluations of operational ocean models, noted that most available observations are used to adjust models and reduce analysis errors. Therefore, a widespread approach is withholding part of the data set for statistical quantification of errors. In our study, the option of withholding the observations was performed: an evaluation was made of how much the DA result will change if DA is performed using 50 % of the available data (Gregg et al., 2009). The present implementation of EOF DA used about 13 000 observational averages over a coarse grid of about 5 nautical miles. The reconstruction procedure by Eqs. (1)-(2) has no direct connection to the ongoing modelling (although it includes sta-tistical results from longer model runs), and the fields of ψ o in Eqs. (3)-(4) are the only link where observations enter the DA process. The experiments which took every second available observation "box" into account (this resulted in a mean sampling interval along ship tracks about 20 km instead of 10 km) revealed that performing DA during the study period with a reduced data set (6.5 000 averaged observation data instead of 13 000) changed RMSD of SST by only 1 % and of SSS by 2 %, whereas the RMSD values were 0.05 • C for SST and 0.027 g kg −1 for SSS. An evaluation was made over the full time span and domain using 182 000 coarse grid cells; correlation between the data sets was higher than 0.999. We have also checked reconstruction results with FerryBox data only, excluding the data from shipborne monitoring stations. Compared with the full data set, the largest (but still minor) differences with RMSD of SSS up to 0.03 g kg −1 were found in the Gulf of Riga and the eastern Gulf of Finland, where FB data were missing. Consequently, for our large-scale approach DA results are robust to the reasonable variation of data amount, and we used FB data for reference in the performance evaluations.
Evaluated forecast performance metrics are presented in Table 2. Only those fine-grid points which had respective value of FerryBox observations on the same day were used for metrics calculation. Wet points of the model without corresponding observation value were left out from the procedure.
The statistical properties presented in Table 2 reflect that DA improves the model performance significantly: RMSD of SST was reduced by 22 % and SSS by 34 %, compared to the control run. From DA01 to DA02, a slight improvement of DA performance was observed; therefore we adopted DA02 as the major result. The spatial pattern of RMSD change between the DA and FR (Fig. 9) reveals that larger reduction rates (up to 50 %), for both SST and SSS, were found in the observation-covered areas in the Gulf of Finland. Overly cold waters produced by FR near the northern coast of the Gulf of Finland were effectively corrected by DA (see also Fig. 5); therefore highest improvement percentage scores were detected in this region. Near the western open boundary, nonassimilated SST and SSS values of the larger model were advected into the area, and therefore RMSD reduction was small, or even negative for SSS.
The applied EOF DA method does not assimilate mesoscale variability. Applying the weekly average statistics like Zujev and Elken (2018) further reduced RMSD by 13 % for SST and 9 % for SSS, compared to the daily data in Table 2. Weekly statistics suppresses the mesoscale variability and reveals a better match between the DA and the observations. DA decreased the bias, especially for SSS. At the same time, the correlation of SSS between DA and observations increased considerably. We may conclude that DA made major improvements in the forecasting of SSS. Still, the forecast RMSD in reference to the observations is 62 % of the observed standard deviations, which suggests that there may be further room for improvement. Modelling of SST is already more accurate than SSS without DA: RMSD of the control run (FR) makes 18 % of the standard deviation of observations for SST and 94 % for SSS.

Discussion
The Baltic Sea is considered as one of the most studied marine areas in the world (e.g. Andersen et al., 2017). However, the large observational data sets are distributed unevenly. If we divide our study area into 744 eddy-averaging grid cells of 5 × 10 arcmin by N and E, then during the study period 330 000 FerryBox observations covered only 18 % of the sea region. Shipborne monitoring added 8 % more coverage of the area, but with a much smaller frequency of sampling. Having in mind that the ocean models tend to deviate in the NE Baltic from the observations not only by constant bias but also for large-scale and longer-term responses, the introduction of non-local, region-wide data assimilation is of high importance.
It is interesting to consider how our statistical evaluations of model and DA performance, given in Table 2, compare with other Baltic Sea studies. For remote sensing versus in situ reference, Kozlov et al. (2014) have found RMSD 1.31 • C in the Curonian Lagoon. Uiboupin and Laanemets (2015) have estimated RMSD of various satellite products to FerryBox in the Gulf of Finland from 0.29 to 0.98 • C. Our control run gave RMSD of 0.72 • C. Golbeck et al. (2015) compared SST from 13 models with satellite data and found yearly RMSD for SST of 0.65-0.87 • C in the Baltic Sea. They found a larger relative spread of SSS ensemble members than of SST: deviations in the Gulf of Finland between the models were up to nearly 1 g kg −1 , while the average SSS is only about 4 g kg −1 . Unfortunately, there were not enough validating observations for SSS available. Fu et al. (2011) found even larger RMSD for SST for the control run, 1.0 • C, based on satellite observations. They also used DA with ensemble optimal interpolation and found that DA reduced RMSD between the forecasts and observations by 25 % for SST and 34 % for SSS. With our simpler and less computationally demanding EOF DA technique, similar RMSD reductions have been obtained (Sect. 3.2.4) compared to earlier studies.
We have developed and tested an EOF-based relaxation technique where the large-scale observed fields to be assimilated are pre-calculated independently from the ongoing model. From sparse observations, it is possible to estimate the amplitudes of only the gravest, large-scale EOF modes. The EOF DA method handles large-scale features over the sea basin(s), like change of mean SST, SSS and their gradients, including differential heating in coastal and offshore areas, major patterns from upwelling, and spreading of river discharge. The method can work well with irregular data but cannot resolve mesoscale features in the areas of dense observations, because the EOF amplitudes of higher modes get noisy, according to our experiments. Optimal interpolation, Figure 9. Improvement of RMSD of DA compared to that of FR, both taken in reference to 110 000 FerryBox observations. Comparison is made for 20 × 20 grid cells (about 10 nautical miles) for SST (a) and SSS (b) over the whole study period. Legend codes: few points -less than 100 observations in a box, small values -absolute percentage change less than 10 %, negative -DA RMSD growth more than 10 %, positive -DA improvement (RMSD reduction) from 10 % to 30 %, large positive -improvement more than 30 %. successive corrections and similar methods usually assume localized covariance and/or radius of influence (e.g. Axell and Liu, 2016); they work well in resolving mesoscale features in dense sampling areas, but regions of rare observations remain unaffected by DA. For the mesoscale range, in our study area there are only satellite observations of surface variables available. They were omitted from our study, since salinity as a variable of primary interest can be presently only be determined in situ in the Baltic. It is possible to implement on top of EOF DA more traditional localized DA methods to assimilate mesoscale data when and where such data are available. Studies on using EOF DA for handling large-scale data are also ongoing in the UK Met Office by Daniel Lea (Haines, 2018).
We have tested the EOF-based DA in a centred time window of 30 d, based mainly on available FerryBox data during the study period. As shown by reconstruction experiments by , the time-dependent method can also work with backward observations as if it occurs during operational forecasts. When more observations become available, for example from new automated buoy stations, Argo floats and gliders, the time window can be shortened. A full covariance matrix estimated from the model results is the backbone of the EOF DA method. Prior and/or complementary to implementation of the method into operational practice, detailed covariance studies using results from multiple models could be useful, as well as additional reconstruction and DA studies using more data sources over longer periods.
The EOF DA method has some practical advantages. Firstly, for assimilation of basin-scale patterns, it can be implemented on a coarse grid, and therefore it has small computational effort compared to the localized methods (like optimal interpolation etc.) that should be usually implemented on the model resolution, i.e. on the fine grid. Secondly, inter-mediate results are in the form of maps that are easily understandable and can be checked visually or taught to be analysed by artificial intelligence. For optimizing the observational data needs, the concept of OSEs (observing system experiments), which check various data configurations for DA performance, is high on the agenda. Since the quality of DA and forecasting are primarily determined by the quality of EOF reconstruction (when extensive mesoscale observations are not available), then it would be possible to save a significant amount of computing power and perform most of the experiments using EOF basis vectors.
There are obvious possible extensions of the EOF DA method to other variables and layers: improvement of stratification modelling; extension to biogeochemical models; and DA of oxygen, nitrogen and phosphorus. Applicability depends on how well the model reproduces the studied fields and their covariance as well as how much variance is explained by the major EOF modes. There are a number of questions that may be addressed, such as the following: What is the minimal amount of observations needed to produce decent results? What areas are reconstructed with higher accuracy with given observation design, nearshore, offshore, open basins? What areas are most problematic to reconstruct, complicated coastline, straits and channels, semi-enclosed basins, regions of river influence? Are there some specific locations that can be used as a proxy for larger regions? Is it possible to measure SST/SSS just at these points in order to give enough input for successful reconstruction?

Conclusions
The present study was aimed to implement EOF-based statistical reconstruction technique into the data assimilation of the forecast model, and to study the feasibility of such assimilation method. Gridded EOF modes were determined from the 5 yr long model results. "Observational" EOF amplitudes were found each day to minimize the RMSD between the reconstructed and observed values at the observation points, using a time-dependent technique where both the amplitudes and their time rate of change were searched for the best fit. In this procedure, a time window of 30 d was selected that ensured acceptable SST and SSS reconstruction patterns by three leading EOF modes throughout the whole study period from 1 May to 31 December 2015. The study used about 330 000 FerryBox observations along four ship tracks from 1 May to 31 December 2015, and 370 observations from research vessels. Statistically gridded observations were assimilated into the model daily by the relaxation techniques, using restoring times of 5 and 10 d.
The tested EOF-based data assimilation (DA) method decreased RMSD of surface temperature (SST) and salinity (SSS) in the NE Baltic Sea by 22 % and 34 %, respectively, compared to the control run without DA. Using the observation-estimated amplitudes of the pre-calculated gravest model-based EOF modes, the method is able to follow on the regular grid the pointwise observed temporal changes of the mean state and of the major basin-scale gradients. DA with EOF reconstruction technique was found to be feasible for further implementation studies, since (1) the method that works on the large-scale patterns (mesoscale features are neglected by taking only the leading EOF modes) improves the high-resolution model performance by comparable or even better degree than in the other published studies, and (2) the method is computationally effective.