Assessment of the three dimensional temperature and salinity observational networks in the Baltic Sea and North Sea

The spatial averaged correlations are presented in 1.5 × 1.5 bins for the North and Baltic Sea region. The averaged correlations are computed based on the proxy ocean data generated by the operational forecast model of Danish Meteorology Institute (DMI). It is shown that the spatial distribution of the averaged correlations could reflect the overall influence of the local atmospheric forcing, complex topography, coastlines, boundary and bottom effect, etc. Comparisons with the satellite SST data demonstrate that the proxy ocean data reproduce realistic results at the surface. Based on the spatial bin-averaged correlations, a general correlation model is assumed to approximate the spatial and temporal correlation structure. Parameters of the correlation model are obtained on the standard Levitus levels. It is found that the correlation model is not the typical Guaussian-type function. For instance, the exponents of the correlation model vary in the longitudinal direction from 0.75 at the surface to 1.33 at the depth of 250 m for temperature. For salinity, the temporal correlation can be approximated with an exponential function. Two complementary quality-indicators, effective coverage rate and “explained” variance, are defined based on the correlation models obtained above. The two indicators are able to identify the “influence area” of the information content in a given observation network and the relative importance of observations at different locations. By these indicators, the 3-D temperature and salinity observational networks are assessed in the Baltic Sea and North Sea for the period 2004–2006. It is found that the surface level is more effectively covered than the deep waters with existing networks. In addition, the Belt Sea and the Baltic Proper also show good coverage for both temperature and salinity. However, more observations are required in the Norwegian Trench and Kattegat. In the Correspondence to: W. Fu (wfu@dmi.dk) vertical, the two indicators show smaller values from 50 m to 125 m in this region, indicating the need for more observations.


Introduction
The ocean observational networks provide us with the most reliable knowledge about realistic ocean states.Such information also plays an essential role in the ocean forecast system.However, the quality of the observation is largely affected by two components: the data quality assurance and the sampling scheme.In particular, the total information content depends mainly on the sampling of an observational network.A suitably designed observational network will provide insights into the specific oceanic phenomena while an ill-designed network will not be cost-effective.Over the past decade, available oceanic observations have increased enormously from different instruments such as the XBT, CTD, Satellite, Argo, moored station.It should be noted that the design of an observational network requires some existing knowledge, which in turn relies on the observations (maybe from another network).Therefore, it is a two-way and reciprocal problem.In order to get a better design, it is clear that objective assessments of existing observational networks are critically necessary.As a result, the observational network assessment has become a hot topic over decades (McIntosh, 1987;Barth and Wunsch, 1990;She, 1996;Kelly, 1997;Kuo et al., 1998;Hackert et al., 1998;Bishop et al., 2001;Hirschi et al., 2003;Schiller et al., 2004;Oke and Schiller, 2007).
The Baltic Sea and North Sea are two adjacent marginal seas connected through the Danish transition water.The water passage in the Danish Transition water is largely hampered by the shallow sills and channels as well as the hydrodynamic constraints such as fronts and mixings.The processes of the water flow are not completely understood though some investigations have been conducted in the Published by Copernicus Publications on behalf of the European Geosciences Union.context of some observing experiments like the BALTEX (the Baltic Sea Experiment) (Raschke et al., 2001) and the Baltic Sea Patchiness Experiment, PEX (ICES, 1989(ICES, , 1992(ICES, , 1994)).In recent years, there are also some on-going operational modeling activities (Buch et al., 2006) that benefit from these observational networks.However, most of the observational networks, as they are, are based on ad hoc designs and deployed for national interests in this region.It can be expected that this is probably not sufficient and efficient to support the operational ocean forecasting system in the Baltic and North Sea.Meanwhile, the lack of observations is still a heavy hindrance for the progress of operational forecasting systems at present.During several EU projects like the ODON (Optimal Design of Observational Networks), PAPA (Programme for a Baltic network to assess and upgrade an oPerational observing and forecAsting system), meta data and historical temperature and salinity data were collected for the period from 2004 to 2006.In this period, relatively complete observations are provided in the Baltic Sea and North Sea.These data also make it practically feasible to perform an objective assessment in the 3dimenstional space.
The assessment of observational networks serves as a benchmark for further design and can be addressed in different ways.In practice, the statistical and dynamical methods are very popular.With regard to the dynamic methods, Observing System Experiment (OSE) and Observing System Simulation Experiments (OSSEs) have been widely used in the assessment and design of ocean observing systems (McIntosh, 1987;Hackert et al., 1998;Hirschi et al., 2003;Oke and Schiler, 2007;Sakov and Oke, 2008).The advantage of OSEs/OSSEs lies in that model dynamics are used in reconstructing the ocean state together with the observations, which reduces the data requirements from the observational networks.However, two disadvantages exist with the OSEs/OSSEs: firstly, the experiments with complex models are very time consuming and computationally costly; secondly, modeling and assimilation methods may have large impacts on the OSEs/OSSES, which means that one may get different assessment results by using different combination of models and assimilation schemes for a given observational network.For instance, Fu et al. (2009) assimilated the sea level data with both 3DVAR and EnOI in a tropical Pacific model.Both schemes lead to reduced root mean square errors (RMSE), but their effects exhibit clear difference in some areas.Though the assimilated data is the same, the resulting improvements and their spatial distribution have discrepancies due to the different configurations of the two methods.
In addition to the OSEs/OSSES, some ensemble-based methods are also used in the last few years to assess and design observational network (e.g., Bishop et al., 2001).These methods are based on ensemble square root filter theory (Tippett et al., 2003).One advantage is its ability in handling large systems when explicit manipulation of the background error covariance is not possible.Optimal observational network design with the ensemble-based method consider the problem of "targeted observations" or adaptive sampling, aimed at improving the model's forecasts at a given time (Langland, 2005;Kharne and Anderson, 2006;Le Hénaff et al., 2009).However, these methods are also subject to the problems in computational cost.At the same time, the generation of the ensemble poses another formidable problem.Apart from the above methods, a number of quantitative methods with different indicators have also been used to do the assessment of the observational networks, e.g., effective coverage (She, 1996), noise-signal ratio (Smith and Meyers, 1996), sampling error (She and Nakamoto, 1996).These methods are computationally efficient and the results are independent of the model and data assimilation schemes.The disadvantage of these methods is that the role of the physical model's constraints is excluded.Two indicators, the effective coverage and the explained variance are used in this study for the assessments.The effective coverage (She et al., 2007) can identify the gaps and effectively covered area by a given observational network and provide a clear image of its performance.On the other side, the "explained variance" helps to detect the relative importance of the observational networks by their ability in reconstructing the time series for a given position.Moreover, this method is computationally efficient compared with the OSE/OSSEs.The statistics are obtained from the daily output generated by a regional operational forecast model, i.e., the data are regarded as the "proxy ocean data".The rationale is threefold: first of all, it has spatial and temporal coverage that can not be reached in observations at present; secondly, it may be the only feasible way due to lack of observation particularly at deep layers; finally, the continuous developments of the complex models allow a better representation of model physics and variability.
This paper is outlined as follows.We describe the relevant data used in this study in Sect. 2. The involved data include the meta data to be assessed, the proxy ocean data and the satellite data for comparison.We present the characteristic scale analysis generated from the proxy ocean data in Sect.3. The definition of the effective coverage is given in Sect. 4 together with the assessment of the existing observational network.In Sect.5, results obtained by the explained variance are given in a similar way.Finally, the concluding remarks are presented in Sect.6.

Data
In this section, the different kinds of data are briefly described.The meta data is used to represent the existing observational temperature and salinity networks, which include the sampling location, frequency and the platform information.We assume that these observations are all that can be obtained during the given period.Meanwhile, the proxy ocean data is regarded as the best surrogate of the "real ocean Ocean Sci., 7, [75][76][77][78][79][80][81][82][83][84][85][86][87][88][89][90]2011 www.ocean-sci.net/7/75/2011/state" because they are produced by combining the state-ofthe-art model and some observations.The characteristic parameters, which are utilized during the assessment, are based on the proxy ocean data.Besides, we also use some satellites data to verify the results of the proxy ocean data.

Meta data
Meta data and historical data of temperature and salinity observational networks are collected in the Baltic and North Sea for a given 3 year period (2004)(2005)(2006).The observations are from different platforms, including CTD, VOS, XBT, moored array, ARGO float, glider and observing station.In the Baltic Sea and North Sea, Danish Meteorological Institute (DMI) had established a meta dataset during the EU ODON and PAPA project, which will be used in this study.Main groups of meta data are shown in Fig. 1 for temperature and salinity.Most of the observations are from buoys and research vessels.A large amount of CTD stations are found in the Baltic Sea and the Danish transition waters.It should be noted that the meta dataset is not complete in a wide area of the North Sea.This factor should be taken into account in the assessment results.In general, the numbers of sampling locations for CTD, moored buoy and station are 808, 72 and 114, respectively.The Baltic Sea and the North Sea are treated as a whole but our focus is placed on the Baltic Sea where the meta dataset is relatively complete.

Proxy ocean data
The proxy ocean data in the Baltic Sea and North Sea cover the area 48  (Larsen et al., 2007).The proxy data was then extracted from a hindcast run for the three years period 2004-2006 and transformed into the standard Levitus level in the vertical direction.There are 14 vertical levels at the standard depths at 0, 10,20,30,50,75,100,125,150,200,250,300,400 and 500 m.The spline interpolation is used in the transformation from the model levels to Levitus levels.DMI BSHcmod is a hydrostatic, free surface model and two-way dynamically nested model, which was originally developed as BSHcmod at Bundesamt fur Seeschifffahrt und Hydrographie (BSH) (Kleine, 1994).The BSHcmod has been running operationally at BSH since 1994 (Dick et al., 2001)    barotropic surge model provides surge boundary conditions to a 3-D North Sea-Baltic Sea model which is dynamically two-way nested (Berg, 2003;Barth et al., 2005) to a 3-D model (on a regular 1 by 1.6 grid) for the narrow transition waters in the Danish straits, covering Kattegat from Skagen at North and Arkona Basin to the island Bornholm at East.An extended classical k-omega turbulence model (Wilcox, 1988) for buoyancy affected geophysical flows (Umlauf et al., 2003) are used but with a new set of coefficients developed at DMI to obtain consistency.Different algebraic stability functions are applied to the vertical diffusivities of momentum, heat and salt (Canuto et al., 2002).To account for short wave radiation into the subsurface layers, a proper parameterization of penetrating insolation suited for the Baltic Sea area was implemented (Meier, 2001).At the surface the model is forced by hourly meteorological forcing (10 m winds, 2 m air temperature, mean sea level pressure, relative humidity and cloud cover) based on DMI's operational numerical weather prediction (NWP) model DMI-HIRLAM.Along the North Sea and the English Channel boundaries, the model is forced by tides from satellite altimetry observations and surges from the North-east Atlantic barotropic surge model.At the open lateral boundaries the climatological temperature and salinity fields are used for a sponge layer.

Satellite SST data
The satellite SST data used in this study are merged product with observations obtained from up to 10 different satellites such as AVHRR, NOAA, Modis AMSR-E and so on.Only nighttime SST observations are used for the interpolation because these are more representative of the temperature in the upper meters of the water column.Gaps in the observations due to clouds are filled using a 3-dimensional Optimal Interpolation technique (Høyer and She, 2007).The interpolation scheme uses statistics, which are derived locally and provides the "best possible" estimate of the SST observations, assuming steady state statistics.The mean error of the gridded SSTs is about 0.5-0.7 • C. The SST data covers the domain from 5 • W to 30 • E and from 47 • N to 67 • N. The spatial and temporal resolution is about 3.3 km × 3.3 km × 1 day.

Proxy ocean analysis
In this section, the statistical analysis is performed.The importance of removing the annual and semi-annual cycles from the temperature is demonstrated.The spatial correlations averaged in small bins are also obtained across the model domain.The features revealed by the bin-averaged correlations are described.After that, the correlation models are determined on standard Levitus levels and some parameters are derived for the use in the assessment.

Annual and semiannual harmonics
In the Baltic Sea and North Sea, there is spatial and temporal variability in a wide spectral range, from seasonal to decadal.Investigations on the sea level (Plag and Tsimplis, 1999;Chen and Omstedt, 2005) show that strong semiannual and annual cycles exist due to the zonal wind and vorticity.In addition, changes in the wind and sea level leads to changes in the current.For example, the Norwegian Coastal Current has a pronounced annual cycle with a substantial increase during summer.For the 3-years proxy ocean data, we also find that there exist strong signals of annual and semiannual cycles for the temperature field.Figure 2 presents the standard deviations of SST from the original data, the data with the annual and semi-annual harmonics removed and the ratio of the total variance accounted for by these harmonics.For the original data, the temporal variations are very strong in both the Baltic Sea and the North Sea.The Baltic Sea is characterized by large annual and semiannual cycles and the maximum standard deviation reaches up to 8 • C. In the north part of the North Sea, the standard deviation is relatively small, but still large than 3 • C.After the removal of the annual and semi-annual cycles, however, the amplitudes are reduced to about 1-2 • C in many parts of the Baltic Sea and North Sea.The ratio shows that the annual and semi-annual signals account for about 3/4 of the total variance for SST field.This clearly demonstrates that these harmonics play a dominant role in this region.Moreover, the spatial and temporal scales of the annual and semi-annual harmonics are on the order of hundreds of kilometers and several months, and these signals can be resolved by the ocean model and the existing observational networks.Consequently, these harmonics must be removed to ensure robust statistical analysis that retains signals at other scales.Similar to that of the SST, Fig. 3 presents the standard deviations of the temperature at the depth of 75 m.Compared with the result from the SST, one noticeable difference is that the annual and semiannual variations are less strong, only accounting for about 30 percent of the total variance.This can be expected because the impact of the atmospheric forcing at the surface declines quickly below the thermocline layer.The variations at this depth are less affected by the synoptic processes of the atmosphere.Regarding the salinity field, the annual and semiannual harmonics are not removed because they have only small effect on the total variance (figure not shown).In this study, the major focus is placed on the anomalies of temperature field.In practice, we firstly fit the annual and semiannual harmonics to the time series at each model grid point.The time series of the annual and semi-annual cycles are then reconstructed.After that, the reconstructed time series are subtracted from the original one and the anomalies are thus derived in this way.

Bin-averaged correlations
The average depth of the Baltic is about 54 m.The southern part of the North Sea is also shallow, being mostly less than 50 m deep.The water masses and flows in the Baltic Sea and North Sea are largely influenced by the complex topography, coastal lines, bathmetry and even the bottom in the shallow water.The correlations between a point and its surrounding points can reflect the very local features, the total effect of the local current advection, bathymetry, coastlines, etc.To present a general picture of the local correlations, we choose to calculate the averaged correlations in small bins defined according to their geographic positions.One advantage of this treatment is to reduce the computational cost.In addition, it permits a visible image of the flow-dependent features across different bins.In this study, the bin size is empirically set to be about 1.5 • × 1.5 • in the longitudinal and latitudinal direction.For each point inside a given bin, we calculate the correlations with its neighboring points when the lags are less than ±120 km in both directions.Two factors are considered in selecting the bin size.Firstly, the bin size is chosen as a compromise between computational cost and physical interpretation.The computational cost will increase greatly if the bin size is too small.Second, some previous calculations using satellite SST (Høyer and She, 2007) reveals that spatial scales are on the order of hundred kilometers in longitude and latitude.For the points near the boundaries, only points in water are included.Then, the correlation coefficients between a pair of points are averaged if the pairs have the same lags in a given bin.Finally, the averaged correlations with increasing distance are displayed accordingly in each bin, where the central point denotes pairs without spatial lags (the correlation of a point with itself is 1.0).By definition, the correlations tend to decline from the center point outwards in every spatial bin.The distribution of the averaged correlations for temperature at surface, 30 m and 75 m are presented in Fig. 4. The three layers could be representative of the surface and intermediate water.In order to give a concise description, we here define the part of a bin where the averaged correlation is greater than 0.7 as the "High Correlation Area" (HCA).
From the SST, we can find that there are remarkable differences in different bins for the averaged correlations.Large correlations are shown in the Baltic Sea and the central North Sea where the HCAs almost occupy the whole bin.This implies the characteristic correlation scales are large in these regions.However, correlations decline rapidly in the area close to the Norwegian Trench, the English Channel and the Gulf of Finland.In these areas, the correlations are largely modulated by the coastline and the topography.In the English Channel and the Skagerrak, the rotation of the axis of the  HCAs corresponds well with the current system.For example, the Channel water flows northwestward along the coastline, merging with the Continental coastal water and the Jutland coastal water (Turrell, 1992).The water further flows northward to the Skagerrak.This feature can be clearly identified from the HCAs in different bins.Meanwhile, the rotation of the axis of the HCAs in the Baltic Sea also has good agreement with the local water flows.Different from the surface, the HCAs at the depth of 30 m show a general decline tendency almost in the whole model domain.Still, the correlations from the English Channel to the coastal region exhibit the similar flow-steered features as the surface.The HCAs are large in the central North Sea, but quite smaller elsewhere.This is partly due to the reduced effect of the atmospheric forcing with depth.At the depth of 75 m, however, the HCAs show no general decrease compared to the 30 m depth.
Two factors should be taken into account to understand the distribution of HCAs at different level.One is the model's "boundary effect".That means model's open boundary conditions tend to increase the averaged correlations.This is especially clear for SST near the open boundary areas in the North Sea.The "bottom effect" should also be noted in analyzing the results in subsurface waters.The correlation calculation will be stopped when a grid becomes land.This means that a subsurface bin may be smaller than 1.5 • × 1.5 • , which may give a high bin-averaged correlation.On the other hand, the "bottom effect" would decrease the averaged correlation.The HCAs in the subsurface waters will be affected by all these factors.The correlations and the axis rotation could also contribute to the data assimilation studies in this region with complex topography.
Figure 5 presents the spatially averaged correlations calculated from the satellite SST data and the differences of total mean correlation in each bin for both the proxy and satellite SST data.Compared with the satellite data, the proxy ocean data reproduce quite realistic results in the Baltic Sea where most of the HCAs and their axis tilts agree well with those of the satellite data.This can also be identified from the total mean of correlations in each bin in Fig. 5b.The differences of mean correlations are small in this region.Discrepancies between the satellite and proxy ocean data may arise from the errors on the satellite observations.The white noise part of the satellite observational errors will result in lower satellite correlations for very small spatial lags, compared to the model derived correlations.In addition, the errors related to atmospheric effects have very large scales and will enhance the satellite correlations for very large lags.An indication of these effects can be seen in Fig. 5c.
The results are comparable with the observations in regions near the Kattegat and the Inner Danish water.Particularly, both data produce that the axis of the averaged correlation is somehow rotated along the Norwegian Coastal Current region.For the North Sea, the results from the two datasets show some differences.Near the boundary areas (English Channel and the northern boundary of the model), the HCAs are much bigger than in the satellite observation.This is primarily due to the restoring boundary conditions are employed in the model as discussed above.Nevertheless, the rotation of the axis has good agreement with those of the satellite data in many bins of these areas.For the location (5.35 • E, 59 • N), the correlations with varying lags are compared in the meridional and zonal directions.It can be found that the trend is close to each other.The correlations do not reflect the expected residual flow pattern in some regions like e.g. the Gulf of Finland where the residual flow is east-west, whereas the correlations are more north-south oriented.The north-south correlation pattern is found both in the satellite and proxy data and may reflect that we are only looking at fluctuations around a mean.If a residual flow is not associated with temperature variations, we will not see any correlations.
The averaged correlations in each bin for salinity are also calculated based on the same definition as the temperature.Results at surface, 30 m and 75 m are shown in Fig. 6.The surface layer of the Baltic Sea is occupied by the low saline water due to the river runoff and surplus precipitation.The source of salt in the Baltic Sea is the inflows of saline water through the Danish Sounds, which are essential for the main-tenance of the vertical stratification.For surface salinity, it can be easily seen that the HCAs are much smaller compared with those of the temperature at the same level.Large HCAs appear in the Danish transition water and the area near the English Channel.The axis of the HCAs can be explained  similarly by the local currents as discussed above.For salinity, the HCAs show more pronounced flow-steered features in the model domain.At 30 m, the bin-averaged correlations show a similar distribution as those at surface.In some areas such as the English Channel and the Bothnian Bay, the HCAs are larger.In the Botahnian Bay, similarities between surface and 30 m can partly be accounted for by the weak stratification.Similar features can be seen at the depth of 75 m in most parts of the North Sea.It is noted that the HCAs are large in many parts of the Baltic Sea such as the Baltic Proper and Bothnian Sea.One reason is that the saline water from Kattegat can reside for a long time in the deep layers of these regions.In addition, the "bottom effect" in the calculation can also contribute.It is also interesting to see the axes of the HCAs remain the same direction in many bins at different depths.This reflects that currents in the semi-closed sea are largely steered by the topography.

Correlation models
Choice of the correlation function is of great importance.An ideal correlation function should be representative of the flow-steered feature revealed in different bins given above.Traditionally, the forecast error covariance in data assimilation is assumed to be homogeneous, isotropic, and stationary (Bartello and Mitchell, 1992;Daley, 1993;Chen and Wang, 1999).Such assumptions are suggested to be invalid for the coastal ocean environment especially where there is very complex topography.The averaged correlations in the above figures clearly show that the correlation distribution is strongly influenced by the local coastline, bathmetry and topography.In the meridional and zonal directions, particularly in the Baltic Proper, Gulf of Finland and in the Norwegian Trench, the correlation axes of SST tend to be strongly rotated.That means such correlations are not isotropic.Therefore, the correlation changes can be fit into a uniform explicit function.The definition of the correlation model is critical for our assessment because it is the foundation to define the criteria for the assessments.On the other side, the correlation model is beneficial to the applications of the assimilation schemes such as 3DVAR and Optimum Interpolation (OI) where the forecast error covariance is usually approximated by an isotropic Gaussian-type function.
In this study, we choose to represent the spatial covariance with a covariance model defined in the longitudinal, latitudinal and temporal directions.Thus, we can only reproduce correlations where the major axis of the correlations is aligned with the x or y axis, the axis-rotation effect is excluded.In the ideal case, the spatial and temporal correlations could be estimated from the observed anomalies in every grid point.For practical applications, however, a correlation model is often fitted to the empirical correlation estimates within a small domain.Several covariance models have been used in oceanography and meteorology (see e.g.Thiebaux and Pedder, 1987;Leeuwenburgh, 2001).In this paper, the correlation model is generally assumed to be in the form of where x, y and t are the longitudinal, meridional and temporal lags, respectively.(a, b, c) are parameters to be determined.In addition, we assume that the parameters can be separately determined for the latitudinal, longitudinal and temporal correlations.The spatial correlation parameters are determined from all the empirical correlations calculated in the 1.5 • × 1.5 • bins.The best temporal model is obtained by calculating the autocorrelation of the time series in every grid point.The lagged correlations in space and time are calculated where more than 50 pairs are available.These minimum numbers are selected to ensure a robust outcome.For each level, the empirical correlations are averaged for the whole domain to obtain the coefficients (α, β, γ ) while coefficients (a, b, c) are considered as locally dependent values.In a broad sense, (a, b, c) correspond to the spatial and temporal characteristic scales while (α, β, γ ) signifies the declining speed of the function.The larger (α, β, γ ) are, the faster the function declines.
The mean spatial-temporal correlation function for SST is obtained by fitting the correlation model to the averaged correlations in all bins (Fig. 7).By this, the empirical correlation function can be acquired for the whole domain as: ρ( x, y, t) = e −0.0717 x 0.752 −0.0114 y 0.615 −0.0224 t 1,26 (2) where x, y is in kilometers and t is in days.Other correlation models are also tested to fit the average curves, but the selected correlation model produces the best fitting.For example, the Gaussian function is frequently adopted in oceanic and atmospheric applications.But from our tests, the fitting to the Gaussian correlation function is significantly poorer than the model we use here.The correlation models that we consider here satisfy two critical conditions: they fit the observations very well and they produce positive definite covariance matrices.By repeating the fitting, we can obtain the empirical function at every Levitus level for both temperature and salinity.Table 1 lists the coefficients (α, β, γ )  2. It should be noted that most of the (α, β, γ ) for salinity is close to 1.0, that means, the correlation model is nearly exponential.

Correlation parameters
There are two indicators to assess the observational networks in this paper, effective coverage and explained variance.The effective coverage is based on the characteristic scale analysis while the explained variance also involves the information obtained from the scale analysis.After we obtain the (α, β, γ ) for each level, the fitting is performed at each bin by retaining (α, β, γ ) and leaving (a, b, c) to be determined.Figure 8  The effective coverage (She et al., 2007) is defined to evaluate the impact of an observational network in a given domain in a more quantitative way by considering the local characteristic scales.The representative area of a measurement is assumed to be proportional to the local characteristic scales.Mathematically, this is defined as follows: for a given grid cell (x o , y o , t o ), if a grid cell (x i , y i , t i ) satisfies: where ρ c is the cutting correlation which defines the characteristic scales, the grid cell (x i , y i , t i ) and (x o ,y o ,t o ) are called a pair of "impact cells".We use the e-folding scale in this study (i.e.= 1/e).The "impact cells" can be exemplified for the sea surface temperature.Using the parameters derived from the correlation above, if two grid cells at the surface satisfy: the two grid cells are called "impact cells".This equation includes the effect of the local characteristics, suggesting that the number of impact cells of a given location is different at different levels.
The following criteria are defined in order to quantify the representative area of a given measurement: a grid cell (x o ,y o ,t o ) is regarded as being effectively covered either when an observation is found at this cell or when a number of impact cells are observed.In practice, if the grid cell (x o ,y o ,t o ) is not observed, the grid cell is also called "effectively covered" if there is at least one impact grid cell that is observed.For a given observational network, its effective coverage thus means the total area covered by the effectively covered grid cells.The ratio of the effectively covered grid cells to the total number of grid cells for a given period is called the total "effective coverage rate" of an observational network.The gaps and the effectively covered areas by an observational network can be identifiable in a more quantitative way with these statistics.

Meta data assessment
Table 3 lists the total effective coverage rate at standard Levitus levels down to 300 m for both temperature and salinity.
For temperature at surface and 10 m, the North Sea and Baltic Sea are well covered where the total effective coverage rates are 0.911 and 0.776, respectively.However, the total effective coverage rate drops quickly to 0.263 at the depth of 20 m and remains about 0.3 for all other levels.It should be noted that the effective coverage rate is not monotonically decreasing as the depth goes down for the temperature.Except the bottom layer, the effective coverage rate is lowest at the depth of 75 m in the Baltic Sea and North Sea as a whole.This level lies at the typical averaged depth of the thermocline, which is more difficult to be realistically simulated in the model.This table also shows more observations are required at the depth of 20 m to 75 m.For the temperature, the effective coverage rate below 100 m is comparable to the upper levels.This is partly due to that the number of total grid cells is much smaller than the upper levels though fewer available observations exist there.The changes in the characteristic scales should be taken into account for the total effective coverage rate.For salinity, the total effective coverage rate shows a general increasing tendency with depth except at 300 m.This is caused by three factors: the characteristic scales show small variations at different levels; the total number of the grid cells dwindles gradually; the salinity meta data, mainly composed of the station and buoy data existing at most levels.Figure 9 presents the effective coverage area of the observational network for temperature at 10 m and 75 m.The large value means the grid cell is well covered both in space and in time.The small value (blue color) denotes big gaps, suggesting observations are either discontinuous or sparse in this area.For the level at 10 m, most of the North Sea and Baltic Sea are well covered for temperature field.The rel-  atively poorly covered areas include the Norwegian Trench, the southern Kattegat and the stripe region along 54 • N. In these regions, there is small number of observations and the characteristic scales are relatively shorter.It can be noted that the 3 buoys are very effective in the English Channel, leading to good effective coverage rate in this area.At the depth of 75 m, three areas are well effectively covered: the western part of the Skagerrak, the southern part of the Baltic Proper and the area centered at (57 • N, 0.8 • E) where the buoy observation exists.The regions north of the Baltic Proper are poorly covered.For salinity at 10 m, effectively covered part corresponds well with the observation locations in the North Sea.Meanwhile, the Skagerrak and most of the Baltic Sea are well covered due to the relatively complete meta data.The effective coverage rate is higher compared to the temperature at 75 m because the temporal characteristic scales are larger for salinity.However, there are still small patches where the effective coverage rate is lower than 0.6 inside the Baltic Sea.In terms of the effective coverage rate, the existing observational networks give a good coverage in the North Sea and Baltic Sea.It should be noted good "effective coverage rate" implies the existing observational networks provide relevant and possibly useful information in the given area.A  good effective coverage rate does not mean the information is enough.

"Explained variance"
As stated above, the effective coverage has its limitations while it provides a quantitative assessment.The limitation is that the effective coverage rate can not clearly reflect the relative importance of the observations at different location.To deal with that, the explained variance is defined.It aims to assess the relative importance of the existing observations by their abilities in constructing the time series at locations in absence of observations.We assume that the time series B(i) at a given location (location B), where i denotes the length, can be constructed with the time series from m number of observations nearby.In practice, the "nearby" points are limited to within a radius of 100 km.Additionally, the complex topography effect is taken into account.For example, time series from west of Jutland could be within 100 km for a point in the Belt sea, but these time series are excluded in constructing the time series.Suppose B(i) satisfies such an equation: where A denotes the observed time series from the m locations, the X(m) -the weights, can be obtained by regression.
In this study, the proxy ocean data are used for maxtrix A and B. With the weights X and the m numbers of observations, the time series at location B can be reconstructed.
The square of the correlation between the time series from the proxy ocean and reconstructed one is defined as the "explained" variance.The "explained" variance gives the effectiveness of a given observational network.
In general, the explained variance at a given point is determined by two factors: the number of the observations and their locations.The calculation of the weight matrix is performed based on the daily proxy ocean data.Ideally, the explained variance of a time series by itself should be 1.0 if one location is observed daily and without temporal gaps.The temporal gaps will greatly reduce the "explained" variances.Moreover, the spatial locations of the observations also play an important role in the "explained" variance.Some locations are more effective and contribute more than the others.Therefore, the "explained" variance reflects the overall effect of the existing observational network and can be served as a good indicator for the spatial and temporal coverage of the data.Compared to the "effective coverage", the "explained" variance is a more critical criterion.Its ability to identify the relative importance of observations at different locations can provide a potential tool for the design and planning of an observational network.In choosing the observations nearby a given location, it is surely not proper to include observations in the North Sea for a location in the Bothinan Bay.In this study, the time series of a given location is reconstructed by using observations located in a certain area, which is defined by the correlation models for different levels.

Meta data assessment with the explained variance
The mean explained variances at standard Levitus levels are listed in Table 4.For both temperature and salinity, some common features can be found.For example, the explained variance decreases from surface down to 75 m and then increases from there to 200 m.The explained variance drops to a very low value at 300 m.In addition, in the intermediate layer from 50 m to 100 m, the explained variances are lower than other levels.This suggests more observations are needed to better explain the variances close to the thermocline layer, which is also revealed by the total effective coverage rate for temperature.Below 200 m, the mean explained variance is smaller for both temperature and salinity.
Figure 11 presents the mean explained variances by the observational network for temperature at the depth of 10 m and 75 m.The biggest explained variance is found to be about 0.6 in the Bothnian Bay at the depth of 10 m.In the Belt Sea and the southern part of the Baltic Proper, the explained variance is also about 0.6.That means the observations have good spatial and temporal coverage in these two regions.It should be noted the "explained" variance is lower close to the Jutland coast though the effective coverage rate is high (Fig. 9).This is due to the temporal gaps in these data.In the English Channel, we can see that the 3 buoys produce the explained variance about 0.4 for a large area.The "explained" variance in the Skagerrak and Kattegak is very low.Apart from the temporal gaps in the data, this area has complex topography and is affected by fronts, tides and so on.At 75 m, most of the explained variances are lower than 0.4.Large values appear in the central Baltic Sea, which is consistent with the effective coverage at this depth.For salinity at 10 m, the explained variances are close to 0.8 in the Belt Sea and Baltic Proper region.Apart from this area, the largest explained variance is about 0.3 existing in the southern Baltic Sea.In general, the salinity shows lower "explained" variances than the temperature for the existing observational networks.To understand the effective coverage rate and the "explained" variance, we have to take into account the frequency of the observations.For example, the CTD provides observations much less frequently than observations from buoys or observing stations during the period 2004-2006 (Fig. 1).The observing stations are mainly located near the coast while there are only a few buoys in the North Sea and Baltic Sea.The spatial effective coverage rate shows a good agreement with the observation locations where the observation is more frequent.For example, the buoys in the North Sea correspond well with the relatively large effective coverage rate.The effective coverage is poor in the Kattegat because there are only CTD data whose frequency is much lower.The correspondence is clearer in the spatial explained variance as well for both temperature and salinity.There are more buoys and observing stations in the Inner Danish water such as the Belt Sea and Baltic Proper and this produces large "explained" variance than other areas.

Concluding remarks
The three dimensional in-situ observational networks in the Baltic Sea and North Sea are assessed by means of the meta data collected during the period 2004-2006.Due to the relatively complete data, our focus is placed on the Baltic Sea.Two complementary quality-indicators, the "effective" coverage rate and "explained" variance are used as the criteria to identify the gaps and redundancy of the 3-dimensional observational networks.The characteristic scale analysis is firstly performed because it provides the necessary information to define the effective coverage.Due to the lack of observation below the surface, the proxy ocean data generated from DMI-BSHCmod are employed to calculate the characteristic parameters.The horizontal resolution is about 6 km while interpolated to the standard Levitus levels in the vertical direction.
The spatial averaged correlations in 1.5 • × 1.5 • bins are calculated on each Levitus level for both temperature and salinity.By removing the dominant annual and semiannual cycles in temperature field, the anomalies are used to compute the bin-averaged correlations across the North Sea and Baltic Sea.The distribution of the bin-averaged correlations reveals some features clearly associated with the local flows, complex topography, coastline effect, etc.This can be shown by the rotation of the axis of the bin-averaged correlation at different locations.Moreover, the bin-averaged correlation is subject to the atmospheric forcing, boundary and bottom effect.For example, the averaged correlations are quite large for SST with HCA occupying a large part in most of the bins.This is because the SST is largely affected by the atmospheric forcing.The HCAs are much smaller in most of the Baltic Sea below surface compared to the SST.It can also be noted that the rotation of the bin-averaged correlation is similar at different depths.This reflects that the water in the semi-enclosed marginal sea is largely influenced by the coastlines, channels and boundaries.The comparisons with the satellite SST show many similarities in the results.The proxy ocean data tend to produce larger bin-averaged correlations in areas close to the boundary, apparently in the northern boundary area and the English Channel.This may partly explained by the restoring boundary conditions used in the dynamical model.Importantly, the rotation of axis in the bin-averaged correlation agrees well with that of the observations.The bin-averaged correlations of the salinity also present similar "axis rotation" features as the temperature, but the HCAs are much smaller and display complex variations at different depths.
The correlation models are estimated using the binaveraged correlations in the North Sea and Baltic Sea.The effects in the longitudinal, meridional and temporal directions are included and a general correlation model is assumed in the form ρ( x, y, t) = e −a x α −b y β −c t γ , where x, y and t are the longitudinal, meridional and temporal lags.The parameters (α, β, γ ) and (a, b, c) are obtained by fitting the correlations to the bin-averaged correlations.For temperature, the exponents vary from 0.75 to 1.33 in the longitudinal direction, from 0.6 to 1.56 in the latitudinal direction for the upper 300 m.In the temporal direction, the exponents vary from 1.05 to 1.36.These parameters differ from the typical Guassian function.We also find that the correlation model we assume here leads to smaller error than the Guanssian type function.Regarding the salinity field, the exponents vary around 1.0 especially in the temporal direction.These correlation models also provide useful information for the implementation of data assimilation schemes in the North and Baltic Sea.
The three dimensional observational networks are assessed in the Baltic and North Sea by two complementary quality-indicators: effective coverage rate and explained variance.The effective coverage rate gives the influence domain of the total observed information from a given network while explained variance helps to identify the relative importance of information from different locations.The assessments are performed on the standard Levitus levels.For temperature, the total effective coverage rate is about 0.9 at the surface and 0.776 at 10 m, but it drops to about 0.25 for other levels.The total effective coverage of salinity is almost increasing with depth.Spatially, the whole domain is well covered for SST except in Norwegian Trench, Kattegat, and the "band" area along 54 • N. The effective coverage rate is lower in Norwegian Trench and the Bothnian Sea at the depth of 75 m.The total coverage rate of temperature shows smallest values around 75 m depth, which corresponds to the thermocline depth.The explained variance is a more critical criterion than the effective coverage rat because it can identify the relative importance of observations at different locations.The mean explained variances decline from surface to 75 m and then shows a rising tendency to 200 m for both temperature and salinity.The explained variance is very low below 200 m.This agrees with the effective coverage rate of temperature.Spatially, the Belt Sea and part of the Baltic Proper show relatively large explained variance.For the two indicators, sampling frequency of the observation plays a very important role.The CTD data with big gaps contribute less to the calculation than the buoys and observing stations.It should also be noted that the effective coverage rate and explained variance show relatively lower values from 50 m to 125 m (except the effective coverage for salinity).This means more observations are required to better understand the intermediate water in this region.
The effective coverage rate and explained variance help to present an overall assessment of the observational networks in the North Sea and Baltic Sea.However, the design problem is not addressed in this study.For the next step, the effective coverage and explained variance will be used as the criteria to perform the optimal design of the observational networks.For instance, for a given number of sampling locations, optimal design will reach the maximum effective coverage and explained variance.Some design experiments are being conducted with the optimality attained by annealing.
and at DMI since 2001.Three domains are applied in the present model setup: a 2-D North-East Atlantic

Figure 1 ,
Figure 1, Spatial distribution of the meta data for temperature (a) and salinity (b) measurements during 2004-2006 in the Baltic Sea and North Sea.The CTDs are marked in red dot, the station observations are in blue and the black dot denotes the buoys.The sizes of the dots correspond to the time frequency of the observations.

Fig. 1 .
Fig. 1.Spatial distribution of the meta data for temperature (a) and salinity (b) measurements during 2004-2006 in the Baltic Sea and North Sea.The CTDs are marked in red dot, the station observations are in blue and the black dot denotes the buoys.The sizes of the dots correspond to the time frequency of the observations.

Figure 2 ,Fig. 2 .
Figure 2, The standard deviation of SST in the North Sea and Baltic Sea (a) before and (b) aft removal of the annual and semi-annual harmonics.The contour interval is 0.5ºC.(c) gives the percentage of the total variances accounted by the annual and semi-annual harmonics, the con interval is 0.1.

Figure 3 ,Fig. 3 .
Figure 3, Similar to figure 2, but at the depth of 75 m.

Figure 4 ,
Figure 4, Contours of the spatial bin-averaged correlations in 1.5°X1.5°bins calculated from the proxy ocean data at (a) surface, (b) 30 m and (c) 75 m for temperature.

Fig. 4 .
Fig. 4. Contours of the spatial bin-averaged correlations in 1.5 • × 1.5 • bins calculated from the proxy ocean data at (a) surface, (b) 30 m and (c) 75 m for temperature.

Figure 5 ,
Figure 5, (a) contours of the spatial averaged correlations in 1.5°X1.5°bins calculated from the satellite SST; (b) differences of the total mean correlations for each bin between the proxy ocean and satellite data, size of the circles denotes magnitude of the differences, ranging from 0.01 to about 0.35.(c) the comparison of the correlations with meridional and zonal lags at (5.35ºE, 59ºN) from the proxy ocean data (red line) and the satellite data (black line).

Fig. 5 .
Fig. 5. (a) Contours of the spatial averaged correlations in 1.5 • ×1.5 • bins calculated from the satellite SST; (b) differences of the total mean correlations for each bin between the proxy ocean and satellite data, size of the circles denotes magnitude of the differences, ranging from 0.01 to about 0.35.(c) the comparison of the correlations with meridional and zonal lags at (5.35 • E, 59 • N) from the proxy ocean data (red line) and the satellite data (black line).

Figure 6 ,
Figure 6, Contours of the spatial averaged correlations in 1.5ºX1.5ºbins calculated from the proxy ocean data at (a) surface, (b) 30 m and (c) 75 m for salinity.

Fig. 6 .
Fig. 6.Contours of the spatial averaged correlations in 1.5 • × 1.5 • bins calculated from the proxy ocean data at (a) surface, (b) 30 m and (c) 75 m for salinity.

Figure 7 ,
Figure 7, The best fitting of the mean correlation models to the averaged correlations in (a) the longitudinal, (b) latitudinal and (c) temporal direction.The correlation model is fitted to correlations of all the spatial bins.

Fig. 7 .
Fig. 7.The best fitting of the mean correlation models to the averaged correlations in (a) the longitudinal, (b) latitudinal and (c) temporal direction.The correlation model is fitted to correlations of all the spatial bins.

Table 1 .
Parameters of the mean correlation model, ρ( x, y, t) = e −a x α −b y β −c t γ for temperature at standard Levitus levels down to 300 m. (a, b, c) corresponds to the spatial and temporal characteristic scales, (α, β, γ ) signifies the declining speed of the function.( x, y, t) are the longitudinal (km), meridional (km) and temporal (day) lags, respectively.Parameters of the mean correlation model, ρ( x, y, t) = e −a x α −b y β −c t γ , for salinity at standard Levitus levels down to 300 m. (a, b, c) corresponds to the spatial and temporal characteristic scales, (α, β, γ ) signifies the declining speed of the function.( x, y, t) are the longitudinal (km), meridional (km) and temporal (day) lags, respectively.b, c) for the temperature at each level.The parameters for salinity are shown in Table

Figure 8 ,Fig. 8 .
Figure 8, Spatially varying parameters for the correlation model calculated from the proxy SST in the (a) longitudinal, (b) latitudinal and (c) temporal directions.
gives the spatial distribution of the correlation model parameters (a, b, c) for SST.Small values correspond to large spatial or temporal scales, and vice versa.The figure shows that there are significant spatial variations in the characteristic scales.Large spatial scales are clearly seen in the central North Sea, which can be expected from the averaged correlations.In the Baltic Sea, large spatial scales can be found over most parts (for the zonal correlations) and in the central part (for the meridional correlations).The spatial scales are relatively smaller in Gulf of Finland and the Transition Zone, where the correlation could be largely influenced by the narrow channel.Also, small scales appear in the Norwegian Coastal Current area and the Skagerrak for both longitudinal and latitudinal directions.The fit of the temporal correlation models shows largest temporal scales in the eastern and northern Baltic Sea and in the central North Sea, but are smaller near the boundary areas, especially in the Norwegian Trench and the southern North Sea near the English Channel.The variations in the meridional and zonal scales are in good agreement with Fig.2a.The spatially varying parameters on other levels are not shown due to the length of the paper.

Figure 9 ,
Figure 9, Effective coverage rate at the depth of (a) 10 m and (b) 75 m for temperature.

Fig. 9 .
Fig. 9. Effective coverage rate at the depth of (a) 10 m and (b) 75 m for temperature.

Figure 10 ,
Figure 10, Effective coverage rate at the depth of (a) 10 m and (b) 75 m for salinity.

Fig. 10 .
Fig. 10.Effective coverage rate at the depth of (a) 10 m and (b) 75 m for salinity.

Figure 11 ,Fig. 11 .
Figure 11, Explained variance at the depth of (a) 10 m and (b) 75 m for temperature.

Figure 12 ,Fig. 12 .
Figure 12, Explained variance at the depth of (a) 10 m and (b) 75 m for salinity.

It has been generated from a specific run of Danish Meteorological Institute (DMI)'s BSHcmod. A lower resolution model run was first initialized on 15 June 2000 from climatological salinity and temperature and run till 22 December 2003, at which date salinity and temperature were
interpolated to the finer resolution model grid.The satellite SST data are assimilated by a reduced ensemble Kalman Filter • 31 30 -65 • 52 30 N, 4 7 • 30 W-30 • 17

Table 3 .
Total effective coverage rate at standard Levitus levels down to 300 m for temperature and salinity.

Table 4 .
Total explained variance at standard Levitus levels down to 300 m for temperature and salinity.