Evaluation of numerical models by FerryBox and Fixed Platform in-situ data in the southern North Sea

Introduction Conclusions References


Introduction
The North Sea is a marginal sea that has among the highest densities of ship traffic in the world.It is an economically important region, sustaining commercial fisheries, wind farming, oil production and tourism (Kannen, 2012;OSPAR, 2010).As a major part of the north-western European continental shelf, the North Sea has a mean depth of 90 m.Bathymetry varies, and while the southern part is shallow (15-50 m), the northern part deepens to 100-200 m in the Norwegian Trench to well below 200 m.The south-eastern part of the North Sea is known as the German Bight, with the Wadden Sea at its coastal margins.Because of freshwater inflow from several rivers in the southern North Sea (e.g.Rhine, Maas, Elbe), salinity near the coasts is in the range of 15-25.In the central North Sea, salinity is approximately 35 (Janssen et al., 1999;OSPAR, 2000).Besides the freshwater inflow, the North Sea is also strongly influenced by tides and residual circulation, which is governed by bathymetry, density distribution and wind stress (Queste et al., 2013).An anti-clockwise circulation dominates the North Sea, with North Atlantic water entering at its north-western boundary near the Shetland Islands (0.4-0.5 Sv, OSPAR, 2000), travelling along the Scottish and English coast, and leaving along the Norwegian coasts (Turrell, 1992) (Fig. 2).Some of the North Atlantic water entering from the north reaches the southern North Sea, but the majority circulates north of the Dogger Bank.A much smaller portion of North Atlantic water enters through the Dover Strait (approximately 0.07-0.12Sv, OSPAR, 2000) and travels up to the entrance of the Baltic Sea, where less saline water is entrained into the North Sea water through the Skagerrak and Kattegat.The relatively salty English Channel water (> 35) is mixed on its way along the south-eastern way of the North Sea coasts with freshwater from several rivers, passes the German Bight, and enters the Norwegian Trench region, mixing with the northern branch of the North Sea circulation.The estimated residence time of North Sea water is less than 1 year (Jickells, 1998;Lenhart and Pohlmann, 1997;Thomas et al., 2003).
Given the importance of the North Sea to the European economy and to the coastal communities, it is vital to monitor and understand its current ecological state.The Ferry-Box system provides regular high-frequency scientific measurements of ecologically important parameters, including temperature and salinity.It is installed on ships of opportunity (SoO) in European coastal regions, as well as on fixed onshore stations near harbours, river banks or estuaries (e.g. at Cuxhaven harbour located at the mouth of the Elbe River estuary).It is a flow-through system that continuously measures biogeochemical parameters every 10 s.FerryBoxes are a valuable platform to test and operate new developed oceanographic sensors in a sheltered environment (e.g.ship or container) without limitation of power supply.
During the FerryBox project from 2002 to 2005 (Ferry-Box, 2014; Petersen et al., 2005), a cooperation between several international oceanographic institutions was launched, which targeted development of new sensors and observing systems, as well as best practices in quality control, maintenance and biofouling prevention (Hydes et al., 2009;Petersen et al., 2005Petersen et al., , 2007)).
Shelf seas are complex regions governed by many processes.Along with operational monitoring using in situ and satellite observing systems, numerical simulation has long been acknowledged to be important for understanding the hydrodynamics of coastal regions.Since the 1980s, baroclinic 3-D models have been developed to predict water temperature and salinity variations in the North Sea.All countries around the North Sea have been contributing to this effort, i.e.Denmark (Vested et al., 1992), Norway (Svendsen et al., 1996), the UK (Proctor and James, 1996), Belgium (Delhez and Martin, 1992;Luyten et al., 1996), the Netherlands (de Kok, 1997) and Germany (Backhaus, 1985;Dick et al., 2001).For the present study, two different hydrodynamic models, BSHcmod and FOAM AMM7 NEMO, were used.These models provide the hydrodynamics for other studies, e.g. for ecosystem modelling (Edwards et al., 2012;Maar et al., 2011) and predicting wave-tide-current interactions (Pleskachevsky et al., 2009) in the North Sea.The German Federal Maritime and Hydrographic Agency (Bundesamt für Seeschifffahrt und Hydrographie, BSH) developed the BSHcmod hydrodynamic model for operational use in the North and Baltic seas (Dick et al., 2001).The cou-pled Forecasting Ocean Assimilation Model (FOAM) consists of a hydrodynamic (O'Dea et al., 2012) and an ecosystem (Edwards et al., 2012) part.The hydrodynamics are provided by the Nucleus for European Modelling of the Ocean (NEMO, Madec, 2008), while the ecosystem part is supplied by the European Regional Seas Ecosystem Model (ERSEM, Baretta et al., 1995;Blackford et al., 2004).The FOAM is a regional model, nested to the UK Met Office global ocean model (Blockley et al., 2014).
Besides the FerryBoxes, several other measurement networks are available in the North Sea, including the COSYNA coastal observing system (COSYNA, 2014;Grayek et al., 2011;Riethmuller et al., 2009;Stanev et al., 2011).Also, other observational networks like MARNET (BSH, 2014) and the SmartBuoys network (Cefas, 2014;Mills et al., 2003) measure water temperature and salinity on buoys and fixed platforms.Satellite coverage is generally limited in temporal resolution and even more restricted due to cloud coverage, e.g. when using visible parts of the spectrum (Petersen et al., 2008;Volent et al., 2011).
FerryBox data can bridge the gap between existing in situ observations typically used for data assimilation, as they provide reliable and high-resolution in situ data for transects in the North Sea (Petersen et al., 2008).However, the FerryBox data coverage is limited to grid points along a transect.To overcome this limitation, Wehde et al. (2006) and Petersen et al. (2011) applied a water transport model for comparison of FerryBox measurements with other operational observations.
The aim of the present study was to compare numerical model data with in situ measurements of different monitoring systems (FerryBox, fixed platforms).The goal of this study was to evaluate the quality of modelled water temperature and salinity data in different areas of the North Sea and to identify related weaknesses of the AMM7 and BSHcmod v4 operational models.These models are used by a variety of sectors for a range of applications.The most important applications supported by BSHcmod v4 are, however, the sealevel prediction and storm surge warning service for the German coast and different kinds of drift forecasts (e.g. for oil spill combating or search-and-rescue at sea), with sea surface heights and currents being the primary outputs required.This study gives some indication of where it could be beneficial to improve the computed mass distribution or baroclinic dynamics.
The first section of our study describes the data sets and the applied methods.Then, data from a complete FerryBox transect data set are compared with model results.Discrete point comparisons of model data and observations are then presented., 2000, andBecker, 1990).

FerryBox system
In general, all European FerryBox systems have a similar design.The differences are in the design of the flow-through system, the degree of automation and biofouling prevention, as well as the possibilities of supervision and remote control.The FerryBox systems used in the present study are designed and manufactured by 4H-Jena engineering GmbH and Helmholtz-Zentrum Geesthacht (HZG) and have the following specifications.
The water is pumped from a subsurface inlet (located at 5 m depth) into the flow-through system containing multiple sensors.Due to the ship's movement and turbulence, the water pumped into the FerryBox system originates from the surface layer of the water column.A debubbling unit removes air bubbles, which may enter the system during heavy seas.Coupled to the debubbler, an internal water loop circulates the seawater with a constant velocity of about 1 m s −1 .At certain positions along the transects, water samples are collected in an automated cooler sampler for subsequent laboratory analyses.FerryBox locations are tracked via GPS positioning.
More information about the FerryBox system can be found e.g. in Petersen et al. (2003Petersen et al. ( , 2005)).
For this study, the data sets of two different commercial ships have been used.The data are available at the FerryBox database at Helmholtz-Zentrum Geesthacht (HZG) (http:// ferrydata.hzg.de).In Fig. 1, the transect of the TorDania (which was in service until April 2012) is shown.TorDania travelled on the route between Cuxhaven (GER) and Immingham (GB) every 2 days with an average cruising speed of 12 kn.The temporal resolution of TorDania measurements is 10 s.Cargo vessel Lysbris (IMO number 9144263) operates on a route going along the coasts of the North Sea (Fig. 1).At certain times, the ship also travelled along the Elbe River estuary up to the port of Hamburg.
For FerryBox water temperature and salinity measurements, the Citadel TS-NH thermosalinograph (Teledyne Technology Company) is used.The basic salinity instrument measures inductive conductivity, while the temperatures are measured by a thermistor in close proximity.Accuracy for salinity is ±0.015, and for temperature, ±0.005 • C. For validation of the Citadel sensor, water samples taken on board are analysed in the laboratory.Salinity of discrete samples was measured using the Guildline Autosal Salinometer 8400B until 2012, after which a more recent OPTIMARE Precision Salinometer system was used.The accuracy of the Guildline Autosal Salinometer is ±0.002.The OPTIMARE is accurate to ±0.003, verified in laboratory tests with standard seawater probes from the OSIL company in 2012.

MARNET observing system
The MARNET station network consists of several measurement sites in the German coastal parts of the Baltic Sea and the North Sea (BSH, 2014).It is also part of the COSYNA observing system in the North and Arctic seas.MARNET has a long tradition of monitoring in coastal waters (on unmanned light ships since 1984) and is operated by the BSH.This study uses data from North Sea MARNET station Deutsche Bucht (German Bight), located west of the island of Helgoland.It is an unmanned light vessel, located at position 54 • 10 N, 7 • 27 E.The observations started in 1989 at seven depths (3, 6, 10, 15, 20, 25, and 30 m) and meteorological measurements at 14 m height.Hourly data of water temperature and salinity from 6 m depth are used.There, the station is equipped with a CTD SBE 37-SIP MicroCAT (Sea-Bird Electronics Inc).The probe provides water temperatures with an accuracy of ±0.002 • C and conductivity values with an accuracy of ±0.0003 mS cm −1 .

The BSHcmod v4 model
The BSHcmod is a 3-D baroclinic ocean circulation model for the North Sea and Baltic Sea (Dick et al., 2001)  patterns can be viewed at BSH (2015), along with the pertaining statistical distribution.The monthly simulated BSH mean surface circulation for the whole North Sea is published in several reports, e.g. in Loewe (2009) and Loewe et al. (2013), showing a pronounced seasonal as well as inter-annual variability strongly related to the atmospheric circulation pattern over the North Sea.This has also been described in a detailed review of the physical oceanography for the North Sea by Otto et al. (1990).
The model is based on the Reynolds-averaged Navier-Stokes equations which are discretized on a geographical Arakawa-C grid and on adaptive vertical coordinates.A twoway nesting approach is applied with a coarse-resolution grid (5 km grid spacing) in the North and Baltic seas and a fineresolution grid (900 m grid spacing) in the German Bight and the western part of the Baltic Sea (focus region).Internally, BSHcmod v4 makes use of adaptive layers with variable thickness (8 m in the English Channel, 1-2 m in the German Bight), depending e.g. on tidal amplitude.There are 36 layers in the coarse grid, and 25 layers in the fine-grid domain.The mixing scheme used in the model is described by Dick et al. (2001).When archived, BSHcmod data are interpolated on a coarser grid with constant vertical layer resolution.Thus, the archived data at forecast time step 0 applied here are from the surface layer, which has a thickness of 5 m and a temporal resolution of 15 min.
Meteorological forcing is provided by the German Weather Service (Deutscher Wetterdienst, DWD) (Doms and Schättler, 1999).The 10 m wind components are extrapolated from the lowest pressure level height data, considering also the stability conditions in the Prandtl layer.The freshwater input into the North Sea is estimated using the daily averaged data of 5 rivers (i.e. the Rhine, Ems, Weser, Elbe, and Eider), obtained from river gauge observations.For the remaining rivers in the North Sea, the constant mean annual values of freshwater runoff are used (in total 80 rivers).The temperature of the river water is set to equal the temperature of the grid cell where river inputs discharged.The salinity of the inflowing river water is assumed to be zero.The BSH model simulates tides based on 14 harmonic constituents which are provided at the open boundaries in the northern part of the North Sea (60 • 30 N) and the western part of the English Channel (4 • 3 E), as well as external surges, computed by a 2-D model of the north-eastern Atlantic (Brüning et al., 2014).Additionally, at the open boundaries, the model is forced by monthly mean T/S profile data of the climatology compiled by Janssen et al. (1999).Draining and flooding of tidal flats is also taken into account.

The FOAM AMM7 NEMO model
The AMM7 includes a 3-D hydrodynamic component based upon the NEMO model, which is included as part of the Met Office Forecasting Ocean Assimilation Model (FOAM) suite of forecast systems that run daily and include assimilation of in situ observations.The AMM7 system also contains the ERSEM ecosystem model (Baretta et al., 1995;Blackford et al., 2004;Siddorn et al., 2007).
The model domain encompasses the European northwestern continental shelf on a regular lat-lon grid (42-65 • N, 20 • W-13 • E) resolved on a 1/15 • (lat) by 1/9 • (lon) grid.To get the correct vertical resolution of the terrain, hybrid ssigma terrain-following coordinates are applied with 50 levels (interpolated onto 24 geopotential levels for data distribution, e.g. for the MyOcean database).
The NEMO model itself is a community model particularly developed in Europe (Madec, 2008).Though it has been developed for the deep ocean, it has then been modified for usability for shelf seas.Details of the model and its implementation are given in O' Dea et al. (2012).Vertical mixing is resolved using the generic length scale (GLS) model and a second-moment algebraic closure model for the two dynamical equations of turbulent kinetic energy (TKE) and TKE dissipation.
The system assimilates observations using an optimal interpolation scheme (Martin et al., 2007), with updates described in Storkey et al. (2010) and adaptations to enable it to address the particular requirements for shelf applications (O'Dea et al., 2012;Siddorn et al., 2007).The assimilation system uses a first guess at appropriate time (FGAT) scheme to calculate model-observation differences (innovations) which are converted to model increments using an iterative method.A daily analysis window is used, with the model being rerun for the same day with an incremental analysis update (IAU) scheme to update the model state using these increments.Only sea surface temperature (SST) data are assimilated.Temperature and salinity profile assimilation along with sea surface height assimilation are technically more challenging in the shelf environment and will be implemented as future developments to the system.
Data assimilated include in situ data and level-2 satellite SST data provided by the Global High-Resolution Sea Surface Temperature project (GHRSST).In situ data are obtained from a variety of sources and include measurements taken by ships, moored buoys, and drifters.Satellite observations are obtained from the Advanced Microwave Scanning Radiometer-Earth observing system (AMSRE), the Advanced Along-Track Scanning Radiometer (AATSR), and the Advanced Very High Resolution Radiometer (AVHRR) instruments on board the NOAA and MetOp satellites.Also assimilated are data from the geostationary Spinning Enhanced Visible and Infrared Imager (SEVIRI).All data are quality controlled and a bias correction scheme, based on comparisons to in situ and AATSR data, is applied to the AMSRE, AVHRR, and SEVIRI observations.A full description of the satellite data types, and the scheme used to correct them, can be found in Donlon et al. (2012).
It is worth noting that although a number of SoO data were assimilated into the system, including reasonable data density in the southern North Sea, the FerryBox data used in this study were not available for assimilation and so were not included.
At the open boundaries, AMM7 is one-way nested into the Met Office operational FOAM 1/12 • deep ocean model (Storkey et al., 2010).River flow is specified for 320 European rivers, whereby the temperature of the river water is specified as the SST of the model box at the river point and the river flow is specified by the river flow climatology.The river input is assumed to be of zero salinity.The method for obtaining and adjusting these monthly climatologies is described in Young and Holt (2007).
The data for the flux between the Kattegat and the Baltic are derived from the Danish Hydrographic Institutes' Dynamics of Connected Seas (DYNOCS) experiment and are applied as a monthly mean climatology of vertical temperature and salinity structure.The atmospheric forcing is provided by the Met Office Numerical Weather Forecast model.
For the present study, the AMM7 data set of the analyses is provided by the MyOcean database (McLaren et al., 2015) in hourly time resolution and 7 km grid resolution.Data are taken from the surface layer, which in the shallow waters of the southern North Sea is valid for approximately the surface metre or less of the water column.

Statistical measures
A variety of statistical measures were applied to evaluate the model performance.Since the time periods for the evaluated models are different, the statistical measures are valid for different but overlapping time periods.
If the observations are denoted as obs and model predictions as sim, the bias can be described as the difference between the mean of simulations and the mean of observations, i.e. bias = sim − obs.

Thus, negative (positive) bias means model underestimation (overestimation). The standard deviation of error (stde) is calculated by stde
The root mean square error (RMSE) is then calculated from bias and stde, namely, RMSE = (bias) 2 + (stde) 2 , and the skill variance (skvar) is the ratio of standard deviation The index of agreement (IOA) was first described by Willmott (1981), and it is described as It is a standardized measure of the degree of model prediction error and varies between 0 and 1.A value of 1 indicates a perfect match, and 0 indicates no agreement at all.The index of agreement can detect additive and proportional differences in the observed and simulated means and variances; however, it is sensitive to extreme values due to the squared differences (Legates and McCabe, 1999).
The cost function (cf) field, introduced by Berntsen and Svendsen (1999) and later adapted by Søiland and Skogen (2000), is a measure for discrepancies of parameter F between model and observations, normalized by the standard deviation of the observations where F SD min denotes the minimum allowed amount of the standard deviation, which then prevents the cf from going into infinity.The cost function is the mean of the absolute cost function values of the field the analysis has been applied to.For example, a cf value of 0.5 means that the model error is on average 0.5 times the standard deviation of observations.So, the difference between model and observation is related to the normal variation of the field variable (Søiland and Skogen, 2000).

Methods
The southern North Sea has different regions with different characteristics.To take that into account, three positions for detailed investigation have been selected for the time period of 2006-2013 (Fig. 1): Position p1 is situated near to the coast and not far from the mouth of the Humber estuary.It is influenced both by the freshwater discharge from the Humber estuary and by the southerly flowing cold Scottish coastal water current (< 15 • C), originating from the North Atlantic (OSPAR, 2000) (Fig. 2).
The second point (p2) is located near the Oyster Ground, a region with water depths of up to 40 m.TorDania travels along the German and Dutch coasts to England and back.In Petersen et al. (2011)  this point.The region is thermally stratified in the summer season and belongs to a transition zone between the stratified central North Sea and the well-mixed coastal zones (Fig. 2).Due to spring algae bloom, stratification in the summer leads to low oxygen concentration, which is a serious problem considering the predicted warming climate and oceans (Queste et al., 2013).Together with salty water (> 35) from the English Channel, frontal zones form in this region, as has been observed e.g. by FerryBox measurements (Petersen et al., 2011).
The German Bight area, where p3 is situated, is influenced by the Continental Coastal Water current, the input of freshwater, and the exchange processes between the Wadden Sea and the North Sea (e.g.exchange of nutrients, suspended matter, tidal flow).The German Bight also has one of the highest tidal amplitudes of the North Sea (> 4 m) (OSPAR, 2000).
Model data have been taken from the HZG model archive (BSHcmod) and from the MyOcean database (AMM7).For BSHcmod, data from the surface box down to 5 m depth were taken, with instantaneous grid values at 15 min resolution.AMM7 is taken from the surface box of the model and has instantaneous values for a 7 km grid box mean every hour.As the model uses an S-sigma coordinate vertical discretisation, the depth of these data varies as a function of the total water depth from about 10 cm in the shallowest waters to approximately 1 m in the deeper parts of the southern North Sea.
FerryBox data with 10 s resolution have been taken from the HZG FerryBox database.For the detailed analysis of the three positions in the North Sea, an internal search routine of the HZG database has been applied with a search radius of 5 km around the fixed positions p1, p2 and p3 (5 km is the default search radius).The retrieved FerryBox time series have been interpolated using a nearest neighbour approach for model time steps with a time range of ±30 min.Also, the nearest model grid point of BSHcmod and AMM7 has been allocated to those fixed positions.
For the evaluation of the complete transect between the UK and Germany, FerryBox data from 5 m depth along the complete transect with a time resolution of 10 s have been sampled on a longitudinal grid with intervals of 0.05 • length for 3 years.For each position of this track, a time series with hourly resolution has been created.Accordingly, model data of BSHcmod v4 and AMM7 have been interpolated on the same longitudinal track with intervals of 0.05 • .

Validation of FerryBox data
A calibration with discrete samples was done to validate the FerryBox salinity measurements.On both ships -TorDania and Lysbris -water samples have been taken at fixed stations and analysed in the laboratory.Generally, it is not feasible to compare FerryBox water temperature measurements to water samples analysed in the laboratory, so instead a cross-check between the TorDania FerryBox and MARNET observations was done.
In Fig. 3, comparisons of FerryBox measurements and laboratory analyses of salinity for TorDania and Lysbris are shown.The water samples are taken regularly along the Fer-ryBox transect from 2007 to 2011 and from 2009 to 2012, respectively.The data correspond in both cases very well, and only a few outliers were observed.Note the different scales of salinity in the graphs.In the case of Lysbris, a higher range of salinity values is covered.This is due to the included Fer-ryBox route section in the Elbe River estuary up to the port of Hamburg.The correlation is 0.96 for TorDania and 0.99 for Lysbris, which indicates a high reliability of FerryBox salinity measurements.The RMSE for Lysbris salinity is slightly lower than for TorDania (0.68 compared to 0.79).
For the evaluation of water temperature accuracy, MAR-NET measurements were compared to FerryBox observations for the German Bight region.TorDania passes the MARNET station (p3) every second day on its way between Cuxhaven (GER) and Immingham (UK).Only TorDania data from 2007 until 2011, recorded less than 10 km away from MARNET, have been considered.
For both parameters, water temperature (left panel) and salinity (right panel), a good agreement was observed in Fig. 4. TorDania water temperature measurements are higher than corresponding MARNET observations.The bias (FerryBox-MARNET) amounts to 0.37 • C. The stde of the FerryBox in regard to the "truth" that MARNET provides amounts to 0.42 • C. The bias is probably due to the relatively long time lag from the time water is pumped into the Ferry-  Box to the time it reaches the FerryBox sensors.In comparison, the MARNET temperature sensors are submerged in the water.Due to this potential bias, FerryBox water temperature measurements have been corrected using a simple additive correction method (Sperna Weiland et al., 2010).
Grayek et al. ( 2011) also compared FerryBox data to MARNET observations and the OSTIA satellite data package (Donlon et al., 2009) and found similar agreement between the temperature data sets.
At MARNET station, water temperatures are measured at several depths, including 3 and 6 m.To get a more concise picture of variation of water temperature in the surface layer, data at both depths have been analysed.The mean water temperature difference is 0.09 • C and the according standard deviation is 0.27 • C. The standard deviation of the time series is 5.32 (3 m) and 5.4 (6 m).Together with the findings of Grayek et al. (2011), it could be suggested that the vertical mismatch between FerryBox intake depth and model layer depths has only little influence on the outcome.
The time series of salinity are also in good agreement (Fig. 4).However, the figure shows a higher scattering than for water temperature, with the MARNET station observing higher values.The standard deviation of the difference amounts to 0.57; the determination coefficient amounting to 0.82 is, thus, not as high as for water temperatures.Therefore, the agreement between FerryBox and MARNET salinity observation is good; however, the 10 km distance between FerryBox measurements and MARNET data, along with the large influence of tides and river discharge on salinity, may explain the lower correlation.
All in all, this suggests that different FerryBox sensor observations are reliable, and that there is high agreement between different measurement systems (FerryBox and MAR-NET).The FerryBox parameters water temperature and salinity are well suited for comparison with model data, which will be described in the next section.

Transect comparisons for the southern North Sea
Together with model output of BSHcmod v4 and AMM7, the complete TorDania transect between Germany and England has been analysed regarding differences in simulated and observed water temperatures and salinity.

Water temperature
In Fig. 5

BSHcmod
At first glance, water temperature differences range around ±1.5 • C for BSHcmod.Several spatial aspects can be determined in combination with Fig. 6a, which shows the temporal mean along the transect of bias, standard deviation of error (stde), RMSE, and skill variance (skvar).In Fig. 5a, differences are mainly positive in winter months and mainly negative in summer months.The bias in Fig. 6 ranges between −1 and +0.4 • C, while the stde varies around 1 • C. Thus, the RMSE is around 1 • C. The mean bias for the whole transect is −0.02 the variability of the data.The optimal value is 1.In Fig. 6, the overall skvar ranges between 0.8 and 1.1.
Close to the English coast, temperature differences are systematically negative, dropping down to −2 • C. In Fig. 6a the corresponding bias is also −2 • C for this region.The stde reaches 2 • C and the skvar reaches its minimum amounting to 0.8.Eastward, between 0.5 and 2 • E, biases are around 0.5 • C, stde is 0.8 • C and the RMSE is 1 • C. Figure 5a indicates that there is a seasonal cycle in the bias around 1 • E, with differences dropping below −1 • C in the late summer season, during all 3 years.Normalized differences are small and positive in the winter.This seems to be a systematic model error, probably caused by too weak vertical mixing in the Scottish coastal water current in this time of year.It could also mean that the flow of colder Atlantic water is overestimated by the model.It should be noted that the TorDania transect crosses the southern North Sea approximately along the transition zone between the stratified and well-mixed regions of the southern North Sea (Figs. 1 and 2.10 in OSPAR, 2000), and therefore small errors in the position of the seasonal front will cause biases in this region.In the central part of the transect, stde and RMSE range around the average values of 0.72, while in the German Bight (east of 7 • E), both reach values of 1 • C. Near the English coast, a local maximum of 0.8 • C (stde) and 1.1 • C (RMSE) is visible, together with a local minimum of bias and skvar, amounting to −0.8 • C and 0.8, respectively.On the central parts of the transect, bias varies around ±0.3, while skvar ranges around 0.9 and 1.0.In the German Bight, skvar reveals an overestimation of simulated water temperature variability near the German coast.

AMM7
Results for AMM7 (Fig. 5b) show general good agreement with FerryBox observations for April 2011 to April 2012, as the bias for the whole transect amounts to 0.19 • C (Table 1).However, some weaknesses are also revealed in AMM7 simulations of water temperatures off the eastern English coast near 0.5 • E and in the German Bight in 2011.The differences are as high as 2 • C near the English coast, and between −1 and +1 • C in the German Bight, depending on the seasons (overestimation in summer, underestimation in winter).
The statistical measures for AMM7 are shown in Fig. 6b, confirming the results in Fig. 5b: stde and bias show two local extreme positions; near the English coast and in the German Bight.The skvar is around 1 or slightly higher, reflecting good model performance for water temperature variability.
Overestimation of water temperatures near the English coast in 2011 around 0.5 • E indicates that the FerryBox observed a drop in temperature of around 1-2 • C in this area, which AMM7 did not catch entirely.In this region, the cooler Scottish coastal water current (characterized by temperature < 15 • C, OSPAR, 2000) seems to be underrepresented in AMM7.Keeping in mind that there is generally good agreement between AMM7 and observations, this suggests that the horizontal grid resolution of 7 km may not be sufficient to reflect the highly variable temperature field in this complex area.This holds also for BSHcmod near the English coast.While bias is around zero, the stde peaks to over 1 • C, reflecting the seasonal dependence of differences for AMM7 at the German Bight.

Salinity
As for water temperatures, the error in simulated salinity of BSHcmod and AMM7 has been calculated for the whole transect and is shown in Fig. 7a and b.Positive (negative) values show overly high (low) simulated salinity values.For both models, differences can be divided into three sectors all over the TorDania transect.Both coastal zones (English eastern coast and the German Bight) are dominated by high negative differences, whereas in the central part absolute differences are significantly lower, negative for AMM7 in most parts, and positive for BSHcmod.However, they are not significant, as they are lower than 2-fold SD of FerryBox.

BSHcmod
For BSHcmod, positive differences occur in the western part of the transect between 0.5 and 5 • E, meaning an overestimation of salinity by BSHcmod v4.They range between 0 and 1.While model underestimation at the western transect part is restricted close to the English coast, it reaches in the German Bight until 6 • E. For the coastal parts of the transect, also the bias in Fig. 8a is negative and the stde increases above the mean value of 0.68, whereas for the central part of the transect, the bias amounts to around 0.3, the stde to only 0.2.The mean salinity bias is slightly negative (−0.17, Table 1).The model salinity variability is well below 1 in the western part from 0 to 3 • E, having a minimum of only 0.15 at 0.8 • E. The mean of this sector amounts to 0.52.East of this sector, the Skill variance varies between 0.5 and 1.5.

AMM7
The salinity is generally underestimated by AMM7, except for the region between 3 and 6.5  ferences are significant.Between 5 and 7.5 • E, differences are only significant from September to December 2011.The mean bias of AMM7 amounts to −0.89.Near the coasts, the bias is higher, amounting to −3.This also holds for the stde, which exceeds 2 near the coasts (Fig. 8b).In the central parts of the transect, the bias (stde) shows low variation and is between −1 and 0 (between 0 and 1).The AMM7 salinity skvar is between 0.3 and 1.2 over the whole transect, so in total no spatial dependences could be found.
A combination of several factors seems to be responsible for the underestimation of salinity in the German Bight for both models.First of all, the runoff from the Elbe River and thus the freshwater input into the region seems to be overestimated, although in BSHcmod v4 daily averaged runoff rates of German rivers are included.For AMM7, climatological runoff is provided.An underestimation of vertical mixing in the BSHcmod v4 simulation possibly contributes to the underestimation of the salinity by mixing bottom water with higher salinity into the top layer sampled by the FerryBox.In BSHcmod the western boundary of the high-resolution grid nested into the coarse North Sea grid is located at 6 • 10 25 E, which coincides with the boundary of the region with underestimation in salinity.A meanwhile corrected inconsistency in the two-way nesting scheme for current velocity during the analysed simulation period had a negative impact on the advection of salinity across the nesting boundary, which most probably substantially contributed to the underestimation of salinity in the German Bight.Further studies of vertical (and horizontal) mixing as well as investigations of the interactive coupling scheme have to be carried out.AMM7, on the other hand, is more limited near the coast in terms of special resolution than BSHcmod.The combination of poor representation of the river inputs along the German coastline with relatively coarse resolution and no representation of the wettingand-drying limits the AMM7 model in these regions.

Long-term measurement time series
In this section, time series of measurements and model simulations for the time period of 2009 to 2012 are presented.The observations have been recorded by the FerryBox of TorDania and Lysbris.To address the different results along the transect between the UK and Germany, described in the previous sections, three single positions in the southern North Sea have been selected.

English eastern coast
The time series of the water temperature difference at the English coast point (p1) for 2009 to 2011 is shown in Fig. 9a.The figure contains FerryBox data of TorDania and Lysbris, as well as model data of BSHcmod v4 and AMM7.The Tor-Dania time series from 2009 to 2012 has some data gaps in 2009.The time series of Lysbris generally has many gaps, because the vessel is at the same position only every 2 weeks.
Both models show similar behaviour, except for their bias (Fig. 9b).The bias of AMM7 temperature amounts to 0.39 • C, which is surprising since this model assimilates SST.In most other evaluations of the SST against in situ observa-tions, the bias has been an order of magnitude smaller.For example, McLaren et al. (2011) document a bias of 0.02 • C in the southern North Sea as a whole, and for a buoy in the German Bight of −0.01 • C. The BSHcmod v4 bias is below zero, amounting to −0.28 • C. The temperature variability is matched well by both models, as the simulated SD is nearly the same as the observed, and skvar is around 1. Seasonal variation is simulated well by AMM7.However, BSHcmod v4 slightly overestimated the winter low water temperatures in January 2011 and underestimated the summer temperature maximum, resulting in positive differences in winter and negative differences in summer.This holds also for 2010.Despite these mismatches, the IOA of BSHcmod v4 is 0.99, as high as for AMM7.This is also visually demonstrated by the high level of agreement shown in the scatterplot of Fig. 9b.
Results of comparison between salinity observations and simulations for the eastern England coast are shown in Fig. 9c, and statistical measures in Fig. 9d.In the time period of 2009-2012, observations range between 30 and 35, with a mean value of 33.03.Some low-salinity events occur below 30, mainly in winter months.These low-salinity events are not entirely reproduced by BSHcmod in 2010 and 2011, resulting in high positive differences.Generally, BSHcmod v4 salinity ranges around 33.67, with a bias of 0.64.
AMM7 starting in April 2011 gives salinity values between 30 and 34, with a bias of −0.72.The mean FerryBox salinity for the AMM7 period is 33.38.The skvar for AMM7 is 0.94, which is better than for BSHcmod (0.46).But the IOA is slightly higher for BSHcmod (0.53) than for AMM7 (0.37).BSHcmod does not capture the high variability seen in the observations, with the variation mainly showing oscillatory changes as would be expected from water mass movements due to tidal fluctuations near the English coast.In conclusion, BSHcmod v4 results overestimate, and AMM7 results underestimate, salinity.This is also shown by the different signs of the cost function (cf) results (negative for BSHcmod v4, positive for AMM7).
The reduced level of agreement in both models can for the most part be explained by the model forcing concerning freshwater discharges.For most rivers entering the North Sea and the Baltic Sea, BSHcmod uses either river runoff data derived from measured water levels or runoff forecasts of a hydrological model of the Swedish Meteorological In-stitute (SMHI).For British rivers, BSHcmod uses constant annual mean values.Therefore, at the eastern coast of England, the BSH model shows only weak seasonal fluctuations and is not able to simulate the large observed fluctuations.The AMM7 model also uses climatological runoff data for British rivers, but monthly variations are included, and this is visible in Fig. 9c.

Oyster Ground
At Oyster Ground point p2, BSHcmod and AMM7 simulations of water temperatures match observations most of the time.The water temperatures differ mainly in summer seasons (upper left panel), ranging between −1.8 and 1.8 • C. The annual cycles between the two models are also similar (not shown).Agreement is apparent in the statistical analyses, shown in Fig. 10a.The statistical measures in Fig. 10b are in a similar range as for the English coast (p1), giving 0.99 for the IOA and near 1 for skill variance (skvar).The bias for both models is on a low level, slightly negative for BSHcmod v4 (−0.02 • C), positive for AMM7 (0.15 • C).In Fig. 10c, the time series of salinity difference for the Oyster Ground point p2 are shown.The mean level of observed salinity (mean value = 34.43)has been slightly overestimated by BSHcmod v4 (mean value = 34.68)and underestimated by AMM7 (mean value = 34.11).This is visible in Fig. 10c, which shows mainly positive differences for BSHcmod and negative differences for AMM7.The observed variability was not accomplished by either model.Although AMM7 skvar is around 1, the IOA is only 0.3 (for BSHcmod v4 0.53) (Fig. 10d).
As was already described in Petersen et al. (2011), lowsalinity intrusions can be observed in that North Sea region, often originating from the Rhine/Maas River estuary.The salinity dropped in 2011 to a level of 33.5.In 2008, an even more pronounced salinity drop to 32 was observed (not shown).The drop event of 2011 has been recognized by BSHcmod v4 and AMM7; however, the amplitude has been underestimated, resulting in high differences between model and in situ data.This is also visible in Fig. 7b by positive values between April and June 2011 for AMM7.However, subsequent to the observed salinity drop, AMM7 shows a second, even more pronounced drop in summer 2011 which has not been observed by the FerryBox and by BSHcmod v4 at that position.
Therefore, both models are able to simulate riverine influence in most of the North Sea, except near river outflows.However, mixing of coastal and estuarine water is probably underestimated in the models.It is known for example that the AMM7 model underestimates the tidal amplitudes in the German Bight (MyOcean QuID, McLaren et al., 2015), which will result in reduced flushing of the freshwater input to the region.This is likely to be partially responsible for the underestimates of salinity in the region.
Moreover, long-persisting low-salinity water masses, as reported by Petersen et al. (2011), seem to cover only small scales in space and could be missed either by the model or the FerryBox travelling along the route, resulting in higher discrepancies between model and FerryBox.In this context, the different spatial characteristics of model and FerryBox should be noted.Whereas the FerryBox samples data of spots along a track, the model represents means of an area of several tens of square kilometres.

MARNET German Bight station
In Fig. 11a, the annual cycles of water temperatures for MARNET, FerryBox on TorDania and both models are shown.The highest water temperature amplitude of the analysed time period is observed in 2010, with an 18 • C seasonal water temperature range and summer water temperature around 19 • C. In 2011, the summer water temperatures were lower, reaching only 16-18 • C. In general, BSHcmod v4 data are in agreement with the observations.In 2010 and 2011, BSHcmod v4 agrees well with MARNET and Ferry-Box observations, except for two time episodes: in July 2010, the observed temperature maximum is also recognized correctly by the model; however, the temperatures in September are too low.In 2011, the summer maximum in July and August is underestimated by up to 2 • C. It is clearly seen in Fig. 11a that not only the annual cycle, but also smaller variation, are present in the BSHcmod v4 model.In Fig. 11b scattering of water temperatures reflects the overall good agreement between observations.There is also good agreement between AMM7 and observations for the year 2011 (Fig. 11a).
Consequently, the bias of AMM7 water temperatures is −0.24 • C and, thus, is in the range of p1 and p2.The bias of BSHcmod v4 is 0.02; however, the stde of BSHcmod v4 is higher (1.2 • C) than for the English coast (0.99 • C) and for the Oyster Ground (0.87 • C), while for AMM7 the stde is slightly lower (0.98).The skvar and the IOA are for both models near the ideal value of 1. Also, the cost function is near zero for both models, meaning that simulations are well within the standard deviation of observations.However, both values are higher than for p1 and p2.
Figure 11c shows the time series of salinity, which features three large salinity drops below 31 in June 2010 and January and May 2011.The first one lasts more than 1 month and is represented by BSHcmod v4, albeit later than observed.The next low-salinity event in January 2011 is also seen in the BSHcmod results, although slightly underestimating the freshening.The third event is recognized by BSHcmod v4; whereas in the observations the salinity quickly returns, BSHcmod salinity remains low for the summer period.The AMM7 does not represent well the timing or variability shown in the observations.
In summer, the simulated salinity drops to below 32, while observations show values of around 33.This holds not only for the MARNET position, but also for the German Bight east of 6 • E. Salinity observations of MARNET have been analysed for two depths -6 and 30 m.In summer, differences between the surface and the bottom show a (thermalinduced) stratification, also apparent in salinity.However, the stratification is not stable throughout the summer.Several episodes of mixing are reflected by very low temperature difference between surface and bottom (< 0.2 • C).During strat-ification, temperature differences of 3.5 • C can occur.We assume that both models overestimate the thermal stratification (and, thus, underestimate vertical mixing) in summer, leading to fresher water masses.This would hold for 2010 and 2011.
The statistical measures are shown in Fig. 11d.The bias of salinity simulation is negative for BSHcmod v4 (−0.33) at the German Bight, while positive at the English coast and Oyster Ground.The bias for AMM7 is on the same level as for BSHcmod v4, i.e. negative with a value of −0.39.This is in line with p1 and p2.It should be noted that the statistics for the AMM7 and BSHcmod are calculated from differentlength time series, so despite the differences in statistics shown here, both models behave similarly over the period in which data are available for both.Skvar and IOA are much less for salinity than for water temperature.Yet, both models are mostly better for the German Bight than for the other regions.

Discussion
The statistical tests indicate that AMM7 could be improved by reducing the offset of mean temperature levels (AMM7 0.19 • C, Table 1).We think that poor representation of river flows is a major contributor to the biases shown in the models.The bias in the AMM7 seems higher than one would expect in an assimilating model and is slightly at odds with previous results (e.g.O'Dea et al., 2012, who find a bias of around 0.1 • C).Given the higher than expected biases in this study compared to others, observational errors must be considered.For example, we calibrated the FerryBox data using data of a single MARNET station.This may have introduced errors into the calibration given that spatial gradients are high in the region.
There is a slight misfit in BSHcmod simulations of the annual cycle of water temperatures (too low in winter, too high in summer).The bias is near zero, but the RMSE is twice as high as for AMM7.Both models reveal deficits in the prediction of variations of water temperatures near the coasts, and in particular in the cold Scottish coastal current (only for BSHcmod v4).That is probably due to weak vertical mixing or overestimation of cold water currents in BSHcmod, especially at the end of summer.This particular circumstance has to be further investigated to deepen the understanding of the underlying processes.BSH is currently transitioning to a new model code (HBM, HIROMB-BOOS model) which uses a different vertical mixing scheme.We recommend further model evaluation to analyse the expected benefits from that transition.
Comparisons of salinity show much higher differences between observations and simulations and reveal geographical dependencies of the model performance.Altogether, both models show certain limitations.
BSHcmod does not capture properly the variability or the correct salinity range in the German Bight east of 6 • E. This may be due to a deficient model input of freshwater river forcing.Otherwise BSHcmod v4 generally accurately captures salinity for the open North Sea.AMM7 generally performs well in the central parts of the North Sea, but misrepresents the salinity distribution near the coasts.
Low-salinity events occurring in the southern North Sea are caught by BSHcmod v4 and AMM7 to some extent.In order to improve salinity values in the model, we recommend using validated daily freshwater input data for all main rivers entering the North Sea.
The models' representation of vertical and horizontal mixing as well as river boundary conditions should be further studied.In addition, for BSHcmod v4, the nesting process of different grid sizes also has to be further evaluated.
FerryBox measurements, routinely validated for accuracy and precision using external checks and laboratory analyses, can serve as a reliable proxy for the state of the surface temperature and salinity variations in the North Sea.The operational FerryBox measurements are routinely checked against water probes.Salinity measurements are validated against laboratory analyses and revealed good results.The FerryBox and the MARNET measurements are also in good agreement.There is a bias of 0.37 • C in the water temperature measurements from the FerryBox, most likely caused by warming inside of the system.We recommended cross-checking the water temperature instruments with an additional certified temperature probe on board.
While FerryBox measurements are done along transects in European marginal seas, fixed stations provide longer time series at a particular site, but a lack of spatial information for the neighbouring regions.In this study, using the Ferry-Box and the MARNET data sets, both types of measurements were examined.
Previously, FerryBox transect data have been successfully assimilated in North Sea models, as has been demonstrated by Stanev et al. (2011) and Grayek et al. (2011).The latter have shown that FerryBox data provide reliable information with limited coverage.They could be analysed parallel to satellite-derived SST data (extracted from the OSTIA data set, Donlon et al., 2009) and to other measurements from fixed stations for increasing the information efficiency derived from the FerryBox data.For the Aegean Sea, Korres et al. (2009) also have assimilated FerryBox sea surface salinity (SSS) data together with AVHRR sea surface temperature data into a hydrodynamic model.They showed that the assimilation of satellite SST data enhanced the model performance.Additional FerryBox salinity data helped to improve model results even more, by significantly decreasing the RMSE statistics for the southern Aegean Sea.
Data assimilation of FerryBox data is performed in most cases using a Kalman filter approach to extrapolate 1-D data on 2-D fields.Since the influence of the assimilated FerryBox data is restricted to a rather shallow area around the Ferry-Box track, one method for data assimilation could be the use of particle tracking algorithms for (approximately) conserva-tive parameters like temperature and salinity in combination with 2-D North Sea current fields, e.g. of operational BSHcmod.A data assimilation scheme for operational use is under development at BSH (Losa et al., 2012(Losa et al., , 2014)).It is based on the local singular evolutive interpolated Kalman (SEIK) filter algorithm which has been coded within the Parallel Data Assimilation Framework (PDAF).So far this method has been tested during the assimilation of satellite-derived SST data along with vertical temperature and salinity profiles.
The AMM7 model already assimilates SST from SoO managed under the Joint WMO-IOC Technical Commission for Oceanography and Marine Meteorology, where telecommunications have been established to transmit data via the Global Telecommunications System (GTS) in real time.Fer-ryBox data, like the ones used in this study, could also be included relatively easily, if communications allowed it.
The operational implementation of FerryBox data is one of the next steps for completion of the scheme.An important next step is overcoming the delayed mode limitation of Fer-ryBox measurements for assimilation into operational forecast modelling systems.This has been partly achieved already, mainly at recently installed FerryBoxes using satellite communication.For the operational assimilation, operational post-processing of FerryBox data for quality assessment is also necessary and has also been partly established.Recommendations of real-time FerryBox data processing have been formulated e.g. in the Data Management, Exchange and Quality (DATA-MEQ) EuroGOOS working group and described in Petersen (2014).

Conclusions
In this study, we compared the hydrodynamic model simulations of BSHcmod and AMM7 to continuous operational FerryBox and MARNET in situ water temperature and salinity observations along the FerryBox route from England to Germany, as well as in detail for three positions also situated along the transect.For water temperatures, data assimilation gives a significant benefit for better performance for AMM7, reducing the RMSE to 0.44 • C compared to BSHcmod (no data assimilation, RMSE amounting to 0.72 • C).The bias in the AMM7 seems higher compared to other studies.This may be partly explained by the inability of the model to represent river input.
For salinity, model results reveal limitations, especially near the coasts, where river input, vertical mixing and tidal fluctuations are important features for the variability and general range of salinity.
The operational implementation of FerryBox data would be an important next step as previous studies showed benefits of assimilation of FerryBox data in North Sea models.Also, the assimilation of SSS data would be beneficial for model performance of salinity simulation, as has been noted by Korres et al. (2009).
Especially near the coasts, weaknesses of the models are apparent.They could be affected by wrong mixing and stratification simulation as well as misfits in river runoff simulation.More realistic river runoff data could increase model performance of salinity simulation.

Figure 1 .
Figure 1.FerryBox routes and crossing points in the North Sea.The blue line marks TorDania route Cuxhaven-Immingham and the red lines indicate Lysbris route England-Norway-Germany. Specific analysis points of FerryBox routes are indicated by black dots and labelled p1, p2, and p3.p1 is situated at the English eastern coast.p2 marks the analysis point in the Oyster Ground area.At p3, MARNET station Deutsche Bucht is located.The hatched area marks the transition zone between well-mixed and stratified surface layers (adapted from OSPAR, 2000, and Becker, 1990).

Figure 3 .
Figure 3.Comparison of FerryBox salinity measurements and water sample analyses in the laboratory for TorDania (left) and Lysbris (right).

Figure 4 .
Figure 4. Comparison of water temperature (left) and salinity (right) measurements in the German Bight at geographical point p3 from 2007 to 2011.

Figure 5 .
Figure 5. Differences in water temperatures for the TorDania transect (left side BSH-TorDania 2009-2011, right side AMM7-TorDania 2011-2012).The eastern England coast is located on the left side, the German Bight on the right side.Positive values indicate model overestimation.Differences are statistically significant beyond ±0.84K (2-fold SD of FerryBox, in red and purple colours).
, the water temperature differences from 2009 to 2011 for BSHcmod (a) and from April 2011 to April 2012 for AMM7 (b) are shown.Note the different timescales of the model comparisons in both figures.Positive (negative) differences indicate overly high (low) simulated temperatures.The differences have been marked in the figure according to the double SD of the FerryBox data, which has been described in the previous Sect.2.7.Thus, differences beyond ±0.84 • C for water temperatures and beyond ±0.8 for salinity are statistically significant.Gaps in the data in 2009 and 2010 are due to FerryBox malfunction.

Figure 7 .
Figure 7. Differences in salinity for the TorDania transect (left side BSH-TorDania, right side AMM7-TorDania).The eastern England coast is located on the left side, the German Bight on the right side.Positive values indicate model overestimation.Differences are statistically significant beyond ±0.8 (2-fold SD of FerryBox, in red and purple colours).

Figure 9 .
Figure 9. Upper panels: time series of temperature differences and absolute FerryBox values (in green) (a) and scatterplot of temperatures (b) of FerryBox measurements of TorDania and Lysbris and model results at the eastern coast of England (p1).Lower panels: time series of salinity difference and absolute FerryBox values (in green) (c) and scatterplot of salinity (d) of FerryBox measurements of TorDania and Lysbris and model results at the eastern coast of England (p1).

Figure 10 .
Figure 10.Upper panels: time series of temperature differences and absolute FerryBox values (in green) (a) and scatterplot of temperatures (b) of FerryBox measurements of TorDania and Lysbris and model results at the Oyster Ground (p2).Lower panels: time series of salinity difference and absolute FerryBox values (in green) (c) and scatterplot of salinity (d) of FerryBox measurements of TorDania and Lysbris and model results at the Oyster Ground (p2).

Table 1 .
• C (Table 1); the mean RMSE is 0.72 • C. The skill variance evaluates the model's ability to reproduce Statistical measures for performance analysis of BSHcmod v4 and AMM7.