Modelling temperature and salinity in Liverpool Bay and the Irish Sea : sensitivity to model type and surface forcing

Three shelf sea models are compared against observed surface temperature and salinity in Liverpool Bay and the Irish Sea: a 7 km NEMO (Nucleus for European Modelling of the Ocean) model, and 12 km and 1.8 km POLCOMS (Proudman Oceanographic Laboratory Coastal Ocean Modelling System) models. Each model is run with two different surface forcing datasets of different resolutions. Comparisons with a variety of observations from the Liverpool Bay Coastal Observatory show that increasing the surface forcing resolution improves the modelled surface temperature in all the models, in particular reducing the summer warm bias and winter cool bias. The response of surface salinity is more varied with improvements in some areas and deterioration in others. The 7 km NEMO model performs as well as the 1.8 km POLCOMS model when measured by overall skill scores, although the sources of error in the models are different. NEMO is too weakly stratified in Liverpool Bay, whereas POLCOMS is too strongly stratified. The horizontal salinity gradient, which is too strong in POLCOMS, is better reproduced by NEMO which uses a more diffusive horizontal advection scheme. This leads to improved semi-diurnal variability in salinity in NEMO at a mooring site located in the Liverpool Bay ROFI (region of freshwater influence) area.


Introduction
The Irish Sea is a semi-enclosed shelf sea located between Great Britain and Ireland; Liverpool Bay is an area of the eastern Irish Sea, bordered by the North Wales and Lan-cashire coasts.The Irish Sea is a typical coastal sea of the northwest European shelf and is subject to large tides, freshwater influence, high levels of suspended sediment, and human exploitation (e.g.wind farms, oil platforms, shipping).The eastern side of the Sea is shallow, with depths generally less than 50 m.A deeper channel runs down the western side of the Sea with typical depths of 80-100 m, but reaching 150-200 m in parts (Fig. 1c).
Liverpool Bay in particular is considered to be a region of freshwater influence, or ROFI (Simpson, 1997), and is subject to input from several major river systems including the Conwy, Dee, Mersey, and Ribble estuaries (Fig. 1d).The freshwater input to Liverpool Bay is estimated to be 7.3×10 9 m 3 annually (Polton et al., 2011).The interaction of the resulting horizontal salinity gradient and strong tides leads to a cycle of stratified and mixed conditions known as strain induced periodic stratification (SIPS) (Simpson et al., 1990;Howlett et al., 2011;Polton et al., 2011).Briefly, on the ebb tide vertical shear in the tidal current (due to bed friction) means that surface fresh water is advected over saline water, creating stratified conditions.Excluding non-linear effects, on the flood tide the return vertical shear flow restores the stratification to its original profile.
The complex dynamics of the Irish Sea, and especially Liverpool Bay, pose a difficult challenge to modellers, particularly the challenge of accurately modelling intermittently stratified waters.In this study we aim to quantify and compare the performance of three different models in this region.Two POLCOMS (Proudman Oceanographic Laboratory Coastal Ocean Modelling System) models are used which are both well established.A newer NEMO (Nucleus Published by Copernicus Publications on behalf of the European Geosciences Union.for European Modelling of the Ocean) model is also used.The NEMO code is currently undergoing rapid development and validation by the modelling community.

Models
The model domains and formulations used in this study were chosen because they are all standard domains that are widely used, including operationally.Each was run with two different atmospheric forcing datasets of different resolutions.This combination of runs allows us to investigate the impact of surface forcing resolution on the models, as well as evaluating the performance of the newer NEMO configuration against two well established POLCOMS models.
All models were run for the year 2008, with temperature and salinity outputs written hourly so that tidal variability is included in the results.Because of the high temporal resolution of the output, this was restricted to surface and bottom values only.This matches the distribution of the majority of the observations available.A summary of all 6 model runs is given in Table 1.

POLCOMS
POLCOMS is a three-dimensional hydrostatic baroclinic model that was designed to be particularly suited to modelling shelf sea regions.A brief summary of notable model features is given here, but the full description including the equations may be found in Holt and James (2001).
The model grid is formulated on the Arakawa B grid (Arakawa, 1972), in which both velocity components are defined on the same points, separated from scalar points by half a grid box.This was chosen because it helps to preserve horizontal features such as fronts (Holt and James, 2001).The vertical coordinates used are the s-coordinates of Song and Haidvogel (1994); these are terrain following levels whose spacing is allowed to vary horizontally.Grid points where the water depth is less than a specified critical depth revert to evenly spaced σ -levels (Song and Haidvogel, 1994).The critical depth used here is 150 m, which is deeper than almost all points within the Irish Sea.
The advection scheme used is the piecewise parabolic method (PPM) of Colella and Woodward (1984).The variable's concentration is assumed to vary across a grid box as a parabola, which is then integrated upwind.This method introduces low numerical diffusion giving it good featurepreserving properties (James, 1996(James, , 1997)).The turbulence closure scheme used is the κ-scheme (Burchard and Baumert, 1995) implemented by coupling with the General Ocean Turbulence Model (GOTM) (http://www.gotm.net).The POLCOMS implementations used in this study do not apply explicit horizontal diffusion.
Two well-established model domains are used in this study: the Atlantic Margin Model (AMM) (e.g.Wakelin et al., 2009) which covers the northwest shelf area, and a one-way nested Irish Sea Model (IRS) (e.g.Holt and Proctor, 2008).The horizontal resolution of the AMM grid is 1/9 • latitude by 1/6 • longitude, or approximately 12 km.The nested IRS model has resolution 1/60 • latitude and 1/40 • longitude which is around 1.8 km.The geographic extent and bathymetry of the model domains are shown in Figs.1a and  c.The models have 40 (AMM) and 32 (IRS) internal vertical levels, with additional 'virtual' levels placed below the sea bed and above the sea surface used in the calculation of flux boundary conditions (Holt and James, 2001).Typical baroclinic Rossby radius values in the western Irish Sea are 1-2 km (Holt and Proctor, 2003), and the semi-diurnal tidal excursion in Liverpool Bay (a more relevant length scale there) is 5-10 km (Hopkins and Polton, 2011).The 1.8 km IRS model is eddy permitting in the western Irish Sea and can resolve the tidal scale, whereas the 12 km AMM model does not resolve either scale.
Open boundary conditions for the outer AMM model were provided by the Met Office FOAM system North Atlantic model (Bell et al., 2000).The models were initialised using restart files provided by daily pre-operational models that run at the National Oceanography Centre as part of the Liverpool Bay Coastal Observatory (Howarth and Palmer, 2011).River flow inputs are long-term climatological mean daily values from a database of over 300 rivers (Young and Holt, 2007).The surface heat and salt fluxes are calculated internally using bulk formulae (see Sect. 2.1.3).

NEMO
In contrast to POLCOMS, NEMO was originally designed as a global ocean model.Consequently, there are several fundamental differences between the models.A full description of all model equations and techniques is found in the NEMO manual, which is freely available from the NEMO website (Madec, 2008).
NEMO is implemented on a C grid (Arakawa, 1972) in which u and v velocity components are defined on separate points.NEMO does not currently include the PPM tracer advection scheme used in POLCOMS, though this is planned to be included in a future release.A number of other schemes are available and the TVD (total variance diminishing) option described by Zalesak (1979) is used here.The TVD scheme would be expected to be more diffusive than the PPM scheme (James, 1996).Similarly to POLCOMS, the Canuto et al. (2001) κ-turbulence closure scheme is used.Explicit horizontal diffusion is applied using a Laplacian operator for tracers, and Laplacian and bi-Laplacian operators for momentum.
The vertical coordinate system is similar to the scoordinate system of Song and Haidvogel (1994) used in POLCOMS, but with a further modification that allows levels to run into the sea bed and be lost in areas with very steep slopes.The resulting hybrid z*-s system reduces the slope of the model levels and the associated horizontal pressure gradient errors (O'Dea et al., 2012), though this has little impact in Liverpool Bay.
The model domain used in this study is the 1/15 • latitude by 1/9 • longitude (approximately 7 km) Atlantic Margin Model which is based on version 3.2 of the NEMO code (O'Dea et al., 2012).This model is run operationally, coupled to the ecosystem model ERSEM, at the UK Met Office providing forecast products to MyOcean users (http://www.myocean.eu).Edwards et al. (2012) find that this operational NEMO-ERSEM model performs better at modelling nutrient and chlorophyll distribution than the POLCOMS-ERSEM system it has replaced.The horizontal extent of the domain is the same as the 12 km POLCOMS-AMM grid, although the NEMO model has 32 internal vertical levels rather than 40.As described above, these become regularly spaced σ -levels at most points within the Irish Sea.The horizontal resolution of 7 km is larger than the Rossby radius in the Irish Sea, but it is only approaching the scale of the tidal excursion within Liverpool Bay.
The model code was further modified for this study to use the same surface flux bulk formulae as POLCOMS, to allow more direct comparison between the models.River input was the same climatological data used in the POLCOMS runs.The lateral boundary forcing was again provided from the Met Office FOAM system.This is the same source as used for the POLCOMS-AMM boundary forcing, though files were provided separately as the models require differ- ent input formats.The runs were initialised from a restart file created at the end of a spin-up run.

Surface forcing
Surface heat fluxes were calculated internally in all three models using bulk formulae following the COARE v3 algorithm (Fairall et al., 2003).The input parameters required are air temperature, air pressure, wind speed, specific humidity, cloud cover, and precipitation.The input atmospheric data were provided by Met Office numerical weather prediction models.
The impact of changing surface forcing resolution was investigated by running two simulations with each model that are identical in every respect other than the surface forcing data.Two Met Office datasets were used: a global atmospheric model, which has resolution of 6 h and 0.83 × 0.56 • , and a northeast Atlantic model with resolution of 3 h and 0.11 × 0.11 • .The two forcing datasets have been checked for consistency and are well correlated with each other.The higher resolution dataset does not cover a wide enough geographic area to be used to force the POLCOMS-AMM and NEMO-AMM models, so the outer areas were filled by interpolating the lower resolution global dataset in time and space.For the rest of this paper, the low and high resolution forcing datasets are labelled LO and HI, respectively.The three models all interpolate the forcing data to their model grid resolution at run time.

Observations
The observational data we use to evaluate model performance were collected as part of the Liverpool Bay Coastal Observatory (CObs) and are freely available (Howarth and Palmer, 2011).Three sources of observational data are used in this study: -Regularly sampled CTD (conductivity, temperature, depth) survey grid in Liverpool Bay.
-Instrumented ferry that runs across the Irish Sea from Liverpool to Dublin or Belfast.
-Mooring located at Site A in Liverpool Bay.
Use of these different data sources allows us to compare model performance in the near-shore area (where the www.ocean-sci.net/8/903/2012/Ocean Sci., 8, 903-913, 2012 mooring and CTD grid are located) as well as in the offshore Irish Sea (the instrumented ferry), and provides confidence that the results are robust, as they are independent from one another.We have not used the available satellite-measured sea surface temperature data as these are already assimilated into the atmospheric model that is used to force the models.

CTD survey
CTD profiles were taken on regular cruises about every 6 weeks through the year, with measurements taken at 0.5 dBar intervals.Cruise dates in 2008 are shown in Table 2.There are a total of 34 stations spaced approximately on a 5nautical-mile grid (Howarth and Palmer, 2011), though they were not all sampled on every cruise.There were 9 cruises within the study year during which a total of 246 CTD profiles were taken; this included one 25-h station at site A where measurements were taken half-hourly.Figure 2 shows the mean location of the stations and the total number of profiles taken at each one during 2008.

Ferry
As a joint initiative between CObs and the FerryBox project, a commercial ferry run by Nofolkline (later DFDS Seaways) was fitted with sensors measuring near-surface parameters including temperature, conductivity, turbidity and chlorophyll (Howarth and Palmer, 2011;Balfour et al., 2007).Measurements were taken every 10 s, and sent remotely to data servers approximately every 15 min.The ferry ran daily from Liverpool to Dublin (or occasionally Belfast), effectively giving a repeated transect across the centre of the Irish Sea.The temporal coverage in 2008 is 2 January-22 October.Because the 10 s sample interval is far beyond the temporal resolution of the model outputs, the dataset was sub-sampled approximately every 10 min.In addition, any points where the recorded salinity was less than 20 were assumed to be within the dock and were not used.Points further north than 53.8 • N were also excluded to keep the focus on the east-west transect.This gives a total of 15102 data points which were used in this study.

Site A mooring
A Centre for Environment, Fisheries and Aquaculture Science (CEFAS) "SmartBuoy" mooring measuring near surface temperature and conductivity, as well as various biogeochemical parameters, is located at 53 • 31.8N, 3 • 21.6 W (referred to as Site A, and indicated by the dashed square in Fig. 2).Observations are recorded approximately hourly and after accounting for data gaps there are 7599 temperature and 6591 salinity observations in 2008.Co-located with the mooring is a bed frame also measuring temperature and salinity.The number of observations where both surface and bottom data are present is 7027 for temperature and 4042 for salinity.
The location of site A presents distinct challenges for model validation since it is within a region of strong lateral salinity gradients that are advected with the ebb and flood tide leading to significant intratidal variability.Therefore, these data are preprocessed by filtering in one of two ways before making model comparisons: 1. Time series analyses are graphically presented (Figs. 6,7,10) by computing a running mean with a moving window of 175 h (7 × 25) to give a simple tidal filter.
2. Before making statistical comparison with the model results, the tidal signal was removed using a Doodson X 0 filter (details in Pugh, 1987).This is a more robust way of removing the tidal signal that uses the 19 h before and after the reference value.The residual data were then also smoothed using a 6-h running average to remove any remaining high frequency noise.
For consistent intercomparisons at this site, the same processing was applied to the model results after being spatially interpolated to the site A location.

Analysis
For each observation type, the model results were interpolated in space-time to the locations of the observations.Results were then compared on a point by point basis.This is a strict test of the models, as there is no account taken of a slight difference in phase or location of features such as the Mersey river plume.Note that the CTD and ferry observations, and the model results compared against them, were not adjusted to remove the tide.
In order to objectively and systematically compare the performance of different models, it is necessary to make use of quantitative skill score metrics.In this paper we use the squared correlation coefficient r 2 , and a cost function χ following that used by Holt et al. (2005).RMS error is also calculated to give an idea of the size of model errors in physical , where N is the number of data points, and O n and M n are observed and modelled values, respectively.
The correlation coefficient r is defined by Eq. ( 1), and is a measure of how well the modelled and observed data fit a linear least squares relationship.A perfect fit has a value of r = 1, whereas non-varying model data would have a value of r = 0.The squared value, r 2 , represents the proportion of observed variability that is reproduced by the model.In this case we use the form r|r|, so that the distinction between positive and negative correlation is retained.
Ō and M are the mean of the observed and modelled values.
The cost function used in this study is defined by Eq. ( 2) (Holt et al., 2005).
where σ 2 o is the variance of the observations.This type of cost function is recommended by Allen et al. (2007, p. 402) due to its non-linearity, which rewards a good fit and punishes poor fit.
A cost function is a measure of the ratio of model error to the observed variance, and this particular form may be thought of as the RMS error normalised by observed standard deviation.We would expect a model with predictive skill to produce values of χ < 1 (Holt et al., 2005), i.e. the RMS error is smaller than the standard deviation of the observations.Holt et al. (2005) additionally identify the threshold χ = 0.4 as representing a "well modelled" variable.

Surface temperature
Figure 3 gives an overview of the results, with the model results plotted against observations for every comparison point.The plots show that all three models show a clear linear relationship between modelled and observed temperature.Closer inspection of the plots shows that all the models tend to overestimate warm summer temperatures, and underestimate cold winter values.This bias was reduced when the HI surface forcing was used.
The ferry and CTD comparisons also provide information on the spatial variation in model errors, shown in Figs. 4 (ferry) and 5 (CTD).RMS errors compared to the ferry data show a significant east-west difference in all runs using LO forcing, with higher errors in the eastern half of the Irish Sea (Fig. 4).This spatial pattern remains in the lowest resolution POLCOMS-AMM model with HI forcing, whereas the POLCOMS-IRS and NEMO errors are more homogeneous (and lower) across the observed region.RMS errors for each CTD station (Fig. 5) show a less clear spatial signal than could be seen in the ferry comparison, though the POLCOMS-AMM model appears to be generally worse in the east of Liverpool Bay, along the coast.Though it must be remembered that at 12 km resolution only a limited performance can be expected on these spatial scales.All three models show a clear improvement when using the HI forcing (Fig. 5 right-hand panels).
The r 2 correlation values (Tab.3) show that all models perform well at capturing the large-scale seasonal temperature cycle with r 2 values greater than 0.9.The values are similar across the models, although the 12 km POLCOMS-AMM model's correlation is consistently slightly lower than the other models.There is very little difference between the LO and HI results.
All models produce cost function χ < 1 (Tab.4) indicating they all have at least some predictive skill.However, there is more variation between the different models than is indicated by the r 2 values.Again, the 12 km POLCOMS-AMM is consistently poorer than the higher resolution models.The NEMO model performs as well as the higher resolution POLCOMS-IRS model on this score.There is a clear improvement in all three models with the HI forcing, which brings POLCOMS-IRS and NEMO below the χ < 0.4 "well modelled" threshold.In absolute terms, the RMS error (Tab.5) is typically reduced by around 0.4 • C when using the higher resolution surface forcing.The 7×25-h running mean temperature and the difference from this running mean for the mooring data and the models are shown in Fig. 6.The running mean surface temperature (panel a) is dominated by the very strong annual cycle and shows that all three models are too cold in winter and too warm in summer.This is improved when using the HI forcing dataset.The POLCOMS-IRS and NEMO results are very similar to each other, as is POLCOMS-AMM in the second half of the year.The deviation from the running mean (panel b) indicates that NEMO appears to be better reproducing the high frequency tidal advection of surface temperature.

Surface salinity
Figure 8 shows scatter plots for all surface modelobservation pairs over the study year.It is clear that none of the models predicts surface salinity as well as they do temperature.This is particularly striking when comparing the models with the mooring data.Both POLCOMS models overestimate the salinity range, whereas NEMO generally matches or underestimates the observed range.Both POLCOMS runs display a clear spatial variation in RMS error with high errors in the region of the Mersey plume, seen in both the CTD and ferry comparisons (Figs. 5 and 9).NEMO shows a similar, but much weaker, pattern.NEMO also shows lower errors than either of the POLCOMS runs in Liverpool Bay and the eastern Irish Sea, but is slightly worse than POLCOMS in the western Irish Sea.
The r 2 correlation values (Table 3) range from 0.3-0.6 against the CTD and 0.6-0.8against the ferry observations.This suggests the models do have some predictive skill, particularly in the open sea areas where most of the ferry measurements are taken, although they do not capture the seasonal variability as well as they do with temperature.The comparison with the mooring at site A on the other hand is very poor in all models with r|r| between −0.1 and 0.0.This suggests that none of the models are correctly predicting the salinity variability within Liverpool Bay.This is confirmed by Fig. 10, which shows that on longer timescales, the models' variabilities do not match the observed patterns at site A.
The cost function values (Table 4) also show that the model performance is significantly poorer (higher χ) than was the case with temperature.However, the NEMO models do produce values less than 1 against the CTD and ferry observations.The POLCOMS-AMM cost function is poor against all the observations, with values from 1.6-7.8.This is reflected in high overall RMS errors (Table 6).There is a less homogeneous response to the change in surface forcing than was seen in the temperature comparison, with improvements in some model-observation comparisons (e.g.POLCOMS-IRS vs. CTD) and little change in others (e.g.POLCOMS-AMM vs. CTD).There is a more consistent improvement in the overall statistics when the models are compared with the ferry data, which covers more open sea regions away from the Liverpool Bay ROFI.
The error distribution maps and statistics show that all the models display some skill at predicting salinity within the bulk of the Irish Sea.However, the models' performances are poor within the highly dynamic region close to the Mersey river plume.Again, it is important to remember that the model runs in this study were all forced using climatological mean river flow data.Howarth and Palmer (2011) show a similar comparison between the site A mooring data and results from the same POLCOMS Irish Sea model for the year 2010, where the model qualitatively appears to perform better than we find in this study.
Figure 10b shows the difference from the mean salinity for the mooring data and models at site A. It is clear that both POLCOMS models are significantly overestimating the salinity tidal variability.On the longer timescales indicated by the running mean salinity (Fig. 10a), there is also a large variation between the models.Again NEMO shows less variability than either of the POLCOMS models, consistent with the RMS errors shown in Table 6.Both POLCOMS models are generally too fresh, especially POLCOMS-AMM which is well below the observed salinity at site A, though this is improved when the HI surface forcing is used.

Discussion
The overall results show that all the models used in this study display significantly higher skill at predicting surface temperature than they do for salinity, particularly in the near coastal region of Liverpool Bay where the principal driving influence is the poorly determined freshwater forcing (Polton et al., 2011).By comparing the different model runs, we can assess the relative importance of surface forcing resolution and model type on the predictive skill.

Impact of surface forcing resolution
The surface temperature cost function against all three observation sources was significantly improved when the higher resolution forcing data were used, from 0.43-0.47 to 0.32-0.33.This corresponds to a general improvement in the instantaneous temperatures throughout the record.There was, however, little increase in the r 2 values since this is a measure of the skill in reproducing the annual cycle, which was already very high with values greater than 0.95 in both runs.This is because the r 2 score is dominated by the large seasonal cycle in temperature that the models capture well overall.Although the value of r 2 does not change much, Fig. 6a shows that the summer high and winter low biases are reduced in all the models when the HI forcing is used.
The spatial distribution of model errors (Figs. 4 and 5) shows a marked difference between the HI and LO forcing, particularly in comparison with the ferry data.Initially the temperature errors were generally larger in the eastern Irish Sea than in the west.With the increase in forcing resolution, the error was reduced across the domain leading to a more homogeneous error distribution in POLCOMS-IRS and NEMO.In POLCOMS-AMM there remains a strong eastwest pattern, but the magnitude of error is reduced.It should not be forgotten that the 12 km AMM model is not generally used for near-coastal work, and it performs well in most of the Irish Sea.The overall temperature RMS error (Table 5) was reduced by around 20-30 % when using the HI forcing.
The running mean temperature (Fig. 6a) clearly shows that the surface forcing resolution has more impact on the seasonal cycle results than the choice of model, particularly the choice between POLCOMS-IRS and NEMO.An unpublished preliminary NEMO run which was forced by hourly surface flux data (rather than bulk parameters) produced better results again than either of the runs discussed here, particularly in the winter.The high frequency variability (Fig. 6b) on the other hand is generally similar in the LO and HI runs, especially in NEMO.
Surface salinity showed a somewhat weaker response to the change in forcing, with RMS errors reduced in some areas, particularly against the ferry data, but increased in others (Figs. 5 and 9).This is not entirely surprising since we would expect surface temperature to be more sensitive to changes in the surface forcing, as it is directly forced by atmospheric heat fluxes.Salinity in the Liverpool Bay region on the other hand is predominantly influenced by the balance between riverine and oceanic freshwater inputs and is less strongly linked to surface fluxes.However, an unpublished preliminary NEMO run which used hourly surface flux forcing further improved results for salinity as well as temperature.Further work is needed to establish whether this was

Differences between models
It is clear from the spatial maps of model error and the objective metrics that the 12 km POLCOMS-AMM model does not perform as well in Liverpool Bay as either of the other models used, particularly with surface salinity.This is not unexpected given the low resolution of the model, and it would perhaps be unfair to expect it to perform any better in this region.However, it is important to note that in the western Irish Sea the model results are comparable with the higher resolution models.Overall, POLCOMS-AMM was shown to have skill in modelling both the annual cycle of surface temperature, with r 2 > 0.9, and the tidal variability with the cost function χ < 1.
The salinity error map associated with the POLCOMS-AMM model on the other hand stands out as different to all the other models, with a very clearly defined area of large error in Liverpool Bay seen in both the ferry (Fig. 9) and CTD (Fig. 5) comparisons.The front associated with the freshwater plume from the River Mersey is known to move by up to 0.5 • east-west over the spring-neap cycle (Polton et al., 2011;Hopkins and Polton, 2011).This is equivalent to fewer than 3 grid boxes within the POLCOMS-AMM model, so we could expect to have large errors in the position of the front with this model.
The 7 km NEMO model is shown to perform as well overall as the 1.8 km POLCOMS-IRS model when using the same surface forcing.The correlation, cost function, and RMS error are all comparable in both surface temperature and salinity.Although the overall predictive skill scores are comparable, this is not to say that the sources of errors in the two models are the same.
There are large differences in salinity between the models on both the tidal and longer timescales (Fig. 10) at site A. Both POLCOMS models systematically overestimate the high frequency tidal variability in salinity at site A (Fig. 10b), whereas the variability of NEMO is much more similar to the observations.With all three models using the same river input data, this is a direct consequence of the models' abilities to simulate the strength of the front in Liverpool Bay. Figure 11 shows the variation with longitude of the horizontal salinity gradient of the models and the ferry and CTD observations.All three models reproduce the lower gradients in the bulk of the Irish Sea, but both POLCOMS models significantly overestimate the strength of the salinity gradient east of 4 • W, in Liverpool Bay.The NEMO model more accurately reproduces the observed gradient in this area.As all the models in this study were forced by the same climatological river input data, the likely source for this difference is the more diffusive horizontal mixing scheme used in NEMO.Surface-bottom differences in temperature and salinity at site A are shown in Fig. 7.These are 25-h running mean values to show the persistent stratification rather than intermittent SIPS conditions.Both POLCOMS models are frequently persistently stratified, and generally more strongly than is shown by the observations, particularly with salinity.However, the POLCOMS models do capture the broad annual cycle in the temperature difference with warmer water over cold in the summer and vice versa in winter.NEMO in contrast is consistently well mixed and shows little variation over the year.

Conclusions
Liverpool Bay, and the Irish Sea as a whole, is a dynamically complex area which poses a difficult challenge to models.Nevertheless, all the models utilised in this study, includ- metrics.It is clear that much work still needs to be done to improve our ability to accurately model salinity in this complex regime.Liverpool Bay is highly influenced by the large freshwater input from several major river basins, so the model performance may have been limited in this area by the use of climatological river flow forcing data.
The resolution of surface forcing input to the models is found to be important, and increased forcing resolution universally increased the models' skills in forecasting surface temperature.In particular, the 7 km NEMO model performed as well as the 1.8 km POLCOMS model when both used the same forcing dataset.This indicates the importance of matching surface forcing resolution with the ocean model resolution -increasing ocean model resolution is costly whereas improving the forcing data resolution could potentially bring as much benefit.This is particularly relevant for decadal modelling where it is not practical to use high resolution ocean models.Salinity in Liverpool Bay is less clearly linked to the surface fluxes and surface salinity errors were not always improved in the runs with high resolution forcing data.Improvements in river input forcing and the background dynamics such as stratification have more impact on the models' skills in predicting salinity.
The POLCOMS and NEMO models differ in their ability to represent the density front in Liverpool Bay.POLCOMS is specifically designed to preserve frontal quantities with its piecewise parabolic advection scheme.NEMO on the other hand is more diffusive.At fine resolution, in Liverpool Bay, it appears here that the non-diffusive advection scheme is not diffusive enough.While this is appropriate for coarser resolution models like the 12 km POLCOMS-AMM, or in deeper water frontal environments such as the western Irish Sea, the more diffusive advection scheme used in NEMO better captures the Liverpool Bay tidal variability.
Overall, the performance of the NEMO-AMM model is very promising, with objective skill scores as good as or better than the higher resolution POLCOMS-IRS model.In particular, NEMO reproduces the horizontal salinity gradient in Liverpool Bay more accurately than POLCOMS.However, we should note that NEMO underestimates the stratification within Liverpool Bay (whereas POLCOMS is overstratified), and the reasons behind this need to be further investigated.We look forward to the continuing development and improvement of NEMO which may yield better future results when modelling salinity in Liverpool Bay.

Fig. 1 .
Fig. 1.Bathymetry of the model domains in m.Note that plots A and B use a logarithmic colour scale.In all the models a minimum depth of 10 m is imposed.(A) POLCOMS-AMM.The inset box shows the location of the nested Irish Sea model.(B) NEMO AMM.(C) POLCOMS-IRS, with location of Liverpool Bay indicated by the box.(D) Close up of Liverpool Bay noting the major rivers flowing into the bay.The box marks the area displayed in Fig. 2.

Fig. 2 .
Fig. 2. Mean location of CTD survey stations and number of profiles collected in 2008.Site A (location of the mooring) is indicated by the dashed square.

Fig. 4 .
Fig. 4. Model RMS error in surface temperature compared with ferry observations.Due to the large number of observations, the results have been spatially averaged within 3 by 1.2 bins to produce this plot.(A) POLCOMS-AMM LO; (B) POLCOMS-AMM HI; (C) POLCOMS-IRS LO; (D) POLCOMS-IRS HI; (E) NEMO LO; and (F) NEMO HI.
Fig. 6. (A) 175-h running average surface temperature at site A. (B) Deviation of surface temperature from the running average.Note that only the first 14 days are plotted on panel B, for

Fig. 8 .
Fig. 7. 175-h running average surface-bottom temperature (A) and salinity (B) difference from the mooring data and HI model runs at site A. Note that only the first 130 days are plotted for salinity, for clarity.

Fig. 9 .
Fig. 9. RMS error in model surface salinity compared with ferry observations.Due to the large number of observations, the results have been spatially averaged within 3 by 1.2 bins to produce this plot.(A) POLCOMS-AMM LO; (B) POLCOMS-AMM HI; (C) POLCOMS-IRS LO; (D) POLCOMS-IRS HI; (E) NEMO LO; and (F) NEMO HI.
Fig. 10.(A) 175-h running average surface salinity at site A. (B) Deviation of surface salinity from the running average.Note that only the first 14 days are plotted on panel (B), for clarity.

Fig. 11 .
Fig. 11.Magnitude of horizontal salinity gradient for models and observations along the latitude of site A mooring (53.53 • N).Only HI forcing results are shown, for clarity.

Table 1 .
Summary of the model runs compared in this study

Table 3 .
r|r| correlation of model surface temperature and salinity compared with observations.Columns labelled C are against CTD observations, F against ferry observations, and M against mooring observations.

Table 4 .
Cost function χ of model surface temperature and salinity compared with observations.Columns labelled C are against CTD observations, F against ferry observations, and M against mooring observations.

Table 5 .
Mean RMS error of model surface temperature compared with CTD, ferry, and mooring observations.The mean values are calculated by averaging in time over the entire year, as well as horizontally for the CTD and ferry comparisons.The percentage change in RMS error after increasing the surface forcing resolution is also indicated.

Table 6 .
Mean RMS error of model surface salinity compared with CTD, ferry, and mooring observations.The mean values are calculated by averaging in time over the entire year, as well as horizontally for the CTD and ferry comparisons.The percentage change in RMS error after increasing the surface forcing resolution is also indicated.