Operational systems operated by Mercator Ocean provide daily ocean forecasts,
and combining these forecasts we can produce ensemble forecast and
uncertainty estimates. This study focuses on the mixed-layer depth in the
Northeast Atlantic near the Porcupine Abyssal Plain for May 2013. This
period is of interest for several reasons: (1) four Mercator Ocean
operational systems provide daily forecasts at a horizontal resolution of
1/4, 1/12 and 1/36

Operational oceanography has developed since the end of the 1990s in several
countries with global-level partnerships under the GODAE Oceanview initiative
(

Main characteristics of the ocean forecasting systems.

The forecasts used in this study are provided by Mercator Ocean using four
different operational systems. Two global ocean systems, one at 1/4

Operational scheme for producing daily forecasts with all the Mercator Ocean systems. The ocean initial state is produced once a week on Wednesdays. Then, starting from this state, a hindcast (H) is produced each day using analysed atmospheric forcing. Then the forecast for the current day (F0) up to 4-day forecasts (F4) are performed daily, forced by the atmospheric forecasts.

Left panel, position of the 74 profiles available in the area
during May 2013. Right panel, selection of profiles from 11 to 14 May during
the M1 mixing event and the first S1 re-stratification phase (large circles)
and from 24 to 26 May during the S2 stratification phase (small circles).
Colours show the mixed-layer depth computed for each profile with the
0.2

This study focuses on May 2013, when all four ocean forecasting systems
described above were providing forecasts and when the Coriolis in situ
database contains repetitive in situ profiles obtained from glider
observations in a 1/2

Available in situ profiles for three dates corresponding to three mixing events
(M1, M2 and M3) during May 2013. Note that for these three dates three
temperature profiles are available. The circles indicate the mixed-layer
depth computed using the temperature profiles and the 0.2

RMSE in metres for the mixed-layer depth computed with the systems, the mean value and the mean after removing one system. The F0, F2 and F4 forecast lengths are shown. For each forecast length the best forecast is bold underlined, and the other forecast with error not greater than 1 m compared with the best is shown in bold.

The statistics computed are mean bias (not shown), temporal correlation,
error standard deviation for the Taylor diagram (Fig. 4), skill score
(Fig. 5) (Murphy, 1988) and root mean square error (RMSE; Table 2) for each
system and for the ensemble mean and median. For each forecasting system and
each forecast length, the skill score (SS) is computed as follows:

Taylor diagram comparing all available systems (in colour) and forecast lengths (symbol). The black dot with a standard deviation equal to 1 and a correlation of 1 indicates observations.

available at

Skill score for the mixed-layer depth computed for all the systems and the ensemble mean and median during May 2013. The skill score is computed with the persistence of the observation

which induces mixing in Glo4 and improves the

Top left: temporal evolution of the mixed layer simulated by the
ensemble with the standard deviation in blue, and observations with
associated uncertainties. Top right: wind speed time series. the analysis is
in black dashed line with red line and dots when wind speed is larger than
9 m s

scores for F3 and F4 with respect to H, F0 and F1. The response to a realistic atmospheric forcing is not as good in Glo4 as it is in Glo12, Atl12 or Ibi36, which will be discussed in more detail in Sects. 4.2 and 4.3. The ensemble mean gives the best result even if the Glo4 forecast is far worse than the other systems. One would expect the scores to decrease with the forecast length but the results are very similar (except for Glo4) for H up until the 1-day forecast; the dispersion of the systems (illustrated by the colour) is small in the Taylor diagram (Fig. 4) for all the metrics (correlation, standard deviation or rms). However, the forecast dispersion increases after the 2-day forecast and in particular there is a significant decrease in correlation to under 0.79 for Glo12, when it remains around 0.85 for Ibi36. The RMSE (Table 2) confirms previous results with a smaller error for Ibi36 and the ensemble mean (between 15 and 18 m rms) and a larger RMSE for Glo4 (between 27 and 30 m rms). The skill scores which measure improvement of the forecast in comparison to persistence of the initial condition (not shown) display very similar values as the one measuring the improvement in comparison to persistence of the last observation. The latter (Fig. 5) shows positive values (meaning that the forecast is better than persistence) for all forecasts except F0 in the Glo4 system. As expected, it increases with the forecast length meaning that the 4-day forecast is more efficient than the 1-day forecast in beating persistence. The largest skill score change lies between F0 and F1, meaning that after 1 day the forecast and the last analysis or observation available are nearly equivalent, while after 2 days, the model has some significant predictive skill. Three “classes” of score can be seen as in the Taylor diagram (Fig. 4); the best is obtained with Atl12, Ibi36 and the mean and median products, a second with a significant decrease in the score obtained with Glo12, and a third with Glo4. Combining the forecasts in another way, simply by removing one system from the statistics, quantifies the gain (or degradation) obtained with each individual system. Table 2 shows the value of the RMSE for these combinations; the robust result is that the best forecast is obtained for the whole forecast with the mean computed after removing the Glo4 system, and with the Ibi36 system. Removing the Glo4 estimate, it may be noted that the mean of these forecasts is better than all the individual forecasts, showing that each estimate of the remaining ocean state gives pertinent information in terms of statistics for the forecast. In the following, the analysis of the mixing and stratification events will provide additional physical interpretation for these statistical results.

Mixed-layer depth evolution during May 2013. The black line is the hindcast and the coloured dots are the forecasts for several forecast lengths. The crosses are the means of the observations and the vertical black lines are error bars computed with the min and max values of the MLD estimated by the profiles during the day.

Temporal evolution of atmospheric forcing for hindcast (black line) and forecasts (coloured dots). Top panels: evolution of wind speed for Glo4 (left) and Atl12 (right) systems. Bottom panels: heat flux (left) and fresh water flux (right) for the Atl12 system.

In our area of study (Fig. 2), centred on 16.25

The standard deviation of all mixed-layer depths available for all systems
and all forecast lengths computed in the same area centred on
16.25

Comparing the hindcasts (hereafter referred to as H) in Fig. 7 for the ocean fields and Fig. 8 for the atmospheric fields, all systems describe a stratified period at the beginning of the month with mixed-layer depth around 20 m, except for Glo12 where the mixed layer is deeper for the same period (around 40 m) which is closer to the observations. This may result from the large-scale conditions being different in Glo12 on the one side, and in Atl12 and Ibi36 on the other side. All systems have their own dynamical regime, but Ibi36 is initialized with Atl12 analyses, which explains that the circulation features of Atl12 and Ibi36 bear similarities, but do not look like the main circulation features of Glo12.

Mixed-layer depth (colour field) and sea surface height (contours)
simulated by the four systems for 13 May in the area surrounding the area of
interest (1/2

On 7 May all the systems simulate the beginning of the M1 mixing event, which
reaches its maximum after 4 days but with significantly different amplitudes.
The M1 event is too fast and too strong with Glo4 and Glo12 compared with the
observations whereas the Ibi36 and Atl12 hindcasts are much closer to the
observed values. Glo4 and Glo12 simulate mixed-layer depth greater than
100 m, while Atl12 simulates only 85 m of mixed-layer depth and Ibi36 even
less so, with only 70 m depth. There is then a re-stratification event (S1),
completely missed with Glo4, while it is observed and simulated with the
other systems. The strongest re-stratification takes place with Glo12 while
nothing happens with Glo4 where the mixed layer remains deeper than 100 m
for 8 days. This stratification event is present in the observations, and the
mixed-layer depths are very close to the observation in Glo12, Atl12 and
Ibi36 even if the schedule of the re-stratification is different mainly due to
differences during the first mixing event as noted above. Figures 9
and 10 show the spatial patterns of the mixed-layer depth for all
systems for 13 May and 16 May respectively. In our area of interest (white
squares in these figures) there is a strong gradient in the mixed-layer depth
with a mixed column in the northern part of the area, and a more stratified
ocean in the south. In this case the mean profile in this box is not fully
representative of the situation and the observation fails to capture this
kind of pattern. Nevertheless, as Figs. 9 and 10 show, the hindcast mixed layer
in this box fit well with the mean observed mixed-layer depth. Statistics
computed over a smaller box (taking into account only the northern part of
the box from 48.55 to 48.8

As Fig. 9, for 16 May.

The greatest forecast error is obtained with the Glo4 system during the M1
event. During this first period (between 9 and 12 May) the 1- and 2-day
forecasts are consistent with the hindcast (green and blue dots with respect
to the black line in Fig. 7) and so deeper than the observations, but the 3-
and 4-day forecasts (red and purple dots in Fig. 7) are closer to the
observations with a thinner mixed layer. One would have expected F0 to be
more accurate than F4. As already mentioned in Sect. 3, the Glo4 is better at
long forecast lengths than at short forecast lengths for wrong reasons. The
4-day wind forecast is less than the analysis wind (4 m s

As already discussed for the hindcast in Sect.

error in the atmospheric forecast (see Fig. 8);

rapid stratification/mixing change occurring over two days; in this case a short delay in the forecast gives a large error;

M2 event occurs when the mixed layer is still thick; in the case of a shallow mixed layer, the uncertainty is naturally reduced;

there are well marked mesoscale structures which affect the mixed-layer depth, generating vertical mixing associated with vertical velocities along the front and around eddies.

The S2, M3, S3 time sequence is well forecast in all the systems, with good
temporal consistency with observations (Fig. 7). Maximum stratification
occurs on 25 May (S2). Then, the water column is mixed until 28 May (M3) and
quickly re-stratified until the end of the month (S3). All the forecast
lengths are close to the hindcast run except the 4-day forecast for 21 and
28 May. For these dates, all systems give consistent solutions with too rapid
a re-stratification for 21 May and a lack of mixing for 28 May. This is
explained by the error in the wind forecast (Fig. 8) taking into account a
1- or 2-day lag, which is the typical time taken to mix the water column.
For 19 and 20 May the forecast wind speed is too strong with wind speeds
exceeding 10 m s

Standard deviation of the forecast error normalized by the standard deviation of the observations. Top panel: atmospheric fields (wind in black, heat flux in blue and fresh water in red) where analyses are considered as observations. The solid line is for May 2013 and the dashed line considers only the mixing events (M1–M3). Bottom panel: ocean mixed-layer depth forecast (for all the systems), in black for May 2013, in blue only during the mixing events (M1–M3) and in red during the stratification events (S1–S3).

The question of the significance or effect of atmospheric forcing vs. initial
state on the mixed-layer forecast has to be addressed. One diagnostic
computed to quantify these two aspects separately is based on the temporal
correlation between several time series. The first step is to compute the
temporal correlation between the same forecast lengths with all the available
systems. Correlations are thus computed for six ensembles of estimates (H, F0,
F1, F2, F3, F4), each ensemble being made of four different time series coming
from the four systems. If the initial state had a strong impact on error growth,
one would expect the mean correlation of the ensembles to decrease
significantly with the forecast length. In this case the mean correlation
decreases from 0.94 (for the Hindcast time series) to 0.91 (for the 4-day
forecast time series). This small decrease in correlation indicates that the
initial state has a small effect. In the second step the lag correlation
between the Hindcast (H time series) and the Forecast (F0 to F4 time series)
is computed independently for each system. In this case the mean correlation
decreases from 0.98 (correlation between H and F0 time series with 1-day lag)
to 0.83 (correlation between H and F4 with 5-day lag). Even though the
correlation is still high, this stronger decrease indicates that atmospheric
forcing has a greater effect in comparison with the initial state. A second
diagnostic is based on the error growth computed with the standard deviation
of the forecast error, normalized with the standard deviation of the
observations (Fig. 11). For the atmospheric variables, the main error is
displayed by the fresh water flux which does not drive the variability of the
mixed-layer depth in our case, as mentioned before. The normalized standard
deviation becomes greater than 1, signifying that for the 1-day forecast the
error variance is greater than the observation variance. An unexpected
decrease of the error happens for F2. First, current numerical weather
prediction systems have difficulties to produce realistic water fluxes over
the ocean, in analysis mode as well as in forecast mode. Second, water fluxes
may vary a lot inside a given day, or between two instances of a weather
forecast. In consequence, errors in water fluxes averaged over 1 day may
behave this way due to pure random effects, and a bigger sample may be
necessary in order to derive robust statistics for this variable. For the
wind field, which in this case is the more important, this ratio is smaller
in comparison with the other forcing fields (heat and fresh water fluxes).
The difference between the forecast over the entire month and that only over
the mixing events (illustrated by the dashed line on the top panel in
Fig. 11) is small except for the 4-day forecast. For the mixed-layer forecast
(bottom panel in Fig. 11) considering the entire period there is a small
linear increase in the normalized standard deviation which generally remains
less than 1 even for the 4-day forecast. The link with the error growth for
the wind fields can be made by considering that the largest increase in the
error for the 4-day forecast will have an effect on the longer-length
forecast of the mixed layer (typically for the 5 or 6 days which are not
included in this study). Taking only the mixing events into account, the
normalized standard deviation is stable for the first 3 days and then
increases. It should be noted that during the stratification events the
normalized standard deviation for the mixed layer is greater than 1. This
is explained by the fact that, in a stratified ocean the error and the mixed-layer depth have the same amplitude and a very small variation in the mixed
layer gives rise to a large effect for this ratio. As we see in Figs. 9 and
10, there is also a strong spatial variability in the mixed layer which is
not driven by atmospheric forcing, especially at small scales. Computing the
spatial standard deviation in the small 1/2

Mean SLA increment (in cm) computed over May 2013 for Glo4 (left), Atl12 (middle) and Glo12 (right) systems.

shows the SLA increments computed for the three systems (note that
there is no data assimilation in the Ibi36 system, which is not presented
here). Our area of interest (48.5

Mean temperature increment (solid line) and

This study focuses on a small area in the Northeast Atlantic during
May 2013. Several conditions are met to obtain robust results:

a large number of temperature profiles (74) in a small area with a high sampling frequency over the month (more than one per day);

available daily forecasts with four operational ocean forecasting systems
containing differences in horizontal resolution from 1/4 to 1/36

a strong variability in the mixed-layer depth during the month with alternating mixing and stratification events;

a strong link between atmospheric forcing and ocean response.

The availability of four systems providing daily forecasts gives the
opportunity to build an ensemble forecast associated with an estimate of the
uncertainty of the mixed-layer depth. These systems have been developed by
Mercator Ocean under the MyOcean project, the ocean part of the European
Copernicus programme, and have been operated in real time since the end of
April 2013 (V3 of MyOcean service). Other ocean forecast products could also
have been used to increase the number of members in the ensemble, but for
this study we chose to use only these four forecasts to separate the effects of
atmospheric forcing and initial state. First results show the benefit of the
mean or the median of the members as forecast. In our case this ensemble
estimate is close to the best forecast, and sometimes this estimate is the
best (for example the best correlation for the 1-day forecast is obtained
with the median state and with the mean for the 4-day forecast). Computing
the same statistics, removing each individual forecast one by one, is a good
way to estimate each contribution in the ensemble. We have shown that after
removing the worst forecast, which systematically degraded the mixed-layer
depth estimation, the mean is always better than each individual forecast for
every forecast length. Using other operational forecasts, it will be now
useful to introduce into the ensemble ocean estimates computed with other
atmospheric forecasts, as for example, the product available in MyOcean
provided by the UK's Met Office covering the Northwest shelf (O'Dea et al.,
2012), or other global high resolution forecasts such as that provided by
Naval Research Laboratory (NRL)
(Cummings, 2005). Uncertainty estimates in the mixed layer in this area based
on our 4-forecasting systems and 4-day forecast length can reach 50 m during
this particular month. The spatial uncertainty for the model in such a small
area has the same order of amplitude (

Finally we have shown that the temporal variability in the mixed-layer depth
when changing from the mixing to the stratification phase is driven by the
atmospheric forcing, but the small and meso-ocean scales also have a great
local impact. At this smaller scale, resolution, parameterization and
assimilation play a role and can impact the forecast score, error or
uncertainty. Unfortunately, based on observations the mixing along fronts and
around eddies remains difficult to validate properly. The coverage of the in
situ observations and the resolution of satellite observations are not
sufficient even though the recovery of vertical velocity based on satellite
observations is promising (Buongiorno Nardelli et al., 2012) and though
observations of water colour provide high-resolution estimates of ocean
parameters directly affected by the vertical mixing. However, the effect of
horizontal circulation, particularly around eddies or along strong fronts, is
illustrated by Figs. 9 and 10. The Ibi36 model, having the highest horizontal
resolution, is able to resolve mesoscale eddies that induce patterns of
convergence and mixing that are not present in the coarser horizontal
resolution systems. The Ibi36 model benefits from model tuning (GLS mixing
scheme, explicit tides, higher-resolution atmospheric forcings) that are not
yet implemented in the basin scale and global model configurations such as
Atl12, Glo12 and Glo4. Further sensitivity studies would be necessary in
order to quantify the effect of each individual improvement of Ibi36 with
respect to Atl12 or Glo12. Eventually as a test platform for further
developments of a high-resolution global system, the Ibi36 forecasting system
proves to be successful in reproducing the mixed-layer depth and its response
to atmospheric forcing. Future development of the operational oceanic
forecasting systems will be crucial in improving forecasts of oceanic
parameters or processes such as the mixed-layer depth. Within the scientific
community, work is in progress to include data assimilation of new types of
observation (such as ocean colour and, in the near future, SWOT high-resolution sea surface height observations), to increase horizontal and
vertical resolution, to improve vertical mixing models and parameterizations,
to improve ocean–atmosphere interaction due to coupling and to provide
better estimates of the uncertainties based on ensemble techniques. On the
short term, Mercator Ocean systems will be improved by using choice already
done for Ibi36 as the full resolution of the atmospheric forcing at
1/8

This research was supported by the MyOcean2 European project and is based on MyOcean products. The authors wish to thank collaborators contributing to the development of the ocean forecasting systems under the MyOcean project, the NEMO consortium and the data centres at CORIOLIS and BODC which disseminate the in situ glider observations collected under the OSMOSIS project. Edited by: A. Schiller