Ensemble quantification of shortterm predictability of the ocean dynamics at kilometricscale resolution: A Western Mediterranean testcase
 ^{1}Univ. Grenoble Alpes, CNRS, IRD, Grenoble INP, IGE, 38000 Grenoble, France
 ^{2}Ocean Next, Grenoble, France
 ^{1}Univ. Grenoble Alpes, CNRS, IRD, Grenoble INP, IGE, 38000 Grenoble, France
 ^{2}Ocean Next, Grenoble, France
Abstract. We investigate the predictability properties of the ocean dynamics using an ensemble of shortterm numerical regional ocean forecasts forced by prescribed atmospheric conditions. In that purpose, we developed a kilometricscale, regional model for the Western Mediterranean sea (MEDWEST60, at 1/60º horizontal resolution). A probabilistic approach is then followed, where a stochastic parameterization of model uncertainties is introduced in this model to initialize ensemble predictability experiments. A set of three ensemble experiments (20 members and 2 months) are performed, one with the deterministic model initiated with perturbed initial conditions, and two with the stochastic model, for two different amplitudes of stochastic model perturbations. In all three experiments, the spread of the ensemble is shown to emerge from the small scales (10 km wavelength) and progressively upscales to the largest structures. After two months, the ensemble variance saturates over most of the spectrum, and the small scales (< 100 km) have become fully decorrelated across the ensemble members. These ensemble simulations are thus appropriate to provide a statistical description of the dependence between initial accuracy and forecast accuracy for timelags between 1 and 20 days.
The predictability properties are statistically assessed using a crossvalidation algorithm (i.e. using alternatively each ensemble member as the reference truth and the remaining 19 members as the ensemble forecast) together with a given score to characterize the initial and forecast accuracy. From the joint distribution of initial and final scores, it is then possible to quantify the probability distribution of the forecast score given the initial score, or reciprocally to derive conditions on the initial accuracy to obtain a target forecast skill.The misfit between ensemble members is quantified in terms of overall accuracy (CRPS score), geographical position of the ocean structures (location score), and spatial spectral decorrelation of the Sea Surface Height 2D fields (decorrelation score). With this approach, we estimate for example that, in the region and period of interest, the initial location accuracy required (necessary condition) with a perfect model (no model uncertainty) to obtain a location accuracy of the forecast of 10 km with a 95 % confidence is about 8 km for a 1day forecast, 4 km for a 5day forecast, 1.5 km for a 10day forecast, and this requirement cannot be met with a 15day or longer forecast.
Stephanie Leroux et al.
Status: final response (author comments only)

RC1: 'Comment on os202211', Anonymous Referee #1, 13 May 2022
Review of os202211
General comments:
This interesting manuscript focuses on the predictability of small scales in realistic ocean models kept "on track" by data assimilation (although the manuscript does not contain assimilation results). In particular, it proposes a rather novel methodological approach to relate forecast uncertainties to initial uncertainties in the fields, and presents some results quite convincingly in the context of a particular experimental protocol based on a set of 2D "displacements". The topic is scientifically relevant and important and the scientific quality is good, but the focus, clarity and precision could sometimes be greatly improved. I have no reservations about the statistical/probabilistic methodologies implemented, and the results are valid and interesting, but I am not convinced of their generality given the particular experimental protocol (type of uncertainties considered, seemingly "fixed" scale, number of members, etc.): the limits of the ensemble generation approach, and thus the scope and validity domain of the results, should become more apparent. This manuscript should eventually be accepted for publication, but perhaps not quite in its present form.
Specific comments:
The style of the introductory and methodological sections is sometimes rather "literary" and "rhetorical", convoluted to the point of being imprecise (an example: see the comment "lines 5668" below)  the approach is often introduced by invoking much more general and theoretical concepts than necessary. On other occasions, the text does not contain enough information or loses the reader. I would recommend (1) adopting a much more "direct", "factual", "scientific" style throughout the text, and (2) improving precision and conciseness. For example, when describing a methodology, the description of what was done in practice could be presented first, accurately and completely (and not in three different places, such as the perturbation scheme in subsections 2.2 and 3.1 and Appendix A); then the validity and scope of the approach, including the wider context, can be discussed, not the other way around.
However, as the ms. progresses, the style improves, especially in the description of the results, which is often adequate.
The definition of predictability scores (in particular CRPS and predictability diagrams), and the way in which statistical calculations are carried out using all members of the ensemble in turn as a reference (reminiscent of generalised crossvalidation) are two aspects of the work that could be generalised to problems beyond the particular experimental protocol. I was particularly interested in the dispersion of the CRPS estimates across the 20 cases (Figures 7,8)  I would be curious to know what they look like with only the reliability CRPS component or only the resolution component (the latter possibly giving access to a form of featurebased predictability, i.e. based on whether a particular forecast eddy is present across the members). The decorrelation score is interesting and also seems to be quite general. The location score is of course more related to the particular type of uncertainties in the study.
While section 4 is solid, one should keep in mind that the predictability analyses are conducted within a very specific experimental protocol: that of pseudorandom perturbations based on 2D "displacement" at small scales (10 grid points), having a direct impact on horizontal advection and pressure gradient at these scales (and indirectly on other dynamical processes). (I know that "displacement" is probably not the right term as you are perturbing the metrics of the model operator, not the grid, but you could give this word that definition in your ms). That's OK, but in retrospect I probably would have liked a more honest introduction and summary framing the study more clearly in the particular experimental protocol (e.g. as described in subsection 2.2 from line 118). Indeed, the results in Section 4 could be very different for other forms of uncertainty. The conclusion is not careful enough in this respect: its first sentence ("The overall aim of this study...") promises too much in relation to the very real and effective work that has been done.
In addition, a limitation of this work that is not mentioned in the conclusion is that the correlation scale of the displacements is set (if I understood correctly) at 10 grid points. So, if I understand correctly, this is a predictability analysis study for a 10 grid point noise. Would a smaller or larger scale noise behave in the same way? What about pseudorandom correlation scales? However, I'm not sure I understood correctly, since the conclusion quotes "10 km wavelength" and not "a scale of 10 grid points"  which is quite confusing. Similarly, the tenfold use of a Laplacian filter is mentioned  even more confusion.
Twenty members is a small size for an ensemble, again a topic not addressed by the conclusion. It is not clear whether we should interpret the discussion in 3.2.3 as evidence that 20 members are "sufficient" for the subsequent predictability study? What about the representation of spatial covariances with 20 members? (These generally converge more slowly than the variances). Also, what is the impact of the ensemble mean, and is it taken into account?
This is a scientific paper. Therefore, the emphasis on CMEMS, which is cited several times, and which also comes as the "last word" in the conclusion, seems out of place. Such a study is of interest to all ocean forecasting systems. If appropriate, CMEMS can be mentioned in the acknowledgements.
Individual comments:
lines 5658: This appears as a purely rhetorical statement, but perhaps I did not understand what was meant. Models and assimilated observations have errors which do impact the forecasts, we know that. Also, how can model instabilities be used to produce a valuable forecast?
lines 6263: "initial uncertainties because observation resources are limited": yes, but observations have errors too; and in an assimilating system initial errors are also due to the whole history of all types of errors up to then.
The introduction has no references on probabilistic skill scores.
line 98: "initiated" > "initialised"
section 2.1: Which scales can be accurately modelled by MEDWEST60? It is important to have those in mind in relation with the perturbation scales which you will use. Also, in the Mediterranean the internal Rossby radii are quite small.
lines 109110: "In this context..." > "In a purely deterministic approach..." to improve clarity. But still, you are missing modelling errors here (parameterisation, numerical schemes, missing physics).
lines 148, 151, in Table 2, etc: "probabilistic model" > "stochastic model"
line 154, legend of Figure 2, etc: "grid size", "size of the model grid" > "grid spacing" or "mesh spacing". Also what is the distribution law used for the perturbations? (If a noncompact support law is used, did you use an upper bound for the displacement?)
lines 163164: "It does rely...": I do not understand the sentence (you wrote the opposite two sentences before). Also: part of this paragraph is descriptive, and part is a discussion in anticipation for another discussion in chapter 4: it is not good to mix everything because you'll get the reader lost.
Table 2: I do not understand what "identical" initial conditions mean. I would have thought that the spunup fields would be pseudorandomly displaced using the 20 samples of the displacement fields (for each of 1%, 5% stdev), hence yielding 20 *different* initial conditions across the ensembles.
lines 182183: I am a bit confused. The "displacement" is variable, with stdev = 1%5% of the mesh spacing, but the displacement correlation scale is fixed to exactly 10 gridpoints. Therefore I do not understand the words "on the order of".
Figure 3: It might be interesting to have a zoomed version on the right (perhaps just for low wavenumbers) to be able to see something.
I did not have time for a full second reading and hence for further individual comments, sorry.
 AC1: 'Reply on RC1', Stéphanie Leroux, 29 Jul 2022

RC2: 'Too much focus on the diagnostics, too little on the perturbation method.', Anonymous Referee #2, 23 May 2022
Review of Leroux et al. “Ensemble quantification of shortterm predictability of the ocean dynamics at kilometricscale resolution: A Western Mediterranean testcase.”
The manuscript presents an analysis of an ensemble of ocean model simulations at very high resolution using a novel idea for intrinsic model errors based on concepts of location errors. The article uses very solid and interesting concepts and methodology and makes both a refreshing and useful contribution to the operational ocean forecasting community (where I belong).
The exploitation part of the research is very well developed and thoroughly explained, which will certainly help popularise probabilistic diagnostics into the oceanographic community, but is so extensive as to almost entirely eclipse the core stochastic model developments, which constitute the novel aspect of the paper. It is indeed seldom that one sees a theoretical advance (that from the papers from Mémin and Chapron) brought into a realistic ocean model, so it is of general interest to see for the first time the effects of the stochastic perturbations on the model solution. However there is no discussion of the numerical effects of these perturbations and no visual from the perturbed model (illustrations are disappointingly always extracted from the CI control simulation without stochastic noise), leaving an uncomfortable impression that something is hidden from the readers. Another aspect that is not discussed is the somewhat binary response of the model to the amplitude of the stochastic noise. The 1% case corresponds to 15m/d displacements (according to my own backoftheenvelope calculation) and is most often indistinguishable from the CI (0%) case. On the contrary, the 5% perturbations corresponds to 75 m/d, which also seems tiny, turns out completely different from the other two cases and generates kilometres of feature location uncertainties within one single day. What happens between 1 and 5% that causes such a binary response? I believe that tidal amplification is the culprit and suggest an additional experiment in the detailed comments below, where the stochastic noise is turned off in the model nesting zone.
The doubts on the stochastic perturbation method do not impair the main findings of the paper, because the latter probably stand with the CI control ensemble alone, and the diagnostic methods can be applied to any stochastic model, but there is a risk that the manuscript is used to advocate for a stochastic model perturbation method that it does not truly validate.Another general remark about the use of the probabilistic diagnostics is that some of them can be generalised to deterministic forecasts under ergodicity assumption: spatially averaged statistics (CPRS, PSD) can be interpreted as expectations and could be applied to forecast systems that have invested in high model resolution rather than in ensembles.
Overall the paper is very good and makes a very enjoyable read. I am impressed by the enormous amount of thoughts and work that went into it. The structure, the style and the illustrations are all excellent, and will certainly make a splash in the operational community. So I recommend its publication after revisions that I would call “major” because of a possible problem in the implementation of the stochastic method.
The paper is maybe a little on the long side but I will suggest some reduction of the illustrations and point out a few repetitions in the text. Ideally the manuscript should be split into two separate papers, one demonstrating a new stochastic perturbation method and the other on the ensemble forecast diagnostics, but I will not insist on this if the authors can shed more lights on the stochastic perturbation method without adding pages of text.Detailed comments:
———————————Title, abstract and introduction: no remark. All are representing well the actual contents of the paper.
Section 2
 Figure 1: Why do you need to define as many as 3 subregions?
 Line 90: I understand that the eNAT60 configuration is not only a boundary condition but a baseline to which the different experiments should revert if there were no stochastic perturbations at all. Please make it explicit and come back to it whenever the different experiments are compared to eNAT60.
 indicate which method is used to impose lateral boundary conditions (the Flather conditions?).
 Line 9598: a) and c) are not strictly a “difference” and b) should not lead to any difference as long as the model is numerically stable. Please rephrase.
 Line 114119: This argument is contorted. Any intrinsic or extrinsic errors (in the vertical mixing or winds for example) may as well affect the smallest scales of the ocean, if they are set up to do so. It would clarify the argument if you state upfront that you consider location errors exclusively and that other types of errors can be added at will.
 Line 134: Indicate the physical scales of 1% and 5% with respect to the temporal autocorrelation: displacements of 15 m/d and 75 m/d respectively.
 Line 139: “quite consistent” does not sound too good. Can you recall which conclusion of Mémin (2014) is comforted by the present study?Section 3
 L. 158: what does CI stand for in ENSCI? Control Integration?
 Figure 2b indicates that even after Laplacian smoothing, the square model grid is distorted and deviates from orthogonality, which may lead to numerical noise and eventually instabilities. The ROMS user community is advised to keep the grid cells orthogonality above 95% in practice, and especially at the lateral boundaries of the model, to avoid errors propagating inside the model grid. My recommendations would therefore be to dampen the model grid perturbations in the nesting zone of the model (in the first 5 or 10 grid cells) to avoid inconsistencies between the outer an inner model solutions, in particular the barotropic mode. I will come back to this at Figure 4.
 Table 2: Define e1 and e2 in relation to the appendix.
 Figure 3 shows indistinguishable lines, and no indication of what is good or bad. You could either plot the difference of PSD from the eNATL60 reference or solely indicate the maximum difference in the text and skip the figure altogether. If you keep the figure, I recommend to remove the part for wavelength > 250km because of the small domain.
 Figure 4 exhibits an oscillatory signal in the ensemble spread, whereas intuitively I expect the spread to grow monotonously. The oscillations are most visible in the 5% case but also in the 1% case. I also noted that the oscillations peak at the same time in the 1% and the 5% cases, about 4 times a day. Unless you have used the same random seed in the 1% and the 5% case  which would be odd  the coherent oscillations indicate an amplified resonance of tidal signals, which brings me back to my previous remark about barotropic lateral boundary conditions: the nesting routines (radiation condition or Flather conditions, whichever you use) should allow tidal and other barotropic signals to be evacuated out of the domain, but if the perturbations make this boundary condition imprecise, the tides may be reflected at the lateral model boundary and resonate inside the nested model domain. I have a suspicion that this could be avoided if the perturbations were attenuated near the model boundaries (and maybe in shallow waters as well).
 Line 209: This claim could be confirmed by a look at the accuracy numbers from the MED MFC QuID document on the Copernicus Marine website.
 Figure 5 makes a stunning impression, but is uninformative. I would have preferred to see the 5% case to have a visual impression of the effect of random perturbations (there are otherwise none in the whole paper).Section 4
 L. 268280 is a nice introduction of the ensemble diagnostics, but seem like a methodological overkill: the diagnostics are initially intended for locationdependent comparison to observations, but in the absence of observations like in the present study, some more basic diagnostics may be simpler to use than a crossvalidation with each ensemble member. This is the case for the CRPS which is aggregated spatially for all members to a single number and does not seem to add more information than a standard deviation. Please replace by the ensemble spread if this is a simpler diagnostics that provides the same insights.
 L298299 are repeated in the figure caption.
 Figure 8. It would seem fair to mention that beyond 5 days of lead time, the 95% percentile is dependent on the model trajectory and does not make a robust statistic, a larger ensemble or a different perturbation method may improve that.
 The small lines in Figure 10 are not very informative. The three figures could be compressed into one by showing the three 95% quantile only and plotting the differences from the initial CRPS.
 Section 4.2.1: I guess there are technical difficulties with the location score in the presence of islands or complex coastlines. This could be mentioned.
 Figure 11 (top against bottom) is nearly showing the same thing. You could remove the two lowermost panels by adding the 20 isolines in the top panels.
 L. 433: Why choose SSH this time?
 L. 460: scales above 150 km should be removed from the figure.
 L. 461: I would suspect that checkerboarding (numerical noise) would easily cause the correlation of small scales. Numerical noise is ubiquitous in all ocean models although viscosity makes it almost invisible. If the authors use a highcontrast colour scale (like “details” in Ncview), they would probably see some checkerboarding in the model output, which would inevitably appear coherent at the smallest wavelengths of the model output.
 Figure 18: Add the diagonal line for T=0.
 L. 485: The authors could indicate which SWOT revisit time would be necessary to maintain the smallscale structures (if the data assimilation were ideally good).Appendix A1:
 L. 553: “Anamorphic transformation” is a pleonasm.
 L. 582: the link between the theoretical papers from Mémin and Chapron and this one is not obvious. How does the sigma value translate into the stochastic process P?Appendix A2
 L. 595 to 599: “can be thought”, “can be be viewed” and “can be argued” make a very embarrassed logical chain to line 600, which I would promote upfront to motivate the approach.
 L. 610: Mention that a^2 + b^2 =1 to maintain the variance constant.
 L. 611: The “assumed independence” of the perturbation is later contradicted by the Laplacian filter in Line 620.
 L. 618: the citation to Garnier et al. (2016) is repeated.
 L. 620: does the Laplacian filter maintain the standard deviation?
 L. 620: is the value of sigma linked to the sigma in Mémin/Chapron?
 L. 629: Transformed to the other grids: do you mean a linear interpolation?
 L. 632: Only here is it possible for the reader to calculate the typical scale of the perturbations (about 15 m/day for 1%). This information is important to realise how much the model amplifies the location noise into location errors (roughly by a factor of 100 to 1000 in a single day, which is mindboggling) and should be discussed in the main text.
Typos:
———
 l. 133: remove the second “that”.
 L. 239: “characterizing”
 L. 343 Fussy > Fuzzy
 Section 4.3.1: “pf” > “of”
 L. 531: Beying > Beyond AC2: 'Reply on RC2', Stéphanie Leroux, 29 Jul 2022
Stephanie Leroux et al.
Stephanie Leroux et al.
Viewed
HTML  XML  Total  BibTeX  EndNote  

498  108  17  623  10  7 
 HTML: 498
 PDF: 108
 XML: 17
 Total: 623
 BibTeX: 10
 EndNote: 7
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1