Comment on os-2021-87

need a lift-up - it is not complete and could be more laser-focused. Results: There are some good insights here and there, but I found the paper lacks important milestones along the sections to put the reader in the right mindset. Datasets are barely presented, figures are not self-described. What each simulation is using/how it is performed is very rushed and not clear from the start. The aspects of assimilation are not evenly formally described. Although I'm not a specialist in the ECCO model, the description here lacks enough detail. Maybe the author wished to simplify, but I think it end up cutting too much!?. Some de-facto nomenclature in data assimilation is brushed and only add to the confusion to the ones not familiar with ECCO.

I think the authors need to reorganize the paper again -the paper reads like an incomplete explorative analysis, without good justifications or reasoning regarding decisions and the way the work was set up -maybe this was chopped off?
The abstract need a lift-up -it is not complete and could be more laser-focused.
Results: There are some good insights here and there, but I found the paper lacks important milestones along the sections to put the reader in the right mindset. Datasets are barely presented, figures are not self-described. What each simulation is using/how it is performed is very rushed and not clear from the start. The aspects of assimilation are not evenly formally described. Although I'm not a specialist in the ECCO model, the description here lacks enough detail. Maybe the author wished to simplify, but I think it end up cutting too much!?. Some de-facto nomenclature in data assimilation is brushed and only add to the confusion to the ones not familiar with ECCO.
In short, I think the paper deserves a major review, mostly for clarity of what has been done and to reduce the guesswork, raised questions, and mental gymnastics from the reader.
In general, I believe the paper results may be actual good results, there are some good insights, but the lack of attention to detail and description of what was performed put me in the uncomfortable position of having one step back in trusting it. The shallow and rushy discussion at the end with all the caveats clobbered together without any linkage is disheartened.
Finally, there is an appendix that contains more description/methods/results which although related to the paper, do not link well to the text/results -there is only a single reference out of context. The author does not explore the aspects of the appendix results and the main research topic -it appears as just a dump of information. I would remove or overhaul the text to point to the results presented in the appendix.

## Specific comments
Below you will find some informal notes/suggestions with my personal opinion that I hope will help the author to be more clearer and better understand more points of view of the work. These notes were made mostly in chronological order, so some aspects are discussed in a rolled/back forth fashion. Although some are a matter of opinion, I'm certain that at least some of them can help improve the paper.
3-4: "is not sufficient to constraint Kp". I would disagree -if the Kp is being analysed/is part of the state vector/changes at analysis time, it is being constrained. This is using a lax formalism of "constraint" -which doesn't bode well with an assimilation subject paper. 80:85: I wonder if this distinction is actually necessary here and the paper in general (in sensitivity terms). For example, you say you use different kp obs sources in the Kp experiment, but you assign a constant error for both (apparently). 83: which method? 85-115: I understand you want to distinguish the observations and how they are derived, but for the assimilation what is improtant is the error you assign to the respective obs set. The nomenclature of Kp is a bit jarring too. Kp from a free-run (missing), the Kp from the state estimate at the start (missing), Kp after iteration 59 ( I assumed here and thereafter Kp_ECCO), Kp micro, Kp_w15, Kp_K17. 95/110-115: I miss a figure here showing these observations estimates in a simple wayyou got 3 different estimates, 1 gridded, 2 scattered. This would be a figure.
Better yet if shows the actual Kp from ECCO which is absent and we know nothing about the spatial variability. There is not a good picture of obs coverage in the paper.
123: Describe what the Oxygen from WOA is, units, coverage, mean/standard deviation figure? Limitations? What about the N2? Seems to me that observations here are treated like the holy grail but this is barely the truth when fitting and assigning errors. 125-130: Why the picture if you are outsourcing the most important thing to the Whalen 2015 reference? Just include the picture here for the sake of your readers. Also, This "justification" is not accompained by a discussion of the methods/results. You need to explain in more detail what the insight here is and not outsource. 175-180: weak appendix link -you need to provide more material along the paper to entice the reader to read the appendix. I don't understand the -rerun here -you don't create a symbol for that and I can't see a reference anywhere. you are being repetitive here since you explain better in 190-195. 185-190: out of place -better around 150. 195-200: can you please define more formally what a forward ECOOv4 simulation is? a free run? No explanation previously of using N2 from WoA -I assume this is why you try to justify the do/dz presented earlier right? This is badly connected.
Kp_ECCO is 10-5 after optimisations or before? You said it was spatially vrying! Kp_ECCO is before or after state estimate!? is Kp_ECCO from E-CTRL? You are not making the reader life's easier without naming simulations and parameters properly. 210-215: Explain better why the results are independent of the run length. Is that just because you are using data in a climatological mode (averaging everything into a clm year!?). If you adjust fluxes and use observations, how the length doesn't count if by varying length you vary the amount of parameters to fit and such the Cost function? Your Eo starts from a different initial condition? Results in the methods? Sorry but this part is a bit of a mess -I think you should explain the sensitivity analysis and the assimilation before this part because this raises all types of questions (how the cost function is, background covar, B/R/Gain matrices, etc).
Figure2. This is a bit of out place and the reason why is not clear. The picture could show much more, such as the standard deviation of both model and obs. The colorbar got strange fonts, was this just pasted on the side. I would flick the centre of the picture (in fact all of them) to 180E picture to the pacific -since this is where errors are larger and where more data points are present (landmasses are distracting here). What is the depth of averaging? I would at least expect that you follow the 250-500/500-1000/1000-2000 you used in other figures so to give us something to reference regarding your results later.
225-230: I found the description here too simplistic and miss something more formal. This session falls apart without the apriori information. How the background error covariance is computed? Is it independent (apparently yes)!? What is the decorrelation length scale used? Several apriori facts are important here and the lack of these details are very concerning. How the setup and the equations are optimized/solved is nowhere to be seen. This is the time to describe more fully how things work -so far all we got were sprinkles of incomplete infromation. Use a clear equation with the full state vector in each experiment. Is the 4dvar inner-loop a whole year? 235: That 2% is optimistic. We need more details on how you declare the obs error. I think a lack of detail here is jarring and gives a bad indication/lack of attention on how the assimilation was setup. What is being corrected? Why you are not correcting Kp jointly with other observations? Just so your equations/W/J are simpler? If you don't use other state paremters in the equation, you are not using the full potential of the system (fitting the errors with T/S/SSH+Kp). I'm puzzled and can't see how these experimenets are being conducted.

238: concerning Kp (and others).
240-245: What!? So you use model error/obs error as "sensitivity" for the Kp experiment? Explain why you decided to do this. I don't think you can use this equation -W is the solution -and you are imposing it? so you are just looking at the sign of "forecast error" here and scaling it by the obs error!? Maybe using better wording or explaining better would remove the guesswork. 245-250: "Short of assimilating ... we assess whether the assimilation of a particular dataset *could lead* to a more ..." -!!! -This should be The first line in 225 -you are just confusing the reader -all of that text to say that you are just looking at the innovations and not performing it (apart from the Oxygen experiment I guess). I'm puzzled here about what is being done -you need to clarify the whole section 2. What you are optimizing here -W is usually the solution to the problem but you are imposing it now? 240-245: All that worry about Kp from different observations having different origins to just set the values like this? Also, the uncertainty here is related to the model to obs, not to obs to obs. This part doesn't bode well for a robust setup in the sensitivity task. Also, there is no discussion about these settings and the sensitivity impact -you just let it for later I assume? (But this never happens down the results...) 245-250: "Because the observations of Kp are not direct measurements...": Again nothing is a direct measurement (maybe ~ Temperature is the closest thing), so this is not the reason to seek how the model Kp differ from observations...You just need to understand how the model errors are distributed in space/time. You just need to know how (y-Sxtilde) looks like -just said it. 250-255: "However we dont want to assimilate ... because of their uncertainties and still limited spatial coverage relatively to oxygen". Why not? -because your constraint is only to Kp -you will be overfitting? Or because the equations you are using are not up to the Kp statistical log distribution? Why not try to solve the problem by assimilating Kp with all the other ECCOv34 parameters? This phrase here is probably locked in with the methods you are using so better to describe these insights with good information. Again some important concepts and insights are not being fully described here. I'm surprised by how this entire section confuses the reader. 256: I would assimilate both since they probably provide both information -but I'm not sure because you don't say what you are fitting here (boundary conditions? initial conditions? fluxes? model parametrizations? 260-265: "is more than a factor of 3/ above". How this choice relates to the specified errors in the fitting? No explanation and insight provided -seems a match-fixing kind of a problem when reading at first. 265-270: I'm quite surprised by the lack of a discussion on how fitting for Kp will improve the model run since this is a parametrisation -the impact/practicalities and impact on dynamics are not discussed at all. The author refers that Kp is fixed in ECCO, but how is the model/analysis will perform after Kp is improved jointly or alone is left to the imagination. I understand now that the author is not looking for the analysis but just for the impact, but given the exoteric parameters, a mention of how this will flow down in the model run is important.

270-280:
More methods in results? "A geometric average is taken ...". New information about kp "log-nomal". these need to be properly defined beforehand. Figure4. The right figures are misleading in regards to the other figures in the paper since white is not where data is missing, but where kp_ecco~=kp_argo! I would recommend adding dot points where this situation was found to distinguish from the lack of data. kp_argo is better than kp_w15 reference, just as well kp_ctd is better than kp_k17. Finally, I would help the reader here and say "that red(blue) areas are where argo is smaller (higher) than ECCO" since the log10 ratio.
295-300: I don't agree with how you define/use constraint here. Lack of agreement is not a lack of constraint. A realistic constraint is what? perfect match? the same order of magnitude? Define what you consider a good constraint or not. You can't in the paper because no one knows what is the initial Kp at all and how it was improved from the base case. 300-305: This is just blaming without a proper discussion which should have happened beforehand to explain the limitations of the state estimate. Although I agree that compressing all the information required in a short text is a challenge, this ending appears a shoot in the foot -you can't give the reader proper insights of why kp is not better constrained. You forgot to mention the numeric nature of the parametrisation and its limitations (plus the other parameters). Wouldn't be a leakage towards fixing the other parameters instead of kp? I miss some discussion around that. 310-315: "Because kp_ecco tends to be very large inside mixed layers" -another sprinkle of missing information in the middle. Couldn't you provide your readers with how kp_ecco is beforehand? 311-312: Another of saying it is "A positive adjoint sensitivity implies that the model overestimated Kp". But isn't the kp less than an order of magnitude compared to micro in Fig3? Globally, the tendency of your "adjoint" here is to increase Kp (more red than blue in Fig4), which is akin to what your GMAO results (appendix) are doing and overshooting the Kp beyond the microstructure. There is no discussion around this. 310-320: Why not a figure with adjoint sign profiles similar to Fig3? This would be more helpful than the whole description by region. Figure 5. Why not include a comparison of signals here? better than asking the reader to do that with these small figures and very gappy coverage. I'm also now presented that the calculations are done only for one year -1992 -which is another surprise since this is not mentioned or discussed anywhere. 325-335: A global metric would be better here. Better metrics to compute %s would be beneficial in the paper since this is rather arbitrary. Maybe one or two regions of focus (one with a lot of obs -Kuroshio/North pacific and South Indian ocean? I found a bit daunting all the %s without a laser focus on the process at hand and where it will impact the most. The figures definitely need to be centred shifted towards the pacific. 335-340: I think this deserves a new paragraph and more explanation since it is an important result. 340-345: Asking readers to calculate the white regions in one figure that are not white in the other !? tip: Making the life of the reader easier is the best way to make them happy. This is the most problematic aspect of using Fig5/Fig6 together since the white parts are misleading. You already set the reader mind that white is missing data, and now some figures perturb this notion and are used to reason about the results. I would refit 350:355: I would also wrap this in another paragraph since the insight here is important and related to the next paragraph at ~360. 362: Finally a good insight, but not without trouble. You don't describe anywhere how N2 is distributed, statistics, or where it will likely dominate against Kp and how it is related to it. Also, how this would be fit together with Kp (and other parameters) is a missing point in the paper discussion. 365-375: This end is badly written and looks like a last-minute addition. I would rewrite it since it is the concluding remarks. 375-400:. I think it is too shallow here. There is no broad discussion of alternatives, cause/effects. 390: Again the uncertainties are not discussed in terms of model error/background covariances/ inflation or decorrelations. 400-435: yes, yes, several things can affect but you don't discuss the real deal: fencing your results so people can locate themselves of what needs to be done next or how to relate this paper to their problem. IMO, at this stage, the reader is just tired of unlinked/big scoped caveats/problems instead of pin-pointed smaller scope discussion. 440-eof : I will leave it to other reviewers/editors to see if this is important to be kept in the paper. Certainly, there are not enough references in the text to this section although some results are interesting from the point of view of assimilation. Also, important references in the appendix are not mentioned in the text and even some discussions related to the results presented are better than in the paper itself. Puzzling to understand why the author didn't include some of that in the actual sections!