Articles | Volume 21, issue 5
https://doi.org/10.5194/os-21-2579-2025
© Author(s) 2025. This work is distributed under the Creative Commons Attribution 4.0 License.
On the global reconstruction of ocean interior variables: a feasibility data-driven study with simulated surface and water column observations
Download
- Final revised paper (published on 24 Oct 2025)
- Preprint (discussion started on 24 Feb 2025)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on egusphere-2025-705', Anonymous Referee #1, 05 Apr 2025
- AC1: 'Reply on RC1', Aina Garcia, 05 Aug 2025
-
RC2: 'Comment on egusphere-2025-705', Anonymous Referee #2, 17 Apr 2025
- AC2: 'Reply on RC2', Aina Garcia, 05 Aug 2025
Peer review completion
AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
AR by Aina Garcia on behalf of the Authors (05 Aug 2025)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (07 Aug 2025) by Bernadette Sloyan
RR by Anonymous Referee #1 (16 Aug 2025)
ED: Publish subject to minor revisions (review by editor) (27 Aug 2025) by Bernadette Sloyan
AR by Aina Garcia on behalf of the Authors (04 Sep 2025)
Author's response
Author's tracked changes
Manuscript
ED: Publish as is (05 Sep 2025) by Bernadette Sloyan
AR by Aina Garcia on behalf of the Authors (05 Sep 2025)
Summary
García-Espriu conduct an observing system simulation experiment (OSSE) to evaluate the feasibility of reconstructing ocean interior temperature and salinity from in situ observational data and satellite observational data products. The authors leverage output from the CMEMS Global Ocean Ensemble Reanalysis product to conduct this experiment, and they subsample the product at times and locations where Argo float profiles are available. They then use these subsampled synthetic profiles to train machine learning models, which they apply to satellite products to reconstruct ocean interior properties, and they compare these reconstructions against the reanalysis “truth” to evaluate the skill of the reconstruction methods.
The authors find that the more complex versions of their random forest regression (RFRv2) model and Long-Short Term Memory (LSTMv2) network are able to reproduce ocean temperature with an R2 of around 0.85 and salinity with an R2 of around 0.95. They validate their models with synthetic profiles withheld from model training and by using a regional subsection of the reanalysis dataset. They report validation statistics spatially and by depth, concluding that the RFRv2 model performed better in terms of the evaluation statistics against the test dataset but the LSTMv2 model was better able to represent the data in terms of variability over time and space. The authors also use SHapley Additive exPlanations (SHAP) to interpret their trained models.
Overall, I support the approach this manuscript takes to question of how in situ and satellite observing systems can be leveraged to reconstruct ocean interior properties. However, it falls short in its execution and interpretation of the analysis. Most importantly, the authors could attempt to remedy or discuss more extensively the shortcomings of the models to predict ocean interior variables from primarily surface data and the results could be better placed into context among similar studies that reconstruct ocean interior properties from observational data.
General suggestions
One aspect that I think is missing from the manuscript is the contextualization of the authors’ results with similar methodologies that have been applied to map salinity and temperature from observations (a few of which are referenced in the introduction). Although not all studies that reconstruct ocean interior properties from observations include a reanalysis-based evaluation of mapping accuracy (as is the focus of this manuscript), many report error statistics of their reconstructions evaluated against independent data. Su et al. (2018), for example, evaluate their reconstructed subsurface temperature anomalies using root mean squared error and R2 as metrics, and the results of the OSSE reported here could be evaluated against those results.
In general, I was surprised to see such high disagreement with the test data at depth, when temperature and salinity should be more constant in space and time, and therefore relatively easier to reconstruct than at the surface. Buongiorno Nardelli (2020), for example, retrieve minimum errors for temperature and salinity at depth. This points, in my opinion, to an aspect of the methodology that can be significantly improved. It is not particularly surprising that a model based primarily on surface characteristics would struggle to estimate temperature and salinity at 1000 meters. I suspect a strategy of somehow de-emphasizing the impact of the surface predictor datasets as depth increases might improve these high offsets at depth. In any event, this is another instance where contextualization of the results of this OSSE would be helpful.
Lastly, the authors miss an opportunity to incorporate uncertainties into their experiment, or at least to discuss their implications. OSSEs present an opportunity to mimic real-world conditions; in reality, satellite observations are not perfect, nor are temperature and salinity measurements from profiling floats. Incorporating measurement uncertainty estimates in the analysis would be an important piece for answering the central question of how feasible it is to use satellite and in situ data to reconstruct ocean interior properties.
Line-by-line comments
Abstract: I would suggest defining the simulated in situ measurement platforms as “Argo floats” or “profiling floats” rather than buoys in the abstract.
28: Presumably this should say “subsurface temperature and salinity”?
86: Awkward phrasing in reference to the equatorial region.
97: punctuation issue here
161-166: I’m not sure I understand the training and test split. Are you withholding some percentage of the dataset on a daily frequency (if so, what percentage?) for testing during model training? How does this differ from the ground truth dataset that is being used for evaluation?
173: It would be helpful to specify the metric you are referring to when discussing “accuracies”
239: What is meant by “it does not overlap with the training dataset”? There are no Argo profiles from 2008-2009 in this region?
272: should be “…each of them with their own…”