Articles | Volume 20, issue 3
Research article
27 May 2024
Research article |  | 27 May 2024

Combining neural networks and data assimilation to enhance the spatial impact of Argo floats in the Copernicus Mediterranean biogeochemical model

Carolina Amadio, Anna Teruzzi, Gloria Pietropolli, Luca Manzoni, Gianluca Coidessa, and Gianpiero Cossarini

Biogeochemical-Argo (BGC-Argo) float profiles provide substantial information on key vertical biogeochemical dynamics and have been successfully integrated in biogeochemical models via data assimilation approaches. Although BGC-Argo assimilation results have been encouraging, data scarcity remains a limitation with respect to their effective use in operational oceanography.

To address availability gaps in the BGC-Argo profiles, an observing system experiment (OSE) that combines a neural network (NN) and data assimilation (DA) was performed here. A NN was used to reconstruct nitrate profiles, starting from oxygen profiles and associated Argo variables (pressure, temperature, and salinity), while a variational data assimilation scheme (3DVarBio) was upgraded to integrate BGC-Argo and reconstructed observations in the Copernicus Mediterranean operational forecast system (MedBFM). To ensure the high quality of oxygen data, a post-deployment quality control method was developed with the aim of detecting and eventually correcting potential sensors drift.

The Mediterranean OSE features three different set-ups: a control run without assimilation; a multivariate run with assimilation of BGC-Argo chlorophyll, nitrate, and oxygen; and a multivariate run that also assimilates reconstructed observations.

The general improvement in the skill performance metrics demonstrated the feasibility of integrating new variables (oxygen and reconstructed nitrate). Major benefits have been observed with respect to reproducing specific biogeochemical-process-based dynamics such as the nitracline dynamics, primary production, and oxygen vertical dynamics.

The assimilation of BGC-Argo nitrate corrects a generally positive bias of the model in most of the Mediterranean areas, and the addition of reconstructed profiles makes the corrections even stronger. The impact of enlarged nitrate assimilation propagates to ecosystem processes (e.g. primary production) at a basin-wide scale, demonstrating the importance of the assimilation of BGC-Argo profiles in forecasting the biogeochemical ocean state.

1 Introduction

The Argo programme appears to be one of the better examples of the capacity of countries and human resources to work together to provide global data coverage (Miloslavich et al.2019) that supports the investigation of present (analysis), future (forecast), and past (reanalysis) ocean state conditions. Over the last 10 years, the increase in the in situ observations from autonomous platforms (Johnson et al.2013; Johnson and Claustre2016) has opened up new perspectives for biogeochemical oceanographers. Indeed, Biogeochemical-Argo (BGC-Argo; Argo2022) has yielded new insights into the interior of the global ocean (Le Traon2013) and key processes such as the deep chlorophyll maximum (Mignot et al.2014; Barbieux et al.2019; D’Ortenzio et al.2020; Ricour et al.2021; Barbieux et al.2022), nutrients' vertical fluxes (Taillandier et al.2020; Wang et al.2021b), carbon exports (Dall'Olmo and Mork2014; Wang and Fennel2023), and oxygen dynamics (Capet et al.2016).

With approximately 270 000 profiles worldwide (as of July 2023), oxygen (O2) is currently the most commonly measured variable. The count of O2 profiles is 2 times that of suspended particles and chlorophyll and more than 4 times that of nitrate, downwelling irradiance, and pH (, last access: 17 July 2023). Since 2019, the availability of nitrate and chlorophyll profiles has progressively decreased due to the high cost of the sensor (Giorgio Dall'Olmo, personal communication, 2023). In contrast, the number of oxygen profiles initially decreased (2019–2022), but it has been stable or has slightly increased since 2022. In the future, Argo Italy envisages mounting oxygen sensors on all Argo floats in the Mediterranean Sea (discussion in the workshop on “Copernicus Marine requirements for the in situ Observing Systems”, 14–15 September 2023).

The BGC-Argo data are distributed by the Global Data Assembly Centres (GDACs, e.g. Coriolis, NOAA) in real time (RT) adjusted mode (AM) and delayed mode (DM). The quality of AM data is controlled within 24 h using internationally agreed upon and automatic quality control (QC) procedures, while DM data are generally distributed a few months later (nearly 6 months) in a more rigorous form (Li et al.2020). The QC tests, conducted across all of the data mode levels, aim to assign a quality flag to every observation. Data labelled as 1, 2, 5, and 8 are categorized as good, probably good, changed, and interpolated values, respectively. The flag 9 indicates missing data, while flags 3 and 4 denote data as probably bad or bad, respectively.

In the case of oxygen, QC is mainly performed at the surface, along the entire vertical profiles, and along the trajectory (Thierry and Bittig2021), excluding specific tests at depth. The implementation of O2 QC tests is mainly devoted to improving the long-term reliability and accuracy of autonomous measurements (Sauzède et al.2017), particularly concerning sensor drift (the optode drift).

When sensor drift exists, it is higher during storage, out of the water, than during deployment. As described in Takeshita et al. (2013) and Maurer et al. (2021), raw oxygen data from floats may exhibit errors of up to 20 % in terms of oxygen saturation (at the surface) due to sensor drift occurring during storage. This drift is typically corrected by multiplying the oxygen concentrations by a gain factor term that is derived from a reference dataset (Johnson et al.2015). Despite efforts to correct drift during storage, which may enhance accuracy by 5 %–10 %, it is likely that drift is still observed in situ (or during deployment). For instance, Maurer et al. (2021) observed drift rates in about 25 % of the 126 floats analysed for the Southern Ocean Carbon and Climate Observations and Modeling (SOCCOM) project. These drift rates spanned a total range of 1.1 % to 1.2 % yr−1 with a standard deviation of 0.65 % yr−1. Similarly, Bushinsky et al. (2016) found the presence of drift rates in about 70 % of the floats deployed in the northern Pacific Ocean. Notably, both positive and negative drift rates were observed across various studies, including those by Johnson and Claustre (2016), Bushinsky et al. (2016), Bittig et al. (2018a), and Maurer et al. (2021).

The development and dissemination of a post-deployment oxygen QC aims to avoid spurious results (Wang et al.2020) and to distinguish between ocean signals or trends (e.g. deoxygenation) from potential drifts. This allows one to obtain more robust datasets suitable for specific numerical modelling applications.

Aiming at optimally combining observations and model information to obtain a closer description of reality, data assimilation (DA) underpins decades of progress in ocean prediction (Geer2021). On the one hand, advancements began with an increase in the number of available observations over the past decade, encompassing both the number of measured variables and the total observations used for model tuning (Wang et al.2020; Yumruktepe et al.2023; Wang and Fennel2023) and validation (Terzić et al.2019; Salon et al.2019; Wang et al.2021a). On the other hand, DA schemes have been progressively updated to enable multivariate and multi-platform assimilation (Cossarini et al.2019; Teruzzi et al.2021; D'Ortenzio et al.2021), retrieve associated uncertainty in prediction models, and solve problems connected to uneven distribution and/or scarcity of the observations (Buizza et al.2022).

In recent years, data assimilation (DA) techniques have increasingly incorporated neural network (NN)-based tools. The main strength of NN algorithms lies in their ability to approximate continuous functions (Hornik et al.1989) with remarkably low computational times. These NN-based tools have been integrated into DA frameworks to tackle various DA challenges, such as bias correction (Kumar et al.2015; Zhou et al.2021), reformulation of observation operators (Storto et al.2021), and cross-calibration (Lary et al.2018). Furthermore, NN algorithms have frequently been used as independent tools, distinct from DA, to generate new products and/or reconstruct datasets (Lary et al.2018). The use of reconstructed datasets may compensate for potential gaps in observation availability, potentially enhancing the predictive skill of numerical models. As an example, ocean colour (OC) datasets were employed to test multi-layer perceptron (MLP), the most common type of NN, with respect to retrieving past and long-term biogeochemical (BGC) time series of phytoplankton and chlorophyll (Martinez et al.2020a, b; Roussillon et al.2023). Moreover, in Sauzède et al. (2016), MLP served to infer the chlorophyll vertical BGC distribution from OC. High predictive performance with respect to predicting BGC states (e.g. oxygen) from physical profiling float measurements was achieved in Stanev et al. (2022) for the Black Sea.

In Sauzède et al. (2017), a multi-layer perceptron neural network (hereafter MLP-NN) was used to approximate the nutrient concentration and carbonate system from physical Argo and BGC-Argo oxygen profiles. The updated version of the method presented in Bittig et al. (2018b) allows for further refinement of this approach with the so-called CANYON-b NN method. A configuration to adapt the global CANYON-b NN in the Mediterranean Sea region has been developed by Fourrier et al. (2020). A further update of the application of the MLP method to the Mediterranean Sea is provided in Pietropolli et al. (2023), entailing lower error in the nutrient predictions through a larger training dataset, hyperparameter refinement, and two-step QC of the input data. Given its potential for predicting nutrient profiles, the MLP-NN model outputs are a valuable datasets that can be used to fill the gap in the availability of in situ observations in DA.

In the context of operational oceanography, the biogeochemical modelling component of the Copernicus Marine Service for the Mediterranean Sea (MedBFM) provides analysis, short-term forecast (Salon et al.2019), and long-term reanalysis (Cossarini et al.2021), including the assimilation of satellite OC and BGC-Argo observations (Salon et al.2019). In MedBFM, the 3DVarBio variational assimilation scheme has evolved over time by including a greater number of observation types and variables. Starting from the first release that included OC DA in the open ocean (Teruzzi et al.2014), the assimilation has progressively developed to handle coastal OC observations (Teruzzi et al.2018) and chlorophyll and nitrate profiles from BGC-Argo (Cossarini et al.2019, and Teruzzi et al.2021, respectively). Considering the growing availability of O2 from BGC-Argo, this paper presents an additional upgrade of MedBFM that includes BGC-Argo oxygen assimilation, with a novel post-deployment QC, and the integration of NN-reconstructed profiles in the assimilation scheme.

The constant evolution of observation networks and assimilation capacities requires an updated understanding of the impact of observation on the numerical model results (Gasparin et al.2019). This can be achieved by using the numerical assimilative models in observing system experiments (OSEs), in which the impact of existing observations on the model performance is assessed (Le Traon et al.2019). In this paper, the OSE experiment, which combines DA and NN in a modular approach, aims to quantify how the Argo and BGC-Argo network can be exploited. The sequential use of NN and DA schemes provides the flexibility of using one module independently of the other, depending on the needs of the overall system (Buizza et al.2022). The DA module used in this work is the 3DVarBio DA scheme described in Teruzzi et al. (2021) and updated to assimilate BGC-Argo oxygen profiles. The NN module is the NN-MLP described in Pietropolli et al. (2023), for the Mediterranean Sea (hereafter NN-MLP-MED).

The spatial and temporal impacts of the OSE have been evaluated using classic and new skill performance metrics in three 2-year (2017–2018) numerical experiments performed using MedBFM coupled with 3DVarBio: a control run (HIND) without assimilation; a multivariate run (DAfl) with assimilation of BGC-Argo chlorophyll, nitrate, and oxygen profiles; and a multivariate run that also assimilates the in situ observations and NN-reconstructed profiles (DAnn). Given its characterization as a miniature ocean suitable for climate studies (Bethoux et al.1999) and considering the density of BGC-Argo profiles, the Mediterranean Sea represents an ideal site to conduct OSE studies to assess the feasibility of assimilating BGC-Argo profiles and analysing their impacts.

Indeed, the Mediterranean Sea is an anti-estuarine, semi-enclosed sea (Pinardi et al.2015) with a complex overturning circulation. This circulation consists of horizontal mesoscale and subbasin-scale gyre structures, transitional cyclonic and anticyclonic gyres, and eddies. These dynamics are influenced by bathymetric features interconnected by currents and jets (Oddo et al.2009), along with vigorous vertical velocities. Furthermore, the shallow Strait of Sicily, with a depth of approximately 500 m, separates the western Mediterranean from the eastern Mediterranean. This geographical feature allows different processes to dominate in each of the two regions and limits exchanges to between the surface and intermediate waters (Pinardi et al.2015). Even from a BGC perspective, the Mediterranean Sea can be roughly subdivided into the western and eastern Mediterranean sectors, characterized by an oligotrophic west–east gradient. This gradient results in low nutrient availability at the surface, which is generally insufficient to sustain high phytoplankton biomass (Siokou-Frangou et al.2010; Marañón et al.2021). Additionally, there is a deeper nitracline in the east (> 120 m) compared with the west (< 100 m). Chlorophyll has a particular seasonal cycle, with pronounced winter/early-spring surface blooms only in the western part and a few locations in the eastern part. During summer, a deep chlorophyll maximum follows the stratified and oligotrophic conditions at increasing depth moving eastward (> 100 m in the east and < 100 m in the west) (Teruzzi et al.2021). Dissolved oxygen has a subsurface maximum at about 50 m, with higher values in the west (partly due to the dependence of oxygen solubility on temperature). Noticeable differences are observed in the intermediate layers, where the oxygen minimum ranges between 300 m (west) and 1000 m (east) (Di Biagio et al.2022).

While the general dynamics of BGC processes can be summarized by a two-basin gradient, it is important to note that mesoscale and sub-mesoscale events can impact the Mediterranean Sea at the subbasin scale. These events can create intense local dynamics, such as blooms and water column stratification, which are often associated with eddy activity and peculiar vertical circulation. Reproducing these phenomena in numerical model simulations can be more challenging, as they are prone to encountering high model bias or representativeness error.

The paper is organized as follows: after a brief presentation of the OSE approach, each component and the experimental set-up are described in detail (Sect. 2); in Sect. 3, we describe the results of the novel NN-MLP-MED and the assimilation simulations by using different skill metrics to assess the model capability with respect to reproducing the main BGC seasonal dynamics; a discussion of some key issues involved in the NN and DA is provided in Sect. 4; the paper then closes with some final remarks (Sect. 5).

2 Methods

A novel combined neural network (NN-MLP-MED) and data assimilation (3DVarBio) approach is included in the Mediterranean MedBFM model system to integrate BGC-Argo and NN-reconstructed profiles into BGC simulations of the Mediterranean Sea.

Our OSE experiment is based on a sequential modular approach (Buizza et al.2022) consisting of a post-deployment quality control method for O2, hereafter QC O2 procedure, a trained multi-layer perceptron NN (Pietropolli et al.2023), and a DA scheme (the 3DVarBio variational scheme of MedBFM) (Fig. 1).

The first two modules, QC O2 and NN-MLP-MED, use BGC-Argo and Argo datasets as input. The 3DVarBio module takes the enhanced dataset, quality-checked O2 (QC O2) and reconstructed nitrate (recNO3) (Fig. 1), as input.

In the following sections, we introduce the components of the MedBFM system, including the transport model (OGSTM; Foujols et al.2000; Lazzari et al.2012, 2016) and the Biogeochemical Flux Model (BFM; Vichi et al.2007a, b). Additionally, we describe the novel modules, namely the QC O2 procedure and the NN-MLP-MED scheme. Furthermore, we outline the dataset, which comprises BGC-Argo and NN-reconstructed datasets, and discuss the revised 3DVarBio approach.

Figure 1Flowchart of the NN-MLP-MED and DA approach. Green boxes represent the modules, plain boxes represent the datasets, and arrows refer to Argo (temperature and salinity) and BGC-Argo profiles of chlorophyll (Chl a), oxygen (QC O2), nitrate (NO3), and reconstructed nitrate (recNO3).


2.1 The regional model for the Mediterranean Sea (MedBFM)

The MedBFM consists of the tracer transport OGS Transport Model (OGSTM), based on the OPA 8.1 system (Foujols et al.2000) and updated according to the Lazzari et al. (2012) and Lazzari et al. (2016) versions; the BFM described in Vichi et al. (2007a) and Vichi et al. (2007b); and the 3DVarBio variational assimilation scheme as in Teruzzi et al. (2014) and Teruzzi et al. (2018).

OGSTM solves for advection, diffusion, and sinking terms as well as considering the effects of the free surface and variable volume-layer effects on tracer transport (Salon et al.2019). It is forced by output variables such as current, temperature (T), salinity (S), and sea surface height from the NEMO3.6 model (Clementi et al.2017). OGSTM and NEMO3.6 share the same bathymetry and z* grid configuration as well as the same open boundary and river conditions (Coppini et al.2023). Atmospheric forcing, including solar short-wave irradiance and wind stress, is acquired as 2D daily fields from the European Centre for Medium-Range Weather Forecasts (ECMWF), as detailed by Salon et al. (2019).

BFM is a biomass- and functional-group-based marine ecosystem model. It solves governing equations for nine living organic state variables, diatoms, autotrophic nanoflagellates, picophytoplankton, dinoflagellates, carnivorous and omnivorous mesozooplankton, bacteria, heterotrophic nanoflagellates, and microzooplankton; macro-nutrients (nitrate, phosphate, silicate, and ammonium); and labile, semi-labile, and refractory organic matter and oxygen. In addition, BFM includes a carbonate system model (Cossarini et al.2015a; Canu et al.2015).

2.2 3DVarBio data assimilation scheme

Based on 3DVarBio (Teruzzi et al.2014, 2018; Cossarini et al.2019; Teruzzi et al.2021), the assimilation module adopted in the present work integrates oxygen, chlorophyll, and nitrate to update all of the assimilated variables as well as all of the phytoplankton biomass and phosphate.

The 3DVarBio is a variational DA scheme (Teruzzi et al.2014) based on the minimization of a cost function (J). This function comprises two terms: (i) the misfit between the model background (xb) and the model control state variable or analysis (i.e. the assimilation result xa) and (ii) the mismatch between the observations (y) and the analysis (xa). Both terms are weighted by their respective error covariance matrices (B and R) as follows:

(1) J x a = x a - x b T B - 1 x a - x b + y - H x a T R - 1 y - H x a .

Here, the observation operator (H) maps the values of the model background state in the observation space. Following Dobricic et al. (2007), the background error covariance matrix, B, is factorized as B=VVT with V=VVVHVB. The V operators describe different aspects of the error covariances: the vertical error covariance (VV), the horizontal error covariance (VH), and the state variable error covariance (VB). VV is defined by a set of reconstructed profiles evaluated by means of an empirical orthogonal function (EOF) decomposition applied to a validated multi-year (1998–2015) run (Teruzzi et al.2018). EOFs are computed for 12 months and 30 coastal and open-sea subregions in order to account for the variability in BGC anomaly fields. VH is built using a Gaussian filter whose correlation radius modulates the smoothing intensity. As in Cossarini et al. (2019), the correlation radius in this work is non-uniform, direction-dependent, and ranges between 12 and 20 km (16 km on average). The VB operator consists of prescribed monthly and subregion varying covariances among the BGC variables (e.g. nitrate to phosphate). Specifically, for the assimilation of chlorophyll, the VB operator includes a balance scheme that maintains the ratio among the phytoplankton groups and preserves the physiological status of the phytoplankton cells (i.e. preserves the internal ratios between the chlorophyll, carbon, and nutrients, as described in Teruzzi et al.2014).

The operators VV and VB of 3DVarBio have been updated for the assimilation of oxygen. VV involved the calculation of specific EOF profiles for oxygen, including a localization function to avoid unrealistic corrections due to possible spurious error covariances in the deepest part of the water column.

VB included only a new direct relation for oxygen (i.e. oxygen assimilation updated only the oxygen itself), given that it has been shown that it barely affects other variables (Skakala et al.2021). In the BFM model equations, few formulations depend on the oxygen concentration (e.g. nitrification). Indeed, when the euphotic zone of the open ocean is well oxygenated, oxygen dynamics have a limited impact on the BGC cycles.

The assimilated observations consist of the quality-controlled BGC-Argo dataset listed in Table 1. Oxygen and nitrate profiles in the 0–600 m layer are used in the assimilation, while chlorophyll is assimilated in the 0–200 m layer.

The observation error covariance matrix R is diagonal with a monthly varying error in chlorophyll (Cossarini et al.2019). In both the nitrate BGC-Argo profiles and the reconstructed nitrate profiles, the observation error remains constant over time and increases along the vertical direction. Within the 0–450 m layer, the error is set at 0.24 mmolm−3, as in Mignot et al. (2019), and the linearly then increases up to 0.35 mmolm−3 between 450 and 600 m (the maximum assimilation depth). This adjustment aims to prevent inconsistencies between the lower part of the assimilated layer (450–600 m) and the deeper layer of the water column (below 600 m). Although the accuracy of the reconstruction of profiles is 0.87 mmolm−3 (Pietropolli et al.2023), we decided to not use different values of error for the two nitrate subsets in order to show the highest potential impact of the OSE.

Observation error for oxygen is set to 5 mmolm−3 in the upper 200 m of depth and gradually increases to 20 mmolm−3 in correspondence with the maximum assimilation depth. These values correspond to the uncertainty associated with the oxygen dataset described in Feudale et al. (2022).

2.3 The architecture of the neural network module and the reconstructed nitrate dataset

NN-MLP-MED (Pietropolli et al.2023) is the evolution of previous MLP architectures developed to predict variables sampled with low frequency (e.g. nutrients) starting from variables sampled with high frequency (e.g. temperature) (Sauzède et al.2017; Bittig et al.2018b; Fourrier et al.2020).

NN-MLP-MED is a deterministic feed-forward neural network based on an MLP structure. It consists of the merging of 10 different MLP architectures, each one with the same input and output features, composed by two hidden layers with varying numbers of neurons per layer. The final prediction resulting from NN-MLP-MED is the mean of all of the predictions of these components. The data flow of the MLP-based approach follows the forward direction from the input to the output layers through the neurons that compose the layers. In our OSE experiment, the trained NN-MLP-MED reconstructs nitrate profiles from sets of temperature, salinity, oxygen, date, latitude, and longitude BGC-Argo profiles.

NN-MLP-MED introduces several innovative features compared with the mentioned methods (e.g. CANYON-Med; Fourrier et al.2020), thereby leading to improved results.

Firstly, the input dataset encompasses a larger sample size and broader coverage of the Mediterranean Sea region. The EMODnet (European Marine Observation and Data Network) data collection, as described by Buga et al. (2018), consists of multi-platform data gathered from different research cruises and monitoring activities in Europe's marine waters and global oceans. This dataset is characterized by its multivariate nature, including various BGC observations, such as chlorophyll, nitrate, phosphate, dissolved oxygen, dissolved inorganic carbon, and alkalinity, collected between 1999 and 2018. Additionally, this dataset is further enriched with in situ observations spanning the period from 1999 to 2016, as detailed in Lazzari et al. (2016) and Cossarini et al. (2015b).

Secondly, the input dataset benefits from a two-step QC process, removing noisy and unreliable samples. The NN architecture was also modified to enhance prediction performance by accurately selecting a performing non-linear function, adjusting and optimizing the number of neurons for each layer of the MLP model, and choosing a different optimization strategy to train the algorithm. NN-MLP-MED also includes a vertical smoothing step (running mean of 5–10 m window) and a climatological adjustment at depth (600 m) derived from the EMODnet dataset (Salon et al.2019).

The uncertainty in the reconstructed nitrate associated with the EMODnet validation dataset is 0.5 mmolm−3, while it reaches 0.87 mmolm−3 when predicting the BGC-Argo dataset (Pietropolli et al.2023).

After incorporating the NN-reconstructed profiles (recNO3), the nitrate dataset used for assimilation expands to 2146 profiles from the initial 938 nitrate (NO3) profiles (Table 1). Generated by the NN-MLP-MED module, the reconstructed dataset offers broad spatial coverage across the 16 regions of the Mediterranean Sea (Fig. 2) as well as a quite balanced distribution of nitrate data throughout the seasons (Fig. 3), with the addition of 218 NN-reconstructed profiles of nitrate in winter and 361 in summer.

2.4 BGC-Argo data and the post-deployment QC O2 module

BGC-Argo profiles from 2017 to 2018 were downloaded from the Coriolis GDAC (Argo2022; last visited in July 2022). We collected both AM and DM data for oxygen and chlorophyll. For nitrate, we selected DM data, while AM data were incorporated after undergoing correction via the CANYON-b NN method or using the World Ocean Atlas (WOA18) collection (Garcia et al.2019), as explained in Johnson et al. (2021). For the three variables, we use data flagged as good, probably good, changed, and interpolated values (flags 1,2, 5, and 8, respectively).

Table 1 reports the total number of BGC-Argo profiles, characterized by a high number of oxygen and chlorophyll data against the relative paucity of nitrate. Figure 2 shows the spatial distribution of BGC profiles of chlorophyll and nitrate across the Mediterranean Sea. The oxygen coverage can be approximated by merging nitrate and reconstructed nitrate profile locations.

To provide more clarity with respect to analysing the data availability, the Mediterranean Sea has been divided into the following 16 subbasins:

  • the Alboran Sea (alb), south-western Mediterranean west (swm1), south-western Mediterranean east (swm2), north-western Mediterranean (nwm), northern Tyrrhenian (tyr1), and southern Tyrrhenian (tyr2) in the western Mediterranean Sea;

  • the northern Adriatic (adr1), southern Adriatic (adr2), western Ionian (ion1), eastern Ionian (ion2), northern Ionian (ion3), western Levantine (lev1), northern Levantine (lev2), southern Levantine (lev3), eastern Levantine (lev4), and Aegean Sea (aeg) in the eastern Mediterranean Sea.

All three BGC variables have a fairly homogeneous spatial coverage between the western and eastern Mediterranean Sea regions, except for few subbasins not covered (alb, ion1, and adr1; see Fig. 2), and a general 5 d temporal sampling frequency. Higher sampling frequencies (< 5 d) are registered for 20 % of profiles.

Figure 2BGC-Argo profiles of chlorophyll (Chl, in white), in situ nitrate (NO3, in red), and reconstructed nitrate (recNO3, in blue) assimilated in the Mediterranean Sea (2017–2018). The Mediterranean domain was subdivided into subbasins for the validation. According to data availability and to ensure the consistency and robustness of the metrics, different subsets of the subbasins or some combinations of them are used for the different metrics: lev comprises lev1, lev2, lev3, and lev4; ion comprises ion1, ion2, and ion3; tyr comprises tyr1 and tyr2; adr comprises adr1 and adr2; and swm comprises swm1 and swm2.

Figure 3Nitrate and reconstructed nitrate profiles' seasonal availability. Light grey (autumn and spring), cyan (winter), and yellow (summer) bars represent the availability of nitrate in situ data (used in the DAfl run). Grey (autumn and spring), light blue (winter), and orange (summer) striped bars indicate the availability of reconstructed nitrate (used in the DAnn run).


As oxygen sensors may drift and lose accuracy over time, the accurate determination of dissolved oxygen is typically more challenging and requires some form of correction (Johnson et al.2015). The loss of accuracy, expressed as a percentage per year, is observed over time, particularly 12 months after deployment (, last access: 17 July 2023).

Deep ocean drift is considered to be a proxy for oxygen sensor drift because of the lack of seasonal and annual signals for oxygen at depth (Takeshita et al.2013). Here, the optode drift is evaluated using nonparametric methods (the random sample consensus, RANSAC, and Theil–Sen methods) at two different depths (600 and 800 m) to avoid possible fake drift detection because of changes in the water masses. Tests are applied when the life of a float is longer than 1 year. Conversely, if the available float time series is less than 1 year, the profiles are not corrected because the float lifetime is considered to be too short to account for in situ sensor drift.

Used for linear and non-linear regression problems, the RANSAC and Theil–Sen methods automatically partition the oxygen dataset into inliers and outliers. In order to avoid possible biases (Dang et al.2008; Fischler and Bolles1981), these methods calculate the drift based on the data subset identified as inliers.

In our approach, the presence of a drift is established when all four drift estimates (RANSAC at 600 and 800 m and Theil–Sen at 600 and 800 m) agree with respect to their sign and their average value (D_avg) exceeds 1 mmolm−3yr−1. This threshold is chosen on the basis of results in Bittig et al. (2018a). Subsequently, the identified drift is removed from the oxygen profiles. This is achieved by setting the D_avg at 600 m and linearly interpolating toward the surface, where drift is set equal to zero. As highlighted by Thierry and Bittig (2021), there is a lack of specific tests at depth, although several tests are performed near the surface by the GDACs. The presence of near-surface tests motivates our decision to mitigate the correction's impact at the surface.

2.5 Design of numerical experiments

Three numerical experiments are performed to analyse the impact of different assimilation set-ups. The simulated period is 1 January 2017–31 December 2018, and the MedBFM module set-up mostly corresponds to the standard adopted in the Mediterranean Analysis and Forecast biogeochemical system of the Copernicus Marine Service. This set-up includes the following: open boundary conditions in the Atlantic; climatological input of nutrients, carbon, and alkalinity for 39 rivers and the Dardanelles Strait; initial conditions from the EMODnet dataset (details are provided in Salon et al.2019); and a 3-year spin-up using the 2017 forcings in perpetual mode.

Our experimental set-up differs from the standard set-up with respect to the physical forcing, which is sourced from the Mediterranean Copernicus reanalysis (Escudier et al.2021), as well as for the initial oxygen conditions. These conditions are derived from the BGC-Argo dataset by generating 16 climatological profiles of oxygen after performing the QC O2 procedure and then uniformly assigning them to each grid point of the 16 subbasins shown in Fig. 2.

The three simulations, which share the same set-up except for the assimilated datasets, are as follows: (1) control run without assimilation (HIND); (2) assimilation of BGC-Argo chlorophyll, nitrate, and oxygen (DAfl); and (3) assimilation of additional reconstructed nitrate profiles used to enhance the DAfl assimilative set-up (DAnn).

Table 1Summary of the numerical experiments and assimilated BGC-Argo profiles.

Download Print Version | Download XLSX

Before integrating data in 3DVarBio, the same pre-assimilation assessment described in Teruzzi et al. (2021) is applied to the chlorophyll profiles. Nitrate profiles are rejected if the concentration at the surface is higher than 3 mmolm−3. At the surface, the oxygen profile exclusion is evaluated by calculating the difference between the uppermost oxygen measurement and the oxygen saturation (derived from temperature and salinity data from the Argo dataset, as in Garcia et al.2019). Profiles are excluded when this difference reaches the threshold of 10 mmolm−3. At 600 m, the difference between oxygen and a climatological reference oxygen at depth is calculated. Profiles are excluded when the difference reaches the threshold of 2 times the standard deviation of the same reference dataset. As a reference dataset, we chose the EMODnet2018_int data collection, which integrates the in situ aggregated EMODnet data (Buga et al.2018) and the datasets listed in Lazzari et al. (2016) and Cossarini et al. (2015b). The EMODnet2018_int dataset is available for 16 subbasins in the Mediterranean Sea (Fig. 2).

During DA, profiles are excluded when innovation exceeds specific threshold rules. For chlorophyll, the threshold is set at 2mgm−3. For nitrate, the thresholds are 1 and 2 mmolm−3 for the 0–50 and 250–600 m layers, respectively (as in Teruzzi et al.2021). Oxygen thresholds are 30 and 50 mmolm−3 for the 0–150 and 150–600 m layers, respectively (thresholds are roughly 3 times the standard deviation of the climatology computed on EMODnet data for the different subbasins). Exceeding values have to be found in at least five vertical levels within the specified layers. These exclusions aim to prevent corrections that could trigger unstable dynamics after the assimilation (Teruzzi et al.2021; Storto et al.2011; Sakov and Sandery2017; Waller et al.2018). The excluded profiles range from 0.1 % for chlorophyll to less than 1 % for nitrate.

3 Results

3.1 The post-deployment QC O2 module

The product of our QC O2 module is a quality-controlled dataset available at (Amadio et al.2023).

The QC O2 module enabled the automatic correction of in situ sensor drifts. Of the 40 floats available between 2017 and 2018, we performed the drift analysis on 16 floats, while 24 floats remained unanalysed due to the limited length of the time series. Of these 16 floats, we found a drift in 13: 4 with a positive drift and 9 with a negative drift. For the remaining three floats, the drift values were below the prescribed threshold (Sect. 2.4). At a depth of 600 m, the absolute average correction for the 13 floats is approximately 4.3 mmolm−3 yr−1. This value aligns with the ranges expressed in terms of sensor drift percentage in Bittig et al. (2018a) (1 %–1.5 %).

Figure 4 shows the evolution of oxygen profiles for a quasi-stationary float (6902687) after applying the drift correction. Consistent with findings in various studies (e.g. Bittig et al.2018a; Maurer et al.2021), the detection of drift by our QC O2 suggests a possible tendency of the optode to slowly degrade over time. After 2 years, the bias due to the drift reaches approximately 5 mmolm−3 (profiles from 1 December 2017 in Fig. 4).

The removal of drift brings the oxygen concentration at 600 m closer to the EMODnet climatological data (as shown by the green star in Fig. 4). This leads us to infer that our drift correction enables the inclusion of more profiles in the assimilated oxygen datasets.

Figure 4Depiction of the original (black) and corrected (blue) oxygen profiles for float 6902687 across four selected dates (yyyy-mm-dd). The green star refers to the EMODnet O2 climatological value in the nwm subbasin, while the horizontal line refers to the EMODnet O2 standard deviation at 600 m.

3.2 Validation using satellite and BGC-Argo datasets

The performance skill of the simulations listed in Table 1 is evaluated by comparing model results with (i) the satellite Copernicus Marine Service OC product (i.e. non-gap-filled L3 product OCEANCOLOUR_MED_BGC_L3_MY_009_143 from, last access: 17 July 2023) of chlorophyll and (ii) BGC-Argo profiles of chlorophyll, nitrate, and oxygen (Argo2022). The OC L3 satellite products downloaded from the Copernicus Marine Service catalogue are interpolated from a 1 km to 1/24° model resolution.

Specifically, we compared the daily model output with the satellite dataset and the model's first guess (i.e. the model state at 13:00 UTC before assimilation) with the BGC-Argo profiles. While the use of the first guess is a common practice in DA (Hollingsworth et al.1986), it is worth reiterating that this comparison should be considered to be a semi-independent validation, given that two consecutive profiles of the same BGC-Argo float can share a certain degree of correlation in their errors.

The root-mean-square error (RMSE) metric is chosen to quantify the model's capability to reproduce seasonal variability in the main BGC processes at the surface (satellite dataset) or along the vertical column (BGC-Argo dataset), such as phytoplankton surface bloom and dynamics during water column stratification.

Indeed, the RMSE is evaluated during winter (from February to April, FMA) and summer (from June to August, JJA) 2017 and 2018 within 16 subbasins of the Mediterranean Sea (as described in Sect. 2.4 and in Fig. 2) or in an aggregated combination of them. The latter includes six macro-basins: the south-western Mediterranean Sea, (Swm) consisting of swm1 and swm2; the north-western Mediterranean (Nwm), represented solely by the nwm; the Tyrrhenian Sea (Tyr), consisting of tyr1 and tyr2; the Ionian Sea (Ion), consisting of ion1, ion2, and ion3; the Adriatic Sea (Adr), consisting of adr1 and adr2; and the Levantine Sea (Lev), consisting of lev1, lev2, lev3, and lev4.

The winter RMSE concerning the OC chlorophyll in HIND spans between approximately 0.09 and 0.21 mgm−3 with a maximum in the alb region (Fig. 5). The inclusion of multivariate DA (in DAfl) positively impacts the model performance, reducing surface errors by 6.5 %, as mainly observed in the eastern subbasins. A further reduction in the RMSE (up to 10 %) with respect to HIND is then obtained with DAnn, highlighting that enlarging the nitrate float network leads to improvements in the reproduction of surface phytoplankton dynamics. Except for alb and swm1, where no nitrate data (in situ or reconstructed) were available, all of the Mediterranean subbasins exhibit a reduction in the RMSE during winter. In the nwm, the RMSE in the DAfl assimilative set-up is higher than in the HIND run. However, in DAnn (light blue striped bar for nwm in Fig. 3) the enlarged nitrate dataset positively affects the chlorophyll dynamics at surface.

A general slight worsening of the assimilated runs can generally be observed during the summer stratification period, especially in the eastern subbasins. From DAfl to DAnn, the RMSE value slightly increases in all subbasins. These values correspond to an average worsening of about 6 % in DAfl and an average worsening of 7.5 % in DAnn compared with the HIND run. Despite the introduction of a high number of reconstructed nitrate profiles in some subbasins (e.g. orange striped lines for nwm and ion2 in Fig. 3), this inclusion does not positively impact the summer chlorophyll RMSE at the surface. The RMSE values in summer are an order of magnitude lower than in winter, reflecting the seasonal chlorophyll variability in the Mediterranean Sea (i.e. the very low values of chlorophyll at the surface).

Figure 5Seasonal chlorophyll RMSE values of the model runs with respect to satellite OC observations: winter bloom and summer stratification seasons in the Mediterranean Sea subbasins for the HIND run (light blue), the DAfl run (orange), and the DAnn run (green with dots). The black vertical line represents the subdivision of the Mediterranean Sea into the western and eastern sectors.


The RMSE metrics based on BGC-Argo are computed for the six selected aggregated macro-basins and in selected layers (0–10, 10–30, 30–60, 60–100, 100–150, 150–300, and 300–600 m), and they are shown for nitrate (Fig. 6a, b), chlorophyll (Fig. 6c, d), and oxygen (Fig. 6e, f). The statistics computed over the aggregate basin provide more-robust results (e.g. they are computed over a larger number of profiles), even if possible spatial patterns of the errors can be damped. Thus, this choice might limit the analysis on whether/how different nitrate assimilation set-ups affect chlorophyll and oxygen dynamics (see Sect. 3.3).

As expected, the assimilation of in situ BGC-Argo considerably improves the quality of modelled nitrate with respect to the HIND run. During winter, the average RMSE reduction is 40 % in DAfl and increases to 46 % in DAnn, whereas the average reduction reaches 59 % in DAfl and 63 % in DAnn in summer (Fig. 6a, b). The most significant RMSE reduction in the DAnn run compared with DAfl is observed in Nwm and Tyr (0–450 m) during winter and in Ion (0–100 m) in summer. This impact can be directly ascribed to profile availability (Fig. 3), and additional profiles generate more persistent corrections.

As the DAfl and DAnn simulations share the same chlorophyll assimilation set-up, the RMSE improvements in terms of chlorophyll assimilation can be evaluated by comparing the HIND with the DAfl or DAnn simulation (Fig. 6c, d). We observe slight enhancements with respect to simulating chlorophyll in Nwm (0–100 m) and Lev (0–200 m) during winter and in Tyr, Ion, and Lev (50–200 m) during summer (Fig. 6c, d). Even if phytoplankton dynamics depend on nutrients dynamics, the positive impact of DAnn on the nitrate RMSE does not transfer to the vertical chlorophyll statistics in the DAnn.

Assimilating oxygen profiles enables the reduction of the model–BGC float RMSE by about 30 % during winter and summer. In winter, the correction involves the whole water column in the east (Lev and Ion) and deeper layers (150–600 m) in the west (Swm, Nwm) and Adr (Fig. 6e, f). In summer, the impact is mainly observed in Tyr, Ion, and Lev. The integration of NN-reconstructed profiles in the DAnn simulation does not significantly affect oxygen dynamics compared with the DAfl simulation, given that oxygen has already been markedly modified by the O2 assimilation occurring at the same location as NN-reconstructed nitrate profiles.

Figure 6Seasonal nitrate (a, b), chlorophyll (c, d), and oxygen (e, f) profiles of the RMSE of the model runs with respect to BGC-Argo observations for the bloom (a, c, e) and stratification (b, d, f) seasons in the aggregated Mediterranean Sea subbasins for the HIND run (light blue), DAfl run (orange), and DAnn run (green).


3.3 Integration of NN-MLP-MED and DA modules: the impact

3.3.1 Impacts on biogeochemical vertical dynamics

To assess the impact of profile assimilation on changing the vertical gradients of BGC variables, Figs. 7, 8, 9, and 10 show the Hovmöller diagrams of the spatial averages of nitrate, phosphate, chlorophyll, and oxygen for two selected subbasins (first and second columns for nwm and ion2, respectively, with boundaries indicated in the map of Fig. 2) and for the entire Mediterranean Sea (third column). This representation offers additional details on the vertical impact of the reconstructed nitrate profile assimilation with respect to the validation of Fig. 6 that considers only model points corresponding to the location of BGC-Argo profiles. nwm and ion2 represent distinct trophic conditions in the Mediterranean Sea and are also characterized by a high number of assimilated reconstructed nitrate profiles (Fig. 3). The north-western Mediterranean has a higher level of nutrient concentrations and more intense surface blooms in winter (Siokou-Frangou et al.2010; Di Biagio et al.2022). During summer, nwm exhibits a shallow nitracline, a higher chlorophyll concentration at the deep chlorophyll maximum (DCM), and a shallow subsurface oxygen maximum (SOM) (first column in Figs. 7, 8, 9, and 10). Conversely, the eastern subbasin is characterized by a deeper nitracline and DCM as well as more oligotrophic conditions (ion2, second column of Figs. 7, 8, 9, and 10).

Considering nitrate, the multivariate assimilation (DAfl) reduces a general positive bias of the model in all of the Mediterranean areas (blue pattern in Fig. 7). The addition of NN-reconstructed profiles makes the corrections stronger. On average, the nitrate concentration below the nitracline (the depth at which nitrate concentration is 2 mmolm−3) decreases by 8 % and 11 % in the DAfl and DAnn runs, respectively. Both the assimilation runs also exhibit changes in the nitracline depth with more intense deepening in the DAnn simulation. Differences between the assimilation and the HIND run accumulate over time. The rate of this accumulation is highest during the first year and decreases during the second year. These differences remain almost constant in subbasins with a high number of BGC-Argo and NN-reconstructed profiles (e.g. nwm in Fig. 7). On the other hand, considering the ion2 and the whole Mediterranean Sea, which comprises some undersampled areas (e.g. ion1 and ion3), the effect of DA corrections is still propagating after the 2 years (third column of Fig. 7).

Very similar patterns are also observed in the Hovmöller diagrams of phosphate (Fig. 8), which is an updated variable of the multivariate variational assimilation scheme through nitrate–phosphate covariance. In fact, the general negative corrections on phosphate fields are linked to the high positive values of the covariance matrix between nitrate and phosphate (Teruzzi et al.2021).

Considering chlorophyll (Fig. 9), the main difference between DAfl and HIND is a slight reduction in the DCM chlorophyll concentration (e.g. variation smaller than 5 % with respect to HIND simulation) and a correction of the timing of the surface winter blooms (second row in Fig. 9). Even if the chlorophyll validation (Fig. 6) does not show strong differences between DAfl and DAnn, the basin-wide averages of DAnn display more intense corrections with respect to DAfl in terms of the DCM depth and chlorophyll intensity and the overall chlorophyll concentration (Fig. 9). Over the 0–200 m layer of the whole Mediterranean Sea, the chlorophyll decreases with respect to HIND are 4 % and 5 % for DAfl and DAnn, respectively.

Corrections on oxygen dynamics after the multivariate assimilation (DAfl, second row in Fig. 10) are either positive or negative depending on the area and the period of the year. In particular, corrections are mostly positive in ion2, while the nwm subbasin shows negative corrections in the subsurface layer and positive ones in the upper layer in the second year. At a basin-wide scale with respect to the Mediterranean, the average correction is 0.2 % for the 0–200 m layer. The addition of the reconstructed nitrate profiles does not alter the correction pattern, with an average correction of 0.3 %. However, the largest differences between the two assimilation runs can be spotted in areas with a high density of NN-reconstructed profiles during summer (e.g. nwm, first column in Fig. 10). As observed in the nitrate and chlorophyll Hovmöller diagrams, the assimilation of NN-reconstructed profiles causes a decrease in the summer productivity in the DCM layer. Consequently, less oxygen is produced, generating the negative changes in the DCM layer in the bottom left panel of Fig. 10. Because of the smaller amount of subsequent sinking organic matter, less oxygen is consumed in the remineralization processes in layers below the DCM in late summer and autumn, and positive oxygen changes are generated, particularly during 2018.

Figure 7Hovmöller diagram of nitrate for the HIND simulation (a, b, c) and differences between the respective DAfl and DAnn assimilation runs and HIND (d–f and g–i, respectively) for two subbasins (nwm and ion2) and the Mediterranean Sea (Med). The evolution of the depth of the nitracline (the depth at which nitrate concentration is 2 mmolm−3) is also shown for the three runs: HIND (red lines) and DAfl and DAnn (black lines). The averages of the 0–200 m concentration and of the nitracline for the simulated period are reported.


Figure 8Hovmöller diagram of phosphate for the HIND simulation (a, b, c) and differences between the respective DAfl and DAnn assimilation runs and HIND (d–f and g–i, respectively) for two subbasins (nwm and ion2) and the Mediterranean Sea (Med). The evolution of the depth of the phosphocline (the depth at which phosphate concentration is 0.1 mmolm−3) is also shown for the three runs: HIND (red lines) and DAfl and DAnn (black lines). The averages of the 0–200 m concentration and of the phosphocline for the simulated period are reported.


3.3.2 Impact on ecosystem indicator (net primary production)

Net primary production (NPP) integrates phytoplankton growth and respiration processes, which are at the base of the marine trophic food web. The assimilation of chlorophyll and nitrate as well as the updates of phosphate directly and indirectly affect primary production, as they influence both phytoplankton biomass and nutrient availability. Thus, the comparison of primary production among the three simulations reveals how the assimilation impacts a key indicator that integrates several marine ecosystem processes. Seasonal maps of NPP integrated over the 0–200 m layer in the HIND, DAfl, and DAnn simulations (Fig. 11) confirm that the assimilation's impact varies spatially and temporally.

In the DAfl simulation, the most evident differences in primary production compared with the HIND simulation are located in the eastern Mediterranean Sea, with a decrease in NPP of nearly 10 % in the Levantine macro-basin and in the Ionian Sea close to the Greek coast (first and second row of Fig. 11). This reduction is particularly pronounced during winter. In the western Mediterranean, the impacts on primary production are less evident in both seasons, with a slight reduction (5 %) in winter in the Tyrrhenian Sea.

The DAnn simulation shows more pronounced impacts on primary production compared with the DAfl simulation (second and third rows of Fig. 11). The main differences between the DAnn and DAfl runs are highlighted by the black contour line in Fig. 11 (differences larger than 15 mgCm−2d−1). Specifically, during winter, a decrease in NPP is mainly observed in Nwm, Ion, and Tyr, whereas reductions in NPP are observable in Nwm and Ion in summer.

Figure 9Hovmöller diagram of chlorophyll for the HIND simulation (a, b, c) and differences between the respective DAfl and DAnn assimilation runs and HIND (d–f and g–i, respectively) for two subbasins (nwm and ion2) and the Mediterranean Sea (Med). The evolution of the depth of the deep chlorophyll maximum (DCM) is also shown for the three runs: HIND (red lines) and DAfl and DAnn (black lines). The averages of the 0–200 m concentration and of the nitracline for the simulated period are reported.


Figure 10Hovmöller diagram of oxygen for the HIND simulation (a, b, c) and differences between the respective DAfl and DAnn assimilation runs and HIND (d–f and g–i, respectively) for two subbasins (nwm and ion2) and the Mediterranean Sea (Med). The evolution of the depth of the subsurface oxygen maximum (SOM) is also shown for the three runs: HIND (red lines) and DAfl and DAnn (black lines). The averages of the 0–200 m concentration and of the SOM for the simulated period are reported.


As shown in Fig. 3, the lev1 and lev4 basins have a high number of reconstructed nitrate profiles during both the winter and summer seasons. This abundance of NN-reconstructed profiles contributes to an increase in the impact of reproducing the NNP dynamics, which is spatially localized. Conversely, lev2 and lev3, the subbasins dividing lev1 from lev4, contain in situ nitrate but lack reconstructed nitrate profiles. This lack may spatially limit the impacts that the assimilation of reconstructed nitrate profiles could have on NPP throughout the entire Levantine region (Lev).

In general, the impact on primary production is greater in areas where nitrate observations or reconstructed nitrate observations are assimilated (Fig. 3), suggesting a dynamic, bottom-up control on primary production. In fact, the weaker fertilization of the surface layer in DAnn, which occurs for both macronutrients after assimilation (Figs. 7, 8), causes a reduction in NPP.

Figure 11Maps of winter (FMA) and summer (JJA) net primary production (NPP, mgC m−2 d−1) in the three simulations: HIND (a, b), DAfl (c, d), and DAnn (e, f). Seasonal averages were calculated for the period from 2017 to 2018. The black contour lines in panels (e) and (f) encompass areas in which the NPP difference between DAnn and DAfl exceeds 15 mgCm−2d−1.

3.3.3 Impact on the Argo observing system design

Analysing the departure of an assimilated simulation from a reference solution provides insights into the impact of the observing system design, and several data impact indicators can be used (Ford2021; Teruzzi et al.2021; Raicich and Rampazzo2003). In this work, we adopted the impact indicator Iij(t), as described in Teruzzi et al. (2021). This indicator supports the quantification of the vertically integrated response resulting from the assimilation of BGC-Argo profiles compared with the non-assimilation run:

(2) I i j ( t ) = | Sim i j ( t ) - HIND i j ( t ) | 0 - maxdepth ( HIND 0 - maxdepth ) mean .

Here, HIND is the reference, while Sim refers to one of the different DA set-ups (DAfl or DAnn). |Simij(t)-HINDij(t)| is the absolute difference between two simulations (for each day and grid point), while the subscript “maxdepth” indicates the vertically integrated layer of 0–300 and 0–600 m for chlorophyll and nitrate, respectively.

The indicator Iij(t) quantifies the departure of an assimilated run (DAfl or DAnn) from the reference simulation (HIND) for every grid point within the Mediterranean Sea domain over time, while the 95th percentile of the Iij(t) highlights the areas in which the assimilation markedly increases this difference between model runs.

To compare the spatial extent of the 95th percentile of Iij(t) between the two pairs of runs (HIND–DAfl and HIND–DAnn), we choose a threshold value corresponding to the mean value of the HIND–DAfl and HIND–DAnn maps in Figs. 12 and 13 (i.e. 0.1 and 0.4 for nitrate and chlorophyll, respectively) and calculate the areas with values above the threshold.

Figures 12 and 13 show the nitrate and chlorophyll Iij(t) 95th percentile of the seasonal indicator in winter (panels a and c) and in summer (panels b and d) in the DAfl (panels a and b) and DAnn (panels c and d) simulations.

In DAfl, the extent of the nitrate Iij(t) 95th percentile above the threshold of 0.1 is 16.5 % and 18.7 % in winter and in summer, respectively, with a clear spatial distribution mapping the density of BGC-Argo floats. The introduction of NN-reconstructed profiles in DAnn makes it possible to increase the nitrate-impacted areas up to about 35 % and 39 % in winter and summer, respectively. The DAnn impact increase is mainly localized in the western Mediterranean Sea and in Ion, while the less-evident impact in Lev, especially in summer, is mainly due to the low number of NN-reconstructed nitrate profiles in the area.

Chlorophyll impact maps (Fig. 13) show that, besides the direct impact of chlorophyll profile assimilation, phytoplankton is also affected by the reconstructed nitrate assimilation. Compared to the threshold of 0.4, the impacted areas increase from 18.2 % to 29.8 % in winter and from 10.8 % to 14.5 % in summer in the DAfl and DAnn runs. These results suggest that the inclusion of reconstructed nitrate assimilation has the potential to extend its impact across the majority of the 16 subbasins of the Mediterranean Sea. However, the scarcity or absence of available data for assimilation prevents us from observing an impact in the marginal seas (Adr and Aeg), the southern part of the Ionian (ion1), and western subbasins (alb and swm1).

Oxygen impact maps (not shown) are very similar to the nitrate DAnn maps and do not show differences between the two DA simulations, as the same QC oxygen dataset was assimilated in DAfl and DAnn and the oxygen assimilation largely overcomes any other potential model adjustment after nitrate assimilation.

Figure 12Maps of the Iij(t) 95th percentiles for nitrate in winter (a, c) and summer (b, d) in the DAfl (a, b) and DAnn (c, d) runs. White contour lines identify the areas within three correlation radii of the float profiles.

Figure 13Maps of the Iij(t) 95th percentiles for chlorophyll in winter (a, c) and summer (b, d) in the DAfl (a, b) and DAnn (c, d) runs. White contour lines identify the areas within three correlation radii of the float profiles.

4 Discussion

Our quality check procedure (QC O2) for oxygen drift detection and comparison with a reference dataset successfully integrates the official BGC-Argo information (Argo2022), making oxygen BGC-Argo a robust and valuable dataset (Amadio et al.2023) for initial conditions, data assimilation, validation, and reconstruction of new datasets. Even if the distinction between real oxygen depletion signals and optode drift can remain problematic without high-quality in situ data, we believe that the literature and prior knowledge can be used as a baseline for distinguishing drift.

In particular, the oxygen concentration in the mesopelagic layer of the Mediterranean Sea can exhibit basin-scale variability (Mavropoulou et al.2020) as well as local intense multi-year variability (Sisma-Ventura et al.2021). For example, one of the most evident signals was the early-1990s Eastern Mediterranean Transient (EMT) associated with variations in thermohaline circulation. The EMT caused both negative and positive variations (e.g. about 10 mmolm−3 on a decadal timescale) in oxygen levels in the western and eastern Mediterranean Sea (Mavropoulou et al.2020). However, in the last decades, a much smaller inter-annual variability in oxygen in the mesopelagic layer has been observed in both the western and eastern basins (Coppola et al.2018; Mavropoulou et al.2020). Therefore, the threshold of 1 mmolm−3 yr−1 at 600 and 800 m appears to be a prudent limit for the discrimination of sensor drift from real long-term signals for our specific application.

To date, visual checks by oceanographers have been necessary to distinguish ocean signals from sensor drift (Wang et al.2020), and the ongoing debate regarding the replacement of visual checks by automatic statistical procedures is still open. Consequently, our work seeks to contribute to this topic by proposing a new tool designed to automatically handle deep-ocean signal or optode drift issues. This method can be further developed by applying oxygen drift analysis at fixed isopycnals, in conjunction with analysis at constant isobaths. This approach might allow us to filter out potential oxygen concentration changes caused by floats moving across different water masses.

The assimilation of vertical profiles provides complementary information to satellite OC assimilation (Verdy and Mazloff2017; Cossarini et al.2019), which remains the most commonly used method in operational systems (Fennel et al.2019). In fact, the effectiveness of the profile assimilation process, which has the capability to constrain vertical BGC dynamics in subsurface layers (Kaufman et al.2018; Teruzzi et al.2021; Ford2021; Skakala et al.2021; Wang et al.2022), depends on the availability of BGC-Argo data, which are generally insufficient to constrain a basin-wide simulation. Previous findings (Teruzzi et al.2021) have primarily demonstrated the efficiency of OC assimilation in constraining chlorophyll dynamics, especially during winter, and the advantages of assimilating BGC-Argo profiles in summer. Our work highlights the larger and more extensive benefits of profile assimilation during summer due to the incorporation of reconstructed nitrate profiles.

Through the integration of NN and DA, the count of nitrate profiles ingested can potentially be as high as that from BGC-Argo equipped with an oxygen sensor (i.e. more than double the nitrate profiles), which corresponds to a density of one profile in each 2.5°×2.5° box every 10 d for the 2017–2018 period. This means that seasonal subbasin-scale dynamics (e.g. bloom or stratification) can effectively be constrained, whereas the mesoscale dynamics can be only locally constrained (D'Ortenzio et al.2021).

Apart from an increase in the number of floats, a further increase in the area impacted by float assimilation can be optimized by redefining horizontal covariance errors in the DA scheme. Indeed, benefits of a non-uniform correlation radius on the horizontal scale have previously been investigated (Cossarini et al.2019), and additional improvements could be provided by a 3D varying correlation radius (Storto et al.2014).

Looking at the recent evolution in the availability of BGC-Argo sensors (Fig. 14), our combined NN and DA approach would allow us to maintain the benefits of the BGC-Argo observing system in the Mediterranean operational system. Even if nitrate and chlorophyll profiles have dramatically decreased after 2020, the assimilation of NN-reconstructed profiles can potentially overcome this lack. Nevertheless, as shown in our OSE (Figs. 12 and 13), there are still areas that are undersampled by the Argo and oxygen sensors, such as the Alboran and southern Ionian seas and the marginal seas (northern Adriatic and northern Aegean Sea), which would require specific deployments.

Figure 14The monthly availability of BGC-Argo profiles (number of profiles per month) from 2013 to 2022 for nitrate (green), chlorophyll (grey), and oxygen (yellow).


With respect to previous BGC observing system simulation experiments (Yu et al.2018; Ford2021), we show how to exploit the current Argo and BGC-Argo networks to reconstruct BGC variables.

MLP feed-forward methods to reconstruct BGC variables are good enough (Bittig et al.2018b; Fourrier et al.2021; Pietropolli et al.2023; Sauzède et al.2020) for our purposes, even if their application to generate smooth and consistent profiles still has some limitations (Pietropolli et al.2023). The MLP-NN-MED method exhibits a validation error of 0.50 mmolm−3 for nitrate when used to predict nitrate from the EMODnet dataset, whereas this value is 0.87 mmolm−3 when used to predict nitrate from BGC-Argo data (Pietropolli et al.2023). These uncertainties related to the reconstructed nitrate dataset are higher then that used in our study (0.24 mmolm−3) for both the BGC-Argo and reconstructed profiles.

Thus, while it is reasonable to assign a higher observation error to NN-reconstructed nitrate, applying the same error to both in situ and NN-reconstructed datasets has resulted in a potential overestimation of the assimilation impact that can be achieved. On the other hand, using a possibly underestimated error could unbalance the assimilation results toward observation overfitting, and we recognize the potential benefits of using different error values for BGC-Argo and reconstructed profiles. Overfitting effects on observations may similarly stem from our choice of not explicitly including the nitrate representation error. However, our nitrate error definition is an evolution of the approach used in Teruzzi et al. (2021), who demonstrated a well-established balance between assimilation impacts and overfitting towards the observations.

The larger error in the MLP-NN-MED prediction of BGC-Argo profiles stems from the fact that the MLP methods, which are trained on individual data points and produce pointwise outputs, are unaware of the vertical gradient (e.g. typical shape) of the profiles of the BGC variables that they seek to infer. This fact can lead to irregularities and a lack of smoothness in the predicted profiles (Pietropolli et al.2023), which we partly solved by adding a smoothing operator. However, one way to increase the reliability of profile reconstruction would be to include information with a physical meaning from observed data (Buizza et al.2022). One-dimensional convolutional neural networks represent a viable alternative approach, considering their ability to treat the coherence of the 1D signals (e.g. typical shapes of profiles) as shown in Li et al. (2021).

The integration of NN and DA has been tested in several geoscience applications (Buizza et al.2022; Brajard et al.2021; Stanev et al.2022) to infer unresolved spatial scales or reproduce missing data. In our application, the integration of NN, which retrieves a large number of profiles (Pietropolli et al.2023), and DA, which can apply the correction to all nutrients through error covariances (Teruzzi et al.2021), allows spatial and multivariate changes to be captured at both the local and basin scale to constrain Mediterranean productivity (Fig. 11). Although the corrections take time to extend to the entire basin (Fig. 7), our simulations have shown that constraining bottom-up ecosystem processes (e.g. productivity and the organic matter sink) has proven effective and might be used in conjunction with the classical OC correction to phytoplankton biomass.

Any plan to learn directly from observations will be faced with some challenges, such as the use of observations with uneven spatiotemporal coverage or issues related to specific processes (Geer2021). The modular approach followed in this work represents a successful example of exploiting the strengths of NNs and DA to enhance the observing system impact in the operational BGC system of the Mediterranean Sea.

5 Conclusions

Combining a deterministic feed-forward neural network and data assimilation to design an observing system experiment has enabled us to demonstrate the enhanced positive impact of profile assimilation in the Copernicus Mediterranean operational forecast system (MedBFM).

The development of the oxygen QC procedure allowed us to statistically deal with optode in situ drift and to derive accurate reconstructed profiles of nitrate, thereby keeping the number of assimilated observations at a much higher level despite the current negative trend in BGC-Argo availability.

The achieved density of BGC profiles provides valuable and additional information to complement ocean colour in the description of seasonal phytoplankton blooms and stratification dynamics at the subbasin scale.

The assimilation of BGC-Argo nitrate corrects a general positive bias of the model in several Mediterranean areas, and the addition of reconstructed profiles makes the correction stronger.

Along with nitrate assimilation, the phosphate update through error covariances sustains spatial and multivariate changes that are capable of correcting key BGC processes (e.g. nitracline and deep chlorophyll maximum) and constraining ecosystem processes (e.g. productivity) at a basin-wide scale.

Code availability

The MedBFM model system comprises three open and accessible codes: the transport model (OGSTM,, Bolzon et al.2023a), the Biogeochemical Flux Model (BFM,, Lazzari et al.2023), and the data assimilation model (3DVarBio,, Teruzzi et al.2023). The Python “Bit.Sea” package use to quality check the BGC-Argo dataset is accessible from (Bolzon et al.2023a).

Data availability

The original BGC-Argo dataset was downloaded from the Coriolis GDAC in August 2022 (, Argo2022). The quality-checked BGC-Argo dataset used for assimilation and validation after Bit.Sea processing is accessible from Zenodo: (Amadio et al.2023).

Author contributions

CA, AT, and GianpC conceived of the study. CA and AT updated the 3DVarBio code. GP and LM developed the MLP-NN-MED model. CA and GianlC performed the simulations. CA, AT, and GianpC conducted the analysis of the simulation results. CA, AT, GP, and GianpC wrote the manuscript. All authors approved the manuscript and agreed to its submission.

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Special issue statement

This article is part of the special issue “Special Issue for the 54th International Liège Colloquium on Machine Learning and Data Analysis in Oceanography”. It is a result of the 54th International Liège Colloquium on Ocean Dynamics Machine Learning and Data Analysis in Oceanography, Liège, Belgium, 8–12 May 2023.


The authors are grateful to Giorgio Bolzon (OGS) for technical support with the model implementation and to Giorgio Dall'Olmo for a fruitful discussion concerning the BGC-Argo data management. The authors also wish to thank Julien Brajard (the editor of the paper) and the anonymous reviewers for their very useful comments.

Financial support

This research has been partly supported by the MED-MFC (Mediterranean – Monitoring Forecasting Centre) of the Copernicus Marine Service, which is implemented by Mercator Ocean International within the framework of a delegation agreement with the European Union (reference no. 21002L5-COP-MFC MED-5500).

Review statement

This paper was edited by Julien Brajard and reviewed by two anonymous referees.


Amadio, C., TERUZZI, A., Feudale, L., BOLZON, G., DI BIAGIO, V., Lazzari, P., Álvarez, E., Coidessa, G., Salon, S., and COSSARINI, G.:. Mediterranean Quality checked BGC-Argo 2013–2022 dataset, Zenodo [data set],, 2023. a, b, c

Argo: Argo float data and metadata from Global Data Assembly Centre (Argo GDAC), SEANOE [data set],, 2022. a, b, c, d, e

Barbieux, M., Uitz, J., Gentili, B., Pasqueron de Fommervault, O., Mignot, A., Poteau, A., Schmechtig, C., Taillandier, V., Leymarie, E., Penkerc'h, C., D'Ortenzio, F., Claustre, H., and Bricaud, A.: Bio-optical characterization of subsurface chlorophyll maxima in the Mediterranean Sea from a Biogeochemical-Argo float database, Biogeosciences, 16, 1321–1342,, 2019. a

Barbieux, M., Uitz, J., Mignot, A., Roesler, C., Claustre, H., Gentili, B., Taillandier, V., D'Ortenzio, F., Loisel, H., Poteau, A., Leymarie, E., Penkerc'h, C., Schmechtig, C., and Bricaud, A.: Biological production in two contrasted regions of the Mediterranean Sea during the oligotrophic period: an estimate based on the diel cycle of optical properties measured by BioGeoChemical-Argo profiling floats, Biogeosciences, 19, 1165–1194,, 2022. a

Bethoux, J., Gentili, B., Morin, P., Nicolas, E., Pierre, C., and Ruiz-Pino, D.: The Mediterranean Sea: a miniature ocean for climatic and environmental studies and a key for the climatic functioning of the North Atlantic, Prog. Oceanogr., 44, 131–146, 1999. a

Bittig, H. C., Körtzinger, A., Neill, C., Van Ooijen, E., Plant, J. N., Hahn, J., Johnson, K. S., Yang, B., and Emerson, S. R.: Oxygen optode sensors: principle, characterization, calibration, and application in the ocean, Front. Mar. Sci., 4, 429,, 2018a. a, b, c, d

Bittig, H. C., Steinhoff, T., Claustre, H., Fiedler, B., Williams, N. L., Sauzède, R., Körtzinger, A., and Gattuso, J.-P.: An alternative to static climatologies: Robust estimation of open ocean CO2 variables and nutrient concentrations from T, S, and O2 data using Bayesian neural networks, Front. Mar. Sci., 5, 328,, 2018b. a, b, c

Bolzon, G., Lazzari, P., Salon, S., Teruzzi, A., Coidessa, G., and Cossarini, G.: ogstm (4.1), Zenodo [code],, 2023a. a, b

Bolzon, G., Teruzzi, A., Salon, S., Di Biagio, V., Feudale, L., Amadio, C., Coidessa, G., and Cossarini, G.: bit.sea (1.7), Zenodo [code],, 2023b. 

Brajard, J., Carrassi, A., Bocquet, M., and Bertino, L.: Combining data assimilation and machine learning to infer unresolved scale parametrization, Philos. T. R. Soc. A, 379, 20200086,, 2021. a

Buga, L., Sarbu, G., Fryberg, L., Magnus, W., Wesslander, K., Gatti, J., Leroy, D., Iona, S., Larsen, M., Koefoed Rømer, J., Østrem, A. K., Lipizer, M., and Giorgietti, A.: EMODnet Chemistry Eutrophication and Acidity aggregated datasets v2018, EMODnet Thematic Lot no. 4/SI2.749773,, 2018. a, b

Buizza, C., Casas, C. Q., Nadler, P., Mack, J., Marrone, S., Titus, Z., Le Cornec, C., Heylen, E., Dur, T., Ruiz, L. B., Heaney, C., Díaz Lopez, J. A., Kumar, K. S. S., and Arcucci, R.: Data learning: integrating data assimilation and machine learning, J. Comput. Sci., 58, 101525,, 2022. a, b, c, d, e

Bushinsky, S. M., Emerson, S. R., Riser, S. C., and Swift, D. D.: Accurate oxygen measurements on modified A rgo floats using in situ air calibrations, Limnol. Oceanogr.-Method., 14, 491–505, 2016. a, b

Canu, D. M., Ghermandi, A., Nunes, P. A., Lazzari, P., Cossarini, G., and Solidoro, C.: Estimating the value of carbon sequestration ecosystem services in the Mediterranean Sea: An ecological economics approach, Glob. Environ. Change, 32, 87–95, 2015. a

Capet, A., Stanev, E. V., Beckers, J.-M., Murray, J. W., and Grégoire, M.: Decline of the Black Sea oxygen inventory, Biogeosciences, 13, 1287–1297,, 2016. a

Clementi, E., Oddo, P., Drudi, M., Pinardi, N., Korres, G., and Grandi, A.: Coupling hydrodynamic and wave models: first step and sensitivity experiments in the Mediterranean Sea, Ocean Dynam., 67, 1293–1312, 2017. a

Coppini, G., Clementi, E., Cossarini, G., Salon, S., Korres, G., Ravdas, M., Lecci, R., Pistoia, J., Goglio, A. C., Drudi, M., Grandi, A., Aydogdu, A., Escudier, R., Cipollone, A., Lyubartsev, V., Mariani, A., Cretì, S., Palermo, F., Scuro, M., Masina, S., Pinardi, N., Navarra, A., Delrosso, D., Teruzzi, A., Di Biagio, V., Bolzon, G., Feudale, L., Coidessa, G., Amadio, C., Brosich, A., Miró, A., Alvarez, E., Lazzari, P., Solidoro, C., Oikonomou, C., and Zacharioudaki, A.: The Mediterranean forecasting system. Part I: evolution and performance, EGUsphere [preprint],, 2023. a

Coppola, L., Legendre, L., Lefevre, D., Prieur, L., Taillandier, V., and Riquier, E. D.: Seasonal and inter-annual variations of dissolved oxygen in the northwestern Mediterranean Sea (DYFAMED site), Prog. Oceanogr., 162, 187–201, 2018. a

Cossarini, G., Lazzari, P., and Solidoro, C.: Spatiotemporal variability of alkalinity in the Mediterranean Sea, Biogeosciences, 12, 1647–1658,, 2015a. a

Cossarini, G., Querin, S., and Solidoro, C.: The continental shelf carbon pump in the northern Adriatic Sea (Mediterranean Sea): Influence of wintertime variability, Ecol. Model., 314, 118–134, 2015b. a, b

Cossarini, G., Mariotti, L., Feudale, L., Mignot, A., Salon, S., Taillandier, V., Teruzzi, A., and d'Ortenzio, F.: Towards operational 3D-Var assimilation of chlorophyll Biogeochemical-Argo float data into a biogeochemical model of the Mediterranean Sea, Ocean Model., 133, 112–128, 2019. a, b, c, d, e, f, g

Cossarini, G., Feudale, L., Teruzzi, A., Bolzon, G., Coidessa, G., Solidoro, C., Di Biagio, V., Amadio, C., Lazzari, P., Brosich, A., Cossarini, G., Di Biagio, V., Lazzari, P., and Coidessa, G.: High-Resolution Reanalysis of the Mediterranean Sea Biogeochemistry (1999–2019), Front. Mar. Sci., 8, 741486,, 2021. a

Dall'Olmo, G. and Mork, K. A.: Carbon export by small particles in the Norwegian Sea, Geophys. Res. Lett., 41, 2921–2927, 2014. a

Dang, X., Peng, H., Wang, X., and Zhang, H.: The Theil-Sen Estimators in a Multiple Linear Regression Model, (last access: 17 July 2023), 2008. a

Di Biagio, V., Salon, S., Feudale, L., and Cossarini, G.: Subsurface oxygen maximum in oligotrophic marine ecosystems: mapping the interaction between physical and biogeochemical processes, Biogeosciences, 19, 5553–5574,, 2022. a, b

Dobricic, S., Pinardi, N., Adani, M., Tonani, M., Fratianni, C., Bonazzi, A., and Fernandez, V.: Daily oceanographic analyses by Mediterranean Forecasting System at the basin scale, Ocean Sci., 3, 149–157,, 2007. a

D’Ortenzio, F., Taillandier, V., Claustre, H., Prieur, L. M., Leymarie, E., Mignot, A., Poteau, A., Penkerc’h, C., and Schmechtig, C. M.: Biogeochemical Argo: The test case of the NAOS Mediterranean array, Front. Mar. Sci., 7, 120,, 2020. a

D'Ortenzio, F., Taillandier, V., Claustre, H., Coppola, L., Conan, P., Dumas, F., Durrieu du Madron, X., Fourrier, M., Gogou, A., Karageorgis, A., Lefevre, D., Leymarie, E., Oviedo, A, Pavlidou, A., Poteau, A., Poulain, P. M., Prieur, L., Psarra, S., Puyo-Pay, M., Ribera d'Alcalà, M., Schmechtig, C., Terrats, L., Velaoras, D., Wagener, T., and Wimart-Rousseau, C.: BGC-Argo floats observe nitrate injection and spring phytoplankton increase in the surface layer of Levantine Sea (Eastern Mediterranean), Geophys. Res. Lett., 48, e2020GL091649,, 2021. a, b

Escudier, R., Clementi, E., Cipollone, A., Pistoia, J., Drudi, M., Grandi, A., Lyubartsev, V., Lecci, R., Aydogdu, A., Delrosso, D., Omar, M., Masina, S., Coppini, G., and Pinardi, N.: A High Resolution Reanalysis for the Mediterranean Sea, Front. Earth Sci., 9, 702285,, 2021. a

Fennel, K., Gehlen, M., Brasseur, P., Brown, C. M., Ciavatta, S., Cossarini, G., Crise, A., Edwards, C. A., Ford, D., Friedrichs, M. A. M., Gregoire, M., Jones, E., Kim, H., Lamouroux, J., Murtugudde, R., Perrucheet, C., and the GODAE OceanView Marine Ecosystem Analysis and Prediction Task Team: Advancing marine biogeochemical and ecosystem reanalyses and forecasts as tools for monitoring and managing ecosystem health, Front. Mar. Sci., 6, 89,, 2019. a

Feudale, L., Bolzon, G., Lazzari, P., Salon, S., Teruzzi, A., Di Biagio, V., Coidessa, G., Alvarez Suarez, E., Amadio, C., and Cossarini, G.: Mediterranean Sea Biogeochemical Analysis and Forecast, Copernicus Marine Service MED-Biogeochemistry, MedBFM4 system [data set], (last access: 17 July 2023), 2022. a

Fischler, M. A. and Bolles, R. C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM, 24, 381–395, 1981. a

Ford, D.: Assimilating synthetic Biogeochemical-Argo and ocean colour observations into a global ocean model to inform observing system design, Biogeosciences, 18, 509–534,, 2021. a, b, c

Foujols, M.-A., Lévy, M., Aumont, O., and Madec, G.: OPA 8.1 Tracer model reference manual, France, Institut Pierre-Simon Laplace (IPSL), 45 pp., (last access: 17 July 2023), 2000. a, b

Fourrier, M., Coppola, L., Claustre, H., D’Ortenzio, F., Sauzède, R., and Gattuso, J.-P.: A regional neural network approach to estimate water-column nutrient concentrations and carbonate system variables in the Mediterranean Sea: CANYON-MED, Front. Mar. Sci., 7, 620,, 2020. a, b, c

Fourrier, M., Coppola, L., Claustre, H., D'Ortenzio, F., Sauzède, R., and Gattuso, J.-P.: Corrigendum: A regional neural network approach to estimate water-column nutrient concentrations and carbonate system variables in the Mediterranean Sea: CANYON-MED, Front. Mar. Sci., 8, 650509,, 2021. a

Garcia, H. E., Boyer, T. P., Baranova, O. K., Locarnini, R. A., Mishonov, A. V., Grodsky, A., Paver, C. R., Weathers, K. W., Smolyar, I. V., Reagan, J. R., Seidov, D., and Zweng, M. M.: World Ocean Atlas 2018: Product Documentation, NOAA, Technical Editor, Mishonov, A., retrieved from: (last access: 17 July 2023), 2019. a, b

Gasparin, F., Guinehut, S., Mao, C., Mirouze, I., Rémy, E., King, R. R., Hamon, M., Reid, R., Storto, A., Le Traon, P.-Y., Martin, M. J., and Masina, S.: Requirements for an Integrated in situ Atlantic Ocean Observing System From Coordinated Observing System Simulation Experiments, Front. Mar. Sci., 6, 83,, 2019. a

Geer, A.: Learning earth system models from observations: machine learning or data assimilation?, Philos. T. R. Soc. A, 379, 20200089,, 2021. a, b

Hollingsworth, A., Shaw, D., Lönnberg, P., Illari, L., Arpe, K., and Simmons, A.: Monitoring of observation and analysis quality by a data assimilation system, Mon. Weather Rev., 114, 861–879, 1986. a

Hornik, K., Stinchcombe, M., and White, H.: Multilayer feedforward networks are universal approximators, Neural Networks, 2, 359–366, 1989. a

Johnson, K. and Claustre, H.: Bringing biogeochemistry into the Argo age, Eos, Trans. Am. Geophys. Union,, 2016. a, b

Johnson, K., Maurer, T., Plant, J., Bittig, H., Schallenberg, C., and Schmechtig, C.: BGC-Argo quality control manual for nitrate concentration,, 2021. a

Johnson, K. S., Coletti, L. J., Jannasch, H. W., Sakamoto, C. M., Swift, D. D., and Riser, S. C.: Long-term nitrate measurements in the ocean using the In Situ Ultraviolet Spectrophotometer: sensor integration into the Apex profiling float, J. Atmos. Ocean. Technol., 30, 1854–1866, 2013. a

Johnson, K. S., Plant, J. N., Riser, S. C., and Gilbert, D.: Air oxygen calibration of oxygen optodes on a profiling float array, J. Atmos. Ocean. Technol., 32, 2160–2172, 2015. a, b

Kaufman, D. E., Friedrichs, M. A. M., Hemmings, J. C. P., and Smith Jr., W. O.: Assimilating bio-optical glider data during a phytoplankton bloom in the southern Ross Sea, Biogeosciences, 15, 73–90,, 2018. a

Kumar, S. V., Peters-Lidard, C. D., Santanello, J. A., Reichle, R. H., Draper, C. S., Koster, R. D., Nearing, G., and Jasinski, M. F.: Evaluating the utility of satellite soil moisture retrievals over irrigated areas and the ability of land data assimilation methods to correct for unmodeled processes, Hydrol. Earth Syst. Sci., 19, 4463–4478,, 2015. a

Lary, D. J., Zewdie, G. K., Liu, X., et al.: Machine learning applications for earth observation, Earth observation open science and innovation, Springer, Cham, 165–218,, 2018. a, b

Lazzari, P., Solidoro, C., Ibello, V., Salon, S., Teruzzi, A., Béranger, K., Colella, S., and Crise, A.: Seasonal and inter-annual variability of plankton chlorophyll and primary production in the Mediterranean Sea: a modelling approach, Biogeosciences, 9, 217–233,, 2012. a, b

Lazzari, P., Solidoro, C., Salon, S., and Bolzon, G.: Spatial variability of phosphate and nitrate in the Mediterranean Sea: A modeling approach, Deep-Sea Res. Pt. I, 108, 39–52, 2016. a, b, c, d

Lazzari, P., Bolzon, G., Salon, S., Teruzzi, A., Di Biagio, V., Amadio, C., Alvarez, E., and Cossarini, G.: BFM (5.0), Zenodo [code],, 2023. a

Le Traon, P. Y.: From satellite altimetry to Argo and operational oceanography: three revolutions in oceanography, Ocean Sci., 9, 901–915,, 2013. a

Le Traon, P. Y., Reppucci, A., Alvarez Fanjul, E., et al.: From observation to information and users: The Copernicus Marine Service perspective, Front. Mar. Sci., 6, 234,, 2019. a

Li, Z., Liu, F., Yang, W., Peng, S., and Zhou, J.: A survey of convolutional neural networks: analysis, applications, and prospects, IEEE transactions on neural networks and learning systems, Vol. 33, 6999–7019,, 2021. a

Li, Z. Q., Liu, Z. H., and Lu, S. L.: Global Argo data fast receiving and post-quality-control system. InIOP Conference Series: Earth and Environmental Science 2020 May 1, Vol. 502, p. 012012, IOP Publishing,, 2020. a

Marañón, E., Van Wambeke, F., Uitz, J., Boss, E. S., Dimier, C., Dinasquet, J., Engel, A., Haëntjens, N., Pérez-Lorenzo, M., Taillandier, V., and Zäncker, B.: Deep maxima of phytoplankton biomass, primary production and bacterial production in the Mediterranean Sea, Biogeosciences, 18, 1749–1767,, 2021. a

Martinez, E., Brini, A., Gorgues, T., Drumetz, L., Roussillon, J., Tandeo, P., Maze, G., and Fablet, R.: Neural network approaches to reconstruct phytoplankton time-series in the global ocean, Remote Sens., 12, 4156, 2020a. a

Martinez, E., Gorgues, T., Lengaigne, M., Fontana, C., Sauzède, R., Menkes, C., Uitz, J., Di Lorenzo, E., and Fablet, R.: Reconstructing global chlorophyll-a variations using a non-linear statistical approach, Front. Mar. Sci., 7, 464,, 2020b. a

Maurer, T. L., Plant, J. N., and Johnson, K. S.: Delayed-mode quality control of oxygen, nitrate, and pH data on SOCCOM biogeochemical profiling floats, Front. Mar. Sci., 8, 683207,, 2021. a, b, c, d

Mavropoulou, A.-M., Vervatis, V., and Sofianos, S.: Dissolved oxygen variability in the Mediterranean Sea, J. Mar. Syst., 208, 103348,, 2020. a, b, c

Mignot, A., Claustre, H., Uitz, J., Poteau, A., d'Ortenzio, F., and Xing, X.: Understanding the seasonal dynamics of phytoplankton biomass and the deep chlorophyll maximum in oligotrophic environments: A Bio-Argo float investigation, Global Biogeochem. Cy., 28, 856–876, 2014. a

Mignot, A., D'Ortenzio, F., Taillandier, V., Cossarini, G., and Salon, S.: Quantifying observational errors in Biogeochemical-Argo oxygen, nitrate, and chlorophyll a concentrations, Geophys. Res. Lett., 46, 4330–4337, 2019. a

Miloslavich, P., Seeyave, S., Muller-Karger, F., Bax, N., Ali, E., Delgado, C., Evers-King, H., Loveday, B., Lutz, V., Newton, J., Nolan G., Peralta Brichtova AC., Traeger-Chatterjee C., and Urban, E.: Challenges for global ocean observation: the need for increased human capacity, J. Operat. Oceanogr., 12, S137–S156,, 2019. a

Oddo, P., Adani, M., Pinardi, N., Fratianni, C., Tonani, M., and Pettenuzzo, D.: A nested Atlantic-Mediterranean Sea general circulation model for operational forecasting, Ocean Sci., 5, 461–473,, 2009. a

Pietropolli, G., Manzoni, L., and Cossarini, G.: Multivariate Relationship in Big Data Collection of Ocean Observing System, Appl. Sci., 13, 5634,, 2023. a, b, c, d, e, f, g, h, i, j, k

Pinardi, N., Zavatarelli, M., Adani, M., Coppini, G., Fratianni, C., Oddo, P., Simoncelli, S., Tonani, M., Lyubartsev, V., Dobricic, S., and Bonaduce A.: Mediterranean Sea large-scale low-frequency ocean variability and water mass formation rates from 1987 to 2007: A retrospective analysis, Prog. Oceanogr., 132, 318–332, 2015. a, b

Raicich, F. and Rampazzo, A.: Observing System Simulation Experiments for the assessment of temperature sampling strategies in the Mediterranean Sea, Ann. Geophys., 21, 151–165,, 2003. a

Ricour, F., Capet, A., D'Ortenzio, F., Delille, B., and Grégoire, M.: Dynamics of the deep chlorophyll maximum in the Black Sea as depicted by BGC-Argo floats, Biogeosciences, 18, 755–774,, 2021. a

Roussillon, J., Fablet, R., Gorgues, T., Drumetz, L., Littaye, J., and Martinez, E.: A Multi-Mode Convolutional Neural Network to reconstruct satellite-derived chlorophyll-a time series in the global ocean from physical drivers, Front. Mar. Sci., 10, 1077623,, 2023. a

Sakov, P. and Sandery, P.: An adaptive quality control procedure for data assimilation, Tellus A, 69, 1318031,, 2017. a

Salon, S., Cossarini, G., Bolzon, G., Feudale, L., Lazzari, P., Teruzzi, A., Solidoro, C., and Crise, A.: Novel metrics based on Biogeochemical Argo data to improve the model uncertainty evaluation of the CMEMS Mediterranean marine ecosystem forecasts, Ocean Sci., 15, 997–1022,, 2019. a, b, c, d, e, f, g

Sauzède, R., Claustre, H., Uitz, J., Jamet, C., Dall'Olmo, G., d'Ortenzio, F., Gentili, B., Poteau, A., and Schmechtig, C.: A neural network-based method for merging ocean color and Argo data to extend surface bio-optical properties to depth: Retrieval of the particulate backscattering coefficient, J. Geophys. Res.-Ocean., 121, 2552–2571, 2016. a

Sauzède, R., Johnson, J. E., Claustre, H., Camps-Valls, G., and Ruescas, A. B.: Estimation of oceanic particulate organic carbon with machine learning. ISPRS Annals of the Photogrammetry, Remote Sens. Spat. Inf. Sci., 2, 949–956, 2020. a

Sauzède, R., Bittig, H. C., Claustre, H., Pasqueron de Fommervault, O., Gattuso, J.-P., Legendre, L., and Johnson, K. S.: Estimates of water-column nutrient concentrations and carbonate system parameters in the global ocean: a novel approach based on neural networks, Front. Mar. Sci., 4, 128,, 2017. a, b, c

Siokou-Frangou, I., Christaki, U., Mazzocchi, M. G., Montresor, M., Ribera d'Alcalá, M., Vaqué, D., and Zingone, A.: Plankton in the open Mediterranean Sea: a review, Biogeosciences, 7, 1543–1586,, 2010. a, b

Sisma-Ventura, G., Kress, N., Silverman, J., Gertner, Y., Ozer, T., Biton, E., Lazar, A., Gertman, I., Rahav, E., and Herut, B.: Post-eastern mediterranean transient oxygen decline in the deep waters of the southeast mediterranean sea supports weakening of ventilation rates, Front. Mar. Sci., 7, 598686,, 2021. a

Skakala, J., Ford, D., Bruggeman, J., Hull, T., Kaiser, J., King, R. R., Loveday, B., Palmer, M. R., Smyth, T., Williams, C. A., and Ciavatta, S.: Towards a multi-platform assimilative system for North Sea biogeochemistry, J. Geophys. Res.-Ocean., 126, e2020JC016649,, 2021. a, b

Stanev, E. V., Wahle, K., and Staneva, J.: The synergy of data from profiling floats, machine learning and numerical modeling: Case of the Black Sea euphotic zone, J. Geophys. Res.-Ocean., 127, e2021JC018012,, 2022. a, b

Storto, A., Dobricic, S., Masina, S., and Di Pietro, P.: Assimilating along-track altimetric observations through local hydrostatic adjustment in a global ocean variational assimilation system, Mon. Weather Rev., 139, 738–754, 2011. a

Storto, A., Masina, S., and Dobricic, S.: Estimation and impact of nonuniform horizontal correlation length scales for global ocean physical analyses, J. Atmos. Ocean. Technol., 31, 2330–2349, 2014. a

Storto, A., De Magistris, G., Falchetti, S., and Oddo, P.: A neural network–based observation operator for coupled ocean–acoustic variational data assimilation, Mon. Weather Rev., 149, 1967–1985, 2021. a

Taillandier, V., Prieur, L., D'Ortenzio, F., Ribera d'Alcalà, M., and Pulido-Villena, E.: Profiling float observation of thermohaline staircases in the western Mediterranean Sea and impact on nutrient fluxes, Biogeosciences, 17, 3343–3366,, 2020. a

Takeshita, Y., Martz, T. R., Johnson, K. S., Plant, J. N., Gilbert, D., Riser, S. C., Neill, C., and Tilbrook, B.: A climatology-based quality control procedure for profiling float oxygen data, J. Geophys. Res.-Ocean., 118, 5640–5650, 2013. a, b

Teruzzi, A., Dobricic, S., Solidoro, C., and Cossarini, G.: A 3-D variational assimilation scheme in coupled transport-biogeochemical models: Forecast of Mediterranean biogeochemical properties, J. Geophys. Res.-Ocean., 119, 200–217, 2014. a, b, c, d, e

Teruzzi, A., Bolzon, G., Salon, S., Lazzari, P., Solidoro, C., and Cossarini, G.: Assimilation of coastal and open sea biogeochemical data to improve phytoplankton simulation in the Mediterranean Sea, Ocean Model., 132, 46–60, 2018. a, b, c, d

Teruzzi, A., Bolzon, G., Feudale, L., and Cossarini, G.: Deep chlorophyll maximum and nutricline in the Mediterranean Sea: emerging properties from a multi-platform assimilated biogeochemical model experiment, Biogeosciences, 18, 6147–6166,, 2021. a, b, c, d, e, f, g, h, i, j, k, l, m, n, o

Teruzzi, A., Bolzon, G., and Cossarini, G.: 3DVarBio (3.3), Zenodo [code],, 2023. a

Terzić, E., Lazzari, P., Organelli, E., Solidoro, C., Salon, S., D'Ortenzio, F., and Conan, P.: Merging bio-optical data from Biogeochemical-Argo floats and models in marine biogeochemistry, Biogeosciences, 16, 2527–2542,, 2019. a

Thierry, V. and Bittig, H.: Argo quality control manual for dissolved oxygen concentration, Report (qualification paper (procedure, accreditation support)), FRANCE,, 2021. a, b

Verdy, A. and Mazloff, M. R.: A data assimilating model for estimating S outhern O cean biogeochemistry, J. Geophys. Res.-Ocean., 122, 6968–6988, 2017. a

Vichi, M., Pinardi, N., and Masina, S.: A generalized model of pelagic biogeochemistry for the global ocean ecosystem, Part I: Theory, J. Mar. Syst., 64, 89–109, 2007a. a, b

Vichi, M., Pinardi, N., and Navarra, A.: A generalized model of pelagic biogeochemistry for the global ocean ecosystem, Part II: numerical simulations, J. Mar. Syst., 64, 110–134, 2007b.  a, b

Waller, J. A., García-Pintado, J., Mason, D. C., Dance, S. L., and Nichols, N. K.: Technical note: Assessment of observation quality for data assimilation in flood models, Hydrol. Earth Syst. Sci., 22, 3983–3992,, 2018. a

Wang, B. and Fennel, K.: An assessment of vertical carbon flux parameterizations using backscatter data from BGC Argo, Geophys. Res. Lett., 50, e2022GL101220,, 2023. a, b

Wang, B., Fennel, K., Yu, L., and Gordon, C.: Assessing the value of biogeochemical Argo profiles versus ocean color observations for biogeochemical model optimization in the Gulf of Mexico, Biogeosciences, 17, 4059–4074,, 2020. a, b, c

Wang, B., Fennel, K., and Yu, L.: Can assimilation of satellite observations improve subsurface biological properties in a numerical model? A case study for the Gulf of Mexico, Ocean Sci., 17, 1141–1156,, 2021a. a

Wang, S., Flipo, N., Romary, T., and Hasanyar, M.: Particle filter for high frequency oxygen data assimilation in river systems, Environ. Model. Softw., 151, 105382,, 2022. a

Wang, T., Chai, F., Xing, X., Ning, J., Jiang, W., and Riser, S. C.: Influence of multi-scale dynamics on the vertical nitrate distribution around the Kuroshio Extension: An investigation based on BGC-Argo and satellite data, Prog. Oceanogr., 193, 102543,, 2021b. a

Yu, L., Fennel, K., Bertino, L., El Gharamti, M., and Thompson, K. R.: Insights on multivariate updates of physical and biogeochemical ocean variables using an Ensemble Kalman Filter and an idealized model of upwelling, Ocean Model., 126, 13–28, 2018. a

Yumruktepe, V. Ç., Mousing, E. A., Tjiputra, J., and Samuelsen, A.: An along-track Biogeochemical Argo modelling framework: a case study of model improvements for the Nordic seas, Geosci. Model Dev., 16, 6875–6897,, 2023. a

Zhou, H., Yue, X., Lei, Y., Tian, C., Ma, Y., and Cao, Y.: Large contributions of diffuse radiation to global gross primary productivity during 1981–2015, Global Biogeochem. Cy., 35, e2021GB006957,, 2021. a

Short summary
Forecasting of marine biogeochemistry can be improved via the assimilation of observations. Floating buoys provide multivariate information about the status of the ocean interior. Information on the ocean interior can be expanded/augmented by machine learning. In this work, we show the enhanced impact of assimilating new in situ variables (oxygen) and reconstructed variables (nitrate) in the operational forecast system (MedBFM) model of the Mediterranean Sea.