Articles | Volume 17, issue 4
Research article
26 Aug 2021
Research article |  | 26 Aug 2021

Can assimilation of satellite observations improve subsurface biological properties in a numerical model? A case study for the Gulf of Mexico

Bin Wang, Katja Fennel, and Liuqian Yu

Given current threats to ocean ecosystem health, there is a growing demand for accurate biogeochemical hindcasts, nowcasts, and predictions. Provision of such products requires data assimilation, i.e., a comprehensive strategy for incorporating observations into biogeochemical models, but current data streams of biogeochemical observations are generally considered insufficient for the operational provision of such products. This study investigates to what degree the assimilation of satellite observations in combination with a priori model calibration by sparse BGC-Argo profiles can improve subsurface biogeochemical properties. The multivariate deterministic ensemble Kalman filter (DEnKF) has been implemented to assimilate physical and biological observations into a three-dimensional coupled physical–biogeochemical model, the biogeochemical component of which has been calibrated by BGC-Argo float data for the Gulf of Mexico. Specifically, observations of sea surface height, sea surface temperature, and surface chlorophyll were assimilated, and profiles of both physical and biological variables were updated based on the surface information. We assessed whether this leads to improved subsurface distributions, especially of biological properties, using observations from five BGC-Argo floats that were not assimilated. An alternative light parameterization that was tuned a priori using BGC-Argo observations was also applied to test the sensitivity of data assimilation impact on subsurface biological properties. Results show that assimilation of the satellite data improves model representation of major circulation features, which translate into improved three-dimensional distributions of temperature and salinity. The multivariate assimilation also improves the agreement of subsurface nitrate through its tight correlation with temperature, but the improvements in subsurface chlorophyll were modest initially due to suboptimal choices of the model's optical module. Repeating the assimilation run by using the alternative light parameterization greatly improved the subsurface distribution of chlorophyll. Therefore, even sparse BGC-Argo observations can provide substantial benefits for biogeochemical prediction by enabling a priori model tuning. Given that, so far, the abundance of BGC-Argo profiles in the Gulf of Mexico and elsewhere has been insufficient for sequential assimilation, updating 3D biological properties in a model that has been well calibrated is an intermediate step toward full assimilation of the new data types.

1 Introduction

Given the multiple and increasing pressures of ocean warming, acidification, deoxygenation, and changes in primary productivity on ocean ecosystem health, accurate model simulations are urgently needed to assess past and current states of marine ecosystems, forecast future trends, and predict the ocean's response to different scenarios of climate change and management policies. In practice, numerical models are imperfect representations of the natural system, and their accuracy is limited by many factors including insufficient model resolution, inaccuracies in discretization schemes and model formulations, parameterization of unresolved processes, and uncertainties in model inputs. Data assimilation is a practical approach used to compensate for these model deficiencies. It is a statistical method to interpolate and extrapolate sparse observations into the regular model space in a dynamically consistent way. Its success critically depends on well-resolved observations. While any practice to constrain a model by observations can be referred to as data assimilation, in this paper we specifically refer to state estimation, i.e., sequential updates of the model state.

Data assimilation is well developed in physical oceanography (Edwards et al.2015) but less mature in biogeochemical ocean modeling, largely due to insufficient observations (Fennel et al.2019). Thus far, satellite data on ocean color (e.g., chlorophyll) have been the major source of observations to be assimilated (e.g., Ford and Barciela2017; Gregg2008; Hu et al.2012; Mattern et al.2013; Pradhan et al.2019; Teruzzi et al.2018) because of their relatively high resolution and routine availability. More recent advances have focused on the incorporation of other satellite-derived products including size-fractionated chlorophyll (e.g., Ciavatta et al.2018, 2019; Pradhan et al.2020; Skákala et al.2018) and optical properties (e.g., Ciavatta et al.2014; Gregg and Rousseaux2017; Jones et al.2016; Shulman et al.2013; Skákala et al.2020). However, these measurements are limited to the surface ocean and provide little information about the subsurface and ocean interior. In addition, it has been acknowledged that assimilating satellite data on ocean color often fails to improve and even degrades simulation of unobserved biological variables (Ciavatta et al.2018; Fontana et al.2013; Ford and Barciela2017; Skákala et al.2018; Teruzzi et al.2018). Problems also remain in accounting for the codependencies or covariances between biological variables. For instance, Fontana et al. (2013) found that subsurface nitrate was barely impacted by assimilating the satellite surface chlorophyll because of its weak correlations with surface chlorophyll. Although BGC-Argo floats may ultimately provide us with abundant subsurface measurements of multiple key biogeochemical properties (Biogeochemical-Argo Planning Group2016; Chai et al.2020; Roemmich et al.2019), the profiling observations will likely remain insufficient for three-dimensional data assimilation for a number of years, making satellite data the main observation streams for sequential data assimilation in biogeochemical models (Ford2021).

The insufficient availability of subsurface and interior ocean biogeochemical observations is reflected not only in the immaturity of biogeochemical data assimilation but also its skill assessment. When compared with the surface, the subsurface has received less attention in skill assessments of biogeochemical data assimilation systems. Although there have been studies that compared vertical structures with in situ observations and/or climatological datasets (e.g., Fontana et al.2013; Ford and Barciela2017; Mattern et al.2017; Ourmières et al.2009; Teruzzi et al.2014), these validations were often limited to low spatiotemporal resolution. The recent growth of autonomous observation systems, especially BGC-Argo floats and gliders, makes it possible to evaluate biogeochemical data assimilation systems below the surface in high resolution (e.g., Cossarini et al.2019; Salon et al.2019; Skákala et al.2021; Verdy and Mazloff2017).

Finally, since physical processes affect biological properties through advection and diffusion of biological tracers as well as some temperature-dependent biological activities (e.g., phytoplankton growth), deficiencies in biological models can arise from imperfect simulation of the physics (Doney1999; Doney et al.2004; Oschlies and Garçon1999). Although there have been studies demonstrating a positive effect of physical data assimilation on biological properties (Fiechter et al.2011; Ourmières et al.2009), often this approach degrades biological distributions because of elevated vertical velocities and violation of consistency between physical and biological properties (Anderson et al.2000; Raghukumar et al.2015; Yu et al.2018). To address these issues, joint assimilation of physical and biological observations (Song et al.2016a, b) or multivariate updates based on the cross-covariances between physical and biological properties (Goodliff et al.2019; Yu et al.2018) have been suggested.

In this study, a multivariate physical–biological data assimilation scheme is applied to a coupled physical–biological model in the Gulf of Mexico. The rationale for choosing the Gulf of Mexico is that the dominant circulation, including the Loop Current and its associated mesoscale eddies, is stochastic and can influence the subsurface biological distributions, e.g., deep chlorophyll maximum (Fommervault et al.2017). In addition, we test how data assimilation impacts depend on model calibration when using two alternative light parameterizations. By comparing forecast results from the assimilative model with independent observations from five BGC-Argo floats that are not assimilated but used in a priori tuning of the biogeochemical model, we rigorously evaluate whether the main biological observation stream (satellite estimates of surface chlorophyll) in combination with physical observations (satellite estimates of sea surface height and sea surface temperature) can inform the 3D ocean distributions in high spatial and temporal resolution.

Figure 1Bathymetric map of the Gulf of Mexico with a schematic pattern of the Loop Current (black curve with arrows) and Loop Current eddies (black circle with arrows). Trajectories of five BGC-Argo floats (colored lines) in 2015 are also shown in the figure. The model domain is represented by the red rectangle.

2 Tools and methods

2.1 Coupled physical and biological model

The coupled physical and biological model used in this study is based on the Regional Ocean Modeling System (ROMS; Haidvogel et al.2008) configured in the Gulf of Mexico (red rectangle in Fig. 1 shows the model domain) with a horizontal resolution of  5 km and 36 vertical sigma levels (Wang et al.2020; Yu et al.2019). The model used a multidimensional positive definitive advection transport algorithm (MPDATA; Smolarkiewicz and Margolin1998) to solve the horizontal and vertical advection of tracers, a Smagorinsky-type formula (Smagorinsky1963) to parameterize horizontal viscosity and diffusivity, and the Mellor–Yamada 2.5-level closure scheme (Mellor and Yamada1982) to calculate the vertical turbulent mixing. Atmospheric forcing is provided by the European Centre for Medium-Range Weather Forecasts ERA-Interim product (ECMWF reanalysis; Dee et al.2011) with a horizontal resolution of 1/8 (approximately 12 km × 14 km) to calculate the surface wind stress as well as the net heat fluxes and freshwater fluxes.

The biological model uses a nitrogen-based model (Fennel et al.2006) to simulate transportation and transformation of seven pelagic variables, i.e., nitrate (NO3), ammonium (NH4), chlorophyll (Chl), phytoplankton (Phy), zooplankton (Zoo), small detritus (SDet), and large detritus (LDet). As a separate state variable, chlorophyll accounts for photo-acclimation based on Geider et al. (1997). In our coupled model, the biological tracers are advected and diffused as part of the 3D circulation but provide no feedback to the physical model. Biological parameters are from the parameter optimization study of Wang et al. (2020) except that the half-saturation constant of nitrate was subjectively re-tuned based on the BGC-Argo float data from 0.5 mmol N m−3 to about 1.4 mmol N m−3 because the previous model underestimated the nitrate in the euphotic zone.

The coupled model receives freshwater and nutrient inputs from the Mississippi–Atchafalaya river systems, which are specified by daily measurements from the US Geological Survey river gauges and those from other major rivers that utilize climatological estimates (Xue et al.2013). To ensure a dynamically consistent biological field, a 1-year spin-up is performed in 2014 wherein the physical model is initialized from the output of the 1/12 data-assimilative global HYCOM/NCODA (Chassignet et al.2005) and the biological model starts from a regressed 3D field of nitrate based on its climatological relationship with temperature (see Fig. S1). A semi-prognostic method is used during the spin-up period to reduce model drift by replacing model density with a linear combination of model and climatological density fields when calculating the horizontal pressure gradient (Greatbatch et al.2004; Sheng et al.2001). After the spin-up, experiments are performed for a year from January 2015 to December 2015.

2.2 Data assimilation technique

In this study, the data assimilation scheme uses the deterministic formulation of the ensemble Kalman filter (DEnKF), which was first introduced by Sakov and Oke (2008). The approach consists of two steps: (1) the forecast step in which an ensemble of state variables is integrated forward in time by the model and (2) the analysis step in which observations are assimilated to update the forecasted ensemble following the Kalman filter equations:


where x represents the model state estimate, d represents the available observations, H represents the observation operator mapping the model state onto observations, and K represents the Kalman gain matrix, which is determined by the model error matrix P and observation error matrix R (Eq. 2). Superscripts “a” and “f” represent analysis (i.e., updated) and forecast (i.e., prior to the update) estimates, and T represents the matrix transpose. Unlike the original stochastic EnKF, which updates each ensemble member with perturbed observations, the DEnKF updates the ensemble mean (x) and anomalies (A=x-x) separately without perturbating observations; i.e., the former is updated as in Eq. (1), while the latter is updated by

(3) A a = A f - 1 2 KH A f .

More details on the DEnKF can be found in Sakov and Oke (2008) and Yu et al. (2018).

The data assimilation framework and configurations are the same as in Yu et al. (2019) wherein twin experiments were performed in the same model domain. In this study, we extend the work to jointly assimilate the physical and biological observations into a coupled model. For the sake of keeping our data assimilation experiments computationally affordable, we chose an ensemble size of 20, which has been successfully used in previous studies including an idealized channel (Yu et al.2018), the Middle Atlantic Bight (Hu et al.2012; Mattern et al.2013), and the Gulf of Mexico (Yu et al.2019). Spurious correlations, which can arise with relatively small ensembles, are avoided here by applying a distance-based localization with a radius of 50 km (Evensen2003). Vertical localization is not applied. Ensemble anomalies are inflated by 1.05 in each update step to account for unrepresented sources of model uncertainty (Anderson and Anderson1999). Values of the localization radius and inflation factor were determined in Yu et al. (2019).

In order to account for uncertainties in the model's initial, boundary and atmospheric forcing conditions, and biological parameters, the ensemble is initialized from 20 different daily outputs, centered on the initial date of 1 January 2015, from a previous deterministic model simulation (as described above in Sect. 2.1) and is forced by open boundary conditions, which are lagged by up to ±10 d for the different ensemble members. Furthermore, each ensemble member is forced by a perturbed version of the wind forcing. Specifically, the wind forcing from the deterministic run is decomposed into empirical orthogonal functions (EOFs) and then the first four EOFs are perturbed by multiplying random numbers with zero mean and variance of 1 as in Li et al. (2016) and Thacker et al. (2012). In addition, four sensitive biological parameters, namely the mortality rate of phytoplankton, the maximum ratio of chlorophyll to carbon, the grazing rate of zooplankton, and the growth rate of phytoplankton at 0 C, were identified in sensitivity experiments. Specifically, a 1D version of model, described in Wang et al. (2020) was run multiple times while incrementally perturbing one parameter at a time by factors ranging from 0.25 to 1.75 with an increment of 0.25. The four sensitive parameters were selected based on the normalized absolute differences between the perturbed and unperturbed run. In the data assimilation experiments, these parameters are subject to a Gaussian perturbation with a relative variance of 75 %, but they are not updated. The parameters are resampled from their distributions before each forecast step to prevent some extreme parameter values from being used throughout the whole data assimilation experiment.

2.3 Observations

In this study, physical and biological observations are jointly assimilated to constrain the coupled model. The observations assimilated include sea surface height (SSH), sea surface temperature (SST), Argo TS profiles, and satellite estimates of surface chlorophyll.

The SSH observations for assimilation are obtained by adding the 1/4 mapped sea level anomaly (SLA) from Archiving Validation and Interpretation of Satellite Oceanographic Data (AVISO) to a mean dynamic topography (MDT) from Rio et al. (2013), and they are adjusted by removing the spatially averaged mismatches between assimilated and forecasted SSH to account for differences in reference time between the SLA data (1993–2012) and our coupled model (2015) (Haines et al.2011; Song et al.2016b; Xu et al.2012). This is equivalent to assimilating the SSH gradient into the model, as it is the only dynamically meaningful quantity for driving the geostrophic component of ocean currents and adjusting subsurface thermohaline structures. The SST observations are the Advanced Very High Resolution Radiometer (AVHRR; Martin et al.2012) product with a horizontal resolution of 0.01. Observation errors are specified as 0.02 m for SSH and 0.3 C for SST (Song et al.2016b; Yu et al.2018, 2019).

The surface chlorophyll is provided by the Ocean-Colour Climate Change Initiative project (OC-CCI; Sathyendranath et al.2018) at a daily frequency with a spatial resolution of 1/24. However, for the daily chlorophyll field, a large portion of data can be missing due to cloud cover and inter-orbit gaps. In 2015 for the Gulf of Mexico, the spatial coverage of surface chlorophyll varies from 0 % to 63 % with a mean coverage of 9.5±9.0 %. Hence, to increase the availability of observations, an asynchronous data assimilation method (Sakov et al.2010) is applied so that not only the daily records of surface chlorophyll at the date of update but also the daily records within the preceding 7 d are assimilated. Errors associated with the surface chlorophyll are set to be 35 % of the measured concentrations, which has been commonly used in previous applications (e.g., Fontana et al.2013; Ford2021; Ford and Barciela2017; Hu et al.2012; Mattern et al.2017; Santana-Falcón et al.2020; Song et al.2016b; Yu et al.2018). In this study, the update is performed on actual chlorophyll concentrations because our prior tests showed that it outperforms assimilating log-chlorophyll in the open gulf (with depth >1000 m). There are previous examples in which the actual chlorophyll values have been assimilated successfully (e.g., Hu et al.2012; Yu et al.2018), although we note that assimilating the actual chlorophyll values is theoretically suboptimal because of their non-Gaussian distribution.

Profiling observations are from the International Argo project (hereafter referred to as Argo floats) and five BGC-Argo floats, which were funded by the Bureau of Ocean Energy Management (hereafter referred to as BOEM floats). In 2015, the Argo floats provided nearly 800 TS profiles extending from the surface to a depth of 2000 m in the Gulf of Mexico. These are treated either as independent observations for model skill assessment or, in the DAargo experiment (see Sect. 2.4), assimilated with uncertainties of 0.3C for temperature and 0.01 for salinity. The BOEM floats collected more than 500 profiles of temperature, salinity, chlorophyll, and backscatter at a biweekly frequency from 2011 to 2015, 114 of which were collected in 2015 (see Fig. 1 for their locations) and are used as independent observations. Backscatter is converted into phytoplankton and particulate organic carbon (POC) concentrations following Wang et al. (2020). In the absence of direct measurements for nitrate, we estimate it along the BOEM float trajectories based on their climatological relationship with temperature (Fig. S1).

2.4 Simulation strategy

We performed five 1-year simulations in 2015. The first one is a deterministic model simulation without data assimilation (henceforth referred to as the Free simulation). The second one is an ensemble run assimilating satellite data (SSH, SST, and satellite surface chlorophyll) only (henceforth DAsat), and the third one is an ensemble run assimilating Argo TS profiles in addition to satellite data (henceforth DAargo). The calculations (Att=0.04+0.025×Chl) used in these three simulations are from the literature (e.g., Fennel et al.2006, 2011); the light attenuation coefficient, Att, is strongly determined by water depth and not very sensitive to chlorophyll concentrations. The Free run and DAsat run are repeated by using an alternative light parameterization (henceforth referred to as Free-alt and DAsat-alt simulations, respectively) to evaluate its effect on the data assimilation impact on subsurface biological properties. This alternative light parameterization (Att=0.027+0.075×Chl1.2) is subjectively tuned based on the BGC-Argo observations and emphasizes the self-shading effect of chlorophyll on light attenuation.

A two-step update is used on a weekly data assimilation cycle in the assimilative experiments, in which the physical observations are first assimilated to update both physical and biological state variables through the multivariate covariance, and chlorophyll observations are assimilated next to update only biological state variables. Although the DEnKF can update all state variables based on their cross-covariance, we limit updates to two physical variables (temperature and salinity) and four biological variables (nitrate, chlorophyll, phytoplankton, and zooplankton) that are key to the coupled physical–biogeochemical system. As the circulation features in the open gulf (the Loop Current and its associated mesoscale eddies) are primarily in geostrophic balance, an update of temperature and salinity can improve three-dimensional circulation features in large scales effectively, as shown in the twin experiments in Yu et al. (2019). All these state variables are updated throughout the whole water column, while other variables are adjusted by internal model dynamics.

To evaluate the prediction skill, we calculate the root mean square errors (RMSEs), the bias, and the correlation coefficient (Corr) of the model forecast (M) with respect to assimilated and independent observations (O):


where N represents the number of model–data pairs available. To account for the overestimation of nitrate in warm waters, which typically occurs in the euphotic zone (Fig. S1), an unbiased root mean square error (unbiased RMSE) is used to quantify the model–data misfit of nitrate.

(6) unbiased RMSE = 1 N ( M - O - bias ) 2

Figure 2Monthly averaged Loop Current and Loop Current eddies based on the 10 cm SSH contour from satellite data (black), the Free run (blue), the DAsat run (orange), and the DAargo run (yellow). The gray contours represent the isobaths of 200, 1000, and 3000 m.

Figure 3Spatial map of root mean square error (RMSE) in the Free run (a, d) and its differences between the Free run and the two data-assimilative runs for SSH and SST (b, c, e, f). Positive values represent improvements, while negative values represent deteriorations by data assimilation. Gray contours represent the 300, 1000, and 3000 m isobaths.

3 Results

3.1 Assimilation impacts on physical properties

As the biological model provides no feedback to the physical model, the alternative light parameterization does not affect physical properties. The physical results from Free-alt and DAsat-alt runs are thus not displayed in this section.

The dominant circulation features in the Gulf of Mexico, the Loop Current and Loop Current eddies, are assessed by comparing their fronts, defined here as the 10 cm SSH contour, from satellite data, the Free run, and two data-assimilative runs (i.e., the DAsat and DAargo runs). In the first 2 months, all model estimates of the Loop Current are different from satellite observations due to the influence of initial conditions (Fig. 2). After March, the SSH field shows a similar northward and westward extension of the Loop Current intrusion between two assimilative runs and satellite observations, but large deviations from observations remain in the Free run. In addition, all estimates except for the Free run reproduce the satellite-observed timing of eddy shedding well, as well as the size, shape, and position of Loop Current eddies.

For a more quantitative assessment, the daily output of SSH and SST fields from the three runs is compared with the satellite estimates. The spatial distribution of RMSE from the Free run and the RMSE changes in two data-assimilative runs are shown in Fig. 3. In the regions influenced by the Loop Current and Loop Current eddies, this figure shows high RMSE for SSH in the Free run (Fig. 3a) and large RMSE reductions in two data-assimilative runs (Fig. 3b–c). In contrast, the reductions in SST RMSEs are more spatially homogeneous. A summary of the overall RMSE, the bias, and the correlation coefficient (Coef) for physical variables from the Free run and two data-assimilative runs are shown in Table 1. In general, the two data-assimilative runs both significantly improved SSH and SST with reduced RMSEs and increased correlation coefficients. Although the two data-assimilative runs tend to underestimate the satellite observations of SST, the bias (0.06 C) is relatively small.

The correction of mesoscale features by data assimilation was not limited to the surface but extended to the subsurface and even deep waters. Specifically, the two assimilative runs corrected the position, the amplitude, and the polarity of mesoscale eddies and hence better represented the elevated and depressed thermoclines within these eddies (Fig. 4). The most noticeable improvement (by 60 %–61 %) was witnessed by float 287, which captured a newly detached Loop Current eddy with features of high SSH and depressed thermoclines during July and October. In addition, assimilation of Argo TS profiles in the DAargo run led to slight further improvements in the subsurface temperature distributions when compared with the DAsat run. For instance, although the DAsat run greatly improved subsurface temperature distributions along the trajectory of float 285, an underestimation of temperature at a depth of about 200 m remains within the peak of the anticyclonic eddy. Corrections imposed by assimilating Argo profiles increased temperature here and decreased the bias from observations. These small but localized further improvements can also be observed by other floats, e.g., in July–October for float 289 and February for float 290.

In general, assimilating the satellite data in the DAsat run resulted in large reductions in RMSEs of 3D temperature (by 46 %–48 %; Table 1) and salinity (by 36 %–39 %; Table 1) with respect to Argo floats and BOEM floats (Fig. 5). The reductions extend to over 1000 m and a depth of about 800 m for temperature and salinity, respectively. It should be noted again that data from both Argo and BOEM floats are independent in the DAsat run. Although assimilating the Argo profiles in the DAargo run only yields marginal further improvements in RMSEs of temperature ( 3 %) and salinity ( 5 %), it notably reduces the overestimation of temperature that occurs below the surface in the DAsat run (Table 1).

Table 1The root mean square error (RMSE), bias, and correlation coefficient (Corr) for SSH and SST, as well as vertical profiles of temperature and salinity from Argo and BOEM floats. Percentages in the parentheses represent the relative reductions in RMSE values. Since the spatial and temporal average of mismatch between the modeled and observed SSH is removed, the bias of SSH is not shown here.

Download Print Version | Download XLSX

Figure 4Vertical distributions of temperature from BOEM floats, the Free run, the DAsat run, and the DAargo run. Gray lines represent isothermal lines with an interval of 2 C. Thick black lines represent SSH. The observed SSH is obtained from the matching record of altimeter observations.


Figure 5Vertical profiles of root mean square error (RMSE) for temperature and salinity with respect to Argo and BOEM floats.


3.2 Assimilation impacts on biological properties

Assimilating satellite observations in the DAsat run reduced RMSEs of surface chlorophyll almost everywhere, with only 3 % of the model domain experiencing degradation (Fig. 6b). Although large reductions in RMSE were achieved in the coastal regions, e.g., in the northern Gulf of Mexico, on Campeche Bank, and in Campeche Bay, the simulated chlorophyll concentrations remained much lower than the satellite estimates because of high observational uncertainties and a large background misfit in the Free run (Fig. 6a). This was expected because the biological model was optimized for the open gulf (Wang et al.2020). Table 2 shows the RMSE, the bias, and the correlation coefficient for biological variables from the Free run and the data-assimilative runs. A relative reduction in RMSE equal to or exceeding 10 % is considered a significant improvement. In the open gulf, encompassed by the 1000 m isobath, the overall RMSE of surface chlorophyll was reduced by 19 % from 0.13 mg m−3 in the Free run to 0.11 mg m−3 in the DAsat run (Table 2). In addition, the correlation coefficient increased from 0.52 to 0.68. Assimilating Argo TS profiles in the DAargo run led to lower reductions in the overall RMSEs of surface chlorophyll (Table 2) and even more deteriorations (Fig. 6c).

Figure 6The same as Fig. 3 except for surface chlorophyll.

To evaluate the impacts of data assimilation on subsurface biological properties, the temporal evolution of nitrate in different model experiments is shown in Fig. 7 in comparison to nitrate estimated based on its climatological relationship with temperature. The temperature-based nitrate tends to be overestimated in the upper layers (Fig. S1). Because of its high correlation with temperature, the nitrate distribution was modulated in the two assimilative runs along with the improvement in temperature fields. For instance, the two assimilative runs reproduce the Loop Current eddy observed by float 287 and hence capture the depressed thermoclines that are not present in the Free run (Fig. 4). At the same time, the nitraclines are also depressed and the nitrate concentrations become lower within this Loop Current eddy (Fig. 7). As a result, the unbiased RMSE of nitrate following this float is reduced by 40 % in the DAsat run and 38 % in the DAargo run. These depressed (upwelled) nitraclines due to the increase (decrease) in SSH by data assimilation can also be observed elsewhere, e.g., in August for float 285, in April–July for float 286, January–April for float 287, and in August–October for float 290, although the amplitude of these mesoscale eddies is smaller. In general, data assimilation improved the overall agreement of subsurface nitrate with correlation coefficients and decreased RMSEs by 28 % and 30 % in the DAsat and DAargo runs relative to the Free run (Table  2).

Figure 7Vertical distributions of nitrate, which are estimated based on its climatological relationship with temperature and modeled by different experiments, superimposed with the SSH (thick black lines).


Figure 8Same as Fig. 4 but for chlorophyll. Gray contours represent the simulated isolumes, and red lines represent the depth of the deep chlorophyll maximum. Thick black lines represent SSH.


The impacts of assimilation on subsurface chlorophyll are more complicated because of the high nonlinearity of the model with regard to chlorophyll. Although the mean vertical profiles of chlorophyll are well reproduced in all three experiments (Fig. S2), all failed to resolve the high spatiotemporal variability in subsurface chlorophyll, which is at least partly due to the presence of mesoscale eddies (Fig. 8). As a result, assimilation improved subsurface chlorophyll RMSEs marginally, even in the Loop Current eddy for float 287 for which the most noticeable improvements of temperature ( 60 %) and nitrate ( 40 %) RMSEs were obtained. Results for phytoplankton and POC are similar to chlorophyll, although the reductions in their RMSEs are larger because assimilating the satellite data reduces their biases, especially in the upper layer (Fig. S2, Table 2).

The model's inability to reproduce the spatiotemporal variability of subsurface chlorophyll is also reflected by the positions of the deep chlorophyll maximum (DCM, denoted by red lines in Fig. 8). As a ubiquitous phenomenon in the oligotrophic regions, a distinct DCM is observed throughout the whole year in the open Gulf of Mexico, and its depth is inversely correlated with SSH (correlation coefficient −0.6). Although the mean position and magnitude of the DCM are well reproduced by the model with and without data assimilation (Fig. S2), the simulated DCM depth is much more stable and less sensitive to SSH variations. As a result, the reduction in the RMSE of DCM depth is limited to 18 % in DAsat run but is significant (Table 2).

Figure 9Mean vertical profiles of nitrate, light intensity (photosynthetically active radiation, PAR), chlorophyll, and phytoplankton within the center of the newly detached Loop Current eddy from the Free run, the DAsat run, the Free-alt run, and the DAsat-alt run.


3.3 Sensitivity of subsurface chlorophyll to the light attenuation parameterization

Both with and without data assimilation, the alternative parameterization led to higher correlations between simulated SSH and DCM depth with correlation coefficients of 0.60 in Free-alt run and 0.67 in DAsat-alt run. As a result, the alternative parameterization produces slightly lower RMSEs and a higher correlation coefficient for DCM depth (Table 2) and yields larger improvements in chlorophyll within the Loop Current eddy for float 287 (Fig. 8). To illustrate the underlying reasons, the mean vertical profiles of nitrate, the intensity of photosynthetically active radiation (PAR), the chlorophyll, and the phytoplankton within the center of this Loop Current eddy are shown in Fig. 9. When using the original parameterization, assimilating the satellite data depresses the DCM depth from 70 m in the Free run to 90 m in the DAsat run but with a considerable bias of 20 m when compared to the observations. However, the chlorophyll is underestimated in the DAsat run, and as a result its RMSEs are barely improved. In contrast, in the DAsat-alt run the DCM depth is corrected to 120 m, in agreement with the observations, and represents the vertical chlorophyll distribution more accurately, although the nitrate profile is almost the same as in DAsat run. This was because the alternative parameterization accounted for the elevated PAR intensity as a response to reduced chlorophyll concentrations in the upper layer, which in turn facilitated the synthesis of chlorophyll and hence corrected their concentrations toward the observations.

Figure 10Histogram of increments in nitrate (mmol N m−3), chlorophyll (mg m−3), and DCM depth (m) obtained by assimilating physical and biological observations.


4 Discussion

We implemented a coupled data assimilation scheme for jointly assimilating physical and biological observations in a biogeochemical model and evaluated to what degree satellite observations can inform subsurface distributions, especially of biological properties. The degree to which the data assimilation impact can depend on model calibration was tested by using an alternative light parameterization. Although biological data assimilation has received much attention in recent years, observations that are assimilated and used in skill assessment are typically limited to the surface ocean. The increasing availability of BGC-Argo data now makes it possible to validate and improve model performance below the surface (Cossarini et al.2019; Salon et al.2019; Terzić et al.2019; Wang et al.2020), but so far these observations have been too sparse for sequential assimilation in three dimensions; hence, relevant applications are limited to idealized twin experiments (Ford2021; Yu et al.2018) and a few specific regions with high float densities, e.g., the Mediterranean Sea (Cossarini et al.2019). In addition, since a biogeochemical model is coupled to a physical model, assimilating physical observations theoretically should confer improvements on the biological model by correcting the circulation (e.g., Fiechter et al.2011; Raghukumar et al.2015; Song et al.2016a, b) and potentially by providing additional constraints via multivariate updates to biological variables (e.g., Goodliff et al.2019; Yu et al.2018). This is particularly important when the physical model is biased (Yu et al.2018).

Our study shows that assimilating satellite data (DAsat run) can constrain the main circulation features in the Gulf of Mexico, i.e., the Loop Current and its associated mesoscale eddies. Temperature and salinity are also improved down to a depth of  1000 m because of the correction of mesoscale eddies. When calculating the reductions in RMSE for SSH and each single profile of temperature and salinity, we find that the improvement in SSH is highly correlated with those in temperature (correlation coefficient 0.96) and salinity (correlation coefficient 0.92, Fig. S3). Assimilating the satellite data also improves subsurface nitrate because it is tightly correlated with the density structure expressed by SSH and temperature profiles. However, improvements in temperature and nitrate do not necessarily yield better simulations of chlorophyll or phytoplankton because they tend to be light-limited below the surface. In our biogeochemical model, the light intensity is attenuated by water and chlorophyll and is not directly updated by the data assimilation scheme but only adjusted indirectly through changes in chlorophyll during forecast steps. This, in turn, impacts the synthesis of chlorophyll and growth of phytoplankton. However, in the original parameterization, light attenuation is mainly controlled by water depth and much less sensitive to chlorophyll concentrations than it appears to be in reality. By applying an alternative light parameterization with more pronounced self-shading by chlorophyll, the subsurface chlorophyll and phytoplankton distributions are further improved after assimilating the satellite data. These results show that the biological variables can be improved through model dynamical response to data assimilation. However, the efficiency of this mechanism depends on the accuracy of the biological model. That is why data assimilation generally benefits from a well-calibrated model. For example, the usage of suboptimal biological parameters can yield a substantial degradation of data assimilation efficiency, especially with respect to unobserved variables (Song et al.2016a). Although BGC-Argo profiles have so far been insufficient for sequential assimilation, they can provide substantial benefits for biogeochemical prediction by enabling a priori model tuning, e.g., of biological parameter values (Wang et al.2020) and the key parameterization schemes (Terzić et al.2019).

Table 2The root mean square error (RMSE), bias, and correlation coefficient (Corr) for surface chlorophyll in the open gulf, along with vertical profiles of NO3, chlorophyll, phytoplankton, and POC, as well as the depth of the deep chlorophyll maximum with respect to observations from BOEM floats. Percentages in the parentheses represent the relative reductions in RMSE values. Only a reduction in RMSE larger than or equal to 10 % is considered a significant improvement. The NO3 is estimated based on its climatological relationship with temperature. Since the estimated NO3 tends to be overestimated in warm regions, the unbiased RMSE of NO3 is reported and the bias is not shown here.

Download Print Version | Download XLSX

In addition to the model's dynamical response, the biological fields can be directly updated by physical and biological observations through multivariate covariances. To distinguish their influence, we show the increments obtained from assimilating each observation type in the DAsat run (Fig. 10). The increment of DCM depth is defined analogously to other state variables as changes due to the update. As shown in Fig. 10a and b, assimilating physical observations has a much stronger impact than biological observations on nitrate, and therefore we conclude that the improvement of nitrate in this study is mainly obtained from assimilating physical observations. This is consistent with previous studies (e.g., Ciavatta et al.2018; Skákala et al.2018; Teruzzi et al.2018) wherein assimilating surface chlorophyll had little impact on nitrate and even degraded it in both variational and sequential data assimilation. In variational data assimilation, it is hard to define the background errors accurately (Mattern et al.2017; Teruzzi et al.2018), and the biological model can fit itself to observed chlorophyll through many different pathways, e.g., direct changes in biomass or an indirect way through nitrate. However, observations are often insufficient to provide this information (Mattern et al.2017). In sequential data assimilation, the multivariate covariance between surface chlorophyll and subsurface nitrate can be considered, but typically this covariance is not linear or constant. For instance, Fontana et al. (2013) assimilated satellite surface chlorophyll into a biological model in the North Atlantic and found that subsurface nitrate was barely influenced because it was weakly correlated with surface chlorophyll, leading the authors to suggest that it is impossible to fully constrain a 3D biogeochemical model by only assimilating the surface chlorophyll. This issue remains when assimilating the surface chlorophyll to update other biological variables (Yu et al.2018), e.g., phytoplankton functional groups (Ciavatta et al.2018).

In contrast to nitrate, assimilating satellite data from physical and biological observations has a comparable influence on subsurface chlorophyll (Fig. 10c–f). Specifically, they can change subsurface chlorophyll concentrations even below a depth of 100 m and vertical structures of chlorophyll by adjusting the DCM depth; e.g., there are 10 % and 5 % of profiles with changes in DCM depth exceeding ± 20 m due to the update of physical and biological observations, respectively. Because BGC-Argo profiles are currently sparse, i.e., only 14 profiles are available at all update steps, it is hard to draw definitive conclusions about these impacts on chlorophyll and DCM depth.

Assimilating Argo TS profiles in the DAargo run yields slight further improvements with respect to independent profiles of temperature and salinity, similar to the twin experiments in Yu et al. (2019). To diagnose it, we calculate the root mean square difference (RMSD) of temperature between two data-assimilative runs with respect to each profile from the BOEM floats. In general, the RMSD between the two data-assimilative runs decreases with distance to the nearest Argo profiles that have been assimilated recently but shows no significant decreasing trends with the days after update (Fig. S4). This means that the differences induced by assimilating Argo profiles are sustained locally by model dynamical adjustments. The overall similarities between the two data-assimilative runs (i.e., DAsat and DAargo runs) in Fig. 4 can be explained to some extent by the large distances between BOEM and Argo profiles. However, it does not mean that increasing the localization radius can necessarily improve the data assimilation performance. We note that the current localization radius was determined in Yu et al. (2019). The additional benefits for physical properties obtained by assimilating Argo TS profiles are also translated into the simulation of subsurface nitrate but not into other biological fields, i.e., chlorophyll, phytoplankton, and POC. Moreover, assimilating the Argo TS profiles can even degrade surface chlorophyll because of spurious correlations. This issue has also been reported in a recent study (Goodliff et al.2019) that assimilated sea surface temperature to update both physical and biological variables, and this issue was alleviated by muting the multivariate update of phytoplankton, zooplankton, and detritus.

In general, coupled data assimilation of both physical and biological satellite observations can improve subsurface biological properties because it benefits from the high correlations of some biological distributions, especially nutrients, with the vertical density structure and because of the dynamical responses to improvements in circulation in the forecast step. However, this is preconditioned on the coupled model being well calibrated a priori. Therefore, this study provides an intermediate step toward 3D updates of biological properties before the BGC-Argo profiles ultimately become more abundant.

5 Conclusions

In this study, a coupled data assimilation scheme for both physical and biological satellite observations was implemented to investigate whether these observations can inform subsurface distributions. In addition, Argo TS profiles were assimilated to assess their impact beyond satellite observations. The multivariate update was applied by using the covariance structure between physical and biological variables. The Gulf of Mexico was selected as the study region because the dominant physical features, the Loop Current and its associated mesoscale eddies, are stochastic and can substantially influence the biological properties in three dimensions. Our results show that assimilating satellite data leads to significant improvements in the simulation of SSH and SST and also projects these improvements from the surface to a depth of about 1000 m for temperature and salinity, as shown by an assessment of the independent BGC-Argo profiles. With respect to biological fields, the subsurface nitrate distribution benefits greatly from the tight correlation with density and the improved fidelity of mesoscale features. However, initially there were only slight improvements in other biological variables below the surface, i.e., chlorophyll, phytoplankton, and POC, because a suboptimal light parameterization did not react to the changed chlorophyll concentrations appropriately and failed to provide accurate feedbacks on the synthesis of chlorophyll and growth of phytoplankton. We tested an alternative light parameterization with a larger relative contribution from chlorophyll to light attenuation. As a result, the subsurface chlorophyll and phytoplankton were further improved. This highlights the importance of a priori tuning to achieve better assimilation performance. Finally, assimilating the Argo TS profiles on top of satellite observations yields slight further improvements with respect to independent vertical profiles of temperature and salinity, which also translated into improvements in subsurface nitrate.

Code and data availability

The ROMS code can be accessed at, last access: 16 June 2016 (Haidvogel et al.2008). HYCOM data can be downloaded at, last access: 16 August 2018 (Chassignet et al.2005). Profiling data from the BGC-Argo floats are available at the National Oceanographic Data Center (NOAA): (Hamilton and Leidos2017).


The supplement related to this article is available online at:

Author contributions

BW and KF conceived the study. BW carried out data assimilation experiments and analyses. LY provided data assimilation techniques. BW and KF discussed the results and wrote the paper with contributions from LY.

Competing interests

The authors declare that they have no conflict of interest.


Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Special issue statement

This article is part of the special issue “Biogeochemistry in the BGC-Argo era: from process studies to ecosystem forecasts (BG/OS inter-journal SI)”. It is not associated with a conference.

Financial support

This research has been supported by the Gulf of Mexico Research Initiative (grant no. GoMRI-V-487).

Review statement

This paper was edited by Paolo Lazzari and reviewed by four anonymous referees.


Anderson, J. L. and Anderson, S. L.: A Monte Carlo Implementation of the Nonlinear Filtering Problem to Produce Ensemble Assimilations and Forecasts, Mon. Weather Rev., 127, 2741–2758, 1999. a

Anderson, L. A., Robinson, A. R., and Lozano, C. J.: Physical and biological modeling in the Gulf Stream region:: I. Data assimilation methodology, Deep-Sea Res. Pt. I, 47, 1787–1827,, 2000. a

Biogeochemical-Argo Planning Group: The scientific rationale, design and implementation plan for a Biogeochemical-Argo float array, Report,, 2016. a

Chai, F., Johnson, K. S., Claustre, H., Xing, X., Wang, Y., Boss, E., Riser, S., Fennel, K., Schofield, O., and Sutton, A.: Monitoring ocean biogeochemistry with autonomous platforms, Nat. Rev. Earth Environ., 1, 315–326,, 2020. a

Chassignet, E., Hurlburt, H., Smedstad, O., Barron, C., Ko, D., Rhodes, R., Shriver, J., Wallcraft, A., and Arnone, R.: Assessment of Data Assimilative Ocean Models in the Gulf of Mexico Using Ocean Color, Geophys. Monogr. Ser., 161, 87–100,, 2005. a, b

Ciavatta, S., Torres, R., Martinez-vicente, V., Smyth, T., Olmo, G. D., Polimene, L., and Allen, J. I.: Progress in Oceanography Assimilation of remotely-sensed optical properties to improve marine biogeochemistry modelling, Prog. Oceanogr., 127, 74–95,, 2014. a

Ciavatta, S., Brewin, R. J. W., Skakala, J., Polimene, L., de Mora, L., Artioli, Y., and Allen, J. I.: Assimilation of Ocean-Color Plankton Functional Types to Improve Marine Ecosystem Simulations, J. Geophys. Res.-Ocean., 123, 834–854,, 2018. a, b, c, d

Ciavatta, S., Kay, S., Brewin, R. J. W., Cox, R., Di Cicco, A., Nencioli, F., Polimene, L., Sammartino, M., Santoleri, R., Skákala, J., and Tsapakis, M.: Ecoregions in the Mediterranean Sea Through the Reanalysis of Phytoplankton Functional Types and Carbon Fluxes, J. Geophys. Res.-Ocean., 124, 6737–6759,, 2019. a

Cossarini, G., Mariotti, L., Feudale, L., Mignot, A., Salon, S., Taillandier, V., Teruzzi, A., and D'Ortenzio, F.: Towards operational 3D-Var assimilation of chlorophyll Biogeochemical-Argo float data into a biogeochemical model of the Mediterranean Sea, Ocean Model., 133, 112–128,, 2019. a, b, c

Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U., Balmaseda, M. A., Balsamo, G., Bauer, P., Bechtold, P., Beljaars, A. C. M., van de Berg, L., Bidlot, J., Bormann, N., Delsol, C., Dragani, R., Fuentes, M., Geer, A. J., Haimberger, L., Healy, S. B., Hersbach, H., Hólm, E. V., Isaksen, L., Kållberg, P., Köhler, M., Matricardi, M., McNally, A. P., Monge-Sanz, B. M., Morcrette, J.-J., Park, B.-K., Peubey, C., de Rosnay, P., Tavolato, C., Thépaut, J.-N., and Vitart, F.: The ERA-Interim reanalysis: configuration and performance of the data assimilation system, Q. J. Roy. Meteor. Soc., 137, 553–597,, 2011. a

Doney, S. C.: Major challenges confronting marine biogeochemical modeling, Global Biogeochem. Cy., 13, 705–714,, 1999. a

Doney, S. C., Lindsay, K., Caldeira, K., Campin, J.-M., Drange, H., Dutay, J.-C., Follows, M., Gao, Y., Gnanadesikan, A., Gruber, N., Ishida, A., Joos, F., Madec, G., Maier-Reimer, E., Marshall, J. C., Matear, R. J., Monfray, P., Mouchet, A., Najjar, R., Orr, J. C., Plattner, G.-K., Sarmiento, J., Schlitzer, R., Slater, R., Totterdell, I. J., Weirig, M.-F., Yamanaka, Y., and Yool, A.: Evaluating global ocean carbon models: The importance of realistic physics, Global Biogeochem. Cy., 18,, 2004. a

Edwards, C. A., Moore, A. M., Hoteit, I., and Cornuelle, B. D.: Regional Ocean Data Assimilation, Ann. Rev. Mar. Sci., 7, 21–42,, 2015. a

Evensen, G.: The Ensemble Kalman Filter: theoretical formulation and practical implementation, Ocean Dynam., 53, 343–367,, 2003. a

Fennel, K., Wilkin, J., Levin, J., Moisan, J., Reilly, J. O., and Haidvogel, D.: Nitrogen cycling in the Middle Atlantic Bight : Results from a three-dimensional model and implications for the North Atlantic nitrogen budget, Global Biogeochem. Cy., 20, 1–14,, 2006. a, b

Fennel, K., Hetland, R., Feng, Y., and Dimarco, S.: A coupled physical-biological model of the Northern Gulf of Mexico shelf: Model description, validation and analysis of phytoplankton variability, Biogeosciences, 8, 1881–1899,, 2011. a

Fennel, K., Gehlen, M., Brasseur, P., Brown, C. W., Ciavatta, S., Cossarini, G., Crise, A., Edwards, C. A., Ford, D., Friedrichs, M. A. M., Gregoire, M., Jones, E., Kim, H.-C., Lamouroux, J., Murtugudde, R., Perruche, C., the GODAE OceanView Marine Ecosystem Analysis, and Prediction Task Team: Advancing Marine Biogeochemical and Ecosystem Reanalyses and Forecasts as Tools for Monitoring and Managing Ecosystem Health, Front. Mar. Sci., 6, 1–9,, 2019. a

Fiechter, J., Broquet, G., Moore, A. M., and Arango, H. G.: A data assimilative, coupled physical–biological model for the Coastal Gulf of Alaska, Dynam. Atmos. Ocean., 52, 95–118,, 2011. a, b

Fommervault, O. P. D., Perez-brunius, P., Damien, P., Camacho-ibar, V. F., and Sheinbaum, J.: Temporal variability of chlorophyll distribution in the Gulf of Mexico: bio-optical data from profiling floats, Biogeosciences, 14, 5647–5662,, 2017. a

Fontana, C., Brasseur, P., and Brankart, J. M.: Toward a multivariate reanalysis of the North Atlantic Ocean biogeochemistry during 1998–2006 based on the assimilation of SeaWiFS chlorophyll data, Ocean Sci., 9, 37–56,, 2013. a, b, c, d, e

Ford, D.: Assimilating synthetic Biogeochemical-Argo and ocean colour observations into a global ocean model to inform observing system design, Biogeosciences, 18, 509–534,, 2021. a, b, c

Ford, D. and Barciela, R.: Global marine biogeochemical reanalyses assimilating two different sets of merged ocean colour products, Remote Sens. Environ., 203, 40–54,, 2017. a, b, c, d

Geider, R. J., MacIntyre, H. L., and Kana, T. M.: Dynamic model of phytoplankton growth and acclimation: responses of the balanced growth rate and the chlorophyll a: carbon ratio to light, nutrient-limitation and temperature, Mar. Ecol. Prog. Ser., 148, 187–200, 1997. a

Goodliff, M., Bruening, T., Schwichtenberg, F., Li, X., Lindenthal, A., Lorkowski, I., and Nerger, L.: Temperature assimilation into a coastal ocean-biogeochemical model: assessment of weakly and strongly coupled data assimilation, Ocean Dynam., 69, 1217–1237,, 2019. a, b, c

Greatbatch, R. J., Sheng, J., Eden, C., Tang, L., Zhai, X., and Zhao, J.: The semi-prognostic method, Cont. Shelf Res., 24, 2149–2165,, 2004. a

Gregg, W. W.: Assimilation of SeaWiFS ocean chlorophyll data into a three-dimensional global ocean model, J. Mar. Syst., 69, 205–225,, 2008. a

Gregg, W. W. and Rousseaux, C. S.: Simulating PACE Global Ocean Radiances, Front. Mar. Sci., 4, 60,, 2017. a

Haidvogel, D. B., Arango, H., Budgell, W. P., Cornuelle, B. D., Curchitser, E., Lorenzo, E. D., Fennel, K., Geyer, W., Hermann, A., Lanerolle, L., Levin, J., McWilliams, J. C., Miller, A. J., Moore, A. M., Powell, T. M., Shchepetkin, A. F., Sherwood, C. R., Signell, R. P., Warner, J. C., and Wilkin, J.: Ocean forecasting in terrain-following coordinates : Formulation and skill assessment of the Regional Ocean Modeling System, J. Comput. Phys., 227, 3595–3624,, 2008. a, b

Haines, K., Johannessen, J., Knudsen, P., Lea, D., Rio, M.-H., Bertino, L., Davidson, F., and Hernandez, F.: An ocean modelling and assimilation guide to using GOCE geoid products, Ocean Sci., 7, 151–164,, 2011. a

Hamilton, P. and Leidos: Ocean currents, temperatures, and others measured by drifters and profiling floats for the Lagrangian Approach to Study the Gulf of Mexico Deep Circulation project 2011-07 to 2015-06 (NCEI Accession 0159562), Version 1.1, NOAA National Centers for Environmental Information, Tech. Rep. [data set], available at:, last access: 25 October 2017, 2017. a

Hu, J., Fennel, K., Mattern, J. P., and Wilkin, J.: Data assimilation with a local Ensemble Kalman Filter applied to a three-dimensional biological model of the Middle Atlantic Bight, J. Mar. Syst., 94, 145–156,, 2012. a, b, c, d

Jones, E. M., Baird, M. E., Mongin, M., Parslow, J., Skerratt, J., Lovell, J., Margvelashvili, N., Matear, R. J., Wild-Allen, K., Robson, B., Rizwi, F., Oke, P., King, E., Schroeder, T., Steven, A., and Taylor, J.: Use of remote-sensing reflectance to constrain a data assimilating marine biogeochemical model of the Great Barrier Reef, Biogeosciences, 13, 6441–6469,, 2016. a

Li, G., Iskandarani, M., Hénaff, M. L., Winokur, J., Le Maître, O. P., and Knio, O. M.: Quantifying initial and wind forcing uncertainties in the Gulf of Mexico, Comput. Geosci., 20, 1133–1153,, 2016. a

Martin, M., Dash, P., Ignatov, A., Banzon, V., Beggs, H., Brasnett, B., Cayula, J.-F., Cummings, J., Donlon, C., Gentemann, C., Grumbine, R., Ishizaki, S., Maturi, E., Reynolds, R. W., and Roberts-Jones, J.: Group for High Resolution Sea Surface temperature (GHRSST) analysis fields inter-comparisons, Part 1: A GHRSST multi-product ensemble (GMPE), Deep-Sea Res. Pt. II, 77–80, 21–30,, 2012. a

Mattern, J. P., Dowd, M., and Fennel, K.: Particle filter-based data assimilation for a three-dimensional biological ocean model and satellite observations, J. Geophys. Res.-Ocean., 118, 2746–2760,, 2013. a, b

Mattern, J. P., Song, H., Edwards, C. A., Moore, A. M., and Fiechter, J.: Data assimilation of physical and chlorophyll a observations in the California Current System using two biogeochemical models, Ocean Model., 109, 55–71,, 2017. a, b, c, d

Mellor, G. L. and Yamada, T.: Development of a turbulence closure model for geophysical fluid problems, Rev. Geophys. Space Phys., 20, 851–875,, 1982. a

Oschlies, A. and Garçon, V.: An eddy-permitting coupled physical-biological model of the North Atlantic: 1. Sensitivity to advection numerics and mixed layer physics, Global Biogeochem. Cy., 13, 135–160,, 1999. a

Ourmières, Y., Brasseur, P., Lévy, M., Brankart, J.-M., and Verron, J.: On the key role of nutrient data to constrain a coupled physical–biogeochemical assimilative model of the North Atlantic Ocean, J. Mar. Syst., 75, 100–115,, 2009. a, b

Pradhan, H. K., Völker, C., Losa, S. N., Bracher, A., and Nerger, L.: Assimilation of Global Total Chlorophyll OC-CCI Data and Its Impact on Individual Phytoplankton Fields, J. Geophys. Res.-Ocean., 124, 470–490,, 2019. a

Pradhan, H. K., Völker, C., Losa, S. N., Bracher, A., and Nerger, L.: Global Assimilation of Ocean-Color Data of Phytoplankton Functional Types: Impact of Different Data Sets, J. Geophys. Res.-Ocean., 125, e2019JC015586,, 2020. a

Raghukumar, K., Edwards, C. A., Goebel, N. L., Broquet, G., Veneziani, M., Moore, A. M., and Zehr, J. P.: Impact of assimilating physical oceanographic data on modeled ecosystem dynamics in the California Current System, Prog. Oceanogr., 138, 546–558,, 2015. a, b

Rio, M.-H., Mulet, S., and Picot, N.: New global mean dynamic topography from a goce geoid model, altimeter measurements and oceanographic in-situ data, ESA Living Planet Symposium, Proceedings of the conference held on 9–13 September 2013 at Edinburgh in United Kingdom, ESA SP-722, 2–13, 2013. a

Roemmich, D., Alford, M. H., Claustre, H., Johnson, K., King, B., Moum, J., Oke, P., Owens, W. B., Pouliquen, S., Purkey, S., Scanderbeg, M., Suga, T., Wijffels, S., Zilberman, N., Bakker, D., Baringer, M., Belbeoch, M., Bittig, H. C., Boss, E., Calil, P., Carse, F., Carval, T., Chai, F., Conchubhair, D. Ó., D'Ortenzio, F., Dall'Olmo, G., Desbruyeres, D., Fennel, K., Fer, I., Ferrari, R., Forget, G., Freeland, H., Fujiki, T., Gehlen, M., Greenan, B., Hallberg, R., Hibiya, T., Hosoda, S., Jayne, S., Jochum, M., Johnson, G. C., Kang, K., Kolodziejczyk, N., Körtzinger, A., Traon, P.-Y. L., Lenn, Y.-D., Maze, G., Mork, K. A., Morris, T., Nagai, T., Nash, J., Garabato, A. N., Olsen, A., Pattabhi, R. R., Prakash, S., Riser, S., Schmechtig, C., Schmid, C., Shroyer, E., Sterl, A., Sutton, P., Talley, L., Tanhua, T., Thierry, V., Thomalla, S., Toole, J., Troisi, A., Trull, T. W., Turton, J., Velez-Belchi, P. J., Walczowski, W., Wang, H., Wanninkhof, R., Waterhouse, A. F., Waterman, S., Watson, A., Wilson, C., Wong, A. P. S., Xu, J., and Yasuda, I.: On the Future of Argo: A Global, Full-Depth, Multi-Disciplinary Array, available at:, last access: 8 August 2019. a

Sakov, P. and Oke, P. R.: A deterministic formulation of the ensemble Kalman filter: an alternative to ensemble square root filters, Tellus A, 60, 361–371,, 2008. a, b

Sakov, P., Evensen, G., and Bertino, L.: Asynchronous data assimilation with the EnKF, Tellus A, 62, 24–29,, 2010. a

Salon, S., Cossarini, G., Bolzon, G., Feudale, L., Lazzari, P., Teruzzi, A., Solidoro, C., and Crise, A.: Novel metrics based on Biogeochemical Argo data to improve the model uncertainty evaluation of the CMEMS Mediterranean marine ecosystem forecasts, Ocean Sci., 15, 997–1022,, 2019. a, b

Santana-Falcón, Y., Brasseur, P., Brankart, J. M., and Garnier, F.: Assimilation of chlorophyll data into a stochastic ensemble simulation for the North Atlantic Ocean, Ocean Sci., 16, 1297–1315,, 2020. a

Sathyendranath, S., Grant, M., Brewin, R., Brockmann, C., Brotas, V., Chuprin, A., Doerffer, R., Dowell, M., Farman, A., Groom, S., Jackson, T., Krasemann, H., Lavender, S., Martinez Vicente, V., Mazeran, C., Mélin, F., Moore, T., Müller, D., and Platt, G.: ESA Ocean Colour Climate Change Initiative (Ocean_Colour_cci): Version 3.1 Data, Centre for Environmental Data Analysis, 4 July 2018., Tech. rep.,, 2018. a

Sheng, J., Greatbatch, R. J., and Wright, D. G.: Improving the utility of ocean circulation models through adjustment of the momentum balance, J. Geophys. Res.- Ocean., 106, 16711–16728,, 2001. a

Shulman, I., Frolov, S., Anderson, S., Penta, B., Gould, R., Sakalaukus, P., and Ladner, S.: Impact of bio-optical data assimilation on short-term coupled physical, bio-optical model predictions, J. Geophys. Res.-Ocean., 118, 2215–2230,, 2013. a

Skákala, J., Ford, D., Brewin, R. J. W., McEwan, R., Kay, S., Taylor, B., de Mora, L., and Ciavatta, S.: The Assimilation of Phytoplankton Functional Types for Operational Forecasting in the Northwest European Shelf, J. Geophys. Res.-Ocean., 123, 5230–5247,, 2018. a, b, c

Skákala, J., Bruggeman, J., Brewin, R. J. W., Ford, D. A., and Ciavatta, S.: Improved Representation of Underwater Light Field and Its Impact on Ecosystem Dynamics: A Study in the North Sea, J. Geophys. Res.-Ocean., 125, e2020JC016122,, 2020. a

Skákala, J., Ford, D., Bruggeman, J., Hull, T., Kaiser, J., King, R. R., Loveday, B., Palmer, M. R., Smyth, T., Williams, C. A. J., and Ciavatta, S.: Towards a Multi-Platform Assimilative System for North Sea Biogeochemistry, J. Geophys. Res.-Ocean., 126, e2020JC016649,, 2021. a

Smagorinsky, J.: General circulation experiments with the primitive equations: I. the basic experiment, Mon. Weather Rev., 91, 99–164, 1963. a

Smolarkiewicz, P. K. and Margolin, L. G.: MPDATA : A Finite-Difference Solver for Geophysical Flows, J. Comput. Phys., 140, 459–480, 1998. a

Song, H., Edwards, C. A., Moore, A. M., and Fiechter, J.: Data assimilation in a coupled physical-biogeochemical model of the California Current System using an incremental lognormal 4-dimensional variational approach: Part 2 – Joint physical and biological data assimilation twin experiments, Ocean Model., 106, 146–158,, 2016a. a, b, c

Song, H., Edwards, C. A., Moore, A. M., and Fiechter, J.: Data assimilation in a coupled physical-biogeochemical model of the California current system using an incremental lognormal 4-dimensional variational approach: Part 3 – Assimilation in a realistic context using satellite and in situ observations, Ocean Model., 106, 159–172,, 2016b.  a, b, c, d, e

Teruzzi, A., Dobricic, S., Solidoro, C., and Cossarini, G.: A 3-D variational assimilation scheme in coupled transport-biogeochemical models: Forecast of Mediterranean biogeochemical properties, J. Geophys. Res.-Ocean., 119, 200–217,, 2014. a

Teruzzi, A., Bolzon, G., Salon, S., Lazzari, P., and Solidoro, C.: Assimilation of coastal and open sea biogeochemical data to improve phytoplankton simulation in the Mediterranean Sea, Ocean Model., 132, 46–60,, 2018. a, b, c, d

Terzić, E., Lazzari, P., Organelli, E., Solidoro, C., Salon, S., D'Ortenzio, F., and Conan, P.: Merging bio-optical data from Biogeochemical-Argo floats and models in marine biogeochemistry, Biogeosciences, 16, 2527–2542,, 2019. a, b

Thacker, W. C., Srinivasan, A., Iskandarani, M., Knio, O. M., and Hénaff, M. L.: Propagating boundary uncertainties using polynomial expansions, Ocean Model., 43/44, 52–63,, 2012. a

Verdy, A. and Mazloff, M. R.: A data assimilating model for estimating Southern Ocean biogeochemistry, J.f Geophys. Res.-Ocean., 122, 6968–6988,, 2017. a

Wang, B., Fennel, K., Yu, L., and Gordon, C.: Assessing the value of biogeochemical Argo profiles versus ocean color observations for biogeochemical model optimization in the Gulf of Mexico, Biogeosciences, 17, 4059–4074,, 2020. a, b, c, d, e, f, g

Xu, D., Zhu, J., Qi, Y., Li, X., and Yan, Y.: The impact of mean dynamic topography on a sea-level anomaly assimilation in the South China Sea based on an eddy-resolving model, Acta Oceanol. Sin., 31, 11–25,, 2012. a

Xue, Z., He, R., Fennel, K., Cai, W., Lohrenz, S., and Hopkinson, C.: Modeling ocean circulation and biogeochemical variability in the Gulf of Mexico, Biogeosciences, 10, 7219–7234,, 2013. a

Yu, L., Fennel, K., Bertino, L., Gharamti, M. E., and Thompson, K. R.: Insights on multivariate updates of physical and biogeochemical ocean variables using an Ensemble Kalman Filter and an idealized model of upwelling, Ocean Model., 126, 13–28,, 2018. a, b, c, d, e, f, g, h, i, j, k

Yu, L., Fennel, K., Wang, B., Laurent, A., Thompson, K. R., and Shay, L. K.: Evaluation of nonidentical versus identical twin approaches for observation impact assessments: an ensemble-Kalman-filter-based ocean assimilation application for the Gulf of Mexico, Ocean Sci., 15, 1801–1814,, 2019. a, b, c, d, e, f, g, h

Short summary
We demonstrate that even sparse BGC-Argo profiles can substantially improve biogeochemical prediction via a priori model tuning. By assimilating satellite surface chlorophyll and physical observations, subsurface distributions of physical properties and nutrients were improved immediately. The improvement of subsurface chlorophyll was modest initially but was greatly enhanced after adjusting the parameterization for light attenuation through further a priori tuning.