Estimation of positive sum-toone constrained zooplankton grazing preferences with the DEnKF : a twin experiment

The section 2 has been rewritten to take into accounts the comments of the first referee. In order to make the approach less specific, the revised version of the manuscript includes the two systems to solve in order to obtain any set of prior expected values and variances for the (πi)i=1:N and not only equal expected values. It results in a change of content of Appendix B which now describes the derivation of the system of equations required to choose the variances of the parameters (π)i=1:N . Furthermore, we discovered during the reviewing process a work of Nurmela (1995) introducing the hyperspherical coordinate system to remove constraints of sum in geometrical applications (spherical code). A reference to this work has been added in §2.3.


Introduction
The development of numerical ocean biogeochemical models over the last two decades has led to more and more complex representations of the interactions between the different trophic levels, notably between different plankton species at the base of the food chain.While the diet of zooplankton is relatively simply represented in the earliest NPZD models -the unique zooplankton group (Z) is feeding only on the unique phytoplankton group (P) (see for example Evans and Parslow (1985)) -the addition of multiple plankton functional types (PFT) for the phyto-and zooplankton aiming at representing different plankton functional groups in the ecosystem (e.g.diatoms, calcifying algae or microzooplankton) leads to more complex diets and grazing preferences must be added.These parameters are always positive, and, although not compulsory, usually add up to one.We refer to the review of Gentleman et al. (2003) for more details concerning the common mathematical formulations of the zooplankton grazing in ocean biological models and their impact on model dynamics.
Grazing preferences specify the direction of the feeding in the space of foods, and so the direction of the transfer from PFTs (the food) to the zooplankton PFTs (the feeder).Therefore, their impact on the distribution of the different PFTs obtained from a model simulation can be significant.For example, Buitenhuis et al. (2010) observed in their global biogeochemical model that "the phytoplankton functional type distributions and the proportions of primary production that are exported or remineralized" were sensitive to the microzooplankton grazing preferences.In the same way, Buitenhuis et al. (2006) conclude their work by suggesting that the representation of mesozooplankton would notably benefit from the improvement of their grazing preferences by taking into account the food quality.For large-scale applications like configurations covering a whole ocean basin, this results in the potential need of a fine spatial tuning of the grazing preferences in order to take into account the adaptation of zooplankton species to their local environments (Gentleman et al., 2003).Direct measurements of grazing preferences for the different zooplankton species would help to optimize the model parameters representing these preferences.However, field data are sparse; the information provided by Published by Copernicus Publications on behalf of the European Geosciences Union.
the experiments realized in the laboratory does not cover the large spectrum of conditions found in nature (Buitenhuis et al., 2010), and available observations might not be consistent with each other (Buitenhuis et al., 2006).
Multivariate data assimilation methods like ensemblebased Kalman filters make possible the estimation of variables and parameters that are not observed.State variables and parameters can be estimated simultaneously simply by augmenting the state vector with the parameters to estimate (Anderson, 2001;Evensen, 2009).However, the efficient application of ensemble-based data assimilation methods like the ensemble Kalman filter (EnKF; Evensen, 1994Evensen, , 2003) ) to ocean ecosystem models is a challenging issue.Beside the nonlinearity of the model, most variables and parameters are strictly positive, producing non-Gaussian state and parameter distributions thereby breaking an important assumption of the linear analysis, and leading to a loss of optimality of Kalman filters.A solution to perform Kalman filter estimation of non-Gaussian variables is the introduction of nonlinear changes of variables -called anamorphosis functionsin order to realize the analysis step with Gaussian distributed transformed variables (Bertino et al., 2003).This approach has proven to be easily applicable in realistic configurations (Simon and Bertino, 2009) and allows the estimation of biased parameters (Doron et al., 2011;Zhou et al., 2011;Simon and Bertino, 2012).
In this study, we focus on the problem of estimating positive sum-to-one constrained parameters.Our aim is to assess the ability of ensemble-based Kalman filters to estimate zooplankton grazing preferences in ocean biogeochemical models.To overcome the issues that ensemble-based Kalman filters cannot guarantee the sum-to-one constraint when a constraint of positiveness applies on the parameters, we investigate two reformulations for which these two constraints are implicit.
The outline of the paper is as follows.We present the different changes of variables for the estimation of positive sum-to-one constrained parameters in Sect. 2. We describe our experimental framework in Sect.3. Results of the methods are discussed in Sect.4, and we present our conclusion in Sect. 5.

Estimation of positive sum-to-one constrained parameters with ensemble-based Kalman filters
In this section, we describe the general problem of estimating positive sum-to-one constrained parameters with ensemble-based Kalman filters and the issues raised by these constraints.We present a formulation previously suggested by Gelman (1995) to estimate positive sum-to-one constrained parameters in the framework of pharmacokinetics (Gelman et al., 1996).Since the number of food preferences that need to be calibrated can be large in complex ocean biological models (numerous different feeding and fed species), we aim at reducing the number of parameters to estimate.For this reason, we suggest a new formulation that introduces a change of variables based on hyperspherical coordinates.

Definition of the problem
Let (π i ) i=1:N be the N parameters that we wish to estimate.They are positive: and their sum is equal to one: They can be estimated with ensemble-based Kalman filter by augmenting the analysis state vector with these parameters.Unfortunately, the conservation of linear properties intrinsic to the ensemble Kalman filter (Evensen, 2003) is not guaranteed for the parameters due to the constraint of positiveness.The truncation of negative values that results from the Kalman analysis can lead to parameter estimates that do not respect the linear sum-to-one property (Eq.2).Even if the Gaussian anamorphosis extension of ensemble-based Kalman filters makes the estimation possible of positive parameters (Simon and Bertino, 2012), nonlinear transformations do not ensure that they still sum to one.

Dirichlet distribution and Gelman's formulation
A prior distribution for N positive random parameters with the sum-to-one constraint is the Dirichlet distribution of order N.The (π i ) i=1:N can be obtained from N independent gamma distributed random variables (φ i ) i=1:N as follows: (3) Then, the parameters (φ i ) i=1:N are estimated by assimilating observation with ensemble-based Kalman filters, and the values of the original parameters (π i ) i=1:N are obtained from equation (3).Because the parameters (φ i ) i=1:N are not Gaussian distributed, we suggest to transform them with the Gaussian anamorphosis during the analysis.
Another possibility is to substitute the gamma distribution by the log-normal distribution as suggested by Gelman (1995): (4) Ocean Sci., 8, 587-602, 2012 www.ocean-sci.net/8/587/2012/ In that case, the (φ i ) i=1:N fulfill the Kalman filtering assumption of Gaussian distributed variables and do not require anamorphosis.
Due to the symmetrical roles played by the parameters (φ i ) i=1:N in both formulations, the estimation of the parameters (π i ) i=1:N is insensitive to the mapping between the parameters (φ i ) i=1:N and (π i ) i=1:N in the change of variables.However, these approaches do not allow for parameters (π i ) i=1:N equal to zero, meaning that one food type could not be completely removed by assimilation.This might be undesirable in large-scale configurations for which the diet composition can significantly change from one region to another.

The hyperspherical coordinate system
The (π i ) i=1:N can be seen as a position vector in the Cartesian coordinates of a point π in R N .A natural idea is to represent this point in another coordinate system.We suggest to introduce N − 1 angles (φ i ) i=1:N−1 to represent π in the hyperspherical coordinate system that generalizes the spherical coordinate in dimension N .The use of this coordinate system to remove constraints of sum has also been introduced for geometrical applications (Nurmela, 1995).An analogy with a coordinate system describing a point on a sphere shows that 2 angles, longitude and latitude, are required to characterize the position of a point on the surface in 3 dimensions.
with (φ i ) i=1:N−1 N − 1 random variables distributed on the segment line [0, 1].By definition, the (π i ) i=1:N are positive and it can be easily shown that their sum is equal to one.Again, we suggest to transform the parameters (φ i ) i=1:N−1 with the Gaussian anamorphosis functions during the Kalman filter analysis.
One benefit of this approach is the reduction of the number of parameters to estimate from N to N − 1, the (φ i ) i=1:N−1 instead of the (π i ) i=1:N .This is certainly useful for complex systems involving numerous unknown parameters to estimate.However, the estimated values of the (π i ) i=1:N can be sensitive to the choice of the mapping to the (φ i ) i=1:N−1 due to the asymmetry of the transformation.For our specific problem of estimating zooplankton grazing preferences, it means that the results might depend on the choice of assigning the types of food to the (π i ) i=1:N .

2.4
Prior distribution of the (φ i ) i=1:N−1 in the hyperspherical coordinate system A significant issue lies in the choice of the distributions of the parameters (φ i ) i=1:N−1 .When the distributions of the parameters (π i ) i=1:N are known or samples are available, the inversion of the hyperspherical coordinate system can provide prior values for the parameters (φ i ) i=1:N−1 .This can be done recursively starting from φ 1 = 2 π arccos( √ π 1 ) and so on.This approach can also be applied to estimate positive sum-to-one constrained state variables evolving in time accordingly to a model dynamics.The variable transformations before and after the analysis using the inverse of Eq. ( 5) and Eq. ( 5) guarantee that analyzed variables fulfill both constraints.
Another approach consists in focusing on the direct modeling of the (φ i ) i=1:N −1 .This can be an option when too little information on the (π i ) i=1:N is available.We suggest to base this choice on the ability to specify prior values and uncertainties for the (π i ) i=1:N -their prior expected value ]) i=1:N -rather than focusing on their distributions.This leads to the choice of prior values and uncertainties of the (π i ) i=1:N , for which the parameters of the distributions of the (φ i ) i=1:N−1 will be tuned accordingly.For example, in our particular framework, it would be interesting to start the estimation process with (π i ) i=1:N that have the same expected value 1 N , because this case corresponds to no particular feeding preferences in the diet of the zooplankton species.
We assume that the parameters (φ i ) i=1:N−1 are independent and follow marginal distributions (D i ( i )) i=1:N−1 , which can differ.The prior values for the expectation and variances of the parameters (π i ) i=1:N are obtained by an adequate tuning of the N − 1 parameter sets ( i ) i=1:N−1 .Let (m i ) i=1:N and (σ 2 i ) i=1:N be the target means and variances of the (π i ) i=1:N .Due to the sum-to-one constrain, the expected value and variance of one parameter (π N without loss of generality) are determined by the choice of the values for the N − 1 other parameters (π i ) i=1:N −1 .

Specification of the expected values of the
The specification of the expected values (m i ) i=1:N−1 leads to the resolution of N − 1 nonlinear equations (S i ) i=1:N−1 : www.ocean-sci.net/8/587/2012/Ocean Sci., 8, 587-602, 2012 with φ i the characteristic function of the parameter φ i .The derivation of these equations is detailed in Appendix A.
The existence of solutions to this system of equations (S i ) i=1:N −1 depends on the chosen distributions (D i ) i=1:N −1 and can be found numerically.

Specification of the variances of the (π i ) i=1:N
The specification of the variances (σ 2 i ) i=1:N −1 leads to the resolution of N − 1 nonlinear equations ( i ): )) i=1:N−1 depend on the (m i ) i=1:N −1 only and are given by Eq. ( 6).The derivation of these equations is detailed in Appendix B.
Again, the existence of solutions to this system of equations ( i ) i=1:N−1 depends on the chosen distributions (D i ) i=1:N −1 and can be found numerically.

Example with the triangular distribution
In order to illustrate the approaches described above, we specify an equal expected value for the (π i ) i=1:N following §2.4.1.It corresponds to the strategy applied for estimating the grazing zooplankton preferences in the numerical experiments shown in §3 and §4.First, we must choose a distribution for the parameters (φ i ) i=1:N−1 and we assume that they follow a triangular distribution: with θ i ∈ [0, 1] the mode of the distribution.The probability density function reads The characteristic function φ i is given by ∀i with j 2 = −1.Then, a prior equal value for the parameters (π i ) i=1:N is obtained by an adequate tuning of the N − 1 modes (θ i ) i=1:N−1 .For that particular case, one has And Eq. ( 6) reads ∀i = 1 : N − 1, The parameters (φ i ) i=1:N−1 distributed according to (T (0, 1, θ i )) i=1:N −1 , with the (θ i ) i=1:N−1 solutions of the system defined by Eq. ( 12), lead to equal expected values for the (π i ) i=1:N .
However, it can be shown that solutions exist if and only if It means that only the equations (S N−1 ) and (S N−2 ) admit a solution.In practice, it will not be possible to obtain equal prior values of the (π i ) i=1:N for N 4 when using triangular distributed parameters Finally, it is worthy to note that it is not possible to choose the variances of the (π i ) i=1:N after having specified their expected values, because the triangular distribution has only one parameter: its mode.

Truncated Gaussian distribution and linear inequality constraints
Another strategy to reduce the number of parameters to estimate consists in formulating the problem simply using linear inequality constraints.Without loss of generality, we choose to only estimate the first N − 1 parameters (π i ) i=1:N−1 and the last parameter is given by the sumto-one constraint: This can be done with the truncated Gaussian filter suggested by Lauvernet et al. (2009) under the assumption that the (π i ) i=1:N −1 have a truncated Gaussian distribution.However, the application of this filter is not as easy in practice than the simple variable transformations suggested in the previous sections and can be computationally expensive for large systems due to the use of a Gibbs sampler to sample the truncated-Gaussian distribution and to estimate its location vector and scale matrix.

The 1-D ocean ecosystem model
The experiments were performed in a 1-D vertical configuration of the coupled model GOTM-NORWECOM representative of the station Mike (66 • N, 2 • W) in the North Sea.
The 1-D ocean water column model is the General Ocean Turbulence Model (GOTM; Burchard et al., 1999Burchard et al., , 2005; Umlauf and Burchard, 2005) that transports physical quantities with hydrodynamic primitive equations and turbulence schemes.A relaxation towards temperature, salinity and horizontal velocity profiles from the TOPAZ1 system (Bertino and Lisaeter, 2008) is used with a relaxation time of 14 days.The vertical advection velocity is specified to zero.The depth is 2034 m, and the model uses a Cartesian grid of 55 vertical levels with a minimum thickness of 1 m at the top level, increasing exponentially towards the bottom.
The NORWegian ECOlogical Model system (NORWE-COM; Aksnes et al., 1995;Skogen and Søiland, 1998) is coupled to GOTM.The current version of this model includes two classes of phytoplankton (diatom and flagellates), two classes of zooplankton (meso-and microzooplankton) derived with the same formulation from the model ECOHAM4 (Pätsch et al., 2009), three types of nutrients (inorganic nitrogen, phosphorus and silicon) and detritus (nitrogen, phosphorus), biogenic silica, and oxygen, so that the ecosystem state vector is made of 11 variables.The chlorophyll a concentration (CHLA) is computed from the model diatoms and flagellates concentrations (DIA and FLA) by Eq. ( 14): The constant conversion factor 0.8 mmol N mg −1 Chl a is added to obtain the chlorophyll concentration in mg m −3 , the standard unit of data produced from satellite, from the phytoplankton concentration in mmol N m −3 .The mesozooplankton (MES) feed on diatoms (one assumes that the flagellates are too small to be fed on by mesozooplankton), detritus (DEN) and microzooplankton (MIC).The microzooplankton feed on both classes of phytoplankton (flagellates and diatoms) and on detritus.Both classes of zooplankton have the choice of their food among three variables of the model and compete against each other for feeding on detritus and diatoms.For both classes of zooplankton, the formulation of the grazing G i=1:3 on the variable i = 1 : with Z the concentration of meso-or microzooplankton feeder, (X k ) k=1:3 the concentration of the different variables they feed on, (π k ) k=1:3 the grazing preferences, K 1/2 the half-saturation constant for ingestion by zooplankton and g the zooplankton maximum growth rate.The second order modified Patankar-Runge-Kutta scheme is used for the source and sinks dynamics.
The dynamics of phytoplankton blooms in the first 100 m in the reference solution is illustrated in Fig. 1.

Data assimilation experiments
In order to assess the performances of the two formulations, twin experiments have been conducted: the true state and the observations are produced by a deterministic simulation of the model involving meso-and microzooplankton grazing preferences that differ from equal preferences.The values of preferences used to build the reference solution will be called "true" values in the following.These values have been arbitrary chosen and are summarized in Table 1.
The observations are the chlorophyll in the two first layers of the model and are defined as follows: We construct the observations by multiplying the true surface chlorophyll with a gamma distributed observation error with a standard deviation around 30 % (average should be 1).
The observation are assimilated with a Gaussian anamorphosis extension of the deterministic ensemble Kalman filter (DEnKF).This method is based on the DEnKF (Sakov and Oke, 2008)  true state and the observations are produced by a deterministic simulation of the model involving meso-and microzooplankton grazing preferences that differ from equal preferences.The values of preferences used to build the reference solution will be called "true" values in the following.These values have been arbitrary chosen and are summarized in table 1.
The observations are the chlorophyll in the two first layers of the model and are defined as follows We construct the observations by multiplying the true surface chlorophyll with a Gamma distributed observation error with a standard deviation around 30% (average should be 1).
The observation are assimilated with a Gaussian anamorphosis extension of the deterministic ensemble Kalman filter (DEnKF).This method is based on the DEnKF (Sakov and Oke, 2008) and consists in introducing Gaussian anamorphosis functions in order to realize the analysis step with Gaussian distributed transformed variables.More details can be found in Simon and Bertino (2012).The state and parameter estimations are conducted jointly by augmenting the state vector with the parameters that are estimated.In this study, the state vector is made of all the vertical components of the ten state variables (the oxygen is not corrected during the analysis) and the parameters (φ i ) i=1:n , n depending on the formulation that is chosen.In the Gelman formulation, we take six parameters (three parameters controlling the preferences times two zooplankton types).In the spherical formulation, we take four parameters (two parameters controlling the preferences times two zooplankton types).
The ensemble contains 100 members.The background state ensemble is generated by adding a truncated-Gaussian perturbation to the solution x(t = 0).
. σ b is chosen to be equal to 0.3 for all the state variables.In the Gelman formulation, the parameter ensemble is initialized by assuming that the parameters (φ i ) i=1:3 are normally distributed according to N (0, σ = 2).In the spherical formulation, we assume that the parameters (φ i ) i=1:2 follow a triangular distribution: with θ i ∈ [0, 1] the mode of the distribution.The triangular distribution is simulated from the uniform distribution thanks to the MINMAX method suggested by Stein and Keblis (2009).The prior values for the parameters (π i ) i=1:3 are obtained by an adequate tuning of the 2 modes (θ i ) i=1:2 .Equal preferences are obtained by solving the two nonlinear equations: A solution to the equations (S i ) i=1:2 exists and can be found numerically: θ 1 = 0.8905 and θ 2 = 0.5.The mapping of the preferences in eq. 5 is as follows: This choice is motivated by our wish to respect the symmetry between the two classes of phytoplankton found in Simon and Bertino (2012).The state and parameter estimations are conducted jointly by augmenting the state vector with the parameters that are estimated.In this study, the state vector is made up of all the vertical components of the ten state variables (the oxygen is not corrected during the analysis) and the parameters (φ i ) i=1:n , n depending on the formulation that is chosen.In the Gelman formulation, we take six parameters (three parameters controlling the preferences times two zooplankton types).In the spherical formulation, we take four parameters (two parameters controlling the preferences times two zooplankton types).
The ensemble contains 100 members.The background state ensemble is generated by adding a truncated-Gaussian perturbation to the solution x(t = 0): . σ b is chosen to be equal to 0.3 for all the state variables.In the Gelman formulation, the parameter ensemble is initialized by assuming that the parameters (φ i ) i=1:3 are normally distributed according to N (0, σ = 2).In the spherical formulation, we assume that the parameters (φ i ) i=1:2 follow a triangular distribution: with θ i ∈ [0, 1] the mode of the distribution.The triangular distribution is simulated from the uniform distribution thanks to the MINMAX method suggested by Stein and Keblis (2009).The prior values for the parameters (π i ) i=1:3 are obtained by an adequate tuning of the 2 modes (θ i ) i=1:2 .Equal preferences are obtained by solving the two nonlinear equations: A solution to the equations (S i ) i=1:2 exists and can be found numerically: θ 1 = 0.8905 and θ 2 = 0.5.The mapping of the preferences in Eq. ( 5) is as follows: This choice is motivated by our wish to respect the symmetry between the two classes of phytoplankton in the definition of the observed variable (same weight for the FLA and DIA variables when computing the chlorophyll concentration).Since mesozooplankton only eat one type of phytoplankton (diatoms), π 1 is associated with π DIA .The microzooplankton feed on the two classes of phytoplankton respectively; we associate π 2 and π 3 with π DIA and π FLA , respectively.This asymmetry, which appears only in the spherical formulation, may create a dependence of the results on the parameter assignment.However, experiments with different assignments between the microzooplankton grazing preferences and the (π i ) i=1:3 led to similar results (not shown).We noted a slight decrease (increase) in the RMS errors in the estimates of the microzooplankton (mesozooplankton) grazing preferences compared to the results shown in the following.Nevertheless, the observed robustness of the estimation to the asymmetry of the transformation could be applicationdependent and further experiments might be required in a different framework.
Gaussian anamorphosis functions are applied to state variables and parameters except for parameters transformed by the Gelman formulation.In the latter case, the (φ i ) i=1:3 are already normal-distributed and Gaussian anamorphosis is not necessary (see Sect. 2).The strategy to build the anamorphosis functions differs between the chlorophyll and the other state variables and parameters (if necessary) and is a variation of the hybrid approach described in Simon and Bertino (2012).Since the chlorophyll concentration in the ocean is usually assumed to have a log-normal distribution Ocean Sci., 8, 587-602, 2012 www.ocean-sci.net/8/587/2012/(Campbell, 1995), its anamorphosis function is the logarithmic function.For other state variables and the parameters, anamorphosis functions are built from the empirical marginal distributions of the variables.The empirical anamorphosis functions are computed from a sample of the forecast ensemble and are then piecewise linearly interpolated to obtain the Gaussian anamorphosis functions.Their tails are linear and their last segments extrapolated towards specified biological minimum and maximum values.The spherical formulation introduces parameters that are bounded on both sides and for which the odds to reach the bounds during the assimilation are not null.The succession of analysis steps can build-up discontinuities ("atoms") of the distribution at the bounds which are not handled by the piecewise linear anamorphosis function -zero slopes are not invertible (Simon and Bertino, 2012).Extending the first and last segments until they include the first values outside of the atoms seems to resolve the issue.The observation error o is assumed to have a lognormal distribution: log( o ) ∼ N (0, σ 2 o ) with σ o = 0.3.It results in a normal-distributed observation error for the transformed observations with a standard deviation equal to 0.3.
The model includes perturbations on the phyto-and zooplankton components of the state variables.Similarly to the generation of the background state variables, truncated-Gaussian random variables are added every twelve hours.The standard deviation of these perturbations decreases linearly towards zero in the eight deepest layers in order to obtain a smooth transition between the deep layers and the bottom layer where no perturbations are applied.Furthermore, no perturbations are added to the parameters during the model integration, so they remain constant between two analysis steps.
Starting from the background state variables and parameters, a one-year ensemble simulation is performed without assimilation.Assimilation cycles are then performed over four years with a frequency of one analysis step every seven days.This frequency for observing the system is relatively low considering the short time scales of the bloom phenomenon.Figure 2 represents the time evolution of the chlorophyll concentration in the two first top layers in the reference solution and in the assimilated observations.We note that during the blooms the 7-day sampling of the reference run leads to only one observation during the first peak (diatom bloom) and only one or two observations during the second peak (flagellate bloom).Furthermore, the maximum values reached by the concentrations in the reference solution during these two peaks are generally not captured by the observations.Blooms are mostly represented in the observations as two Dirac pulses with highly uncertain amplitude and timing.This usually results in difficulties for the ensemble-based Kalman filter methods to correctly estimate the state of the system, and notably to estimate some parameters.This is a real issue for ocean ecosystem models: the weak production (apart from the bloom periods) results in low innovations and spread of the chlorophyll concentration in the ensemble, and weak corrections by the filters during most of the year.An increase of the sampling frequency to four days would be enough to obtain a good representation of blooms in this simple 1-D configuration, more specifically the transition phases, and potentially improve the quality of the estimation.Nevertheless, assimilating observations more frequently might not be affordable in realistic 3-D configurations due to the computational costs that it implies.The use of an asynchronous version of the EnKF (Sakov et al., 2010) would be a solution to tackle these issues but is out of the scope of this study.
In order to check the robustness of the estimation against random initial conditions and observation errors, we repeated the experiment 20 times.That is, 20 initial ensembles (combined state-parameter background) and 20 sets of observations were generated.Nevertheless, the different assimilation systems used the same state component of the background ensemble and observations for each of the 20 realizations.The diagnostics shown in Sect. 4 are averaged over these 20 experiments.

Overall error evolution
We are interested in the time evolution of the relative root mean square error (RMS) and the relative ensemble standard deviations (STD) of the solution of the two different formulations.These diagnostics are averaged over 20 experiments.The expression at time t n of these two quantities is as follows: where is the domain of computation, N is the number of members, x m is the forecast member m, N exp is the number of experiments, x t is the true state, and x is the mean of the forecast ensemble.
Figure 3 represents the evolution of the relative RMS and standard deviation over five years for the diatoms, flagellates and the micro-and mesozooplankton.These diagnostics are averaged over the whole water column, and represents the 55 vertical layers.The evolution of the spatial average of the true state is plotted (green dashed line) in order to provide www.ocean-sci.net/8/587/2012/Ocean Sci., 8, 587-602, 2012 information on the yearly dynamics of the variables.No assimilation is performed during the first year.First, we note that both formulations lead to a reduction of the RMS error and the standard deviation for flagellates and their grazers, the microzooplankton.The peaks in the error for flagellates occur at the end of the flagellates blooms, which are too short in the assimilated solution, notably around 25 m depth.The evolutions of the standard deviation and RMS error are in agreement during the last bloom both for microzooplankton and flagellates, which highlights a good representation of the error by the ensemble during that period.An improvement of the detritus component of the solutions is also observed (not shown).
The impact of data assimilation on the diatoms is mixed.We note a large increase of the standard deviation after all the diatoms blooms and a large peak in the RMS error during the first year with assimilation associated with a too long bloom.The RMS error decreases year after year for both formulations and reaches its lowest values during the fourth year.However, a large peak is still present in the error during the final bloom for the spherical formulation.This is due to the presence of a strong subsurface chlorophyll maximum at a 70 m depth.Because the silica cycle depends only on the diatom concentration, these large peaks in the error result in a low increase of the RMS error for both silicate components during the bloom every year leading to final error around 10 % (not shown).In the same way, data assimilation cannot significantly reduce the RMS errors for the mesozooplankton.On average, the solutions obtained with the Gelman formulation present a lower error than the ones obtained with the spherical formulation.Finally, nitrate and phosphate are not significantly impacted during the assimilation (not shown).On average, their RMS errors are low (less than 5 %) and exhibit low oscillations during the blooms.

Evolution of the parameters
Figure 4 represents the time evolution of the mean and standard deviation of the ensemble for the meso-and microzooplankton grazing preferences.First, we note that both formulations lead on average to reasonably good final estimates of the microzooplankton grazing preferences.The largest corrections occurring during the first two blooms result in a convergence of the estimation towards the true values of the preferences in less than two years for both formulations.However, we note larger corrections during the last bloom with the Gelman formulation that can be explained by a larger spread for the preferences in the ensemble inherited from the initial ensembles.The Gelman formulation introduces a distribution with two parameters -the mean and the variance of the normal distribution (see Eq. 4) -which makes it possible to choose the mean and the standard deviation of the prior preferences.Our use of a distribution with one parameter -the mode of the triangular distribution (see Eq. 18) -allows only for the choice of the mean for the prior preferences.In these experiments, the prior variances chosen for the normal-distributed parameters in the Gelman formulation lead to initial variances for the prior preferences that are larger than the ones obtained with the spherical formulation.
The mean and standard deviation of the 20 means of the preferences in the ensemble obtained at the end of the experiments are specified in both formulations lead to the same estimate of the preferences for flagellates.Both formulations lead to a similar decrease of the RMS error in the estimate of the three preferences.However, the ternary plots of the final estimates of the preferences for the 20 experiments in Fig. 5 show that the number of experiments, for which the assimilation provides corrections in the direction of the true value for the three preferences, is larger with the spherical formulation than with Gelman's: only two points do not belong to the shaded area representing the subspace of preferences defined by 0 π DET 1/3, 1/3 π FLA 1 and 0 π DIA 1/3 (decrease of the preferences for the diatoms and detritus and increase of the preference for the flagellates) with the spherical formulation compared to four points with the Gelman formulation.
The estimation of the mesozooplankton grazing preferences is less successful.On average, we note in Fig. 4 that the corrections are very weak during the first two years of assimilation.The reduction of the standard deviation of the three preferences is very low for both formulations suggesting a weaker sensitivity of the surface chlorophyll to the mesozooplankton grazing preferences compared to the microzooplankton grazing preferences.This is highlighted in Fig. 6 by the Pearson correlation coefficients between the surface chlorophyll and the microzooplankton that are much larger than the ones between the surface chlorophyll and the mesozooplankton.
On average, the spherical formulation leads to slightly better final estimates of the preferences than the Gelman formulation (see Table 1).The assimilation tends to strongly correct the preference for the detritus to the detriment of microzooplankton.It results in larger RMS errors in the final estimates of these two preferences compared to the prior values (see Table 2).The ternary plots in Fig. 5 show that in 45 % of the experiments the estimation with the Gelman formulation does not jointly improve the three preferences.For most of these experiments, this is due to an erroneous increase of the preference for the microzooplankton.The rate of failure decreases to 30 % of the experiments with the spherical formulation.For most cases, this is due to an erroneous increase of the preference for the detritus to the detriment of diatoms.This is also highlighted by the increase of the RMS error in the final estimate of the preference for the detritus (see Table 2).However, experiments done with different assignments of the microzooplankton grazing preferences in the transformation led to higher RMS errors, notably in the preference for microzooplankton, and a rate of failure equal to 45 % (not shown).This suggests that the performances of both approaches do not significantly differ.Furthermore, we think that these difficulties faced by the DEnKF to correctly estimate the mesozooplankton grazing preferences are related to the configuration of the experiments rather than the variable transformations.As stated earlier, the surface chlorophyll seems to be more sensitive to the microzooplankton than to the mesozooplankton in the model.Furthermore, improvements could be obtained by changing the experimental framework, for example the observation frequency, the specified observation error, etc.

Conclusions
In this study, we investigated the problem of estimating N positive sum-to-one constraint parameters with ensemblebased Kalman filters with the purpose of estimating zooplankton grazing preferences that are commonly used in ocean ecosystem models.
We have suggested a new formulation of the grazing preferences introducing a change of variables based on the hyperspherical coordinate system.This formulation results in the estimation of a reduced number (N − 1) of bounded parameters.Issues raised by estimating non-Gaussian distributed parameters with Kalman filters can be tackled by using the Gaussian anamorphosis.Furthermore, the two systems of  N − 1 nonlinear equations to be solved, in order to obtain target prior values and variances of the preferences, are also exhibited.
The performances of this approach and the one suggested by Gelman (1995) based on the Dirichlet distribution have been assessed in the framework of twin experiments realized in a 1-D configuration of the coupled model GOTM-NORWECOM.Both approaches lead to improved estimates of the microzooplankton grazing preferences.They present the same difficulties to estimate the mesozooplankton grazing preferences that can be explained by the configuration of the experiments: the observed variable, the chlorophyll, constitutes only one type of food (diatoms) for the mesozooplankton diet compared to two (diatoms and flagellates) for the microzooplankton diet.Furthermore, the results obtained with the spherical formulation for the mesozooplankton are not significantly better and cannot be guaranteed for more complex realistic configurations.
Both approaches present theoretical and practical advantages.The Gelman formulation leads to the estimation of Gaussian distributed parameters, a property that presents theoretical advantages in the context of Kalman filtering.Furthermore, this formulation is naturally symmetric with regards to the mapping of parameters.This formulation is straightforward to apply for any number of preferences.However, it can require to estimate a large number of parameters in complex systems.The spherical formulation reduces the number of parameters to estimate but can require a choice of their prior distribution and to solve nonlinear systems of equations accordingly if the inversion of the hyperspherical coordinate system cannot be applied.
In this study, we have used the triangular distribution for its simplicity and its applicability in our ecosystem model.But this distribution is not suitable to obtain equal prior values for more than three preferences and does not allow the tuning of their variances.From five preferences onwards -N equal four can be solved via the introduction of the Hopf coordinate system -the questions of the choice of the distribution and the resolution of the systems remain open.However, the inversion of the hyperspherical coordinate system could provide a prior ensemble for the (φ i ) i=1:N−1 if an ensemble for the preferences is available.This suggests that the Gelman formulation is more suitable in the framework of few zooplankton species with a diet involving numerous types of food, while the spherical formulation could be more suitable in the framework of numerous zooplankton species with a diet involving few types of food.where h( 1) is given by Eq. (A9).Now, let i be an integer between 2 and N − 1.One has Following the same strategy as in Appendix A, it leads to where h( i−1 ) and h( i ) are given by Eq.A15.

Fig. 1 .
Fig. 1.Reference solution: time evolution of chlorophyll and nitrate in the upper 100m from 1 January 2000 to 31 December 2004.

Fig. 1 .
Fig. 1.Reference solution: time evolution of chlorophyll and nitrate in the upper 100m from 1 January 2000 to 31 December 2004.

Fig. 2 .
Fig. 2. Time evolution of the chlorophyll concentration at the surface (two first layers) in the reference solution (blue line), in the reference solution at observation times (green circles) and in the observations (red circles) for one experiment.

Fig. 5 .
Fig. 5. Ternary plots of the final estimate (mean of the ensemble) of the grazing preferences parameters for the 20 experiments.The estimates obtained after assimilation are plotted with grey circles, the true set of parameters with a black square and the mean of the background set of parameters with a black diamond.

Fig. 6 .For
Fig.6.Time evolution of the Pearson correlation coefficients between the surface chlorophyll and the meso-and microzooplankton grazing preferences during the first year (no assimilation) and averaged over the 20 experiments, for the spherical formulation.The evolution of the averaged surface chlorophyll concentration due to diatoms (resp.flagellates) is plotted with a dark dotted line (resp.with a dark dash-dotted line).

Ocean Sci., 8, 587-602, 2012 www.ocean-sci.net/8/587/2012/Table 1 .
Table 1 and the RMS error in Table2.On average, the Gelman formulation produces slightly better estimates of the preferences for diatoms and detritus, while Zooplankton grazing preferences (π i ) i=1:3 : mean and standard deviation (computed over the 20 experiments) of the means of preferences obtained at the final time.

Table 2 .
Zooplankton grazing preferences (π i ) i=1:3 : relative RMS error (computed over the 20 experiments) of the means of preferences obtained at the final time.