the Creative Commons Attribution 4.0 License.

the Creative Commons Attribution 4.0 License.

# Potential artifacts in conservation laws and invariants inferred from sequential state estimation

### Carl Wunsch

### Patrick Heimbach

In sequential estimation methods often used in oceanic and general climate calculations of the state and of forecasts, observations act mathematically and statistically as source or sink terms in conservation equations for heat, salt, mass, and momentum. These artificial terms obscure the inference of the system's variability or secular changes. Furthermore, for the purposes of calculating changes in important functions of state variables such as total mass and energy or volumetric current transports, results of both filter and smoother-based estimates are sensitive to misrepresentation of a large variety of parameters, including initial conditions, prior uncertainty covariances, and systematic and random errors in observations. Here, toy models of a coupled mass–spring oscillator system and of a barotropic Rossby wave system are used to demonstrate many of the issues that arise from such misrepresentations. Results from Kalman filter estimates and those from finite interval smoothing are analyzed. In the filter (and prediction) problem, entry of data leads to violation of conservation and other invariant rules. A finite interval smoothing method restores the conservation rules, but uncertainties in all such estimation results remain. Convincing trend and other time-dependent determinations in “reanalysis-like” estimates require a full understanding of models, observations, and underlying error structures. Application of smoother-type methods that are designed for optimal reconstruction purposes alleviate some of the issues.

- Article
(7985 KB) - Full-text XML
- BibTeX
- EndNote

Intense scientific and practical interest exists in understanding the time-dependent behavior in the past and future of elements of the climate system, especially those represented in a reanalysis. Expert practitioners of the methodology of reanalysis, particularly on the atmospheric side (e.g., Dee, 2005; Cohn, 2010; Janjić et al., 2014; and Gelaro et al., 2017), clearly understand the pitfalls of the methodologies, but many of these discussions are couched in the mathematical language of continuous space–time (requiring the full apparatus of functional analysis) and/or the specialized language of atmospheric sciences. Somewhat controversial, contradictory results in the public domain (e.g., Hu et al., 2020 or Boers, 2021) suggest that, given the technical complexities of a full reanalysis computation, some simple examples of the known difficulties with sequential analysis methods could be helpful. For scientists interested in the results but not fully familiar with the machinery being used, it is useful to have a more schematic, simplified set of examples so that the numerous assumptions underlying reanalyses and related calculations can be fully understood. Dee (2005) is close in spirit to what is attempted here. As in geophysical fluid dynamics methods, two “toy models” are used to gain insight into issues applying to far more realistic systems.

Discussion of even the simplified systems considered below requires much notation. Although the Appendix writes out the fuller notation and its applications, the basic terminology used is defined more compactly here. Best estimates of past, present, and future invoke knowledge of both observations and models, and both involve physical–dynamical, chemical, and biological elements.

To understand and interpret the behavior of physical systems one must examine long-term changes in quantities that are subject to system invariants or conservation rules, e.g., energy, enstrophy, total mass, or tracer budgets. Conservation rules in physical systems imply that any changes in the quantity are attributable to identifiable sources, sinks, or dissipation in the interior as well as in the boundary conditions and that they are represented as such in the governing equations. In the absence of that connection in, e.g., mass or energy conservation, claims of physical understanding must be viewed with suspicion. In climate science particularly, violations undermine the ability to determine system trends in physical quantities such as temperature or mass, as well as domain-integrated diagnostics (integrated heat and mass content) over months, decades, or longer.

Two major reservoirs of understanding of systems such as those governing the ocean or climate overall lie with observations of the system and with the equations of motion (e.g., Navier–Stokes) believed applicable. Appropriate combination of the information from both reservoirs generally leads to improvement over estimates made from either alone but should never degrade them. Conventional methods for combining data with models fall into the general category of control theory in both mathematical and engineering forms, although full understanding is made difficult in practice by the need to combine major sub-elements of different disciplines, including statistics of several types, computer science, numerical approximations, oceanography, meteorology, climate, dynamical systems theory, and the observational characteristics of very diverse instrument types and distributions. Within the control theory context, distinct goals include “filtering” (what is the present system state?), “prediction” (what is the best estimate of the future state?), and “interval smoothing” (what was the time history over some finite past interval?) and their corresponding uncertainties.

In oceanography, as well as climate physics and chemistry more generally, a central tool has become what meteorologists call a “reanalysis,” – a time-sequential estimation method ultimately based on long experience with numerical weather prediction. Particular attention is called, however, to Bengtsson et al. (2004), who showed the impacts of observational system shifts on apparent climate change outcomes arising in some sequential methods. A number of subsequent papers (see, for example, Bromwich and Fogt, 2004; Bengtsson et al., 2007; Carton and Giese, 2008; and Thorne and Vose, 2010) have noticed difficulties in using reanalyses for long-term climate properties, sometimes ending with advice such as “minimize the errors” (see Wunsch, 2020, for one global discussion).

For some purposes, e.g., short-term weather or other prediction, the failure of the forecasting procedure (consisting of cycles producing analysis increments from data followed by model forecast) to conserve mass, energy, or enstrophy may be of no concern – as the timescale of emergence for detectable consequences of that failure can greatly exceed the forecast time. In contrast, for reconstruction of past states, those consequences can destroy any hope of physical interpretation. In long-duration forecasts with rigorous models, which by definition contain no observational data at all, conservation laws and other invariants of the model are preserved, provided their numerical implementation is accurate. Tests, however, of model elements and in particular of accumulating errors are then not possible until adequate data do appear.

We introduce notation essential for the methods used throughout the paper in Sect. 2. Experiments that examine the impact of data on reconstruction of invariants in a mass–spring oscillator system are discussed in Sect. 3. This section includes the impact of data density and sparsity on reconstructions of energy, position, and velocity and ends with a discussion of the structure of the covariance matrix. Section 4 covers the Rossby wave equation and examines a simplified dynamical system resembling a forced Rossby wave solution. Here a combination of the Kalman filter and the Rauch–Tung–Striebel smoother (both defined fully in Bryson and Ho, 1975 and Wunsch, 2006) is used to reconstruct generalized energy and the time-independent transport of a western boundary current. Results and conclusions are discussed in Sect. 5.

All variables, independent and dependent, are assumed to be discrete. Notation is similar to that in Wunsch (2006). Throughout the paper italic lowercase bold variables are vectors and uppercase bold variables are matrices.

Let ** x**(t) be a state vector for time $\mathrm{0}\le t\le {t}_{\text{f}}={N}_{t}\mathrm{\Delta}t$, where Δ

*t*is a discrete time step. A state vector is one that completely describes a physical system evolving according to a model rule (in this case, linear),

where Δ*t* is the constant time step. **A**(t) is
a square “state transition matrix”, and **B**(t)** q**(t) is any imposed
external forcing, including boundary conditions, with

**B**(t) a matrix distributing disturbances

**(t) appropriately over the state vector. Generally speaking, knowledge of**

*q***(t) is sought by combining Eq. (1) with a set of linear observations,**

*x*
Here, **E**(t) is another known matrix, which typically
vanishes for most values *t* and represents how the observations measure
elements of ** x**(t). The variable

**(t) is the inevitable noise in any real observation and for which some basic statistics will be assumed. Depending upon the nature of**

*n***E**(t), Eq. (2) can be solved by itself for an estimate of

**(t). (As part of the linearization assumption, neither**

*x***E**(t) nor

**(t) depends upon the state vector.)**

*n*Estimates of the (unknown) true variables ** x**(t) and

**(t) are written with tildes: $\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(\mathit{t}\right)\mathbf{,}\phantom{\rule{0.25em}{0ex}}\stackrel{\mathrm{\u0303}}{\mathit{q}}\mathbf{\left(}\mathit{t}\mathbf{\right)}\mathbf{,}\phantom{\rule{0.25em}{0ex}}\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(\mathit{t}\mathbf{,}\mathbf{-}\right)\mathbf{,}\phantom{\rule{0.25em}{0ex}}\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(\mathit{t}\mathbf{,}\mathbf{+}\right)$. As borrowed from control theory convention, the minus sign in $\stackrel{\mathrm{\u0303}}{\mathit{x}}\mathbf{(}\mathit{t}\mathbf{,}\mathbf{-}\mathbf{)}$ denotes a prediction of**

*q***(t)**

*x**not*using any data at time

*t*but possibly using data from the past. If no data at

*t*are used then $\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(t\right)=\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(t,-\right)$. Similarly, the plus sign in $\stackrel{\mathrm{\u0303}}{\mathit{x}}(t,+)$ indicates an estimate at time

*t*where data

*future*to time

*t*have also been employed. In what follows, the “prediction” model is always of the form in Eq. (1), but usually with deviations in

**(0) and in**

*x***(t), which must be accounted for.**

*q*In any estimation procedure, knowledge of the initial state elements and resulting
uncertainties is required. A bracket is used to
denote expected value, e.g., the variance matrix of any variable ** ξ**(t) is denoted,

and where ** a** is usually the true value of

**(t) or some averaged value. (When**

*ξ**ξ*is omitted in the subscript,

**P**refers to $\stackrel{\mathrm{\u0303}}{\mathit{x}}$.)

Together, Eqs. (1) and (2) are a set of linear
simultaneous equations for ** x**(t) and possibly

**(t), which, irrespective of whether overdetermined or underdetermined, can be solved by standard inverse methods of linear algebra. For systems too large for such a calculation and/or ones in which data continue to arrive (e.g., for weather) following a previous calculation, one moves to using sequential methods in time.**

*q*Suppose that, starting from *t*=0, a forecast is made using
only the model Eq. (1) until time *t*, resulting in $\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(\mathit{t}\mathbf{,}\mathbf{-}\right)$ and a model-alone forecast $\mathbf{A}\left(t\right)\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(\mathit{t}\mathbf{,}\mathbf{-}\right)+\mathbf{B}\left(t\right)\stackrel{\mathrm{\u0303}}{\mathit{q}}\left(t\right)$. Should a measurement ** y**(

*t*+Δ

*t*) exist, a weighted average of the difference between

**(**

*y**t*+Δ

*t*) and its value predicted from $\mathit{x}(t+\mathrm{\Delta}t,-)$ provides the “best” estimate, where the relative weighting is by the inverse of their separate uncertainties. In the present case, this best estimate at one time step in the future is given by

As written, this operation is known as the innovation form of the “Kalman filter” (KF), and **K**(*t*+Δ*t*) is the “Kalman gain.” Embedded in this form are the matrices $\mathbf{P}(\mathbf{t}+\mathbf{\Delta}\mathbf{t},-)$ and **R**(*t*+Δ*t*), which denote the uncertainty of the pure model prediction at time *t* and the covariance of the observation noise (usually
assumed to have zero mean error), respectively. The uncertainties $\mathbf{P}(\mathbf{t},-)$ and **P**(**t**) evolve with time according to a matrix Riccati equation; see Appendix A, Wunsch (2006), or numerous textbooks such as Stengel (1986) and Goodwin and Sin (1984)
for a fuller discussion. Although possibly looking unfamiliar, Eq. (4) is simply a convenient rewriting of the
matrix-weighted average of the model forecast at *t*+Δ*t* with that
determined from the data. If no data exist, both ** y**(

*t*+Δ

*t*) and

**E**(

*t*+Δ

*t*) vanish, and the system reduces to the ordinary model prediction.

Without doing any calculations, some surmises can be made about system
behavior from Eq. (4). Among them are the following
(a) if the initial condition (with uncertainty **P**(0)) has errors, the time evolution will propagate initial condition errors forward. Similarly, however obtained, any error in $\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(\mathit{t}\mathbf{,}\mathbf{-}\right)$ with uncertainty covariance $\mathbf{P}(t,-)$
will be propagated into $\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(\mathit{t}\mathbf{+}\mathbf{\Delta}\mathit{t}\right)$. (b) The importance of the data versus the model evolution depends directly on the ratio of the norms of $\mathbf{E}\left(\mathit{\tau}\right)\mathbf{P}(\mathit{\tau},-)\mathbf{E}(\mathit{\tau}{)}^{T}$ and **R**(*τ*). Lastly (c), and most important for this paper, the data disturbances
appear in the time evolution equation (Eq. 4) fully analogous to the external source–sink or boundary condition term.
Conservation laws implicit in the model alone will be violated in the
time evolution, and ultimately methods to obviate that problem must be found.

Although here the true Kalman filter is used for toy models to predict time series, in large-scale ocean or climate models this is almost never the case in practice. Calculation of the **P** matrices from
Eq. (A1 in the Appendix) is computationally so burdensome that **K**(t) is replaced by some very approximate or intuitive version, usually constant in time, potentially leading to further major errors beyond what is being discussed here.

Consider the intuitively accessible system of a mass–spring oscillator, following
McCuskey (1959), Goldstein (1980), Wunsch (2006), or Strang (2007), initially in the
conventional continuous time formulation of simultaneous differential
equations. Three identical masses, *m*=1, are connected to each other and to
a wall at either end by springs of identical constant, *k* (Fig. 1). Movement is damped by a Rayleigh friction
coefficient *r*. Generalization to differing masses, spring constants, and
dissipation coefficients is straightforward. Displacements of each mass are *ξ*_{i}(t), with $i=\mathrm{1},\mathrm{2},\mathrm{3}$. The linear Newtonian equations of
coupled motion are

This second-order system is reduced to a canonical form of coupled first-order equations by introduction of a continuous time state vector, the column vector,

where the superscript *T* denotes the transpose. Note the mixture of dimensional
units in the elements of *x*_{c}(t), identifiable with
the Hamiltonian variables of position and momentum. Then Eq. (6)
becomes (setting *m*=1 or dividing through by it)

where

defining the time-invariant 3×3 block matrices **K**_{c} and **R**_{c}, symmetric and diagonal, respectively.
The structure of **B**_{c} depends on which masses are forced.
For example, if only *ξ*_{2}(*t*) is forced, then **B**_{c} would be the
unit vector in the second element.

Assuming $r,\phantom{\rule{0.25em}{0ex}}k\ne \mathrm{0}$, then **A**_{c} is full-rank with three pairs of complex conjugate eigenvalues but non-orthonormal right eigenvectors.
**A**_{c} and **B**_{c} are both assumed to be time-independent. Discussion
of the physics and mathematics of small oscillations can be found in
most classical mechanics textbooks and is omitted here. What follows is left
in dimensional form to make the results most intuitively accessible.

## 3.1 Energy

Now consider an energy principle. Let $\mathit{\xi}=\left({\mathit{\xi}}_{\mathrm{1}}\right(t),\phantom{\rule{0.25em}{0ex}}{\mathit{\xi}}_{\mathrm{2}}(t),\phantom{\rule{0.25em}{0ex}}{\mathit{\xi}}_{\mathrm{3}}(t){)}^{T}$ be the position sub-vector. Define, without
dissipation (**R**_{c}=**0**) or forcing,

Here, ℰ_{c} is the sum of the kinetic and potential energies (the
minus sign compensates for the negative definitions in **K**_{c}).
The non-diagonal elements of **K**_{c} redistribute
the potential energy amongst the masses through time.

With finite dissipation and forcing,

If the forcing and dissipation vanish then $d{\mathcal{E}}_{\text{c}}\left(t\right)/\mathrm{d}t=\mathrm{0}$ (see Cohn, 2010, for a formal discussion of such continuous time systems.)

## 3.2 Discretization

Equation (1) is discretized at time intervals Δ*t* using an
Eulerian time step,

for *n*≥0. The prediction model is unchanged except now

and without the *c* subscript.

For this choice of the discrete state vector, the energy rate of change is formally analogous to that in the continuous case,

where ℰ(*t*) is computed as before in Eq. (10) except now
using the discretized ** x**(

*t*). An example solution for the nearly dissipationless, unforced oscillator is provided in Fig. 2, produced by the discrete formulation ℰ(t). The potential and kinetic energies through time are shown in Fig. 2c, along with elements and derived quantities of the state vector in Fig. 2a and b. Nonzero values here arise only from the initial condition $\mathit{x}\left(\mathrm{0}\right)=[\mathrm{1},\mathrm{0},\mathrm{0},\mathrm{0},\mathrm{0},\mathrm{0}{]}^{T}$ necessarily specifying both initial positions and their rates of change. A small amount of dissipation (

*r*=0.5) was included to stabilize the particularly simple numerical scheme. The basic oscillatory nature of the state vector elements is plain, and the decay time is also visible.

The total energy declines over the entire integration time, but with small oscillations persisting after 5000 time steps. Kinetic energy is oscillatory as energy is exchanged with the potential component.

## 3.3 Mass–spring oscillator with observations

Note that if the innovation form of the evolution in Eq. (4) is used, the energy change becomes

explicitly showing the influence of the observations. With intermittent
observations and/or with changing structures in **E**(t), then ℰ(t) will undergo forced abrupt changes that are a consequence of the sequential innovation.

Given the very large number of potentially erroneous elements in any choice of model, data, and data distributions, as well as the ways in which they interact when integrated through time, a comprehensive discussion even of the six-element state vector mass–spring oscillator system is difficult. Instead, some simple examples primarily exploring the influence of data density on the state estimate and of its mechanical energy are described. Numerical experiments are readily done with the model and its time constants, model time step, accuracies, and corresponding covariances of initial conditions, boundary conditions, and data. The basic problems of any linear or linearized system already emerge in this simple example.

The “true” model assumes the parameters $k=\mathrm{30},\phantom{\rule{0.25em}{0ex}}r=\mathrm{0.5}$, and Δ*t*=.001 and is forced by

shown in Fig. 3a. That is, only mass one is forced in position, and with a low frequency not
equal to one of the natural frequencies. In this case, $\mathbf{B}=[\mathrm{1},\mathrm{0},\mathrm{0},\mathrm{0},\mathrm{0},\mathrm{0}{]}^{T}$ and ** q**(

*t*)=

*q*

_{1}(

*t*), a scalar. The dissipation time is ${T}_{\text{diss}}=\mathrm{1}/r$, and

*ε*(t) is a white noise element with standard deviation 0.1. The initial condition is

*ξ*

_{1}(0)=1 with all other elements vanishing; see Fig. 3b and c for an example of a forced solution of positions, velocities, and derived quantities. Accumulation of the influence of the stochastic element in the forcing depends directly upon details of the model timescales and, if

*ε*(t) were not white noise, on its spectrum as well. In all cases the cumulative effect of a random forcing will be a random walk – with details dependent upon the forcing structure, as well as on the various model timescales.

The prediction model here has fully known initial conditions and **A** and **B** matrices, but the stochastic component of the forcing is being treated as
fully unknown, i.e., *ε*(*t*)=0 in the prediction.
Added white noise in the data has a standard deviation of 0.01 in all
calculations.

The experiments and their parameters are outlined in Table 1, where “–” is used to indicate that the same conditions as the nominal truth are used.

### 3.3.1 Accurate observations: two times and multiple times

To demonstrate the most basic problem of estimating energy, consider highly
accurate observations of all six generalized coordinates (i.e., positions
and velocity) at two times *τ*_{1} and *τ*_{2} as displayed in Fig. 4 with **E**=**I**_{6}, i.e., having no
observational null space. The forecast model has the correct
initial conditions of the true state but incorrect forcing: the deterministic component has half the amplitude of the true forcing and *ε*(*t*) is completely unknown. Noise with standard deviation 0.01 is added to the observations. Although the new
estimate of the KF reconstructed state vector is an improvement over that
from the pure forecast, any effort to calculate a true trend in the energy
of the system, $\stackrel{\mathrm{\u0303}}{\mathcal{E}}\left(t\right)$, will fail unless careful attention
is paid to correcting for the conservation violations at the times of the
observation.

Until the first data point (denoted with a vertical line in Fig. 4 in all three panels) is introduced the KF and prediction energies are identical, as expected.
Energy discontinuities occur at each introduction of a data point (*t*=5000 Δ*t* and *t*=7300 Δ*t*), seen as the vertical jumps in Fig. 4a. After the first data point the KF energy tends to remain lower than the true energy, but the KF prediction is nonetheless an improvement over that from the prediction model alone. This change is shown in Fig. 4b, where the differences between true and predicted energies are plotted alongside the differences of the energy of the true and KF estimate. Even if the observations are made perfect ones (not shown), this bias error in the energy persists (see, e.g., Dee, 2005).
Figure 4c offers insight into the KF prediction via the covariance matrix **P**(*t*).
The uncertainty in the predicted displacement is small once new data are inserted via the analysis increment, but it quickly grows as the model is integrated beyond the time of analysis.

Figure 5 shows the results when observations
occur in clusters having different intervals between the measurements, the
first being sparser observations (300 time steps between data points) and the
second being denser observations (125 time steps between data points).
Visually, the displacement and energy have a periodicity imposed by the
observation time intervals and readily confirmed by Fourier analysis. Again,
the KF solution is the pure model prediction until data are available, at
which point multiple discontinuities occur, one for every *t* where data are
introduced.

A great many further specific calculations can provide insight as is apparent in the above examples and as inferred from the innovation equation, Eq. (4). For example, the periodic appearance of observations introduces periodicities in $\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(t\right)$ and hence in properties such as the energy derived from it. Persistence of the information in these observations at future times will depend upon model time constants including dissipation rates.

### 3.3.2 A fixed position

Exploration of the dependencies of energies of the mass–spring system is interesting and a great deal more can be said. Turn, however, to a somewhat different invariant: suppose that one of the mass positions is fixed, but with the true displacement unknown to the analyst. Significant literature exists devoted to finding changes in scalar quantities such as global mean atmospheric temperatures or oceanic currents, with the Atlantic Meridional Overturning Circulation (AMOC) being a favorite focus. These quantities are typically sub-elements of complicated models involving very large state vectors.

The true model is now adjusted to include the constraint that ${x}_{\mathrm{3}}\left(t\right)={\mathit{\xi}}_{\mathrm{3}}\left(t\right)=\mathrm{2}$ and thus ${x}_{\mathrm{6}}\left(t\right)=d{\mathit{\xi}}_{\mathrm{3}}\left(t\right)/\mathrm{d}t=\mathrm{0}$. That is, a fixed displacement in mass 3 (and
consequently a zero velocity in mass 3) is used in computing the true state vector. The forecast model has the correct initial condition and incorrect forcing: with a deterministic component again having half the true amplitude and fully unknown noise *ε*(*t*). Observations are assumed to be those of all displacements and velocities, with the added noise having standard deviation 0.01. Results are shown in Fig. 6. The question is whether one can accurately infer that *ξ*_{3}(t) is a constant through time. A KF estimate for the fixed
position, ${\stackrel{\mathrm{\u0303}}{x}}_{\mathrm{3}}\left(t\right)={\stackrel{\mathrm{\u0303}}{\mathit{\xi}}}_{\mathrm{3}}\left(t\right)$, is shown in Fig. 6a and includes a substantial error in its value (and its
variations or trends) at all times. Exceptions occur when data are introduced at the vertical lines in Fig. 6a and b. Owing to the noise in the observations, the KF cannot reproduce a perfect result.

Looking at Fig. 6a, one sees that variations in the position of mass 3 occur even during the data-dense period. The variations arise from both the entry of the data and the noise in the observations. An average taken over the two halves of the observation interval might lead to the erroneous conclusion that a decrease has taken place. Such an incorrect inference can be precluded by appropriate use of computed uncertainties. Also note the impact of the KF on the energy (Fig. 6c, d), producing artificial changes as in the previous experiment.

### 3.3.3 Observations of averages

Now consider a set of observations of the average of the position of masses 2 and 3, as well as of the average velocity of masses 1 and 2, mimicking the types of observations that might be available in a realistic setting. Again for optimistic simplicity, the observations are relatively accurate (including noise with standard deviation 0.01) and occur in the two different sets of periodic time intervals. The first interval has observations every 300 time steps and the second every 125 time steps. Prediction begins with the correct initial conditions, and again the forcing has half the correct amplitude with fully unknown random forcing. Figure 7 displays the results. Position estimates shown in Fig. 7c are good, but not perfect, as is also true for the total energy seen in Fig. 7a. The energy estimate carries oscillatory power with the periodicity of the oncoming observation intervals and appears in the spectral estimate (not shown) with excess energy in the oscillatory band and energy that is somewhat too low at the longest periods. Irregular observation spacing would generate a more complicated spectrum in the result.

A general discussion of null spaces involves that of the column-weighted $\mathbf{P}\left(\mathit{\tau},-\right){\mathbf{E}}^{T}$ appearing in the Kalman gain. If **E** is the identity (i.e., observations of all positions and
velocities) and **R**(τ) has a sufficiently small
norm, all elements of ** x**(τ) are resolved. In the
present case, with

**E**having two rows corresponding to observations of the averages of two mass positions and of two velocity positions, the resolution analysis is more structured than the identity, with

A singular value decomposition $\mathbf{E}={\mathbf{USV}}^{T}={\mathbf{U}}_{\mathrm{2}}{\mathbf{S}}_{\mathrm{2}}{\mathbf{V}}_{\mathrm{2}}^{T}$ produces two nonzero singular values, where **U**_{2} denotes the first two columns of the matrix. At rank 2, the resolution matrices, **T**_{U} and **T**_{V}, based on the **U** and **V** vectors, respectively, and the standard solution
covariances are easily computed (Wunsch, 2006). **A**
distributes information about the partially determined *x*_{i} throughout
all masses via the dynamical connections as contained in **P**(τ). Bias errors require specific, separate analysis.

The impact of an observation on future estimated values tends to decay in time, dependent upon the model timescales. Insight into the future influence of an observation can be obtained from the Green function discussion in the Appendix.

## 3.4 Uncertainties

In a linear system, a Gaussian assumption for the dependent variables is
commonly appropriate. Here the quadratic-dependent energy variables become *χ*^{2}-distributed. Thus, ${\mathit{\xi}}_{i}^{\mathrm{2}}$ and ${\dot{\mathit{\xi}}}_{i}^{\mathrm{2}}$ have such
distributions, but with differing means and variances and with potentially
very strong correlations, so they cannot be regarded as independent
variables. Determining the uncertainties of the six covarying elements
making up $\stackrel{\mathrm{\u0303}}{\mathcal{E}}\left(t\right)$ involves some intricacy. A formal
analysis can be made of the resulting probability distribution for the sum
in $\stackrel{\mathrm{\u0303}}{\mathcal{E}}\left(t\right)$, involving non-central *χ*^{2}
distributions (Imhof, 1961; Sheil and O'Muircheartaigh, 1977; Davies, 1980).
As an example, an estimate of the uncertainty
could be made via a Monte Carlo approach by generating *N* different versions of the observations, differing in the particular choice of noise value in each and tabulating the resulting range. These uncertainties can be used to calculate, e.g., the formal significance of any apparent trend in $\stackrel{\mathrm{\u0303}}{\mathcal{E}}\left(t\right)$. Implicit in such calculations is having adequate knowledge of the probability distribution from which the random variables are obtained. An important caveat is that once again bias errors such as those seen in the energy estimates in Fig. 7a must be separately determined.

The structure of the uncertainty matrix **P** depends upon both the model and the
detailed nature of the observations (see Appendix A; Eq. A2). Suppose observations only provide knowledge of the velocity of mass 2, *x*_{5}(*t*). Consider **P**(*t*=7124), just before observations become available (i.e., the model has mimicked a true prediction until this point), and **P**(*t*=8250), after 10 observations of *x*_{5} have been incorporated with the Kalman filter. The resulting **P**(t) following the observations produces highly inhomogeneous variances (the diagonals of **P**). In this
particular case, one of the eigenvalues of **P**(τ) for *τ* just beyond the time of any observation is almost zero, meaning
that **P**(*τ*) is nearly singular (Fig. 8). The corresponding eigenvector
has a value near 1 in position 5 and is near zero elsewhere. Because
numerous accurate observations were made of *x*_{5}(*t*), its uncertainty
almost vanishes for that element, and a weighting of values by **P**(t)^{−1} gives it a nearly infinite weight at that time.

## 3.5 A fixed interval smoother

The Kalman filter and various approximations to it produce an estimate at
any time, *τ*, taking account of data only at *τ* or in the
past – with an influence falling as the data recede into the past at a rate dependent on the
model timescales. But in many problems, such as those addressed (for one example) by the Estimating the Circulation and Climate of the Ocean (ECCO) project
(Stammer et al., 2002; Wunsch and Heimbach, 2007),
the goal is to find a best estimate over a finite interval, nominally, $\mathrm{0}\le t\le {t}_{\text{f}}$, and accounting for all of the observations, whether past or future
to any *τ*. Furthermore, as already noted above, physical sense requires
satisfaction of the generalized (to account for sources and sinks) energy, mass,
and other important conservation rules. How can that be done?

In distinction to the filtering goal
underlying a KF best prediction, the fixed interval problem is generally
known as that of smoothing. Several
approaches exist. One of the most interesting, and one leading to the
ability to parse data versus model structure impact over the whole interval,
is called the Rauch–Tung–Striebel (RTS) smoothing algorithm. In that
algorithm, it is assumed that a true KF has already been run over the full
time interval and that the resulting $\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(\mathit{t}\right),$ **P**(t), $\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(\mathit{t}\mathbf{,}\mathbf{-}\right)$, and $\mathbf{P}\left(t,-\right)$ remain available. The basic idea is subsumed in the
algorithm

with

This new estimate, using data *future* to *t*, depends upon a weighted
average of the previous best estimate $\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(t\right)$ with its
difference between the original pure prediction, $\stackrel{\mathrm{\u0303}}{\mathit{x}}(t+\mathrm{\Delta}t,-)$, and the improvement (if any) made of the later estimate at *t*+Δ*t*. The latter uses any data that occurred *after* that time. Thus a
backwards-in-time recursion of Eq. (19) is done – starting from the
best estimate at the final KF time, *t*=*t*_{f}, beyond which no future data
occur. The RTS coefficient matrix, **L**(*t*+Δ*t*), has a particular
structure accounting for the correlation between $\stackrel{\mathrm{\u0303}}{\mathit{x}}(t+\mathrm{\Delta}t,+)$ and $\stackrel{\mathrm{\u0303}}{\mathit{x}}(t+\mathrm{\Delta}t,-)$ generated by the KF.
Equation (A5) calculates the new uncertainty, $\mathbf{P}\left(t,+\right)$.

In this algorithm a correction is also necessarily made to the initial assumptions
concerning ** q**(t), producing a new set of vector
forcings, $\stackrel{\mathrm{\u0303}}{\mathit{q}}\left(\mathit{t}\mathbf{,}\mathbf{+}\right)=\mathit{q}\left(\mathit{t}\right)+\stackrel{\mathrm{\u0303}}{\mathit{u}}\left(t\right)$, such that the new estimate, $\stackrel{\mathrm{\u0303}}{\mathit{x}}(t,+)$, exactly satisfies Eq. (1) with $\stackrel{\mathrm{\u0303}}{\mathit{q}}$ at

*all*times over the interval. If the true model satisfies energy conservation, so will the new estimate. That the smoothed estimate satisfies the free-running but adjusted model parameters and thus all of its implied conservation laws is demonstrated by Bryson and Ho (1975) in Chap. 13. Estimated $\stackrel{\mathrm{\u0303}}{\mathit{u}}\mathbf{(}\mathit{t}\mathbf{,}\mathbf{+}\mathbf{)}$, often called the “control vector”, has its own computable uncertainty found from Eq. (A5b). In many problems, improved knowledge of the forcing field and/or boundary conditions may be equally as or more important than improvement of the state vector. Application of the RTS algorithm is made in the following section to a slightly more geophysical example.

Consider the smoothing problem in a geophysical fluid dynamics toy model. Realism is still not the goal, which remains as the demonstration of various elements making up estimates in simplified settings.

## 4.1 Rossby Wave normal modes

A flat-bottom, linearized *β*-plane Rossby wave system has a two-dimensional governing equation for the streamfunction, *ψ*,

in a square *β*-plane basin of horizontal dimension *L*. The parameter $\mathit{\beta}=\mathrm{d}f/\mathrm{d}y$ is the variation of the Coriolis parameter, *f*, with the latitude coordinate, *y*.
This problem is representative of those involving both space and time
structures, including boundary conditions (spatial variables *x*,*y* should
not be confused with the state vector or data variables). Equation (21) and other geophysically important ones are not self-adjoint, and
the general discussion of quadratic invariants inevitably leads to adjoint
operators (see Morse and Feshbach, 1953, or for bounding problems, see
Sewell et al. (1987), Chap. 3, 4).

The closed-basin problem was considered by Longuet-Higgins (1964). Pedlosky (1965) and LaCasce (2002) provide helpful
discussions of normal modes. Relevant real observational data are discussed
by Luther (1982), Woodworth et al. (1995), Ponte (1997), Thomson and Fine (2021), and others. The
domain here is $\mathrm{0}\le x\le L$ and $\mathrm{0}\le y\le L$ with the boundary condition *ψ*=0 on all four boundaries.

Introduce non-dimensional primed variables ${t}^{\prime}=ft$, $x=L{x}^{\prime}$, $y=L{y}^{\prime}$, and ${\mathit{\psi}}^{\prime}=({a}^{\mathrm{2}}/f)\mathit{\psi}$. Letting *a* be the radius of the Earth and $\mathit{\beta}={\mathit{\beta}}^{\prime}f/a=\mathrm{1.7}$, Eq. (21) is non-dimensionalized as

Further choosing *L*=*a* (since *a* is of the same order as the width of the Pacific) and then omitting the primes from here on except for *β*^{′},

Haier et al. (2006) describe numerical solution methods that specifically conserve invariants, but these methods are not used here. Gaspar and Wunsch (1989) employed this system for a demonstration of sequential estimation with altimetric data. Here a different state vector is used.

An analytical solution to Eq. (23) is

along with the dispersion relation,

where *c*_{nm} is a coefficient dependent only upon initial conditions in
the unforced case.

A state vector is then

where *p* is a linear ordering of *n* and *m*. Total dimension is then *N*⋅*M*, with *N* and *M* the upper limits in Eq. (24). State transition can be written
in the now familiar form

For numerical examples with the KF and RTS smoother, a random forcing *q*_{j}(*t*) is introduced at every step so that

Note that with the introduction of a forcing the solution described by coefficients *c*_{p} does not strictly satisfy Eq. (23) but is rather a simplified version of a forced solution. Discussion of this dynamical system is still useful for understanding the difficulties that arise in numerical data assimilation. An example of a true forced solution to Eq. (21) is examined in Pedlosky (1965).

The problem is now made a bit more interesting by addition of a *steady* component to *ψ*: namely, the solution *ψ*_{s}(*x*,*y*) from Stommel (1948), whose governing equation is

where *R*_{a} is a Rayleigh friction coefficient.

An approximate solution, written in the simple boundary layer interior form is (e.g., Pedlosky, 1965)

which leads to a small error in the eastern boundary condition (numerical
calculations that follow used the full Stommel, 1948, solution). The sin *π**y* arises from Stommel's assumed time-independent wind stress curl.

The new state vector becomes

now of total dimension $N\cdot M+\mathrm{1}$.

The state transition matrix **A** is diagonal with the first *N*⋅*M* diagonal elements given by diag(exp (−*i**σ*_{j}Δ*t*)) and the final diagonal element ${\mathbf{A}}_{N\cdot M+\mathrm{1},N\cdot M+\mathrm{1}}=\mathrm{1}$ the overall square of dimension $N\cdot M+\mathrm{1}$. A small numerical
dissipation is introduced, multiplying **A** by exp (−*b*) for *b*>0,
to accommodate loss of memory, e.g., as a conventional Rayleigh dissipation. The constant *b* is chosen as $\mathrm{1.8}\times {\mathrm{10}}^{-\mathrm{3}}$ in the numerical code. The matrix operator **B** is diagonal with the first *N*⋅*M* diagonal elements all equal to 1, and ${B}_{N\cdot M+\mathrm{1},N\cdot M+\mathrm{1}}=\mathrm{0}$ (no forcing is added to the steady solution). Some special care in computing covariances must be taken when using complex
state vectors and transition matrices (Schreier and Scharf, 2010).

Consistent with the analysis in Pedlosky (1965), no westward intensification exists in the normal modes, which decay as a whole. Rayleigh friction of the time-dependent modes is permitted to be different from that in the time-independent mean flow – a physically acceptable situation.

If ** q**(t)=0 and with no dissipation, then Eq. (23) has several useful conservation
invariants including the quadratic invariants of the kinetic energy and of the
variance in

*ψ*,

(conjugate transpose), as well as the linear invariant of the vorticity or circulation – when integrated over the entire basin domain. As above, estimates of the quadratic and linear conservation rules will depend explicitly on the initial conditions, forces, distribution, and accuracy of the data, as well as the covariances and bias errors assigned to all of them.

### 4.1.1 System with observations

Using the KF plus RTS smoother for sequential estimation, estimates of Φ(*t*) as well as the transport of the western boundary current (WBC), *T*_{WBC}(t), are calculated; the latter is constant in time, although that is unknown to the analyst. Random noise in *T*_{WBC}(t) exists from both physical noise – the normal modes – and that of
the observations ** y**(

*t*_{j}), as would be the situation in nature.

Equation (1) with the above **A** and **B** is used to generate the true fields. Initial conditions, ** x**(0), in the modal components are

Modal periods are shown in Fig. 9. Parameters are fixed as Δ*t*=29, $b=\mathrm{1.8}\cdot {\mathrm{10}}^{-\mathrm{3}}$, and *f*=30, and the random forcing *q*_{j} has standard deviation 0.002.

The prediction model uses 0.5*q*_{p}(*t*) as a first guess for *q*_{p}(t), where *q*_{p}(*t*) represents the true random forcing values, and the initial conditions are too large as 1.5** x**(0). Noisy observations

*y***(**

*t***)**are assumed to exist at the positions denoted with red dots in Fig. 10, and the measurement noise has a standard deviation of 0.001.

The field *ψ*(*t*=167Δ*t*) as given by the true model is
shown in Fig. 10, keeping in mind that apart from
the time mean *ψ*, the structure seen is the result of a particular set
of random forcings.

## 4.2 Aliasing

In isolation, the observations will time-alias the field if not taken at a minimum interval of $\mathrm{1}/\mathrm{2}$ the shortest period present, here 4Δ*t*. A
spatial alias occurs if the separation between observations
is larger than one-half of
the shortest wavelength present (here $\mathrm{\Delta}y=\mathrm{1}/\mathrm{9})$. Both these phenomena
are present in what follows, but their impact is minimized by the presence
of the time evolution model.

## 4.3 Results: Kalman filter and RTS smoother

For the KF and RTS algorithms the model is run for *t*_{f}=2000 time steps with the above parameters.

### 4.3.1 Energy estimates

A KF estimate is computed and the results stored. As in Sect. 3.1.1, observations are introduced in
two intervals, each with a different density of observations: initially data
are introduced with 50Δ*t* between them and subsequently
reduced to 25Δ*t* between observations. Observations cease prior to *T*_{f}, mimicking a pure prediction interval following the observations.

The estimated values of the quadratic Φ(t) are shown in
Fig. 11 for the true values, KF estimates, RTS smoother estimates, and the pure model prediction. The KF and prediction estimates agree until the first observation time, at which point a clear discontinuity is seen. As additional observations accumulate, Φ_{KF}(t) jumps by varying amounts depending upon the
particulars of the observations and their noise. Over the entire observation
interval the energy reconstructed by the KF remains low – a systematic error
owing to the sparse observations and null space of **E**. Here the forcing amplitude overall dominates the effects of the incorrect
initial conditions. Uncertainty estimates for energy would once again come
from summations of correlated *χ*^{2} variables of differing means. In
the present case, important systematic errors are visible as the offsets
between the curves in Fig. 11.

This system can theoretically be overdetermined by letting the number of
observations at time *t* exceed the number of unknowns – should the null
space of **E**(t) then vanish. As expected, with 14
covarying observations and 18 time-varying unknowns *x*_{i}(t),
rank-12 time-independent **E**(t)=**E** has a
null space, and thus energy in the true field is missed even if the
observations are perfect. As is well-known in inverse methods, the
smaller eigenvalues of **E** and their corresponding eigenvectors are
most susceptible to noise biases. The solution null space of this particular **E**(t) is found from the solution eigenvectors of the
singular value decomposition, $\mathbf{U}\mathrm{\Lambda}{\mathbf{V}}^{T}=\mathbf{E}.$
The solution resolution matrix at rank *K*=12, ${\mathbf{V}}_{K}{\mathbf{V}}_{K}^{T},$ is shown in Fig. 12. Thus the observations carry no
information about modes (as ordered) 3, 6, 9, 12, 15, and 18. In a real
situation, if control over positioning of the observations was possible,
this result could sensibly be modified and/or a strengthening of the weaker
singular values could be achieved. Knowledge of the null space structure is
important in the interpretation of results.

A more general discussion of null spaces involves that of the weighted $\mathbf{P}\left(\mathit{\tau},-\right){\mathbf{E}}^{T}$ appearing in the Kalman gain
(Eq. 5). If $\mathbf{P}{\left(\mathit{\tau},-\right)}^{\mathrm{1}/\mathrm{2}}$ is the Cholesky factor
of $\mathbf{P}\left(\mathit{\tau},-\right)$ (Wunsch, 2006, page 56),
then $\mathbf{EP}{\left(\mathit{\tau},-\right)}^{\mathrm{1}/\mathrm{2}}$ is the conventional
column weighting of **E** at time *τ*, and the resolution
analysis would be applied to that combination. A diagonal **A**
does not distribute information from any covariance amongst the elements *x*_{j}(*τ*) and which would be carried in $\mathbf{P}\left(\mathit{\tau},-\right)$.

Turning now to the RTS smoother, Fig. 11 shows that
the energy in the smoothed solution, Φ_{RTS}(t), is
continuous (up to the model time-stepping changes). The only
information available to the prediction prior to the observational interval lies in the initial conditions, which were given incorrect values, leading to an initial uncertainty. Estimated unknown elements **u**(t), the control vector of ** q**(t), in this interval also have a large variance.

One element through time of the
estimated control vector $\stackrel{\mathrm{\u0303}}{\mathbf{u}}\left(t\right)$ and its
standard error are shown in Fig. 13. The complex
result of the insertion of data is apparent. As with the KF, discussion of
any systematic errors has to take place outside of the formalities leading
to the smoothed solution. This RTS solution does conserve **Φ**(t) as well as other properties (circulation).

### 4.3.2 Western boundary current estimates

Consider now determination of ${\stackrel{\mathrm{\u0303}}{T}}_{\text{WBC}}\left(t\right)$, the north–south
transport across latitude *y*_{0} at each time step, whose true value is
constant. *T*_{WBC}(t) is computed from the velocity or stream
function as

the stream-function difference between a longitude pair, $x=\mathrm{0},\phantom{\rule{0.25em}{0ex}}{x}_{\mathrm{0}}$. From
the boundary condition, $\mathit{\psi}\left(t,\mathrm{0},{y}_{\mathrm{0}}\right)=\mathrm{0}$. The
horizontal line segment in Fig. 10 indicates the
location of the zonal section for the experiment at *y*_{0}=0.5, extending
from *x*=0 to *x*_{0}=0.2. In the present context, five different values of *T*_{WBC}(t) are relevant: (a) the true, steady, time-invariant
value, computed from the Stommel solution; (b) the true time-dependent value including mode contributions
from Eq. (27); (c) the estimated
value from the prediction model; (d) the estimate from the KF; and (e) the
estimate from the RTS smoother. Figure 14a
displays the transport computed from the Stommel solution *ψ*_{s}
alongside the transport computed from the KF estimate. Panel (b) in Fig. 14 displays the same steady transport alongside the RTS estimate. Values here are dominated by the variability induced by the normal modes, leading to
a random walk. Note that the result can depend sensitively on positions *x*_{0} and *y*_{0}, as well as the particular spatial structure of any given normal
mode.

In the KF reconstruction (Fig. 14a),
observations move the WBC transport values closer to the steady solution, seen via the
jump at *t*=400, but remain noisy.
Transport value uncertainties are derived from the **P** of the state vector using Eq. (A1)
and shown in Fig. 14c. Within the observation interval the
estimates are indistinguishable from the true value but still have a wide uncertainty with timescales present from both the natural variability and the regular injection times of the data. The magnitude of the uncertainty during the observation intervals is still roughly 10 % of the magnitude of the KF estimate.

Figure 14b shows the behavior of the estimate of *T*_{WBC}(t) after the RTS smoother has been applied. Most noticeably, the discontinuity that occurs at the onset of the observations has been removed. A test of the null hypothesis that the transport computed from the RTS smoother was indistinguishable from a steady value is based upon an analysis using the uncertainty (not shown).

The very large uncertainty prior to the onset of data, even with use of a smoothing algorithm, is a central reason that the state estimate produced by the Estimating the Circulation and Climate of the Ocean (ECCO) project (e.g., Stammer et al., 2002; Fukumori et al., 2018) is confined to the interval following 1992 when the data become far denser than before through the advent of ocean-dedicated satellite altimetry and nascent Argo arrays. Estimates prior to a dense data interval depend greatly upon the time durations built into the system, which in the present case are limited by the longest normal mode period. The real ocean does include some very long memory (Wunsch and Heimbach, 2008), but the estimation skill will depend directly on the specific physical variables of concern (ECCO estimates are based upon a different algorithm using iterative least-squares and Lagrange multipliers; see Stammer et al., 2002). For a linear system, results are identical to those using a sequential smoother (see Fukumori et al., 2018).

Some understanding of the impact via the smoother of later observations on KF time estimates
can be found from the operator **L**(t). Figure 15 shows the norm of the operator **L** (Eq. 20) controlling the correction to earlier state estimates, along with
the time dependence of one of its diagonal elements. As always, the temporal
structure of **L**(t) depends directly upon the time
constants embedded in **A** and the compositions of **P**(t) and $\mathbf{P}\left(t+\mathrm{\Delta}t,-\right)$. In turn, the
latter are determined by any earlier information, including initial
conditions, as well as the magnitudes and distributions of later forcing and
data accuracies. Generalizations are not easy.

The norm of the gain matrix **M**(t), used for computation of the control vector in Eq. (A4), provides a measure of its importance relative to the prior estimate and is displayed in Fig. 16. Here the dependence is directly upon the a priori
known control variance ** q**(t), the data distributions, and $\mathbf{P}\left(t+\mathrm{\Delta}t,-\right)$. The limiting cases discussed above for the state vector also provide insights here.

### 4.3.3 Spectra

Computation of the Fourier spectral estimates of the various estimates of any state
vector element or combination is straightforward, and the *z* transforms in the
Appendix provide an analytic approach. What is
not so straightforward is the interpretation of the result in this
non-statistically stationary system. Care must be taken to account for the
nonstationarity.

In sequential estimation methods, the behavior of dynamical system invariants and conservation laws, including energy or circulation or scalar inventories, or derived ones such as a thermocline depth, depend, as shown using toy models, upon a number of parameters. These parameters include the timescales embedded in the dynamical system, the temporal distribution of the data used in the sequential estimation relative to the embedded timescales, and the accuracies of initial conditions, boundary conditions, sources and sinks, and data, as well as the accuracy of the governing time evolution model. Errors in any of these parameters can lead to physically significant errors in estimates of the state and control vectors as well as any quantity derived from them.

Estimates depend directly upon the accuracies of the assumed and calculated uncertainties in all of the elements making up the estimation system. Impacts of data insertions can range from very short time intervals to those extending far into the future.
Because of model–data interplay, the only easy generalization
is that the user must check the accuracies of all of these elements,
including the appearance of systematic errors in any of them
(e.g., Dee, 2005), or of periodicities arising solely from data distributions.
When feasible, a strong clue to the presence of
systematic errors in energies, as one example, lies in determining the
null space of the observation matrices coupled with the structure of the
state evolution matrices, **A**. Analogous examples have been
computed for advection–diffusion systems (not shown, but see Marchal and Zhao, 2021) with results concerning, in one example, estimates
of fixed total tracer inventories.

Use of Kalman filters, and the simple analogs most often used in weather prediction or “reanalyses”, produces results that are always suboptimal when the goal is reconstruction because data future to the estimate have not been used. A second consequence is the failure of the result to satisfy any particular dynamical time evolution model, implying loss of energy, mass, etc. conservation laws. In the Rossby wave example, reconstructions of the constant western boundary current transport improved as expected from that of the KF by using *future* data and the Rauch–Tung–Striebel (RTS) smoother. Estimates can nonetheless still contain large uncertainties – quantifiable from the accompanying algorithmic equations.

All possible error sources have not been explored. In particular, only linear systems were analyzed, assuming exact knowledge of the state transition matrix, **A**, and the data distribution matrix, **B**. Notation was simplified by using only time-independent versions of them. Nonlinear problems arising from errors in **A** and **B** will be described elsewhere. Other interesting nonlinear problems
include those wherein the observations are derived quantities of the state vector,
or the observations – such as a speed – are
nonlinear in the state vector.

Unless, as in weather prediction, short-term predictions are almost immediately testable by comparison with the observed outcome, physical insights into the system behavior are essential, along with an understanding of the structure of the imputed statistical relationships. As considerable literature cited above has made clear, the inference of trends in properties and understanding of the physics (or biology or chemistry) in the presence of time-evolving observation systems require particular attention. At a minimum, one should test any such system against the behavior of a known result – for example, treating a general circulation model as the “truth” and then running the smoothing algorithms to test whether that truth is forthcoming.

The RTS algorithm is only one choice from several approaches to the finite interval estimation problem. Alternatives include the least-squares and/or Lagrange multiplier approach of the ECCO project, which in the linear case can be demonstrated to produce identical results. This approach guarantees exact dynamical and kinematic consistency of the state estimate (Stammer et al., 2002; Stammer et al., 2016; Wunsch and Heimbach, 2007), a key requirement when seeking physical understanding of the results. Such consistency is ensured by restricting observation-induced updates to those that are formally independent inputs to the conservation laws, i.e., initial, surface, or – where relevant – lateral boundary conditions. This consistency ensures no artificial source or sink terms in the conservation equations. The ECCO project has conducted this approach with considerable success over the past 2 decades and demonstrated the merits of accurate determination of heat, freshwater, and momentum budgets and their constituents (Heimbach et al., 2019). The Lagrange multiplier framework similarly provides a general inverse modeling framework, which addresses several estimation problems either separately or jointly: (i) inference of optimal initial conditions, such as produced by incremental so-called four-dimensional variational data assimilation practiced in numerical weather prediction; (ii) inference of updated (or corrected) boundary conditions, such as practiced in flux inversion methods; (iii) inference of optimal model parameters, such as done in formal parameter calibration problems; or (iv) any combination thereof. More details of this more general framework in the context of similar toy models will be presented in a sequel paper.

## A1 Kalman filter

The model state transition equation is that in Eq. (1), and the
weighted averaging equation is Eq. (4), with the gain matrix **K**(t) defined in Eq. (5). Time
evolution of the covariance matrix of $\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(t\right)$
is governed by

and

with **E** and **R** defined in the text. The matrix symbol **Γ** is
introduced for a situation in which the control distribution over the state
differs from that in **B**. Otherwise they are identical. Because **P** is square of the state vector length, calculating it is normally
the major computational burden in the use of a Kalman filter.

Under some circumstances in which a system including observation injection
reaches a steady state, the time index may be omitted in both the KF and the
RTS smoother. Time independence is commonly assumed when the rigorous formulation for the
KF is replaced by an ad hoc constant gain matrix **K**.

## A2 RTS smoother

In addition to Eqs. (19) and (20), the Rauch–Tung–Striebel smoother estimates

for the updated control **u**(t); ** q**(

*t*) is the assumed covariance of

**u**(t) (the uncertainty in

**(t)), and**

*q***Γ**is again often equal to

**B**. Then the corresponding uncertainties of the smoothed estimates are

One can gain insight into this filter–smoother machinery by considering its operation on a scalar state vector with scalar observations (not shown here).

## B1 KF response

The impact at other times of having data at time *t* can lend important
physical insight into the sequential analyses. Define an innovation vector,

that is, *d*_{δ} is a vector of Kronecker deltas representing
the difference ${D}_{ij}\left(\mathit{\tau}\right)={\mathit{\delta}}_{t,\mathit{\tau}}{\mathit{\delta}}_{ij}={y}_{j}\left(\mathit{\tau}\right)-{\sum}_{r}{E}_{ir}\left(\mathit{\tau}\right){x}_{r}\left(\mathit{\tau}\right)$. Solutions to the
innovation Eq. (4)
are the columns of the Green function matrix,

**K**, now fixed in time, is sought as an indication of delta
impulse effects of observations on the prediction model at time *τ*.

Define the scalar complex variable,

where *s* is the frequency. Then the discrete Fourier transform of Eq. (B7) (the *z* transform – a matrix polynomial in *z*) is

The norm of the variable ${\left(\mathbf{I}-z\mathbf{A}\right)}^{-\mathrm{1}}$
defines the “resolvent” of **A**
in the full complex plane (see Trefethen and Embree, 2005), but here, only $\left|z\right|=\mathrm{1}$ is of direct interest: that is, only on the unit circle. The full
complex plane carries information about the behavior of **A**,
including stability.

Here $\widehat{\mathbf{D}}\left(z\right)=\mathbf{I}{z}^{\mathit{\tau}}$ and

If a suitably defined norm of **A** is less than 1,

and the solution matrix in time is the causal vector sequence (no
disturbance before *t*=*τ*) of columns of

where **G** can be obtained without the *z* transform, but the frequency
content of these results is of interest.

## B2 Green function of smoother innovation

As with the innovation equation for filtering, Eq. (19) introduces
a disturbance into the previous estimate, $\stackrel{\mathrm{\u0303}}{\mathit{x}}\left(t\right)$, in which the structure of **L**(t) determines the
magnitude and timescales of observational disturbances propagated *backwards* in time. It provides direct insight into the extent to which later
measurements influence earlier ones. As an example, suppose that the KF has
been run to time *T*_{f} so that $\stackrel{\mathrm{\u0303}}{\mathit{x}}\left({T}_{\text{f}},+\right)=\stackrel{\mathrm{\u0303}}{\mathit{x}}\left({T}_{\text{f}}\right)$, which is the only measurement. Let
the innovation, $\stackrel{\mathrm{\u0303}}{\mathit{x}}({T}_{\text{f}},+)-\stackrel{\mathrm{\u0303}}{\mathit{x}}({T}_{\text{f}},-)$,
be a matrix of *δ* functions in separate columns,

Then a backwards-in-time matrix Green function is

The various timescales embedded in **L** depend upon those in $\mathbf{A},\mathbf{P}\left(t,-\right)$, and **P**(t), as well as with many
observations including those of the observation intervals and any structure
in the observational noise. Similarly, the control modification will be
determined by $\mathbf{P}(t+\mathrm{\Delta}t,-{)}^{-\mathrm{1}}$ if ** q**(

*t*)

**Γ**(

*t*)

^{T}values are constant in time.

MATLAB codes used here are available directly from Sarah Williamson.

No data sets were used in this article.

CW designed the study, performed initial calculations, and did the initial writing. Calculations were fully redone and updated by SW. PH advised on, provided updates to, and checked accuracies of the paper.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Work by Carl Wunsch was originally done at home during the Trump–Covid apocalypse period. We would like to thank the two anonymous reviewers for providing detailed, helpful comments on the first copy of this paper. Detlef Stammer provided many useful suggestions for an earlier version of the paper.

Supported in part by NASA grant ECCO Connections (via JPL/Caltech sub-awards) and NSF CSSI grant no. 2103942 (DJ4Earth).

This paper was edited by Ismael Hernández-Carrasco and reviewed by two anonymous referees.

Bengtsson, L., Hagemann, S., and Hodges, K. I.: Can climate trends be calculated from reanalysis data?, J. Geophys. Res.-Atmos., 109, https://doi.org/10.1029/2004JD004536, 2004. a

Bengtsson, L., Arkin, P., Berrisford, P., Bougeault, P., Folland, C. K., Gordon, C., Haines, K., Hodges, K. I., Jones, P., Kallberg, P., et al.: The need for a dynamical climate reanalysis, B. Am. Meteorol. Soc., 88, 495–501, 2007. a

Boers, N.: Observation-based early-warning signals for a collapse of the Atlantic Meridional Overturning Circulation, Nat. Clim. Change, 11, 680–688, 2021. a

Bromwich, D. H. and Fogt, R. L.: Strong trends in the skill of the ERA-40 and NCEP–NCAR reanalyses in the high and midlatitudes of the Southern Hemisphere, 1958–2001, J. Climate, 17, 4603–4619, 2004. a

Bryson, A. E. and Ho, Y.-C.: Applied optimal control, revised printing, Hemisphere, New York, https://doi.org/10.1201/9781315137667, 1975. a, b

Carton, J. A. and Giese, B. S.: A reanalysis of ocean climate using Simple Ocean Data Assimilation (SODA), Mon. Weather Rev., 136, 2999–3017, 2008. a

Cohn, S. E.: The principle of energetic consistency in data assimilation, in: Data Assimilation, Springer, 137–216, 2010. a, b

Davies, R. B.: Algorithm AS 155: The distribution of a linear combination of
*χ* 2 random variables, Appl. Stat., 29, 323–333, https://doi.org//10.2307/2346911, 1980. a

Dee, D. P.: Bias and data assimilation, Q. J. Roy. Meteor. Soc., 131, 3323–3343, 2005. a, b, c, d

Fukumori, I., Heimbach, P., Ponte, R. M., and Wunsch, C.: A dynamically consistent, multivariable ocean climatology, B. Am. Meteorol. Soc., 99, 2107–2128, 2018. a, b

Gaspar, P. and Wunsch, C.: Estimates from altimeter data of barotropic Rossby waves in the northwestern Atlantic Ocean, J. Phys. Ocean., 19, 1821–1844, 1989. a

Gelaro, R., McCarty, W., Suárez, M. J., Todling, R., Molod, A., Takacs, L., Randles, C. A., Darmenov, A., Bosilovich, M. G., Reichle, R., et al.: The modern-era retrospective analysis for research and applications, version 2 (MERRA-2), J. Climate, 30, 5419–5454, 2017. a

Goldstein, H.: Classical Mechanics, Addison-Wesley, Reading, Mass, ISBN 9780201657029, 1980. a

Goodwin, G. and Sin, K.: Adaptive Filtering Prediciton and Control, Prentice-Hall, Englewood Cliffs, N.J., ISBN 10 013004069X, 1984. a

Haier, E., Lubich, C., and Wanner, G.: Geometric Numerical integration: structure-preserving algorithms for ordinary differential equations, Springer, ISBN 978-3-540-30663-4, 2006. a

Heimbach, P., Fukumori, I., Hill, C. N., Ponte, R. M., Stammer, D., Wunsch, C., Campin, J.-M., Cornuelle, B., Fenty, I., Forget, G., et al.: Putting it all together: Adding value to the global ocean and climate observing systems with complete self-consistent ocean state and parameter estimates, Front. Mar. Sci., 6, https://doi.org/10.3389/fmars.2019.00055, 2019. a

Hu, S., Sprintall, J., Guan, C., McPhaden, M. J., Wang, F., Hu, D., and Cai, W.: Deep-reaching acceleration of global mean ocean circulation over the past two decades, Sci. Adv., 6, eaax7727, https://doi.org/10.1126/sciadv.aax7727, 2020. a

Imhof, J.-P.: Computing the distribution of quadratic forms in normal variables, Biometrika, 48, 419–426, 1961. a

Janjić, T., McLaughlin, D., Cohn, S. E., and Verlaan, M.: Conservation of mass and preservation of positivity with ensemble-type Kalman filter algorithms, Mon. Weather Rev., 142, 755–773, 2014. a

LaCasce, J.: On turbulence and normal modes in a basin, J. Mar. Res., 60, 431–460, 2002. a

Longuet-Higgins, H. C.: Planetary waves on a rotating sphere, P. Roy. Soc. A-Math. Phy., 279, 446–473, 1964. a

Luther, D. S.: Evidence of a 4–6 day barotropic, planetary oscillation of the Pacific Ocean, J. Phys. Ocean., 12, 644–657, 1982. a

Marchal, O. and Zhao, N.: On the Estimation of Deep Atlantic Ventilation from Fossil Radiocarbon Records, Part II: (In) consistency with Modern Estimates, J. Phys. Ocean., 51, 2681–2704, 2021. a

McCuskey, S.: An Introduction to Advanced Dynamics, Addison-Wesley, Reading, Mass., ISBN 0201045605, 1959. a

Morse, P. and Feshbach, H.: Methods of Theoretical Physics, McGraw-Hill, New York, ISBN 10 007043316X, 1953. a

Pedlosky, J.: A study of the time dependent ocean circulation, J. Atmos. Sci., 22, 267–272, 1965. a, b, c, d

Ponte, R. M.: Nonequilibrium response of the global ocean to the 5-day Rossby–Haurwitz wave in atmospheric surface pressure, J. Phys. Ocean., 27, 2158–2168, 1997. a

Schreier, P. J. and Scharf, L. L.: Statistical signal processing of complex-valued data: the theory of improper and noncircular signals, Cambridge university press, ISBN 1139487620, 2010. a

Sewell, M. J., Crighton, C., et al.: Maximum and minimum principles: a unified approach with applications, Vol. 1, CUP Archive, ISBN 0521332443, 1987. a

Sheil, J. and O'Muircheartaigh, I.: Algorithm AS 106: The distribution of non-negative quadratic forms in normal variables, J. R. Stat. Soc., 26, 92–98, 1977. a

Stammer, D., Wunsch, C., Giering, R., Eckert, C., Heimbach, P., Marotzke, J., Adcroft, A., Hill, C., and Marshall, J.: Global ocean circulation during 1992–1997, estimated from ocean observations and a general circulation model, J. Geophys. Res.-Oceans, 107, https://doi.org/10.1029/2001JC000888, 2002. a, b, c, d

Stammer, D., Balmaseda, M., Heimbach, P., Köhl, A., and Weaver, A.: Ocean data assimilation in support of climate applications: status and perspectives, Annu. Rev. Mar. Sci., 8, 491–518, 2016. a

Stengel, R. F.: Stochastic optimal control: theory and application, John Wiley & Sons, Inc., ISBN 10 0471864625, 1986. a

Stommel, H.: The westward intensification of wind-driven ocean currents, EOS T. Am. Geophys. Un., 29, 202–206, 1948. a

Strang, G.: Computational science and engineering, Vol. 791, Wellesley-Cambridge Press Wellesley, ISBN-10 0961408812, 2007. a

Thomson, R. E. and Fine, I. V.: Revisiting the ocean’s nonisostatic response to 5-day atmospheric loading: New results based on global bottom pressure records and numerical modeling, J. Phys. Ocean., 51, 2845–2859, 2021. a

Thorne, P. and Vose, R.: Reanalyses suitable for characterizing long-term trends, B. Am. Meteorol. Soc., 91, 353–362, 2010. a

Woodworth, P., Windle, S., and Vassie, J.: Departures from the local inverse barometer model at periods of 5 days in the central South Atlantic, J. Geophys. Res.-Oceans, 100, 18281–18290, 1995. a

Wunsch, C.: Discrete inverse and state estimation problems: with geophysical fluid applications, Cambridge University Press, https://doi.org/10.1017/CBO9780511535949, 2006. a, b, c, d, e, f

Wunsch, C.: Is the ocean speeding up?, Ocean surface energy trends, J. Phys. Ocean., 50, 3205–3217, 2020. a

Wunsch, C. and Heimbach, P.: Practical global oceanic state estimation, Physica D, 230, 197–208, 2007. a, b

Wunsch, C. and Heimbach, P.: How long to oceanic tracer and proxy equilibrium?, Quat. Sci. Rev., 27, 637–651, 2008. a

- Abstract
- Introduction
- Notation and some generic inferences
- Example 1: mass–spring oscillators
- Example 2: barotropic Rossby waves
- Discussion and conclusions
- Appendix A: Notation and equations
- Appendix B: Green function analysis of estimates
- Code availability
- Data availability
- Author contributions
- Competing interests
- Disclaimer
- Acknowledgements
- Financial support
- Review statement
- References

climateincludes all sub-elements (ocean, atmosphere, ice, etc.). A common form of combination arises from sequential estimation theory, a methodology susceptible to a variety of errors that can accumulate through time for long records. Using two simple analogs, examples of these errors are identified and discussed, along with suggestions for accommodating them.

- Abstract
- Introduction
- Notation and some generic inferences
- Example 1: mass–spring oscillators
- Example 2: barotropic Rossby waves
- Discussion and conclusions
- Appendix A: Notation and equations
- Appendix B: Green function analysis of estimates
- Code availability
- Data availability
- Author contributions
- Competing interests
- Disclaimer
- Acknowledgements
- Financial support
- Review statement
- References