Observation operators (OOs) are a central component of any data assimilation system. As they project the state variables of a numerical model into the space of the observations, they also provide an ideal opportunity to correct for effects that are not described or are insufficiently described by the model. In such cases a dynamical OO, an OO that interfaces to a secondary and more specialised model, often provides the best results. However, given the large number of observations to be assimilated in a typical atmospheric or oceanographic model, the computational resources needed for using a fully dynamical OO mean that this option is usually not feasible. This paper presents a method, based on canonical correlation analysis (CCA), that can be used to generate highly efficient statistical OOs that are based on a dynamical model. These OOs can provide an approximation to the dynamical model at a fraction of the computational cost.

One possible application of such an OO is the modelling of the diurnal cycle of sea surface temperature (SST) in ocean general circulation models (OGCMs). Satellites that measure SST measure the temperature of the thin uppermost layer of the ocean. This layer is strongly affected by atmospheric conditions, and its temperature can differ significantly from the water below. This causes a discrepancy between the SST measurements and the upper layer of the OGCM, which typically has a thickness of around 1 m. The CCA OO method is used to parameterise the diurnal cycle of SST. The CCA OO is based on an input dataset from the General Ocean Turbulence Model (GOTM), a high-resolution water column model that has been specifically tuned for this purpose. The parameterisations of the CCA OO are found to be in good agreement with the results from the GOTM and improve upon existing parameterisations, showing the potential of this method for use in data assimilation systems.

Data assimilation (DA) strives to improve the forecast skill of a numerical model by combining the model with observations. Observations are incorporated into the model by applying a series of corrections to the internal state of the model. As the state variables of a numerical model are usually not observed directly, this procedure requires an observation operator (OO) to project the model state variables onto the variable that is observed. The difference between the observation and the model prediction, the so-called innovation, forms the basis for calculating the correction to the model state. The accuracy of the OO is paramount in this process: any bias in the projection will lead to a bias in the innovation and therefore result in a biased correction to the model state. For this reason, bias correction procedures have been built considering not only systematic errors in observations but also in observation operators (see e.g.

Many different types of OO exist. In its simplest form, an OO could just select one of the state variables in a point near the observation or, perhaps, perform an interpolation. More complex OOs may include corrections for processes that influence the observation but are not modelled or are insufficiently modelled. Ultimately, one could even consider a dynamical OO that wraps a second numerical model to locally refine the results of the parent model. The latter solution may very well provide the most accurate results, but the vast number of observations that need to be assimilated in a typical atmospheric or oceanographic model means that this approach would require a prohibitive amount of computing resources. This limits OOs in most practical applications to relatively simple parameterisations in terms of the model state variables. Moreover, variational data assimilation requires observation operators to be linearised around the background within the inner loops (tangent-linear approximation). This translates into a need to construct OOs that can be formally and practically differentiated.

This paper presents a method of parameterising the results of a specialised model in such a way that it can be efficiently used within an OO. The parameterisation is based on canonical correlation analysis (CCA), a well-established mathematical method for finding cross-correlations between datasets. A new pseudo-dynamical OO is generated using the canonical correlation between the inputs and outputs of the specialised model on a large and representative dataset. Once this correlation has been calculated, the application of the pseudo-dynamical OO involves only a matrix multiplication that can be performed at a fraction of the computational cost of the dynamical OO. A similar method has been used previously to build reduced-order OOs in atmospheric data assimilation

This work is part of the SOSSTA (Statistical-dynamical observation Operator for SST data Assimilation) project, funded by the EU Copernicus Marine Environment Monitoring Service (CMEMS) through the Service Evolution grants. The aim of SOSSTA is to formulate an efficient OO for sea surface temperature (SST) DA that accounts for the diurnal variability of the ocean skin temperature. The results of the project are presented in multiple publications. The modelling of the diurnal cycle of SST is described in

The paper is organised as follows: Sect.

CCA

The structure of

The calculation of the matrices

The orthogonality requirement of Eq. (

As QR decomposition and SVD are common matrix operations that are efficiently implemented in most numerical libraries, this algorithm is straightforward to implement in most programming languages.

The CCA method can be used to construct an OO. Let

Assuming that

During the training phase of the CCA OO, the datasets

One possible application of the new CCA OO is the assimilation of SST in ocean general circulation models (OGCMs). In recent years OGCMs have seen significant improvements in vertical resolution, particularly near the surface, where the first layer has been reduced to a thickness of the order of 1 m or less. At this resolution, the diurnal cycle of SST should be taken into account. Although diurnal variability is included to some extent

This issue becomes particularly evident when assimilating satellite SST observations. The different types of sensors used on satellites probe the ocean temperature at different depths. Infrared (IR) sensors measure the temperature at about 10

Representation errors have been extensively discussed within ocean applications

An important source of SST observational data is the Spinning Enhanced Visible and Infrared Imager (SEVIRI) instrument onboard the Meteosat satellites of the second generation. As these are geostationary satellites, SEVIRI can provide continuous measurements of the same area with a 15 min temporal resolution. Although the IR imager is sensitive to skin temperature, the calibration algorithm of SEVIRI corrects for the cool-skin bias, and the resulting SST products should be considered the subskin temperature

This section will discuss how to use the output of a water column model specifically tuned for modelling the diurnal cycle of SST together with the CCA OO to build an observation operator for SST that accounts for the diurnal variability.

The SST diurnal cycle is modelled using the General Ocean Turbulence Model (GOTM). The GOTM is a one-dimensional water column model that includes multiple turbulence closure schemes

The subskin SST represents the temperature at the base of the conductive laminar sub-layer of the ocean surface; for practical purposes it is represented by the temperature of the top model layer of the GOTM (1.5 cm). The conductive sub-layer of the air–sea interface, associated with the cool-skin effect, is parameterised and dynamically computed within the GOTM to produce a modelled skin SST. Further details are provided in

The aim for the CCA OO is to parameterise the IR and MW satellite SST observations as a function of temperature in the water column below. While the dataset of

The magnitude of the diurnal warming at the subskin level as a function of the time of the day for different wind and insolation categories. The diurnal warming is measured with respect to the SST at local sunrise. The wind categories are represented by the different panels, while the insolation categories are shown as different curves within each panel.

The magnitude of the diurnal signal depends strongly on the atmospheric conditions, most importantly the insolation and wind speed. Insolation causes the ocean skin to heat up during the course of the day, while wind mixes the upper layers of the ocean, leading to the dissipation of the heat. Due to latent heat loss, the ocean skin may even cool down below the bulk temperature. To accommodate a non-linear dependence on the different insolation and wind scenarios in the CCA OO, the GOTM dataset is divided into 12 insolation and 8 wind categories. Insolation and wind are defined in each location as the daily mean value in local mean time (LMT). The category boundaries were chosen to equally divide the dataset. The magnitude of the diurnal warming for the different categories is shown in Fig.

The GOTM dataset has been compared to SEVIRI data at the skin level in

The correlation coefficients between the model variables and observations

For each category of wind and insolation, and at hourly time resolution, the CCA OO is calculated to project the 10 uppermost levels of the MFS model onto the skin and subskin SST temperatures. The 10 levels extend down to a depth of approximately 40 m, which was chosen to be well below the depth influenced by the diurnal cycle of temperature. Figure

The CCA OO is validated by comparing its performance to that of the full GOTM. To use the operator effectively in a DA system, it should be able to provide an accurate approximation of the GOTM results. The validation is performed against GOTM profiles that are withheld from the CCA OO calculation. The GOTM dataset is split in two, withholding every other profile in the zonal direction from the calculation. The validation then uses the withheld profiles and extracts the depths corresponding to the MFS levels, mimicking the use of the operator inside a DA system. The CCA OO, based on the atmospheric category and closest time, is subsequently applied to project the model temperature onto the skin and subskin SST. The projected SST values are then compared to the values in the original GOTM profile.

Examples of temperature profiles in various conditions and at different times. The GOTM profiles are shown by the red curve, while the filled circles indicate the values used as input to the CCA OO. The output of the CCA OO is shown by the black triangles.

Some examples of the validation are shown in Fig.

Skill score of the CCA OO compared to the OGCM upper layer for all wind and insolation categories at midnight

The performance of the GOTM-based CCA OO for SST is compared to other commonly used methods. For this comparison the GOTM dataset is again split along the zonal direction using every other profile to calculate the CCA OO. The remaining profiles are matched to SEVIRI subskin retrievals using only profiles matched to a measurement with an acceptable (4) or good (5) quality control level. The performance can be conveniently expressed in terms of the skill score (

Skill score of the CCA OO compared to the parameterisation of

The simplest method of assimilating satellite SST observations in a model that insufficiently describes the diurnal cycle of SST is to assimilate only at night or during high wind; see, for example,

A more advanced solution is the parameterisation of

Using the CCA OO to improve the description of SST has many potential applications. For example, the CCA OO could be used as a parameterisation of diurnally varying skin SST within an OGCM as part of the air–sea flux calculations. The skin SST is the true interface temperature for air–sea fluxes, so this approach should result in improved air–sea heat transfer in OGCMs and coupled ocean–atmosphere models. See, for example,

Due to the way in which it is constructed, the CCA OO is an inherently linear operator. This makes it straightforward to implement in DA schemes that require linearised and differentiable OOs. However, non-linear effects can be accommodated to some extent by constructing a series of CCA OOs conditioned on such a non-linear dependency. For example, in the case of SST, this method has been used to condition the CCA OO on insolation, wind, and time. The only requirement in this case is that the datasets

The minimum size of the input dataset required ultimately depends on the number of model variables used (

Observation operators (OOs) form a central component in any data assimilation (DA) system, as they transform the state variables of a numerical model into real-world observable variables. Often, an OO also needs to correct for processes that are not fully described by the parent model. Such processes may be best modelled by interfacing the OO to a specialised model, but this is generally not feasible due to computational constraints.

The assimilation of satellite sea surface temperature (SST) in ocean general circulation models (OGCMs) is a prime example of a situation in which insufficiently modelled processes play an important role. The diurnal cycle of SST causes a discrepancy in the temperature of the very thin upper layer measured by a satellite and the rather coarse upper layer in a typical OGCM. On a clear summer day with low wind, this discrepancy can amount to as much as 2

The current paper presented a method, based on canonical correlation analysis (CCA), to build parameterisations based on an output dataset of a specialised model. These parameterisations, referred to as the CCA OO, can provide an efficient approximation to the results of the specialised model and are therefore well-suited for use in DA systems.

The case of SST assimilation has been used to demonstrate the new CCA OO. Using an output dataset of the General Ocean Turbulence Model (GOTM), a high-resolution water column model specifically tuned for modelling the diurnal cycle of SST, a new CCA OO has been derived. Subsequently, the operator has been applied to reduced-resolution temperature profiles from the GOTM to simulate its use in a DA system. The approximations provided by the CCA OO are found to be in good agreement with the GOTM at various times of the day and across all atmospheric conditions. The results indicate that the CCA OO could be used to enable the assimilation of SST in conditions under which this was previously not possible. Moreover, the atmospheric categories that were introduced in the construction of the CCA OO for SST show that the linear assumption implicit in CCA can be partially relaxed. This makes the CCA OO versatile for any condition. Compared to commonly used methods for SST assimilation, the CCA OO can provide substantial improvements. This is especially true for measurements of the skin SST, since the CCA OO profits from the modelling of the cool-skin effect that is included in the GOTM.

The ability of the CCA OO to handle complicated physical models in a relatively simple way is attractive for a large number of problems in DA, for which reduced-order OOs are desirable due to computational constraints. Remotely sensed data are the obvious target given the complexity of their relationships with state variables. Observations in coupled assimilations (e.g. ocean–atmosphere, ocean–sea ice, or ocean–biogeochemistry) are examples of challenging problems that could be investigated in the future with the CCA OO.

The GOTM dataset used in Sects. 4 and 5 is available as described in Pimentel et al. (2019). The code for calculating the CCA OO is available from the authors upon request.

EJ designed and implemented the CCA OO software. SP and WHT performed the modelling of the diurnal cycle. DD, GK, and IM evaluated the OO in different DA systems and provided feedback on the modelling and the software. AS was the PI of the project and coordinated the work. EJ prepared the paper with input from all co-authors.

The authors declare that they have no conflict of interest.

This article is part of the special issue “The Copernicus Marine Environment Monitoring Service (CMEMS): scientific advances”. It is not associated with a conference.

This work forms part of the SOSSTA project, which has been funded by the EU Copernicus Marine Environment Monitoring Service (CMEMS) through the Service Evolution grants.

This paper was edited by Pierre-Yves Le Traon and reviewed by Salvatore Marullo and one anonymous referee.