<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">OS</journal-id><journal-title-group>
    <journal-title>Ocean Science</journal-title>
    <abbrev-journal-title abbrev-type="publisher">OS</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Ocean Sci.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1812-0792</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/os-15-1023-2019</article-id><title-group><article-title>Using canonical correlation analysis to produce dynamically based and highly efficient statistical observation operators</article-title><alt-title>Using CCA to produce dynamically based, highly efficient statistical OOs</alt-title>
      </title-group><?xmltex \runningtitle{Using CCA to produce dynamically based, highly efficient statistical OOs}?><?xmltex \runningauthor{E. Jansen et  al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Jansen</surname><given-names>Eric</given-names></name>
          <email>eric.jansen@cmcc.it</email>
        <ext-link>https://orcid.org/0000-0002-8689-7685</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff3">
          <name><surname>Pimentel</surname><given-names>Sam</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-7753-1748</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff3">
          <name><surname>Tse</surname><given-names>Wang-Hung</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-5422-7508</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff4">
          <name><surname>Denaxa</surname><given-names>Dimitra</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff4">
          <name><surname>Korres</surname><given-names>Gerasimos</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Mirouze</surname><given-names>Isabelle</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Storto</surname><given-names>Andrea</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>Ocean Predictions and Applications (OPA) division, Euro-Mediterranean Center on Climate Change (CMCC), Lecce, Italy</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Ocean Modelling and Data Assimilation (ODA) division, Euro-Mediterranean Center on Climate Change (CMCC), Bologna, Italy</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>Trinity Western University (TWU), Langley, BC, Canada</institution>
        </aff>
        <aff id="aff4"><label>4</label><institution>Hellenic Centre for Marine Research (HCMR), Athens, Greece</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Eric Jansen (eric.jansen@cmcc.it)</corresp></author-notes><pub-date><day>2</day><month>August</month><year>2019</year></pub-date>
      
      <volume>15</volume>
      <issue>4</issue>
      <fpage>1023</fpage><lpage>1032</lpage>
      <history>
        <date date-type="received"><day>31</day><month>December</month><year>2018</year></date>
           <date date-type="rev-request"><day>23</day><month>January</month><year>2019</year></date>
           <date date-type="rev-recd"><day>28</day><month>May</month><year>2019</year></date>
           <date date-type="accepted"><day>18</day><month>June</month><year>2019</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2019 Eric Jansen et al.</copyright-statement>
        <copyright-year>2019</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://os.copernicus.org/articles/15/1023/2019/os-15-1023-2019.html">This article is available from https://os.copernicus.org/articles/15/1023/2019/os-15-1023-2019.html</self-uri><self-uri xlink:href="https://os.copernicus.org/articles/15/1023/2019/os-15-1023-2019.pdf">The full text article is available as a PDF file from https://os.copernicus.org/articles/15/1023/2019/os-15-1023-2019.pdf</self-uri>
      <abstract><title>Abstract</title>
    <p id="d1e155">Observation operators (OOs) are a central component of any data assimilation system. As they project the state variables of a numerical model into the space of the observations, they also provide an ideal opportunity to correct for effects that are not described or are insufficiently described by the model. In such cases a dynamical OO, an OO that interfaces to a secondary and more specialised model, often provides the best results. However, given the large number of observations to be assimilated in a typical atmospheric or oceanographic model, the computational resources needed for using a fully dynamical OO mean that this option is usually not feasible. This paper presents a method, based on canonical correlation analysis (CCA), that can be used to generate highly efficient statistical OOs that are based on a dynamical model. These OOs can provide an approximation to the dynamical model at a fraction of the computational cost.</p>
    <p id="d1e158">One possible application of such an OO is the modelling of the diurnal cycle of sea surface temperature (SST) in ocean general circulation models (OGCMs). Satellites that measure SST measure the temperature of the thin uppermost layer of the ocean. This layer is strongly affected by atmospheric conditions, and its temperature can differ significantly from the water below. This causes a discrepancy between the SST measurements and the upper layer of the OGCM, which typically has a thickness of around 1 m. The CCA OO method is used to parameterise the diurnal cycle of SST. The CCA OO is based on an input dataset from the General Ocean Turbulence Model (GOTM), a high-resolution water column model that has been specifically tuned for this purpose. The parameterisations of the CCA OO are found to be in good agreement with the results from the GOTM and improve upon existing parameterisations, showing the potential of this method for use in data assimilation systems.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

      <?xmltex \hack{\allowdisplaybreaks}?>
<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d1e172">Data assimilation (DA) strives to improve the forecast skill of a numerical model by combining the model with observations. Observations are incorporated into the model by applying a series of corrections to the internal state of the model. As the state variables of a numerical model are usually not observed directly, this procedure requires an observation operator (OO) to project the model state variables onto the variable that is observed. The difference between the observation and the model prediction, the so-called innovation, forms the basis for calculating the correction to the model state. The accuracy of the OO is paramount in this process: any bias in the projection will lead to a bias in the innovation and therefore result in a biased correction to the model state. For this reason, bias correction procedures have been built considering not only systematic errors in observations but also in observation operators (see e.g. <xref ref-type="bibr" rid="bib1.bibx7" id="altparen.1"/>, for satellite radiance data).</p>
      <?pagebreak page1024?><p id="d1e178"><?xmltex \hack{\newpage}?>Many different types of OO exist. In its simplest form, an OO could just select one of the state variables in a point near the observation or, perhaps, perform an interpolation. More complex OOs may include corrections for processes that influence the observation but are not modelled or are insufficiently modelled. Ultimately, one could even consider a dynamical OO that wraps a second numerical model to locally refine the results of the parent model. The latter solution may very well provide the most accurate results, but the vast number of observations that need to be assimilated in a typical atmospheric or oceanographic model means that this approach would require a prohibitive amount of computing resources. This limits OOs in most practical applications to relatively simple parameterisations in terms of the model state variables. Moreover, variational data assimilation requires observation operators to be linearised around the background within the inner loops (tangent-linear approximation). This translates into a need to construct OOs that can be formally and practically differentiated.</p>
      <p id="d1e182">This paper presents a method of parameterising the results of a specialised model in such a way that it can be efficiently used within an OO. The parameterisation is based on canonical correlation analysis (CCA), a well-established mathematical method for finding cross-correlations between datasets. A new pseudo-dynamical OO is generated using the canonical correlation between the inputs and outputs of the specialised model on a large and representative dataset. Once this correlation has been calculated, the application of the pseudo-dynamical OO involves only a matrix multiplication that can be performed at a fraction of the computational cost of the dynamical OO. A similar method has been used previously to build reduced-order OOs in atmospheric data assimilation <xref ref-type="bibr" rid="bib1.bibx6" id="paren.2"/>.</p>
      <p id="d1e188">This work is part of the SOSSTA (Statistical-dynamical observation Operator for SST data Assimilation) project, funded by the EU Copernicus Marine Environment Monitoring Service (CMEMS) through the Service Evolution grants. The aim of SOSSTA is to formulate an efficient OO for sea surface temperature (SST) DA that accounts for the diurnal variability of the ocean skin temperature. The results of the project are presented in multiple publications. The modelling of the diurnal cycle of SST is described in <xref ref-type="bibr" rid="bib1.bibx19" id="text.3"/>, while the current paper focuses on the method for constructing the OO. The project includes pilot studies in the Mediterranean Sea and the Aegean Sea that will be described in forthcoming publications.</p>
      <p id="d1e195">The paper is organised as follows: Sect. <xref ref-type="sec" rid="Ch1.S2"/> provides a quick review of CCA; Sect. <xref ref-type="sec" rid="Ch1.S3"/> discusses how CCA can be used to construct the OO matrix; Sect. <xref ref-type="sec" rid="Ch1.S4"/> applies the CCA OO to the modelling of satellite sea surface temperature (SST) measurements in oceanographic models; and Sect. <xref ref-type="sec" rid="Ch1.S5"/> discusses the performance of the method and other possible applications. Conclusions are presented in Sect. <xref ref-type="sec" rid="Ch1.S6"/>.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>The CCA method</title>
      <p id="d1e216">CCA <xref ref-type="bibr" rid="bib1.bibx8" id="paren.4"/> is a method to find cross-correlations between two datasets <inline-formula><mml:math id="M1" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M2" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula>. The datasets are considered to be matrices structured such that the columns represent different variables and the rows represent the measurements of these variables. CCA then aims to find transformation matrices <inline-formula><mml:math id="M3" display="inline"><mml:mi mathvariant="bold">A</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M4" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> that transform the anomaly of the variables of <inline-formula><mml:math id="M5" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M6" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula>, denoted <inline-formula><mml:math id="M7" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">X</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M8" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>, into the set of canonical variables <inline-formula><mml:math id="M9" display="inline"><mml:mi mathvariant="bold">F</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M10" display="inline"><mml:mi mathvariant="bold">G</mml:mi></mml:math></inline-formula>:

              <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M11" display="block"><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="bold">F</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">X</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mi mathvariant="bold">G</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e338">The structure of <inline-formula><mml:math id="M12" display="inline"><mml:mi mathvariant="bold">F</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M13" display="inline"><mml:mi mathvariant="bold">G</mml:mi></mml:math></inline-formula> matches that of <inline-formula><mml:math id="M14" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M15" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula>. The canonical variables are constructed such that the variable <inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is maximally correlated with the variable <inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">G</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. At the same time, both <inline-formula><mml:math id="M18" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">G</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are uncorrelated with <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M21" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">G</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> for <inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>≠</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:math></inline-formula>; therefore, each additional canonical variable describes the maximal remaining correlation between the two datasets. The number of canonical variables that can be obtained with this procedure is limited to the smallest number of variables in <inline-formula><mml:math id="M23" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> or <inline-formula><mml:math id="M24" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula>.</p>
      <p id="d1e463">The calculation of the matrices <inline-formula><mml:math id="M25" display="inline"><mml:mi mathvariant="bold">A</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M26" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> is relatively straightforward using the algorithm of <xref ref-type="bibr" rid="bib1.bibx2" id="text.5"/>. Writing the requirements outlined above in equation form yields

              <disp-formula id="Ch1.E2" specific-use="align" content-type="subnumberedsingle"><mml:math id="M27" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E2.3"><mml:mtd><mml:mtext>2a</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msup><mml:mi mathvariant="bold">F</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi mathvariant="bold">F</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">G</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi mathvariant="bold">G</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>=</mml:mo><mml:mi mathvariant="bold">I</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E2.4"><mml:mtd><mml:mtext>2b</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msup><mml:mi mathvariant="bold">F</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi mathvariant="bold">G</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:mi mathvariant="bold">D</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          with <inline-formula><mml:math id="M28" display="inline"><mml:mi mathvariant="bold">I</mml:mi></mml:math></inline-formula> the unit matrix and <inline-formula><mml:math id="M29" display="inline"><mml:mi mathvariant="bold">D</mml:mi></mml:math></inline-formula> a diagonal matrix. The algorithm uses a QR decomposition to decompose both <inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">X</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> into an orthogonal matrix <inline-formula><mml:math id="M32" display="inline"><mml:mi mathvariant="bold">Q</mml:mi></mml:math></inline-formula> and an upper-triangular matrix <inline-formula><mml:math id="M33" display="inline"><mml:mi mathvariant="bold">R</mml:mi></mml:math></inline-formula>:

              <disp-formula id="Ch1.E5" content-type="numbered"><label>3</label><mml:math id="M34" display="block"><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msup><mml:mi mathvariant="bold">X</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold">Q</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:msub><mml:mi mathvariant="bold">R</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold">Q</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:msub><mml:mi mathvariant="bold">R</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

        The algorithm proceeds by applying a singular value decomposition (SVD) on the product <inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">Q</mml:mi><mml:mi>x</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold">Q</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>:

              <disp-formula id="Ch1.E6" content-type="numbered"><label>4</label><mml:math id="M36" display="block"><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msubsup><mml:mi mathvariant="bold">Q</mml:mi><mml:mi>x</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold">Q</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">USV</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

        By trying the ansatz,

              <disp-formula id="Ch1.E7" content-type="numbered"><label>5</label><mml:math id="M37" display="block"><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="bold">A</mml:mi><mml:mo>≡</mml:mo><mml:msubsup><mml:mi mathvariant="bold">R</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msubsup><mml:mi mathvariant="bold">U</mml:mi><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mi mathvariant="bold">B</mml:mi><mml:mo>≡</mml:mo><mml:msubsup><mml:mi mathvariant="bold">R</mml:mi><mml:mi>y</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msubsup><mml:mi mathvariant="bold">V</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        the orthonormality requirement of Eq. (<xref ref-type="disp-formula" rid="Ch1.E2.3"/>) becomes
          <disp-formula id="Ch1.E8" content-type="numbered"><label>6</label><mml:math id="M38" display="block"><mml:mtable rowspacing="0.2ex" columnspacing="1em" class="aligned" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msup><mml:mi mathvariant="bold">F</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi mathvariant="bold">F</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:msup><mml:mi mathvariant="bold">X</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mi/><mml:mi>T</mml:mi></mml:msup><mml:msup><mml:mi mathvariant="bold">X</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mi mathvariant="bold">A</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>=</mml:mo><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi mathvariant="bold">U</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">R</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:mfenced><mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:mfenced><mml:mfenced open="(" close=")"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">R</mml:mi><mml:mi>x</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msubsup><mml:mi mathvariant="bold">Q</mml:mi><mml:mi>x</mml:mi><mml:mi>T</mml:mi></mml:msubsup></mml:mrow></mml:mfenced><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi mathvariant="bold">Q</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:msub><mml:mi mathvariant="bold">R</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mfenced open="(" close=")"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">R</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msubsup><mml:mi mathvariant="bold">U</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>=</mml:mo><mml:mi mathvariant="bold">I</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
        and an analogous result follows for <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">G</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi mathvariant="bold">G</mml:mi></mml:mrow></mml:math></inline-formula>.</p>
      <?pagebreak page1025?><p id="d1e880">The orthogonality requirement of Eq. (<xref ref-type="disp-formula" rid="Ch1.E2.4"/>) becomes
          <disp-formula id="Ch1.E9" content-type="numbered"><label>7</label><mml:math id="M40" display="block"><mml:mtable class="aligned" columnspacing="1em" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="bold">D</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">F</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi mathvariant="bold">G</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:msup><mml:msup><mml:mi mathvariant="bold">X</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mi>T</mml:mi></mml:msup><mml:msup><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mi mathvariant="bold">B</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi mathvariant="bold">U</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">R</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:mfenced><mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:mfenced><mml:mfenced close=")" open="("><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">R</mml:mi><mml:mi>x</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msubsup><mml:mi mathvariant="bold">Q</mml:mi><mml:mi>x</mml:mi><mml:mi>T</mml:mi></mml:msubsup></mml:mrow></mml:mfenced><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Q</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:msub><mml:mi mathvariant="bold">R</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mfenced open="(" close=")"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">R</mml:mi><mml:mi>y</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msubsup><mml:mi mathvariant="bold">V</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">U</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi mathvariant="bold">USV</mml:mi><mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:mfenced><mml:mi mathvariant="bold">V</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="bold">S</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
        Therefore, the ansatz of Eq. (<xref ref-type="disp-formula" rid="Ch1.E7"/>) is a valid solution for the matrices <inline-formula><mml:math id="M41" display="inline"><mml:mi mathvariant="bold">A</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M42" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula>. Moreover, by counting the number of degrees of freedom in these matrices and the number of constraints provided by Eq. (<xref ref-type="disp-formula" rid="Ch1.E2"/>), it can be shown that all solutions are permutations of Eq. (<xref ref-type="disp-formula" rid="Ch1.E7"/>) <xref ref-type="bibr" rid="bib1.bibx20" id="paren.6"/>. The canonical basis is therefore uniquely defined. In the case that <inline-formula><mml:math id="M43" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M44" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula> contain different numbers of variables <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, the SVD of Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>) selects the <inline-formula><mml:math id="M47" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> largest correlations, with <inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mo>min⁡</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e1130">As QR decomposition and SVD are common matrix operations that are efficiently implemented in most numerical libraries, this algorithm is straightforward to implement in most programming languages.</p>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Using CCA to construct an OO</title>
      <p id="d1e1141">The CCA method can be used to construct an OO. Let <inline-formula><mml:math id="M49" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> be a set of (possibly) relevant model state variables and <inline-formula><mml:math id="M50" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula> the corresponding observation values. Here <inline-formula><mml:math id="M51" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula> could be obtained from a specialised model but also from a historical dataset of real observations. Applying the algorithm of Sect. <xref ref-type="sec" rid="Ch1.S2"/> yields the matrices <inline-formula><mml:math id="M52" display="inline"><mml:mi mathvariant="bold">A</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M53" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M54" display="inline"><mml:mi mathvariant="bold">D</mml:mi></mml:math></inline-formula>. The first two convert the mean subtracted model states <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">X</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> and observation values <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> into their canonical counterparts <inline-formula><mml:math id="M57" display="inline"><mml:mi mathvariant="bold">F</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M58" display="inline"><mml:mi mathvariant="bold">G</mml:mi></mml:math></inline-formula>. The diagonal matrix <inline-formula><mml:math id="M59" display="inline"><mml:mi mathvariant="bold">D</mml:mi></mml:math></inline-formula> holds for each pair of canonical variables <inline-formula><mml:math id="M60" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> the best fit to the slope of the correlation: <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">D</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="normal">d</mml:mi><mml:msub><mml:mi mathvariant="bold">G</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>/</mml:mo><mml:mi mathvariant="normal">d</mml:mi><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e1272">Assuming that <inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>≥</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>  – i.e.  the number of model state variables is at least equal to the number of observed variables –  it is possible to calculate <inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> from <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">X</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> by passing through canonical space and applying the fitted slope <inline-formula><mml:math id="M65" display="inline"><mml:mi mathvariant="bold">D</mml:mi></mml:math></inline-formula>,

              <disp-formula id="Ch1.E10" content-type="numbered"><label>8</label><mml:math id="M66" display="block"><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msup><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">X</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mi mathvariant="bold">ADB</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>≡</mml:mo><mml:msup><mml:mi mathvariant="bold">X</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mi mathvariant="bold">M</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        defining the CCA OO matrix,

              <disp-formula id="Ch1.E11" content-type="numbered"><label>9</label><mml:math id="M67" display="block"><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="bold">M</mml:mi><mml:mo>≡</mml:mo><mml:msup><mml:mi mathvariant="bold">ADB</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        of size <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>×</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. As the CCA considers only the anomaly of <inline-formula><mml:math id="M69" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M70" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula>, an additional offset term needs to be considered to accommodate the mean values of <inline-formula><mml:math id="M71" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M72" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula> in the input dataset. However, the mean values of <inline-formula><mml:math id="M73" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M74" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula> can be combined by applying the matrix <inline-formula><mml:math id="M75" display="inline"><mml:mi mathvariant="bold">M</mml:mi></mml:math></inline-formula>:
          <disp-formula id="Ch1.E12" content-type="numbered"><label>10</label><mml:math id="M76" display="block"><mml:mtable class="aligned" columnspacing="1em" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi mathvariant="bold">Y</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>=</mml:mo><mml:mfenced open="(" close=")"><mml:mrow><mml:mi mathvariant="bold">X</mml:mi><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi mathvariant="bold">X</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mrow></mml:mfenced><mml:mi mathvariant="bold">M</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="bold">Y</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>=</mml:mo><mml:mi mathvariant="bold">XM</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="bold-italic">K</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
        with

              <disp-formula id="Ch1.E13" content-type="numbered"><label>11</label><mml:math id="M77" display="block"><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="bold-italic">K</mml:mi><mml:mo>≡</mml:mo><mml:mover accent="true"><mml:mi mathvariant="bold">Y</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi mathvariant="bold">X</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mi mathvariant="bold">M</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        a combined offset vector of length <inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e1544">During the training phase of the CCA OO, the datasets <inline-formula><mml:math id="M79" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M80" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula> are used to calculate the matrix <inline-formula><mml:math id="M81" display="inline"><mml:mi mathvariant="bold">M</mml:mi></mml:math></inline-formula> and the offset <inline-formula><mml:math id="M82" display="inline"><mml:mi mathvariant="bold-italic">K</mml:mi></mml:math></inline-formula>. Once computed, they can be used to form an observation operator <inline-formula><mml:math id="M83" display="inline"><mml:mi mathvariant="normal">H</mml:mi></mml:math></inline-formula> that transforms a state <inline-formula><mml:math id="M84" display="inline"><mml:mi mathvariant="bold-italic">x</mml:mi></mml:math></inline-formula> as

              <disp-formula id="Ch1.E14" content-type="numbered"><label>12</label><mml:math id="M85" display="block"><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">H</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>=</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi mathvariant="bold">M</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="bold-italic">K</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

        Furthermore, the tangent-linear approximation used in variational DA schemes requires that

              <disp-formula id="Ch1.E15" content-type="numbered"><label>13</label><mml:math id="M86" display="block"><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">H</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>)</mml:mo><mml:mo>∼</mml:mo><mml:mi mathvariant="normal">H</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi mathvariant="normal">b</mml:mi></mml:msup><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msup><mml:mi mathvariant="bold">H</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        where <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">H</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> is the tangent-linear version of the OO, <inline-formula><mml:math id="M88" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi mathvariant="normal">b</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> the background state, and <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mrow></mml:math></inline-formula> the deviation from the background. The CCA OO is straightforward to implement in this scheme, since for <inline-formula><mml:math id="M90" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">H</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> and its adjoint <inline-formula><mml:math id="M91" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">H</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mi/><mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> it follows that

              <disp-formula id="Ch1.E16" content-type="numbered"><label>14</label><mml:math id="M92" display="block"><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msup><mml:mi mathvariant="bold">H</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">M</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="bold">H</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mi/><mml:mi>T</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mi mathvariant="bold">M</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Use case: satellite SST</title>
      <p id="d1e1768">One possible application of the new CCA OO is the assimilation of SST in ocean general circulation models (OGCMs). In recent years OGCMs have seen significant improvements in vertical resolution, particularly near the surface, where the first layer has been reduced to a thickness of the order of 1 m or less. At this resolution, the diurnal cycle of SST should be taken into account. Although diurnal variability is included to some extent <xref ref-type="bibr" rid="bib1.bibx12" id="paren.7"/>, the vertical resolution of OGCMs is still insufficient to fully resolve the variability of the skin and subskin ocean temperature.</p>
      <?pagebreak page1026?><p id="d1e1774">This issue becomes particularly evident when assimilating satellite SST observations. The different types of sensors used on satellites probe the ocean temperature at different depths. Infrared (IR) sensors measure the temperature at about 10 <inline-formula><mml:math id="M93" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m, a layer that is referred to as the ocean skin. Microwave (MW) sensors, on the other hand, measure the temperature of the layer below that, the subskin, with a depth of about 1 mm. This is much shallower than the vertical resolution of a typical OGCM, while these layers are strongly affected by the atmospheric conditions. The ocean skin cools due to thermodynamic processes at the air–sea interface, while the absorption of solar heat causes a warming of the subskin. At the same time, wind can mix the skin and subskin with the water below, smoothing the temperature variations. During days of low wind and/or high insolation conditions the amplitude of the SST diurnal cycle can be larger than the combined accuracy of the model and observations, causing a straightforward assimilation of SST to degrade the performance of the model <xref ref-type="bibr" rid="bib1.bibx13" id="paren.8"/>. Under favourable conditions this amplitude is typically of the order of a few degrees (see e.g. <xref ref-type="bibr" rid="bib1.bibx5" id="altparen.9"/>), but values as high as 6 <inline-formula><mml:math id="M94" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C have been observed <xref ref-type="bibr" rid="bib1.bibx14" id="paren.10"/>.</p>
      <p id="d1e1803">Representation errors have been extensively discussed within ocean applications <xref ref-type="bibr" rid="bib1.bibx16 bib1.bibx9" id="paren.11"/> and generally include errors due to e.g.  limited spatial resolution or unrepresented processes. However, the diurnal variability of skin SST represents a potentially systematic error that requires a proper treatment rather than just increasing the representation component of the observational error.</p>
      <p id="d1e1809"><?xmltex \hack{\newpage}?>An important source of SST observational data is the Spinning Enhanced Visible and Infrared Imager (SEVIRI) instrument onboard the Meteosat satellites of the second generation. As these are geostationary satellites, SEVIRI can provide continuous measurements of the same area with a 15 min temporal resolution. Although the IR imager is sensitive to skin temperature, the calibration algorithm of SEVIRI corrects for the cool-skin bias, and the resulting SST products should be considered the subskin temperature <xref ref-type="bibr" rid="bib1.bibx21" id="paren.12"/>. For wind speeds greater than 6 m s<inline-formula><mml:math id="M95" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>  the skin temperature may be calculated as <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">skin</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">subskin</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.17</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M97" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C <xref ref-type="bibr" rid="bib1.bibx4" id="paren.13"/>, but this is only an approximation.</p>
      <p id="d1e1863">This section will discuss how to use the output of a water column model specifically tuned for modelling the diurnal cycle of SST together with the CCA OO to build an observation operator for SST that accounts for the diurnal variability.</p>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>General Ocean Turbulence Model</title>
      <p id="d1e1873">The SST diurnal cycle is modelled using the General Ocean Turbulence Model (GOTM). The GOTM is a one-dimensional water column model that includes multiple turbulence closure schemes <xref ref-type="bibr" rid="bib1.bibx3 bib1.bibx23" id="paren.14"/>. It has been successfully adapted to model the near-surface variability of ocean temperature, including both the diurnal cycle and the cool-skin effect <xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx18" id="paren.15"/>. Recently it has been used to systematically simulate the atmospheric and oceanographic conditions in the Mediterranean Sea <xref ref-type="bibr" rid="bib1.bibx19" id="paren.16"/>. The latter study has resulted in a multi-year dataset modelling the diurnal cycle in the Mediterranean Sea on a grid of <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.75</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup><mml:mo>×</mml:mo><mml:mn mathvariant="normal">0.75</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> resolution with hourly time resolution. For this dataset the GOTM is configured with the <inline-formula><mml:math id="M99" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-<inline-formula><mml:math id="M100" display="inline"><mml:mi mathvariant="italic">ε</mml:mi></mml:math></inline-formula> turbulent kinetic energy parameterisation with internal waves. The top 75 m of the water column is resolved using 122 vertical layers with fine resolution near the surface and gradually becoming coarser with depth. The uppermost 1 m contains a total of 21 layers, with the highest level at 1.5 cm of depth. This dataset is used in the present paper to build the CCA OO for SST.</p>
      <p id="d1e1920">The subskin SST represents the temperature at the base of the conductive laminar sub-layer of the ocean surface; for practical purposes it is represented by the temperature of the top model layer of the GOTM (1.5 cm). The conductive sub-layer of the air–sea interface, associated with the cool-skin effect, is parameterised and dynamically computed within the GOTM to produce a modelled skin SST. Further details are provided in <xref ref-type="bibr" rid="bib1.bibx19" id="text.17"/>.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Operator setup</title>
      <p id="d1e1934">The aim for the CCA OO is to parameterise the IR and MW satellite SST observations as a function of temperature in the water column below. While the dataset of <xref ref-type="bibr" rid="bib1.bibx19" id="text.18"/> uses a fine vertical resolution to calculate the SST<?pagebreak page1027?> observations, the CCA OO will consider only the levels of a typical OGCM. Within the SOSSTA project this OGCM is the CMEMS Mediterranean Forecasting System (MFS) <xref ref-type="bibr" rid="bib1.bibx22" id="paren.19"/>, but the parameterisation can be performed for any vertical distribution of levels.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1" specific-use="star"><?xmltex \currentcnt{1}?><label>Figure 1</label><caption><p id="d1e1945">The magnitude of the diurnal warming at the subskin level as a function of the time of the day for different wind and insolation categories. The diurnal warming is measured with respect to the SST at local sunrise. The wind categories are represented by the different panels, while the insolation categories are shown as different curves within each panel.</p></caption>
          <?xmltex \igopts{width=441.017717pt}?><graphic xlink:href="https://os.copernicus.org/articles/15/1023/2019/os-15-1023-2019-f01.png"/>

        </fig>

      <p id="d1e1954">The magnitude of the diurnal signal depends strongly on the atmospheric conditions, most importantly the insolation and wind speed. Insolation causes the ocean skin to heat up during the course of the day, while wind mixes the upper layers of the ocean, leading to the dissipation of the heat. Due to latent heat loss, the ocean skin may even cool down below the bulk temperature. To accommodate a non-linear dependence on the different insolation and wind scenarios in the CCA OO, the GOTM dataset is divided into 12 insolation and 8 wind categories. Insolation and wind are defined in each location as the daily mean value in local mean time (LMT). The category boundaries were chosen to equally divide the dataset. The magnitude of the diurnal warming for the different categories is shown in Fig. <xref ref-type="fig" rid="Ch1.F1"/>.</p>
      <p id="d1e1960">The GOTM dataset has been compared to SEVIRI data at the skin level in <xref ref-type="bibr" rid="bib1.bibx19" id="text.20"/> and was found to be in good agreement over the whole period of 2013 and 2014. However, after dividing the dataset into atmospheric categories, it is found that categories with high diurnal warming may have a warm bias of up to 0.5 <inline-formula><mml:math id="M101" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C and categories with low diurnal warming a cold bias of typically 0.1–0.2 <inline-formula><mml:math id="M102" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C. This category bias is corrected for by subtracting the mean difference between SEVIRI and GOTM at subskin level for each category.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2" specific-use="star"><?xmltex \currentcnt{2}?><label>Figure 2</label><caption><p id="d1e1986">The correlation coefficients between the model variables and observations <bold>(a)</bold>, with the canonical equivalent of these variables <bold>(b)</bold>.</p></caption>
          <?xmltex \igopts{width=469.470472pt}?><graphic xlink:href="https://os.copernicus.org/articles/15/1023/2019/os-15-1023-2019-f02.png"/>

        </fig>

      <p id="d1e2001">For each category of wind and insolation, and at hourly time resolution, the CCA OO is calculated to project the 10 uppermost levels of the MFS model onto the skin and subskin SST temperatures. The 10 levels extend down to a depth of approximately 40 m, which was chosen to be well below the depth influenced by the diurnal cycle of temperature. Figure <xref ref-type="fig" rid="Ch1.F2"/>a shows the correlation between the model temperature at various depths and the two SST observation types. As expected, the SST is strongly correlated with the highest levels and the correlation decreases with depth. It is important to note that in this case the various levels are also strongly correlated with each other. Figure <xref ref-type="fig" rid="Ch1.F2"/>b shows the correlation after transforming to canonical coordinates. It can be seen that the strongest correlation has not significantly changed, as the first canonical variable is very similar to the highest model level. The second pair of canonical variables <inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold">G</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, however, describes an additional correlation of around 60 % between model water temperature and SST.</p>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Validation</title>
      <p id="d1e2038">The CCA OO is validated by comparing its performance to that of the full GOTM. To use the operator effectively in a DA system, it should be able to provide an accurate approximation of the GOTM results. The validation is performed against GOTM profiles that are withheld from the CCA OO calculation. The GOTM dataset is split in two, withholding every other profile in the zonal direction from the calculation. The validation then uses the withheld profiles and extracts the depths corresponding to the MFS levels, mimicking the use of the operator inside a DA system. The CCA OO, based on the atmospheric category and closest time, is subsequently applied to project the model temperature onto the skin and subskin SST. The projected SST values are then compared to the values in the original GOTM profile.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3" specific-use="star"><?xmltex \currentcnt{3}?><label>Figure 3</label><caption><p id="d1e2043">Examples of temperature profiles in various conditions and at different times. The GOTM profiles are   shown by the red curve, while the filled circles indicate the values used as input to the CCA OO. The output of the CCA OO is shown by the black triangles. <bold>(a)</bold> Low wind, high insolation, early morning; <bold>(b)</bold> low wind, high insolation, afternoon; <bold>(c)</bold> high wind, high insolation, afternoon; <bold>(d)</bold> high wind, low insolation, afternoon.</p></caption>
          <?xmltex \igopts{width=469.470472pt}?><graphic xlink:href="https://os.copernicus.org/articles/15/1023/2019/os-15-1023-2019-f03.png"/>

        </fig>

      <p id="d1e2064">Some examples of the validation are shown in Fig. <xref ref-type="fig" rid="Ch1.F3"/>. Each panel shows a profile from the GOTM dataset, together with the model levels that were used as input to the CCA OO. The output of the CCA OO is superimposed onto the GOTM profile so that a comparison can be made. Figure <xref ref-type="fig" rid="Ch1.F3"/>a shows a temperature profile in the early morning, during a day of low wind and high insolation. At this time, diurnal warming is limited, and due to the clear-sky conditions the skin and subskin temperatures have cooled down slightly below the temperature of the first model level. Figure <xref ref-type="fig" rid="Ch1.F3"/>b shows an afternoon profile on a similar day. At this time, diurnal warming is around its maximum, and the skin temperature has increased about 1 <inline-formula><mml:math id="M104" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C above the first level of the model. In the case of high wind speed, the increased mixing of the upper layer of the ocean can completely cancel the effect of the high insolation, as shown in Fig. <xref ref-type="fig" rid="Ch1.F3"/>c. In this situation the temperature in the upper 10 m of the ocean is almost constant. When high wind conditions coincide with low insolation, the surface can also cool quite significantly, as shown in Fig. <xref ref-type="fig" rid="Ch1.F3"/>d. The CCA OO is able to correctly reproduce the GOTM skin and subskin temperature under different atmospheric conditions. The atmospheric categories with strong diurnal warming have a root mean square error (RMSE) of up to <inline-formula><mml:math id="M105" display="inline"><mml:mn mathvariant="normal">0.4</mml:mn></mml:math></inline-formula> <inline-formula><mml:math id="M106" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C; for all other categories the RMSE is around <inline-formula><mml:math id="M107" display="inline"><mml:mn mathvariant="normal">0.1</mml:mn></mml:math></inline-formula> <inline-formula><mml:math id="M108" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C. The bias of the CCA OO compared to the GOTM was found to be negligible.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4" specific-use="star"><?xmltex \currentcnt{4}?><label>Figure 4</label><caption><p id="d1e2122">Skill score of the CCA OO compared to the OGCM upper layer for all wind and insolation categories at midnight <bold>(a)</bold> and in the afternoon <bold>(b)</bold>.</p></caption>
          <?xmltex \igopts{width=497.923228pt}?><graphic xlink:href="https://os.copernicus.org/articles/15/1023/2019/os-15-1023-2019-f04.png"/>

        </fig>

</sec>
</sec>
<sec id="Ch1.S5">
  <label>5</label><title>Performance and discussion</title>
      <p id="d1e2146">The performance of the GOTM-based CCA OO for SST is compared to other commonly used methods. For this comparison the GOTM dataset is again split along the zonal direction using every other profile to calculate the CCA OO. The remaining profiles are matched to SEVIRI subskin retrievals using only profiles matched to a measurement with an acceptable (4) or good (5) quality control level. The performance can be conveniently expressed in terms of the skill score (<inline-formula><mml:math id="M109" display="inline"><mml:mi mathvariant="normal">SS</mml:mi></mml:math></inline-formula>), defined by <xref ref-type="bibr" rid="bib1.bibx15" id="text.21"/> as

              <disp-formula id="Ch1.E17" content-type="numbered"><label>15</label><mml:math id="M110" display="block"><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="normal">SS</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi mathvariant="normal">MSE</mml:mi><mml:mi mathvariant="normal">model</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="normal">MSE</mml:mi><mml:mi mathvariant="normal">reference</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

        The skill score is based on the mean square error (MSE) of the model under testing and of a reference model. Specifically, it expresses the difference in MSE as a fraction of the reference MSE. The skill score is straightforward to interpret: a perfect model (<inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:mi mathvariant="normal">MSE</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>) results in a skill score of<?pagebreak page1028?> 1, while a model that shows no improvement over the reference model receives a skill score of 0. Negative skill scores indicate that the model performs worse and its MSE has increased with respect to the reference. In this case the CCA OO will be used as the model and the reference will be another commonly used OO. The MSE is calculated with respect to the SEVIRI subskin temperature.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5" specific-use="star"><?xmltex \currentcnt{5}?><label>Figure 5</label><caption><p id="d1e2204">Skill score of the CCA OO compared to the parameterisation of <xref ref-type="bibr" rid="bib1.bibx1" id="text.22"/> in the afternoon <bold>(a)</bold> and early evening <bold>(b)</bold>.</p></caption>
        <?xmltex \igopts{width=497.923228pt}?><graphic xlink:href="https://os.copernicus.org/articles/15/1023/2019/os-15-1023-2019-f05.png"/>

      </fig>

      <?pagebreak page1030?><p id="d1e2222">The simplest method of assimilating satellite SST observations in a model that insufficiently describes the diurnal cycle of SST is to assimilate only at night or during high wind; see, for example, <xref ref-type="bibr" rid="bib1.bibx24" id="text.23"/>. During the night the cycle of SST is close to its minimum value and the temperature of the upper layer of an OGCM forms a reasonable approximation for the skin temperature. In this situation the assimilation is performed without additional corrections. Figure <xref ref-type="fig" rid="Ch1.F4"/>a shows the skill score of the CCA OO at midnight local time using the temperature of the OGCM upper layer as a reference method. Figure <xref ref-type="fig" rid="Ch1.F4"/>b shows the same situation, but in the afternoon. For high wind and low insolation the CCA OO performs, as expected, similarly to using the upper OGCM layer. However, for low wind speeds and high insolation the CCA OO shows a clear improvement, even at midnight. This can be explained by the fact that at midnight some diurnal signal still remains and, even using the wind and insolation values of the next day, this is correctly modelled by the CCA OO.</p>
      <p id="d1e2233">A more advanced solution is the parameterisation of <xref ref-type="bibr" rid="bib1.bibx1" id="text.24"/>, which estimates the diurnal signal as a function of wind, insolation, and time. This is a commonly used parameterisation; for example, it is included with the NEMO ocean model <xref ref-type="bibr" rid="bib1.bibx11" id="paren.25"/>. Figure <xref ref-type="fig" rid="Ch1.F5"/> shows the skill score for the CCA OO compared to the parameterisation of <xref ref-type="bibr" rid="bib1.bibx1" id="text.26"/> at the peak of the diurnal cycle (a) and in the early evening (b). It can be seen that for high insolation and low wind, conditions for which the diurnal warming is largest, both methods perform similarly. However, the CCA OO is better at accommodating different atmospheric conditions and shows significant improvements for the intermediate insolation and wind categories. Moreover, Fig. <xref ref-type="fig" rid="Ch1.F5"/>b shows that the CCA OO is able to better parameterise the cooling of the subskin in the late afternoon–evening after the peak of the diurnal warming has passed.</p>
      <p id="d1e2249">Using the CCA OO to improve the description of SST has many potential applications. For example, the CCA OO could be used as a parameterisation of diurnally varying skin SST within an OGCM as part of the air–sea flux calculations. The skin SST is the true interface temperature for air–sea fluxes, so this approach should result in improved air–sea heat transfer in OGCMs and coupled ocean–atmosphere models. See, for example, <xref ref-type="bibr" rid="bib1.bibx13" id="text.27"/>. Another possibility would be the use of the CCA OO as a parameterisation of diurnally varying SST within a climate model. The diurnal cycle is a fundamental signal of the climate system, yet for climate models the lack of vertical structure (and temporal resolution) is even more critical. See, for example, <xref ref-type="bibr" rid="bib1.bibx10" id="text.28"/>.</p>
      <p id="d1e2258">Due to the way in which it is constructed, the CCA OO is an inherently linear operator. This makes it straightforward to implement in DA schemes that require linearised and differentiable OOs. However, non-linear effects can be accommodated to some extent by constructing a series of CCA OOs conditioned on such a non-linear dependency. For example, in the case of SST, this method has been used to condition the CCA OO on insolation, wind, and time. The only requirement in this case is that the datasets <inline-formula><mml:math id="M112" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M113" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula> of Sect. <xref ref-type="sec" rid="Ch1.S3"/> are sufficiently large to divide them by such a dependent variable.</p>
      <p id="d1e2277">The minimum size of the input dataset required ultimately depends on the number of model variables used (<inline-formula><mml:math id="M114" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) and the number of observation variables to predict (<inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>). The number of free parameters in the CCA OO matrix <inline-formula><mml:math id="M116" display="inline"><mml:mi mathvariant="bold">M</mml:mi></mml:math></inline-formula> and the offset <inline-formula><mml:math id="M117" display="inline"><mml:mi mathvariant="bold-italic">K</mml:mi></mml:math></inline-formula> equals <inline-formula><mml:math id="M118" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. As each entry in the input dataset also provides <inline-formula><mml:math id="M119" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> observation values, Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>) requires a minimum of <inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> entries to be mathematically solvable. However, at this point the CCA OO will be overfitted. It will simply be able to memorise the input datasets rather than being based on general characteristics of the data. Care has to be taken to avoid this situation, making sure the input dataset contains a number of entries <inline-formula><mml:math id="M121" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> with <inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mi mathvariant="italic">&gt;&gt;</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. Whether a given size <inline-formula><mml:math id="M123" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is sufficient should be tested using independent data. One possible method for this test is to withhold part of the input dataset from the CCA OO calculation and then use this subset to calculate the CCA OO performance.</p>
</sec>
<sec id="Ch1.S6" sec-type="conclusions">
  <label>6</label><title>Conclusions</title>
      <p id="d1e2406">Observation operators (OOs) form a central component in any data assimilation (DA) system, as they transform the state variables of a numerical model into real-world observable variables. Often, an OO also needs to correct for processes that are not fully described by the parent model. Such processes may be best modelled by interfacing the OO to a<?pagebreak page1031?> specialised model, but this is generally not feasible due to computational constraints.</p>
      <p id="d1e2409">The assimilation of satellite sea surface temperature (SST) in ocean general circulation models (OGCMs) is a prime example of a situation in which insufficiently modelled processes play an important role. The diurnal cycle of SST causes a discrepancy in the temperature of the very thin upper layer measured by a satellite and the rather coarse upper layer in a typical OGCM. On a clear summer day with low wind, this discrepancy can amount to as much as 2 <inline-formula><mml:math id="M124" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C or more <xref ref-type="bibr" rid="bib1.bibx19" id="paren.29"/>.</p>
      <p id="d1e2424">The current paper presented a method, based on canonical correlation analysis (CCA), to build parameterisations based on an output dataset of a specialised model. These parameterisations, referred to as the CCA OO, can provide an efficient approximation to the results of the specialised model and are therefore well-suited for use in DA systems.</p>
      <p id="d1e2427">The case of SST assimilation has been used to demonstrate the new CCA OO. Using an output dataset of the General Ocean Turbulence Model (GOTM), a high-resolution water column model specifically tuned for modelling the diurnal cycle of SST, a new CCA OO has been derived. Subsequently, the operator has been applied to reduced-resolution temperature profiles from the GOTM to simulate its use in a DA system. The approximations provided by the CCA OO are found to be in good agreement with the GOTM at various times of the day and across all atmospheric conditions. The results indicate that the CCA OO could be used to enable the assimilation of SST in conditions under which  this was previously not possible. Moreover, the atmospheric categories that were introduced in the construction of the CCA OO for SST show that the linear assumption implicit in CCA can be partially relaxed. This makes the CCA OO versatile for any condition. Compared to commonly used methods for SST assimilation, the CCA OO can provide substantial improvements. This is especially true for measurements of the skin SST, since the CCA OO profits from the modelling of the cool-skin effect that is included in the GOTM.</p>
      <p id="d1e2431">The ability of the CCA OO to handle complicated physical models in a relatively simple way is attractive for a large number of problems in DA, for which reduced-order OOs are desirable due to computational constraints. Remotely sensed data are the obvious target given the complexity of their relationships with state variables. Observations in coupled assimilations (e.g.  ocean–atmosphere, ocean–sea ice, or ocean–biogeochemistry) are examples of challenging problems that could be investigated in the future with the CCA OO.</p>
</sec>

      
      </body>
    <back><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d1e2438">The GOTM dataset used in Sects. 4 and 5 is available as described in Pimentel et al. (2019). The code for calculating the CCA OO is available from the authors upon request.</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d1e2444">EJ designed and implemented the CCA OO software. SP and WHT performed the modelling of   the diurnal cycle. DD, GK, and IM evaluated the OO in different DA systems and provided feedback on the modelling and the software. AS was the PI of the project and coordinated the work. EJ prepared the paper with input from all co-authors.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d1e2450">The authors declare that they have no conflict of interest.</p>
  </notes><notes notes-type="sistatement"><title>Special issue statement</title>

      <p id="d1e2456">This article is part of the special issue “The Copernicus Marine Environment Monitoring Service (CMEMS): scientific advances”. It is not associated with a conference.</p>
  </notes><ack><title>Acknowledgements</title><p id="d1e2462">This work forms part of the SOSSTA project, which has been funded by the EU Copernicus Marine Environment Monitoring Service (CMEMS) through the Service Evolution grants.</p></ack><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d1e2468">This paper was edited by Pierre-Yves Le Traon and reviewed by Salvatore Marullo and one anonymous referee.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Bernie et al.(2007)Bernie, Guilyardi, Madec, Slingo, and
Woolnough</label><?label Bernie:2007aa?><mixed-citation>Bernie, D. J., Guilyardi, E., Madec, G., Slingo, J. M., and Woolnough, S. J.:
Impact of resolving the diurnal cycle in an ocean–atmosphere GCM. Part 1: a
diurnally forced OGCM, Clim. Dynam., 29, 575–590,
<ext-link xlink:href="https://doi.org/10.1007/s00382-007-0249-6" ext-link-type="DOI">10.1007/s00382-007-0249-6</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx2"><?xmltex \def\ref@label{{Bj{\"{o}}rck and Golub(1973)}}?><label>Björck and Golub(1973)</label><?label Bjorck:1973aa?><mixed-citation>Björck, Å. and Golub, G. H.: Numerical Methods for Computing Angles
Between Linear Subspaces, Math. Comput., 27, 579–594,
<ext-link xlink:href="https://doi.org/10.2307/2005662" ext-link-type="DOI">10.2307/2005662</ext-link>, 1973.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Burchard et al.(1999)Burchard, Bolding, and
Ruiz-Villarreal</label><?label Burchard:1999aa?><mixed-citation>
Burchard, H., Bolding, K., and Ruiz-Villarreal, M.: GOTM, a general ocean
turbulence model. Theory, implementation and test cases, Tech. Rep. EUR
18745 EN, European Commission, Brussels, Belgium, 1999.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Donlon et al.(2002)Donlon, Minnett, Gentemann, Nightingale, Barton,
Ward, and Murray</label><?label Donlon:2002aa?><mixed-citation>Donlon, C. J., Minnett, P. J., Gentemann, C., Nightingale, T. J., Barton,
I. J., Ward, B., and Murray, M. J.: Toward Improved Validation of Satellite
Sea Surface Skin Temperature Measurements for Climate Research, J. Climate, 15, 353–369,
<ext-link xlink:href="https://doi.org/10.1175/1520-0442(2002)015&lt;0353:TIVOSS&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0442(2002)015&lt;0353:TIVOSS&gt;2.0.CO;2</ext-link>,
2002.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Flament et al.(1994)Flament, Firing, Sawyer, and
Trefois</label><?label Flament:1994aa?><mixed-citation>Flament, P., Firing, J., Sawyer, M., and Trefois, C.: Amplitude and Horizontal
Structure of a Large Diurnal Sea Surface Warming Event during the Coastal
Ocean Dynamics Experiment, J. Phys. Oceanogr., 24, 124–139,
<ext-link xlink:href="https://doi.org/10.1175/1520-0485(1994)024&lt;0124:AAHSOA&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0485(1994)024&lt;0124:AAHSOA&gt;2.0.CO;2</ext-link>,
1994.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx6"><label>Haddad et al.(2015)Haddad, Steward, Tseng, Vukicevic, Chen, and
Hristova-Veleva</label><?label Haddad:2015aa?><mixed-citation>Haddad, Z. S., Steward, J. L., Tseng, H. C., Vukicevic, T., Chen, S. H., and
Hristova-Veleva, S.: A data assimilation technique to account for the
nonlinear dependence of scattering microwave observations of precipitation,
J. Geophys. Res.-Atmos., 120, 5548–5563,
<ext-link xlink:href="https://doi.org/10.1002/2015JD023107" ext-link-type="DOI">10.1002/2015JD023107</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Harris and Kelly(2001)</label><?label Harris:2001aa?><mixed-citation>Harris, B. A. and Kelly, G.: A satellite radiance-bias correction scheme for
data assimilation, Q. J. Roy. Meteor. Soc.,
127, 1453–1468, <ext-link xlink:href="https://doi.org/10.1002/qj.49712757418" ext-link-type="DOI">10.1002/qj.49712757418</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Hotelling(1936)</label><?label Hotelling:1936aa?><mixed-citation>
Hotelling, H.: Relations Between Two Sets of Variates, Biometrika, 28,
321–377, 1936.</mixed-citation></ref>
      <ref id="bib1.bibx9"><?xmltex \def\ref@label{{Janji{\'{c}} et~al.(2018)Janji{\'{c}}, Bormann, Bocquet, Carton, Cohn,
Dance, Losa, Nichols, Potthast, Waller, and Weston}}?><label>Janjić et al.(2018)Janjić, Bormann, Bocquet, Carton, Cohn,
Dance, Losa, Nichols, Potthast, Waller, and Weston</label><?label Janjic:2018aa?><mixed-citation>Janjić, T., Bormann, N., Bocquet, M., Carton, J. A., Cohn, S. E., Dance,
S. L., Losa, S. N., Nichols, N. K., Potthast, R., Waller, J. A., and Weston,
P.: On the representation error in data assimilation, Q. J. Roy. Meteor. Soc., 144, 1257–1278, <ext-link xlink:href="https://doi.org/10.1002/qj.3130" ext-link-type="DOI">10.1002/qj.3130</ext-link>,
2018.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Large and Caron(2015)</label><?label Large:2015aa?><mixed-citation>Large, W. G. and Caron, J. M.: Diurnal cycling of sea surface temperature,
salinity, and current in the CESM coupled climate model, J. Geophys. Res.-Oceans, 120, 3711–3729, <ext-link xlink:href="https://doi.org/10.1002/2014JC010691" ext-link-type="DOI">10.1002/2014JC010691</ext-link>,
2015.</mixed-citation></ref>
      <ref id="bib1.bibx11"><?xmltex \def\ref@label{{Madec et~al.(1998)Madec, Delecluse, Imbard, and
L{\'{e}}vy}}?><label>Madec et al.(1998)Madec, Delecluse, Imbard, and
Lévy</label><?label Madec:1998aa?><mixed-citation>
Madec, G., Delecluse, P., Imbard, M., and Lévy, C.: OPA 8.1 Ocean General
Circulation Model Reference Model, Tech. Rep. 11, Institut Pierre Simon
Laplace des Sciences de l'Environment Global, 1998.</mixed-citation></ref>
      <ref id="bib1.bibx12"><?xmltex \def\ref@label{{Marullo et~al.(2014)Marullo, Santoleri, Ciani, Borgne, P{\'{e}}r{\'{e}},
Pinardi, Tonani, and Nardone}}?><label>Marullo et al.(2014)Marullo, Santoleri, Ciani, Borgne, Péré,
Pinardi, Tonani, and Nardone</label><?label Marullo:2014aa?><mixed-citation>Marullo, S., Santoleri, R., Ciani, D., Borgne, P. L., Péré, S.,
Pinardi, N., Tonani, M., and Nardone, G.: Combining model and geostationary
satellite data to reconstruct hourly SST field over the Mediterranean Sea,
Remote Sens. Environ., 146, 11–23,
<ext-link xlink:href="https://doi.org/10.1016/j.rse.2013.11.001" ext-link-type="DOI">10.1016/j.rse.2013.11.001</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Marullo et al.(2016)Marullo, Minnett, Santoleri, and
Tonani</label><?label Marullo:2016aa?><mixed-citation>Marullo, S., Minnett, P. J., Santoleri, R., and Tonani, M.: The diurnal cycle
of sea-surface temperature and estimation of the heat budget of the
Mediterranean Sea, J. Geophys. Res.-Oceans, 121, 8351–8367,
<ext-link xlink:href="https://doi.org/10.1002/2016JC012192" ext-link-type="DOI">10.1002/2016JC012192</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx14"><?xmltex \def\ref@label{{Merchant et~al.(2008)Merchant, Filipiak, Le~Borgne, Roquet, Autret,
Pioll{\'{e}}, and Lavender}}?><label>Merchant et al.(2008)Merchant, Filipiak, Le Borgne, Roquet, Autret,
Piollé, and Lavender</label><?label Merchant:2008aa?><mixed-citation>Merchant, C. J., Filipiak, M. J., Le Borgne, P., Roquet, H., Autret, E.,
Piollé, J. F., and Lavender, S.: Diurnal warm-layer events in the western
Mediterranean and European shelf seas, Geophys. Res. Lett., 35, L04601,
<ext-link xlink:href="https://doi.org/10.1029/2007GL033071" ext-link-type="DOI">10.1029/2007GL033071</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Murphy(1988)</label><?label Murphy:1988aa?><mixed-citation>Murphy, A. H.: Skill Scores Based on the Mean Square Error and Their
Relationships to the Correlation Coefficient, Mon. Weather Rev., 116,
2417–2424, <ext-link xlink:href="https://doi.org/10.1175/1520-0493(1988)116&lt;2417:SSBOTM&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0493(1988)116&lt;2417:SSBOTM&gt;2.0.CO;2</ext-link>, 1988.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Oke and Sakov(2008)</label><?label Oke:2008aa?><mixed-citation>Oke, P. R. and Sakov, P.: Representation Error of Oceanic Observations for Data
Assimilation, J. Atmos. Ocean. Tech., 25, 1004–1017,
<ext-link xlink:href="https://doi.org/10.1175/2007JTECHO558.1" ext-link-type="DOI">10.1175/2007JTECHO558.1</ext-link>,
2008.</mixed-citation></ref>
      <ref id="bib1.bibx17"><?xmltex \def\ref@label{{Pimentel et~al.(2008{\natexlab{a}})Pimentel, Haines, and
Nichols}}?><label>Pimentel et al.(2008a)Pimentel, Haines, and
Nichols</label><?label Pimentel:2008aa?><mixed-citation>Pimentel, S., Haines, K., and Nichols, N. K.: Modeling the diurnal variability
of sea surface temperatures, J. Geophys. Res.-Oceans, 113,
C11004, <ext-link xlink:href="https://doi.org/10.1029/2007JC004607" ext-link-type="DOI">10.1029/2007JC004607</ext-link>, 2008a.</mixed-citation></ref>
      <ref id="bib1.bibx18"><?xmltex \def\ref@label{{Pimentel et~al.(2008{\natexlab{b}})Pimentel, Haines, and
Nichols}}?><label>Pimentel et al.(2008b)Pimentel, Haines, and
Nichols</label><?label Pimentel:2008ab?><mixed-citation>Pimentel, S., Haines, K., and Nichols, N. K.: The assimilation of
satellite-derived sea surface temperatures into a diurnal cycle model,
J. Geophys. Res.-Oceans, 113, C09013,
<ext-link xlink:href="https://doi.org/10.1029/2007JC004608" ext-link-type="DOI">10.1029/2007JC004608</ext-link>, 2008b.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Pimentel et al.(2019)Pimentel, Tse, Xu, Denaxa, Jansen, Korres,
Mirouze, and Storto</label><?label Pimentel:2019aa?><mixed-citation>Pimentel, S., Tse, W.-H., Xu, H., Denaxa, D., Jansen, E., Korres, G., Mirouze,
I., and Storto, A.: Modeling the near-surface diurnal cycle of sea surface
temperature in the Mediterranean Sea, J. Geophys. Res.-Oceans, 124, 171–183, <ext-link xlink:href="https://doi.org/10.1029/2018JC014289" ext-link-type="DOI">10.1029/2018JC014289</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Press(2011)</label><?label Press:2011aa?><mixed-citation>Press, W. H.: Canonical Correlation Clarified by Singular Value Decomposition, available
at: <uri>http://numerical.recipes/whp/workingpapers.html</uri> (last
access: 12 June 2019), 2011.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Saux Picart and Legendre(2018)</label><?label Saux-Picart:2018aa?><mixed-citation>Saux Picart, S. and Legendre, G.: MSG/SEVIRI Sea Surface Temperature data
record Product User Manual, Tech. Rep. OSI-250, EUMETSAT, OSI SAF,
<ext-link xlink:href="https://doi.org/10.15770/EUM_SAF_OSI_0004" ext-link-type="DOI">10.15770/EUM_SAF_OSI_0004</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Simoncelli et al.(2014)Simoncelli, Fratianni, Pinardi, Grandi, Drudi,
Oddo, and Dobricic</label><?label Simoncelli:2014aa?><mixed-citation>Simoncelli, S., Fratianni, C., Pinardi, N., Grandi, A., Drudi, M., Oddo, P.,
and Dobricic, S.: Mediterranean Sea physical reanalysis (MEDREA 1987–2015)
(Version 1), Tech. rep., EU Copernicus Marine Service Information,
<ext-link xlink:href="https://doi.org/10.25423/medsea_reanalysis_phys_006_004" ext-link-type="DOI">10.25423/medsea_reanalysis_phys_006_004</ext-link>, 2014.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx23"><label>Umlauf et al.(2005)Umlauf, Burchard, and Bolding</label><?label Umlauf:2005aa?><mixed-citation>
Umlauf, L., Burchard, H., and Bolding, K.: General Ocean Turbulence Model,
Scientific Documentation v3.2., Tech. Rep. 63, Institute for Baltic Sea
Research Warnemünde, Rostock-Warnemünde, Germany, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Waters et al.(2015)Waters, Lea, Martin, Mirouze, Weaver, and
While</label><?label Waters:2015aa?><mixed-citation>Waters, J., Lea, D. J., Martin, M. J., Mirouze, I., Weaver, A., and While, J.:
Implementing a variational data assimilation system in an operational 1/4
degree global ocean model, Q. J. Roy. Meteor. Soc., 141, 333–349, <ext-link xlink:href="https://doi.org/10.1002/qj.2388" ext-link-type="DOI">10.1002/qj.2388</ext-link>, 2015.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Using canonical correlation analysis to produce dynamically based and highly efficient statistical observation operators</article-title-html>
<abstract-html><p>Observation operators (OOs) are a central component of any data assimilation system. As they project the state variables of a numerical model into the space of the observations, they also provide an ideal opportunity to correct for effects that are not described or are insufficiently described by the model. In such cases a dynamical OO, an OO that interfaces to a secondary and more specialised model, often provides the best results. However, given the large number of observations to be assimilated in a typical atmospheric or oceanographic model, the computational resources needed for using a fully dynamical OO mean that this option is usually not feasible. This paper presents a method, based on canonical correlation analysis (CCA), that can be used to generate highly efficient statistical OOs that are based on a dynamical model. These OOs can provide an approximation to the dynamical model at a fraction of the computational cost.</p><p>One possible application of such an OO is the modelling of the diurnal cycle of sea surface temperature (SST) in ocean general circulation models (OGCMs). Satellites that measure SST measure the temperature of the thin uppermost layer of the ocean. This layer is strongly affected by atmospheric conditions, and its temperature can differ significantly from the water below. This causes a discrepancy between the SST measurements and the upper layer of the OGCM, which typically has a thickness of around 1&thinsp;m. The CCA OO method is used to parameterise the diurnal cycle of SST. The CCA OO is based on an input dataset from the General Ocean Turbulence Model (GOTM), a high-resolution water column model that has been specifically tuned for this purpose. The parameterisations of the CCA OO are found to be in good agreement with the results from the GOTM and improve upon existing parameterisations, showing the potential of this method for use in data assimilation systems.</p></abstract-html>
<ref-html id="bib1.bib1"><label>Bernie et al.(2007)Bernie, Guilyardi, Madec, Slingo, and
Woolnough</label><mixed-citation>
Bernie, D. J., Guilyardi, E., Madec, G., Slingo, J. M., and Woolnough, S. J.:
Impact of resolving the diurnal cycle in an ocean–atmosphere GCM. Part 1: a
diurnally forced OGCM, Clim. Dynam., 29, 575–590,
<a href="https://doi.org/10.1007/s00382-007-0249-6" target="_blank">https://doi.org/10.1007/s00382-007-0249-6</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Björck and Golub(1973)</label><mixed-citation>
Björck, Å. and Golub, G. H.: Numerical Methods for Computing Angles
Between Linear Subspaces, Math. Comput., 27, 579–594,
<a href="https://doi.org/10.2307/2005662" target="_blank">https://doi.org/10.2307/2005662</a>, 1973.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Burchard et al.(1999)Burchard, Bolding, and
Ruiz-Villarreal</label><mixed-citation>
Burchard, H., Bolding, K., and Ruiz-Villarreal, M.: GOTM, a general ocean
turbulence model. Theory, implementation and test cases, Tech. Rep. EUR
18745 EN, European Commission, Brussels, Belgium, 1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Donlon et al.(2002)Donlon, Minnett, Gentemann, Nightingale, Barton,
Ward, and Murray</label><mixed-citation>
Donlon, C. J., Minnett, P. J., Gentemann, C., Nightingale, T. J., Barton,
I. J., Ward, B., and Murray, M. J.: Toward Improved Validation of Satellite
Sea Surface Skin Temperature Measurements for Climate Research, J. Climate, 15, 353–369,
<a href="https://doi.org/10.1175/1520-0442(2002)015&lt;0353:TIVOSS&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0442(2002)015&lt;0353:TIVOSS&gt;2.0.CO;2</a>,
2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Flament et al.(1994)Flament, Firing, Sawyer, and
Trefois</label><mixed-citation>
Flament, P., Firing, J., Sawyer, M., and Trefois, C.: Amplitude and Horizontal
Structure of a Large Diurnal Sea Surface Warming Event during the Coastal
Ocean Dynamics Experiment, J. Phys. Oceanogr., 24, 124–139,
<a href="https://doi.org/10.1175/1520-0485(1994)024&lt;0124:AAHSOA&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0485(1994)024&lt;0124:AAHSOA&gt;2.0.CO;2</a>,
1994.

</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Haddad et al.(2015)Haddad, Steward, Tseng, Vukicevic, Chen, and
Hristova-Veleva</label><mixed-citation>
Haddad, Z. S., Steward, J. L., Tseng, H. C., Vukicevic, T., Chen, S. H., and
Hristova-Veleva, S.: A data assimilation technique to account for the
nonlinear dependence of scattering microwave observations of precipitation,
J. Geophys. Res.-Atmos., 120, 5548–5563,
<a href="https://doi.org/10.1002/2015JD023107" target="_blank">https://doi.org/10.1002/2015JD023107</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Harris and Kelly(2001)</label><mixed-citation>
Harris, B. A. and Kelly, G.: A satellite radiance-bias correction scheme for
data assimilation, Q. J. Roy. Meteor. Soc.,
127, 1453–1468, <a href="https://doi.org/10.1002/qj.49712757418" target="_blank">https://doi.org/10.1002/qj.49712757418</a>, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Hotelling(1936)</label><mixed-citation>
Hotelling, H.: Relations Between Two Sets of Variates, Biometrika, 28,
321–377, 1936.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Janjić et al.(2018)Janjić, Bormann, Bocquet, Carton, Cohn,
Dance, Losa, Nichols, Potthast, Waller, and Weston</label><mixed-citation>
Janjić, T., Bormann, N., Bocquet, M., Carton, J. A., Cohn, S. E., Dance,
S. L., Losa, S. N., Nichols, N. K., Potthast, R., Waller, J. A., and Weston,
P.: On the representation error in data assimilation, Q. J. Roy. Meteor. Soc., 144, 1257–1278, <a href="https://doi.org/10.1002/qj.3130" target="_blank">https://doi.org/10.1002/qj.3130</a>,
2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Large and Caron(2015)</label><mixed-citation>
Large, W. G. and Caron, J. M.: Diurnal cycling of sea surface temperature,
salinity, and current in the CESM coupled climate model, J. Geophys. Res.-Oceans, 120, 3711–3729, <a href="https://doi.org/10.1002/2014JC010691" target="_blank">https://doi.org/10.1002/2014JC010691</a>,
2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Madec et al.(1998)Madec, Delecluse, Imbard, and
Lévy</label><mixed-citation>
Madec, G., Delecluse, P., Imbard, M., and Lévy, C.: OPA 8.1 Ocean General
Circulation Model Reference Model, Tech. Rep. 11, Institut Pierre Simon
Laplace des Sciences de l'Environment Global, 1998.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Marullo et al.(2014)Marullo, Santoleri, Ciani, Borgne, Péré,
Pinardi, Tonani, and Nardone</label><mixed-citation>
Marullo, S., Santoleri, R., Ciani, D., Borgne, P. L., Péré, S.,
Pinardi, N., Tonani, M., and Nardone, G.: Combining model and geostationary
satellite data to reconstruct hourly SST field over the Mediterranean Sea,
Remote Sens. Environ., 146, 11–23,
<a href="https://doi.org/10.1016/j.rse.2013.11.001" target="_blank">https://doi.org/10.1016/j.rse.2013.11.001</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Marullo et al.(2016)Marullo, Minnett, Santoleri, and
Tonani</label><mixed-citation>
Marullo, S., Minnett, P. J., Santoleri, R., and Tonani, M.: The diurnal cycle
of sea-surface temperature and estimation of the heat budget of the
Mediterranean Sea, J. Geophys. Res.-Oceans, 121, 8351–8367,
<a href="https://doi.org/10.1002/2016JC012192" target="_blank">https://doi.org/10.1002/2016JC012192</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Merchant et al.(2008)Merchant, Filipiak, Le Borgne, Roquet, Autret,
Piollé, and Lavender</label><mixed-citation>
Merchant, C. J., Filipiak, M. J., Le Borgne, P., Roquet, H., Autret, E.,
Piollé, J. F., and Lavender, S.: Diurnal warm-layer events in the western
Mediterranean and European shelf seas, Geophys. Res. Lett., 35, L04601,
<a href="https://doi.org/10.1029/2007GL033071" target="_blank">https://doi.org/10.1029/2007GL033071</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Murphy(1988)</label><mixed-citation>
Murphy, A. H.: Skill Scores Based on the Mean Square Error and Their
Relationships to the Correlation Coefficient, Mon. Weather Rev., 116,
2417–2424, <a href="https://doi.org/10.1175/1520-0493(1988)116&lt;2417:SSBOTM&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0493(1988)116&lt;2417:SSBOTM&gt;2.0.CO;2</a>, 1988.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Oke and Sakov(2008)</label><mixed-citation>
Oke, P. R. and Sakov, P.: Representation Error of Oceanic Observations for Data
Assimilation, J. Atmos. Ocean. Tech., 25, 1004–1017,
<a href="https://doi.org/10.1175/2007JTECHO558.1" target="_blank">https://doi.org/10.1175/2007JTECHO558.1</a>,
2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Pimentel et al.(2008a)Pimentel, Haines, and
Nichols</label><mixed-citation>
Pimentel, S., Haines, K., and Nichols, N. K.: Modeling the diurnal variability
of sea surface temperatures, J. Geophys. Res.-Oceans, 113,
C11004, <a href="https://doi.org/10.1029/2007JC004607" target="_blank">https://doi.org/10.1029/2007JC004607</a>, 2008a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Pimentel et al.(2008b)Pimentel, Haines, and
Nichols</label><mixed-citation>
Pimentel, S., Haines, K., and Nichols, N. K.: The assimilation of
satellite-derived sea surface temperatures into a diurnal cycle model,
J. Geophys. Res.-Oceans, 113, C09013,
<a href="https://doi.org/10.1029/2007JC004608" target="_blank">https://doi.org/10.1029/2007JC004608</a>, 2008b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Pimentel et al.(2019)Pimentel, Tse, Xu, Denaxa, Jansen, Korres,
Mirouze, and Storto</label><mixed-citation>
Pimentel, S., Tse, W.-H., Xu, H., Denaxa, D., Jansen, E., Korres, G., Mirouze,
I., and Storto, A.: Modeling the near-surface diurnal cycle of sea surface
temperature in the Mediterranean Sea, J. Geophys. Res.-Oceans, 124, 171–183, <a href="https://doi.org/10.1029/2018JC014289" target="_blank">https://doi.org/10.1029/2018JC014289</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Press(2011)</label><mixed-citation>
Press, W. H.: Canonical Correlation Clarified by Singular Value Decomposition, available
at: <a href="http://numerical.recipes/whp/workingpapers.html" target="_blank">http://numerical.recipes/whp/workingpapers.html</a> (last
access: 12 June 2019), 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Saux Picart and Legendre(2018)</label><mixed-citation>
Saux Picart, S. and Legendre, G.: MSG/SEVIRI Sea Surface Temperature data
record Product User Manual, Tech. Rep. OSI-250, EUMETSAT, OSI SAF,
<a href="https://doi.org/10.15770/EUM_SAF_OSI_0004" target="_blank">https://doi.org/10.15770/EUM_SAF_OSI_0004</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Simoncelli et al.(2014)Simoncelli, Fratianni, Pinardi, Grandi, Drudi,
Oddo, and Dobricic</label><mixed-citation>
Simoncelli, S., Fratianni, C., Pinardi, N., Grandi, A., Drudi, M., Oddo, P.,
and Dobricic, S.: Mediterranean Sea physical reanalysis (MEDREA 1987–2015)
(Version 1), Tech. rep., EU Copernicus Marine Service Information,
<a href="https://doi.org/10.25423/medsea_reanalysis_phys_006_004" target="_blank">https://doi.org/10.25423/medsea_reanalysis_phys_006_004</a>, 2014.

</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Umlauf et al.(2005)Umlauf, Burchard, and Bolding</label><mixed-citation>
Umlauf, L., Burchard, H., and Bolding, K.: General Ocean Turbulence Model,
Scientific Documentation v3.2., Tech. Rep. 63, Institute for Baltic Sea
Research Warnemünde, Rostock-Warnemünde, Germany, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Waters et al.(2015)Waters, Lea, Martin, Mirouze, Weaver, and
While</label><mixed-citation>
Waters, J., Lea, D. J., Martin, M. J., Mirouze, I., Weaver, A., and While, J.:
Implementing a variational data assimilation system in an operational 1/4
degree global ocean model, Q. J. Roy. Meteor. Soc., 141, 333–349, <a href="https://doi.org/10.1002/qj.2388" target="_blank">https://doi.org/10.1002/qj.2388</a>, 2015.
</mixed-citation></ref-html>--></article>
