Formulation and demonstration of an extended-3DVAR multi-scale data assimilation system for the SWOT altimetry era

. A state-of-the-art data assimilation system for a high-resolution model has been developed to address the opportunities and challenges posed by the upcoming Surface Water and Ocean Topography (SWOT) satellite mission. A new ‘extended’ three-dimensional variational data assimilation scheme (extended-3DVAR) is formulated to assimilate observations over a time interval, and integrated using a multi-scale approach (hereafter MSDA). The new MSDA scheme 10 specifically enhances the efficacy of the assimilation of satellite along-track altimetry observations, which are limited by large repeat time intervals. This developed system is computationally highly efficient, and thus can be applied to a very high-resolution model. A crucial consideration of the system is first to assimilate routinely available observations, including satellite altimetry, sea surface temperature (SST) and temperature/salinity vertical profiles, to constrain large scales and large mesoscales. High-resolution (dense) observations and future SWOT measurements can then be effectively and seamlessly 15 assimilated to constrain the smaller scales. Using this system, a reanalysis dataset was produced for the SWOT pre-launch field campaign that took place in the California Current System from September through December, 2019. An evaluation of this system with assimilated and withheld data demonstrates its ability to effectively utilize both routine and campaign observations. These results suggest a promising avenue for data assimilation development in the SWOT altimetry era, which will require the capability to efficiently assimilate large-volume datasets resolving small-scale ocean processes.


Introduction
The Surface Water and Ocean Topography (SWOT) satellite mission will launch in late 2022. SWOT will carry a newgeneration altimeter -a Ka-band Radar Interferometer (KaRIn) -measuring sea surface height (SSH) in two-dimensions (2D) at unprecedented spatial resolution (Durand et al., 2010;Fu & Ubelmann, 2014). KaRIn has lower instrument noise (~2 cm 2 /cycle/km) than conventional nadir-looking altimeters (~100 cm 2 /cycle/km). The low noise allows for an effective 25 spatial resolution down to a scale of 15 km. The effective spatial resolution is the minimum spatial wavelengths that can be resolved. The SWOT resolution is a significant improvement over the ~150-200 km 2D resolution of altimetry measurements from a constellation of conventional nadir-looking altimeters (Fu & Ubelmann, 2015;Dufau et al., 2016;Wang et al., 2019). The KaRIn instrument measures SSH over a nominal 120-km wide swath with a 20-km gap around the satellite's nadir track.
The SWOT satellite has a repeat orbit of about 21 days with global coverage. While the SWOT mission will provide unprecedentedly high spatial resolution measurements, it is not matched by high temporal resolution. The SWOT satellite will fly at 7 km/s, the same as conventional altimetric satellites. In 1-min, the measurements can cover a region of 420-km along-track, and thus can effectively provide a synoptic map of 2D SSH structures. The temporal resolution is limited by a 21-day repeat time. Fast evolving dynamical processes may develop, decay, or move unobserved between two passes. The 35 current mapping methods (e.g., Le Traon, et al., 1998;Archer et al., 2020) for traditional altimetry observations will not be appropriate for SWOT measurements (Morrow et al., 2019).
When the SWOT satellite launches, a 90-day fast-sampling phase has been designated for Calibration and Validation (CalVal) to support understanding of the novel measurements. During this phase, the SWOT satellite will fly on a one-day repeat orbit to gather high-temporal resolution data at specific locations; two overpasses will be made every day at 40 the crossover points. An 'Adopt-a-Crossover' consortium has been organized (Morrow et al., 2019), in which researchers around the globe will deploy in-situ instruments to augment the SWOT daily observations, to study small-scale ocean features and understand the nuances of SWOT's novel measurements. These observing systems can be coupled to data assimilation systems that combine the measurements with high-resolution models to produce best-estimates of the oceanstate. However, there are challenges in applying existing DA systems to the novel SWOT data. 45 We have developed a multi-scale data assimilation (MSDA-SWOT) system with a high resolution-model to specifically address some of the challenges posed by the SWOT satellite mission, which include: i.
A 21-day repeat cycle that limits our ability to directly observe the time-evolution of the fast-moving small-scale ocean variability, requiring a dynamical model to fill-in the time gaps ii. SWOT measurements should be effectively combined with measurements from conventional nadir-looking 50 altimeters measurements, which have a fundamentally lower spatial resolution but relatively higher temporal resolution. iii.
Measurements of small-scale ocean features embedded in larger-scale structures, which requires an ability to effectively constrain different scales of variability For MSDA-SWOT to successfully address the specific challenges of SWOT, it is fundamentally important that the 55 system has the capability of assimilating all routinely available observations from operational observing networks. During the last two decades, global routine observing networks have been established, enhanced and sustained for operational purposes. There are more than five satellite altimeters presently in operation, from which SSH measurements are routinely available in near real time and may be merged to resolve eddies in two-dimensions down to almost 100 km in size at the mid-latitudes (Archer et al., 2020). Global temperature/salinity vertical profiles are provided by the operational Argo float 60 network and augmented by mooring arrays, glider lines and a variety of other observing platform networks. Although temperature/salinity vertical profiles are spatially sparse and temporally infrequent, they have allowed the production of monthly-average data sets, which have an effective resolution of a few hundred kilometers (e.g, Good et al., 2013). daily basis. The MSDA-SWOT algorithm has been formulated to effectively constrain mesoscale variability using 65 observations from these routine observing networks. We argue that the effectively constrained mesoscale variability down to about 100 km is a prerequisite for the dense observations from the field campaigns and future high resolution SWOT measurements to be effectively and seamlessly assimilated, and the MSDA methodology is formulated to achieve such a goal (Li et al., 2015a;Li et al., 2019).
MSDA-SWOT is a three-dimensional variational data assimilation system (MS-3DVAR) that has been documented in Li 70 et al. (2015bLi 70 et al. ( , 2019. The model used is the Regional Ocean Modeling System (ROMS, Shchepetkin & McWilliams 2005, 2011. The MS-3DVAR has been extended to address the particular challenges of the SWOT mission. In MSDA-SWOT, a new 3DVAR formulation termed 'extended-3DVAR' is implemented. Extended-3DVAR is formulated to assimilate observations over a time interval and thus has the objective similar to the First Guess at Appropriate Time (FGAT) (e.g. Martin et al., 2015). In extended 3DVAR, a formulation has been given to account for the error due to the difference 75 between data assimilation and observing time, which is termed as sampling time error (see section 3.1). The extended 3DVAR is essential to MSDA-SWOT. First, it is as computationally efficient as a 3DVAR algorithm, so that it can be implemented with a very high-resolution model. And second, by taking into account the sampling time error, it allows the selection of a relatively large time interval so that many temperature/salinity vertical profiles and along-track altimetry measurements can be assimilated jointly to constrain large-scale and mesoscale circulation. 80 The MSDA-SWOT system is based on the MSDA system described in Li et al. (2019b), but with the implementation of extended 3DVAR. There are other major differences in the treatment of observations, and accordingly, in background error covariances, discussed later. It has been implemented at the primary SWOT CalVal site in a region encompassing the California Current System (Fig. 1). This eastern boundary current site has been selected for the moderate tides, and moderate mesoscale and sub-mesoscale dynamics (e.g., Hickey, 1998;Capet et al., 2008). 85 In Li et al. (2019), we presented a set of twin experiments, also known as OSSE (observing system simulation experiment). The results showed that the MSDA system could produce appropriately accurate SSH for CalVal with a dedicated glider array on top of the routine observations available from operational observing networks. A pre-launch field campaign that was dedicated to SWOT CalVal took place from September through December, 2019 (for more details, see Wang et al., 2021). MSDA-SWOT has been used to produce analyses and forecasts by assimilating measurements from the 90 field campaign and routine observing networks. We here evaluate and illustrate the performance of MSDA-SWOT to produce appropriately accurate SSH as the Li et al. (2019) twin-experiment OSSE demonstrated. We focus on the fine-scale performance of MSDA-SWOT in comparison to withheld glider and mooring data in a companion paper (Archer et al. 2021).
The paper is organized as follows. Section 2 summarizes observations that are assimilated and used for evaluating the 95 system. The MSDA-SWOT algorithm is presented in section 3. Implementation strategies and practical considerations are described in section 4. Section 5 presents an evaluation of the performance of MSDA-SWOT with an emphasis on the illustration of the algorithm and implementation strategies. Finally, a brief summary of the key results is given in Section 6.

Data
In the context of this paper, distinct spatial scales are considered. For clarity, they are defined as follows: the large-scale is 100 greater than 400 km, the large mesoscale is from 400 km down to 150 km, the small mesoscale is from 150 km down to 50 km, and the submesoscale is smaller than 50 km. These scale definitions are not used for dynamical analysis, but they are associated with the ability of currently existing observing networks to resolve spatial scales. SWOT measurements aim to accurately resolve scales from 150 km down to 15 km in two-dimensions.
In the following, we briefly describe the observations that are collected during the pre-launch field campaign as well as 105 those from routine observing networks. We also provide information on a global reanalysis dataset we use to benchmark our system.

Pre-launch field campaign 115
The pre-launch field campaign observations provide information for the design of the post-launch in-situ observing system. Three different moorings were deployed. The first (south) is a hybrid mooring with a profiling Conductivity, Temperature, and Depth (CTD) instrument called a WireWalker installed on top of fixed CTD sensors. The WireWalker measures vertical profiles of temperature/salinity down to a depth of about 500 m with high temporal resolution. The second (north) mooring comprises a GPS altimeter installed on top of a line of CTDs. And the third (middle) is a GPS/Prawler mooring, on which 120 was installed a GPS altimeter above a Prawler. Like the WireWalker, a Prawler is a profiling CTD that measures T/S vertical profiles at very high resolution and frequency down to a depth of about 500 m. Another observing platform deployed was a Slocum glider that measures T/S vertical profiles. Three moorings were deployed along a Sentinel-3A altimetry track, and the glider flew along a track perpendicular to the mooring line. (Fig. 1).

Temperature/salinity vertical profiles 125
Temperature/salinity vertical profiles are measured by the global Argo float network, local glider networks, moorings, and during various surveys. The profiles are spatially and temporally sparse and heterogeneous, but they have allowed the production of monthly gridded products at a low spatial resolution (e.g. Good et al., 2013). We assimilate individual T/S profiles processed and quality controlled by the EN4 program at the Met Office Hadley Centre (Good et al., 2013). Figure 2 shows the locations of all the vertical profiles from September 1 through Nov 30, 2019. An appropriate assimilation of those 130 profiles should help constrain the large-scale circulation in the model.

Along-track and gridded altimetry data
In this study, both along-track and gridded altimetry data are assimilated. The altimetry data of absolute dynamic topography is assimilated although it is called SSH for convenience. We use the level 3 (L3) along-track data and level 4 gridded data that are publicly available through the Copernicus website (http://marine.copernicus.eu/services-portfolio/access-to-135 products/). We use 5 altimeters that were in orbit: Jason-3 (J3), Sentinal-3A (S3A) and Sentinal-3B (S3B), SARAL-DP/AltiKa (ALD), and Cryosat-2 (C2). Table 1 presents the main characteristics of each satellite altimeter, and details of the L3 dataset (see Pujol et al. (2016) for more information on altimeter standards and the DUACS processing chain). DUACS-DT2018 provides the 1 Hz along-track data in two resolutions; unfiltered (~7 km spacing between along-track grid points), and filtered (~14 km spacing, low passed with a 65 km cut-off). The 65 km threshold was chosen based on signal-to-noise 140 ratio of one in the wavenumber spectra (Dufau et al.,2016). Even with five altimeters, the long repeat intervals limit coverage for one day to only a few tracks over the model 145 domain (Fig. 3a). It is challenging to assimilate such highly heterogeneous observations into a model with a spatial resolution on the order of 1 km. Our experience has shown that the assimilation of such few tracks of altimetry observations often causes problems -distorting and/or mis-locating eddies, and even generating spurious eddies. As such, we focused in previous studies on assimilating only the gridded data products that are highly smoothed. The high-resolution information from along-track altimetry was thus not used in our previous DA systems (e.g., Li et al., 2019a, b). To address this 150 limitation, we have formulated the MSDA-SWOT data assimilation scheme to assimilate multiple days of along-track altimetry observations, by extending the framework of 3DVAR. As we will show, MSDA-SWOT can effectively and reliably assimilate along-track altimetry observations into a high-resolution regional model (Section 3).
Using the measurements from multiple conventional satellite altimeters, gridded SSH products can now resolve SSH length-scales larger than approximately 150-km wavelength in the region near the CalVal site (e.g., Chelton et al., 155 2007;Pujol et al., 2016;Dufau et al., 2016;Archer et al., 2020). By merging SSH measurements from all available altimetry satellites, daily SSH maps can be generated even though they involve smoothing with a time window much longer than one day. Daily gridded maps by AVISO (Archiving, Validation and Interpretation of Satellite Oceanographic) have been extensively used. The root-mean-square difference between the AVISO gridded and filtered along-track data is around 2.5 cm in this region (Archer et al., 2020). 160

170
Microwave AMSR SST has a resolution of 25 km. No microwave SST is available near shore because of land contamination. VIIRS SST is not assimilated but used as independent data for evaluation.

Satellite sea surface temperature
Satellite remote sensing has provided accurate global sea surface temperatures (SST) for decades. SST measured by satellite microwave (MW) sensors has been gridded at a spatial resolution of 25 km with an error of 0.6°C. SST measured by infrared 175 (IR) sensors that are carried on polar orbit satellites has a spatial resolution of about 1 km with an error of 0.7°C, and they are well into the submesoscale. There are several satellites measuring MW and IR SST, jointly producing SST observations a few times daily. In Li et al. (2019b), we assimilate nighttime and morning (Local 12 am-9 am) satellite SSTs, and daytime SSTs with wind speeds higher than 3 m/s to minimize the impact of skin temperature differences that arise in low wind daytime conditions. In this study, we will not assimilate IR SST but use them as independent data for evaluation. 180

HYCOM
We benchmark MSDA-SWOT against the Global HYbrid Coordinate Ocean Model (HYCOM), which is coupled with the Navy Coupled Ocean Data Assimilation (NCODA) to produce a global reanalysis product (Cummings, 2005;Cummings and Smedstad, 2013). This is one of the leading global reanalysis products available. NCODA uses the 24-hour model forecast as a background field in a 3DVAR scheme. It assimilates satellite SSH, in situ and satellite SST, vertical profiles of temperature 185 and salinity from XBTs, Argo floats and moored buoys. It can be considered an approximate equivalent to the routine DA run of MSDA. More information on the system is available at https://www.hycom.org/. We here use a high-resolution run provided by the Navy, at 0.04 º longitude by 0.02º latitude, and hourly time sampling, for the pre-launch field campaign period. To remove the tides from the SSH field, we take a daily average and detrend the SSH in two-dimensions around the CalVal site. 190

An extended three-dimensional multiscale algorithm
The MSDA-SWOT system has been developed to address new challenges relating to the fine-scale resolution and data density of satellite observations in the SWOT altimetry era. As such, the numerical ocean model used should be high resolution with a grid spacing on the order of 1 km or finer. This imposes a major computational challenge to formulate MSDA-SWOT. Both the field campaign and SWOT measurements are localized to a limited area. Another major challenge 195 is to assimilate these localized measurements seamlessly along with the broad routine observations. Otherwise, a spurious circulation surrounding the observing area may develop, and data assimilation could even fail. To address these two major challenges, we have formulated a particular MSDA scheme, which is based on a 3DVAR algorithm that is mathematically extended to assimilate observations over a time window.
For our purposes, to explore ocean fine-scale variability and assimilate dense observations, we work with a more 205 computationally-efficient 3DVAR system. For a very high-resolution model, 3DVAR is still the predominant scheme in both meteorological and oceanic applications (e.g., Gustafsson, et al., 2018). A 3DVAR scheme formally uses observations taken only at an instantaneous time. In practice, observations over a time window are assimilated with the assumption that they are taken at the DA time. This time difference between the observations and the DA time inputs error to the system, and is termed 'sampling time error'. In practice, a short time window is selectively chosen as a compromise between incorporating 210 more observational information, and keeping the sampling time error at an acceptable level.
As a solution to this time window limitation in 3DVAR, we here formulate a scheme to extend the ability of 3DVAR to assimilate observations in a longer time window. This is made possible by explicitly quantifying the sampling time error that results from the time difference between observation and DA time, and taking this into account in the cost function. We term this 'extended-3DVAR'. It has the same objective as First Guess at Appropriate Time (FGAT), another 215 scheme used in conjunction with 3DVAR (e.g. Martin et al., 2015). This is discussed further in Archer et al. (2021).

Formulation of extended-3DVAR
A variational data assimilation scheme can be described as a method to use imperfect observations to correct the error in the forecast from a numerical prediction model. In discrete form, the prognostic state of a model at time ! , also known as state variables, can be denoted as an -vector, ! . Then the forecast through a numerical model from !"# to ! can be written as 220 The -vector % , the state at % , is an initial condition. Here !"#→! is known as an operator or propagator from time !"# to ! .
A conventional or strong-constraint 4DVAR algorithm is formulated to find an estimate that minimizes the cost function through changing controls (initial and boundary conditions, forcing) and use of an adjoint model. In this discussion, we focus 225 on the time dependence in the 4DVAR cost function only, not the minimization process. The cost function to be minimized is: where ! = %→! ( % ). Here is the background error covariance, and Suppose that at time ! , a number of observations (note that could vary in time) are available and placed into an - By definition, the observational errors can be written as where ! is often called an observation operator, which maps the state variable to ! )-, and % is the unknown true state. The 235 observational error given in equation (3) is important. It involves the measurement error and representation error, and these two types of observation errors are well known (e.g., Lorenc, 1986). However, equation (3) involves an additional error due to the error in %→! , which will be discussed further later. For a 4DVAR algorithm, a prediction model (1) is required. In equations (2) and (3), however, the prediction model is not required to be the same model as in (3), but a different prediction model can be chosen instead. We choose to use a 240 persistence model as = . (4) It is also called a persistence forecast. With (4), the 4DVAR cost function becomes This cost function does not involve the prediction model, and thus equates to a cost function of 3DVAR. However, it uses all 245 the observations over a time window.
From (3), the observational error can be expanded where ! is the unknown true state. The first term is the measurement error, and the second term the representation error due to the inaccurate observation operator. The last term is a new type of observational error. This error incurs due to the use of the persistence model (4), rather than the accurate model (1). We can understand this error in the following way. If all the observations are taken at % , this type of error does not occur. We thus dub this error as a sampling time error. This sampling time error provides a mathematical formulation to account for the error arising from the difference between the observation 255 time and the DA time.
The MSDA-SWOT system is based on (5) and (6). Because is allowed to be negative in (5), (6) thus allows negative too. In the MSDA-SWOT implementation, we assimilate observations prior to and after the assimilation time.

Multiscale Framework
We have stated that data assimilation is a method to use imperfect observations to correct the error in the forecast. The 260 essence of a multi-scale data assimilation framework is to correct forecast errors sequentially from large to small spatial scales in multiple steps, where the number of steps depends on the characteristics of the observing networks. An MSDA formulation hinges on the fact that the scales of forecast error that are corrected can be defined and determined by the background error covariance (Li et al., 2015(Li et al., , 2016. The background error covariance can be decomposed as = , where is a diagonal matrix whose elements 265 are the background root-mean-square error (RMSE) associated with % ' , and is the correlation matrix whose elements consist of the spatial correlations.
This error covariance decomposition allows us to separately examine RMSE and correlation. The background error amplitude is given by , which plays a role of weight as we can see in the cost function. It is well known that the correlation plays a key role in spreading the observation innovations to the surrounding areas. 270 However, a more important role in MSDA is its filtering effect. The larger the correlation scale is, the stronger the filtering effect that data assimilation imposes on observation innovations (Li et al., 2016;Jacobs et al., 2020). A rule of thumb is that all the scales smaller than twice the correlation length scale are filtered out. This filtering effect dictates that the data assimilation can correct only those scales larger than twice the correlation length scales. By defining a proper background state and associated correlation length scales, MSDA allows for specifying scales that are corrected. In MSDA-SWOT, we 275 implement a three-step MSDA for constraining different spatial scales (section 4.2).

Implementation
The implementation of the scheme formulated above leverages the MSDA system described in Li et al. (2019b). The difference here is in the use of extended-3DVAR scheme, as well as in the treatment of observations, and accordingly, in the adjustment of the multiscale background error covariances, which will be described in this section. 280

Modeling System
The MSDA-SWOT configuration of the modeling system is similar to the one described in Li et al. (2019). Here we give a description. The modeling system is based on ROMS (Shchepetkin & McWilliams, 2005, 2011. A one-way nesting procedure is used as described in Mason et al. (2010) with successive, nearly isotropic grid resolutions, varying from 9 km covering a large region of the Northeast Pacific, 3-km for an extended region of the California coast. The bathymetry for 285 three domains is constructed from the ETOPO1 1 arc-minute dataset of Amante & Eakins (2009). The model domain and its bathymetry are shown in Fig.1.
The lateral boundary conditions for the 9-km domain are derived from version GOFS3.1 of the global analysis of HYCOM (https://www.hycom.org/dataserver/gofs-3pt1/analysis). The temperature, salinity, velocity components and SSH fields are all used. To smooth out possible unrealistic variability, a 15-day average is applied. The three-hourly atmospheric 290 forcing uses a bulk flux formula (Fairall et al., 2003). The required atmospheric fields (10m-wind speed and direction, net shortwave radiation, downward longwave radiation, 2m-air temperature and relative humidity) are obtained by interpolation from the 25-km resolution of the NCEP GFS (National Centers for Environmental Prediction Global Forecast System) operational atmospheric model 3-hourly outputs. In the calculation of wind stresses, ocean surface currents are subtracted.

The model is forced by barotropic tides at the open boundaries of the 3-km resolution domain, and the Flather 295
boundary condition (Flather, 1976;Wang et al., 2009) is used. The tidal sea level and barotropic velocity amplitudes and phases for the 10 dominant tidal constituents (M2, S2, N2, K2, O1, K1, P1, Q1, Mf, and Mm) were extracted from the 1/12th degree resolution tidal model solution for the Pacific basin, which was constrained by the assimilation of satellite altimetry in the Oregon State University Inverse Tidal Software (OTIS, Egbert and Erofeeva, 2002). An evaluation against mooring observations shows that the model realistically produces baroclinic tides in the region near the CalVal site (Li et al., 2019b).

Three-Step MSDA
Following the definition given in section 2, a three-step MSDA is configured respectively to constrain three scales: 1) large and large mesoscale DA for scales larger than 150 km, 2) small mesoscale DA for scales larger than 50 km, and 3) submesoscale DA for scales larger than 15 km.
As discussed in section 3.3, the scales that are constrained are dictated by background error correlation length scales. We 305 define the background correlation using a Gaussian function, "0 ! &2 " ! 3 , where is a spatial distance between grid points and 4 is a decorrelation length scale, known as the Daley correlation scale (Daley, 1991). Following the rule of thumb that only the scales larger than two times of the correlation length scale are constrained, we use a decorrelation length scale 4 of 75 km, 25 km and 9 km.
We note that these scales are defined empirically and will benefit from a further tuning. Those three scales are specified 310 mainly by considering the observations assimilated, which are listed in Table 2. An important consideration is that the observation error should be specified consistently with the decorrelation length scale and the assimilation window for the use of observations.
For the large mesoscale DA, a decorrelation scale of 75 km is given mainly because the AVISO gridded data and daily mean microwave SST can have an effective constraint on about 150 km (Archer et al., 2020), roughly two times the 315 decorrelation length scale. Their observation errors given in section 2 are used. Since the boundary conditions are derived from the 15-day average of the HYCOM analysis, we also assimilate the same 15-day average T/S profiles twice a month to ensure that the state inside the model domain is not drifted away from the lateral boundary conditions. For the small mesoscale DA, a decorrelation scale of 25 km is used for maximizing the impact of along-track altimetry observations, because the along track observations are filtered with a cut-off of 65 km (Pujot et al., 2016). Leveraging the 320 extended 3DVAR formulation, we assimilate along-track altimetry and routine T/S profile observations over a time window of 11 days. We choose 11 days because the Jason-3 has the smallest repeat interval among five altimetry satellites, and it is 10 days. It is not uncommon that a few days of observations are assimilated in 3DVAR. For example, an assimilation windows of 5 days for SSH and 12 days for T/S profiles is used in Jacobs et al. (2014). The cost function (5) allows the use of a larger assimilation window, but the sampling error given in (6) should be added to the observation error. 325 In implementation, a 11-day time window comprises 5 days prior to and 5 days succeeding the DA Day. To add the sampling time error, we simply assume that the sampling time error has the form as where ! ≥ 0 and = 1,2, ⋯ ,5 in day. With (7), the observational error given in (6) becomes ! % = (1 + ! )( ! / + ! 0 ).
Thus, the observational error is inflated. The value of ! can be estimated using observations or model simulation outputs. 330 We note that the assumption (7) is not necessary, since ! /1 can be directly estimated. We use (7) , so that ! can be used as an adjustable parameter to account for the possible inaccurate representation error. The formulation (7)  For the submesoscale DA, a decorrelation length scale of 9 km is tentatively given. This is because of the SWOT CalVal baseline requirement resolution of 15 km. This scale will smooth out spatial structures smaller than 18 km. We emphasize that DA smooths out the observation innovation, but not the background state. The small scales generated by the model during the forecasting stage will remain, although no correction will be made to them by DA. We do not use a smaller decorrelation length scale because the model grid space is 3 km. The decorrelation length scale has been as small as the 340 three-grid point size. When a smaller model grid spacing is used, this decorrelation length scale can be reduced accordingly.
This three-step implementation is computationally efficient. In the configuration described above, the wall-clock time for the execution of MSDA is close to that required to carry out the model forecast using the same CPUs. This implies that MSDA can be implemented on a very high-resolution model if the configuration is computationally feasible for carrying out simulations without data assimilation; that is -the MSDA implementation does not require any compromise in the 345 configuration of the forecast model.

11-day routine T/S profile 11-day along-track SSH
Campaign T/S profile AVISO gridded SSH Daily mean campaign T/S profile Satellite IR SST* Daily mean MW SST Daily mean HF radar velocities* Other observations* Daily mean MW SSS*

Evaluation and Illustration of Performance
The MSDA-SWOT outlined here is largely based on the DA system presented in Li et al. (2019). As described in previous sections, we have implemented three major new capabilities: (1) assimilation of observations over a longer time window via 350 extended-3DVAR; (2) assimilation of multi-satellite along-track altimetry observations; and, (3) optimization of the MSDA implementation. The evaluation presented in this section primarily serves to illustrate the effectiveness of these new capabilities. A systematic evaluation related to the SWOT pre-launch field campaign is presented Archer et al. (2021).
The evaluation here focuses on data assimilation analyses and forecasts for four months from August 10 to December 10, 2019. The model is initialized on July 30 using a 15-day average of the HYCOM analyses that are also used to derive the 355 lateral boundary condition as described in section 4. 1. The data assimilation analyses and forecasts are generated by four experiments: 1) Routine DA, which assimilates the routine observations; 2) NODA, which has no data assimilation; 3) Routine DA without along-track altimetry observations; and, 4) DA Cal, which assimilates routine observations and the SWOT pre-launch field campaign observations. We emphasize that interpretation of the evaluation results must take into account observation errors, since most 360 observation errors are close to the analysis and forecast error. Therefore, when comparing analyses and forecasts with the observations, we use the terminology root-mean-square difference (RMSD) rather than use the term root-mean-square error (RMSE).  .1 cm, 2.8 cm, 2.6 cm, and 3.4 cm, respectively.

DA Analysis Evaluation
Since the assimilation of along-track altimetry data is a new implementation, we examine whether they are assimilated effectively. Figure 5 shows the RMSD between filtered Jason-3 along-track data and the DA analysis. We use Jason-3 data 370 because of its spatially fixed ground tracks and a short repeat time of 10 days. The mean RMSD over the entire model domain is 2.6 cm. As a reference, the RMSD between the filtered Jason-3 along-track data AVISO gridded data is also shown and has a domain mean RMSD of 2.1 cm. The RMSD with the DA analysis is larger than that with the AVISO gridded data. This result is expected, because the spatial resolution of the model is higher than the AVISO grid and thus retains smaller scales. This result is consistent with the same analysis using Sentinel-3A and 3B (not shown). The RMSD 375 with HYCOM is 3.4 cm. This indicates that MSDA-SWOT performs well for this region in which it is optimized. To explore the variability across different length scales, we compute wavenumber power spectral densities, where 385 the DA and AVISO data is interpolated to the Jason-3 ground tracks (Fig. 6). For wavelengths larger than 120 km, the spectral density of the DA analysis is very close to that of the Jason-3 along-track observations. For wavelengths from 120 km down to round 65 km, the spectral density of the DA analysis is smaller than that of the Jason-3 along-track data. We hypothesize this may be due to internal tidal residuals in the along-track altimetry observations (e.g., Zhao et al., 2016). For wavelengths smaller than 65 km, the spectral density of the DA analysis is much larger, because the scales smaller than 65 390 km are significantly filtered in the along-track altimetry observations (Pujol et al, 2016). The results indicate that the DA has the capability of excluding along-track altimetry observation errors due to internal tidal contamination and smoothness. It is desirable that the spectral density of the DA analysis is larger at wavelengths smaller than 65 km. It is actually an 400 objective of MSDA-SWOT. The MSDA-SWOT system is formulated to allow the assimilation of smoothed data without smoothing the DA analysis. This is demonstrated in Fig. 6. For example, the AVISO gridded data is assimilated, and Figure   6 shows that the spectral density of the gridded AVISO data is much smaller than that of the along-track altimetry data, but the spectral density of the DA analysis closely follows the along-track data. This shows that assimilating the highly smoothed AVISO gridded data does not smooth the DA analysis. In contrast, the HYCOM power spectrum, while similar in 405 character to MSDA-SWOT, does not exhibit small-scale variability, but rather follows the filtered along-track power level.
To quantify the impact of along-track altimetry observations in the analysis, they are withheld in a DA experiment. The RMSD increases by a substantial amount (Fig. 5 top right), with a space-time mean RMSD increase from 2.6 cm to 2.8 cm.

DA Forecast Evaluation
As stated in the introduction, the MSDA-SWOT system is developed not for prediction but for state estimation. One question 410 for DA analysis is whether observations are assimilated with dynamical consistency. In particular, the SSH increment from the assimilation of altimetry observations must be consistent with the increment in T/S vertical profiles, otherwise the SSH increment could create external gravity waves that may propagate away in a few hours. Also, all observations can be considered as independent data when we evaluate forecasts. We thus here focus on the evaluation of short-time forecasts.

Comparison with altimetry data 415
The gridded AVISO data can resolve wavelengths down to 150-200 km in this region (Pujol, 2016;Taburet et al., 2019;Archer et al., 2020). Therefore, the AVISO gridded data can be reliably used to evaluate large mesoscale eddies from the forecast. Here we compare the mean of the forecast from 24 h to 48 h to AVISO daily gridded data As an example, Fig. 7 shows the 24-48 hr mean of the forecast and the AVISO gridded data on November 1. The model forecast reproduces the large mesoscale eddies and filaments. Because of the high resolution of the model, there are finer 420 scale eddies and other circulation features in the forecast. To quantify the similarity between the forecast and the AVISO data, we compute RMSD and spatial correlation over the entire model domain (Fig. 7 bottom). The spatial correlation ranges from 0.96 to 0.97 day-by-day, indicating a high resemblance in spatial structure. The RMSD ranges from 1.8 cm to 2.9 cm, while the mean RMSD over this time period is 2.3 cm. We conclude that the eddies that the AVISO gridded data resolves are realistically reproduced. 425 As discussed, the MSDA-SWOT system is based on the previous MSDA system described in Li et al. (2019b), but with a set of new implemented formulations. In Li et al. (2019b), a similar comparison with the AVISO data was made. The average correlation was 0.91 and RMSD 3.3 cm. Although the experiment was for a different period of time and thus the results are not directly comparable, the significant difference in both correlation and RMSD indicate that the MSDA-SWOT shows a significant improvement in SSH prediction.
In subsection 5.1, the DA analysis was evaluated against the Jason-3 along-track altimetry data. The RMSD with the 2day forecast is given in Fig. 8a. The RMSD shows an increase over the majority of the model domain. The domain average RMSD is 2.9 cm. As we can see in Fig. 7, mesoscale eddies have an amplitude ranging from about 10 cm to 40 cm in the region. The relatively small RMSD implies that mesoscale eddies can be reproduced in the forecast.
To further illustrate the accuracy of the forecast, we compare the RMSD to the experiment without DA. Without DA, the 435 RMSD is much larger (not shown), and the domain average RMSD is as large as 10.0 cm, more than 3 times larger than the forecast with DA. Thus, the DA effectively reduces the forecast error.
We also evaluate the extent to which assimilating along-track altimetry data reduces the error (Figure 8). Without assimilating along-track altimetry data the domain-averaged forecast error is 3.3 cm. After assimilating along-track data, it is 2.9 cm, so the RMSD is reduced by as much as 14 %. Therefore, improvement in the DA analysis from along-track altimetry 440 assimilation remains in the 24-48-hr forecast.

Comparison with independent satellite SST
Since microwave (MW) SST measurements are not affected by clouds, they provide more consistent spatial coverage than infrared (IR) SST (compare Fig. 4a to 4b). We thus compute RMSDs and spatial correlations over the entire model domain day-by-day. MW SST is assimilated at UTC 03 using observations within a time interval of 12 h from UTC 21 to UTC 09 of 450 the subsequent day. Therefore, the observations from UTC 9 to UTC 21 are not assimilated and so are independent observations that can be used for evaluating the DA system. The RMSD is around 0.5° C, and the spatial correlation over the entire model domain higher than 0.95 day-by-day (Figure 9). Given expected differences between the model upper-bin temperature and MW satellite-observed skin temperature, this RMSD is very acceptable.
VIIRS SST is a high-resolution product. Its resolution of 0.7 km allows for resolving submesoscale features down 455 to a few kilometers. Unfortunately, the CCS region is prone to low-elevation cloud coverage, and the VIIRS IR sensor cannot measure SST over cloudy areas. Over the majority of the model domain VIIRS has no observations. We thus interpolate the model data to the time and location of VIIRS SST observations. Figure 10 shows the distribution of the temperature difference. More than 75% of the difference is smaller than 0.75 °C. The overall RMSD is 0.78 °C, which is close to the measurement error. 460 Figure 9. Daily forecast RMSD and spatial correlation against MW SST that is not assimilated.

Evaluation of subsurface temperature and salinity profiles 465
The MSDA-SWOT goal is to be able to assimilate observations concentrated in a very limited area, and avoid the so-called spurious "campaign area circulation" that can occur when small-scale observations are assimilated into the background field of a numerical model that may have a bias.
The strategy used in MSDA-SWOT is to assimilate all routine observations to minimize the model bias and constrain mesoscale variability down to about 100 km or further, then the innovations of the dense but localized field 470 campaign observations become small enough to be effectively assimilated without being smoothed, via the multiscale data assimilation methodology (Li et al., 2015a). The evaluation of the SSH and SST forecasts has shown that the forecast errors are reduced to less than 3 cm and 1°C, respectively. These errors are a few times smaller than the amplitudes of their signal standard deviation (not shown). Compared against the T/S profiles from the middle mooring CTD ( Fig. 1 and 11), which are independent 475 observations for the routine DA, there is a significant reduction in the forecast errors at all depths for both temperature and salinity when compared to the NODA run (not shown). For depths below 150 m, the temperature error is reduced to 0.25ºC from about 1ºC in the experiment without DA, while the maximum error located near the mixed layer base reduces from 4.0ºC to 1.2ºC. Overall, the assimilation of routine observations reduces the error to less than one third of the error without DA. Except the depth between 200 m and 300 m, in which the NODA experiment shows an error as small as 0.07 psu and is 480 close to the Routine DA, the impact of the routine observations is also significant in salinity. The largest error reduction occurs at a depth of 100 m, from 0.55 PSU in the NODA experiment to 0.14 PSU.

485
The comparison of HYCOM to the middle mooring shows similar performance to the routine DA run. There are some differences however, most notably in salinity. HYCOM exhibits largest error at the top of the halocline, but then the error quickly falls away. In contrast, the routine MSDA has less error above the halocline, but the error more slowly reduces with depth than HYCOM. Both the routine MSDA and HYCOM have errors greater than the signal variance for depths 490 above ~100-m.
In the DA Cal experiment, the T/S profiles from the south and middle mooring CTD observations (Fig. 1) are assimilated in the small mesoscale and small-scale DA (Table 2). Compared to the middle mooring CTD observations, the assimilation of the campaign T/S profiles further reduces the forecast error (Fig. 11). The subsurface error reduces to below 0.2℃, except near the bottom of the mixing layer depth. The largest error is 0.5℃ and located at a depth of around 40 m. The 495 salinity error reduces to be smaller than 0.02 psu below 200 m, and the largest error is 0.1 psu and occurs at a depth of   Fig. 12 shows the SSH at the middle mooring location (Fig. 1). All DA experiments and the AVISO product can 505 effectively represent the longer period variability exhibited by the mooring. However, DA Cal shows a much improved ability to resolve the shorter-scale fluctuations. The analysis by Archer et al. (2021) shows that the forecast steric height can reach the basic SWOT CalVal requirement of an accuracy of the order of 1 cm.
We have cautioned that the assimilation of observations from a localized area may create a spurious "campaign area circulation". Following Fig. 12, we can articulate it further. The RMSD with the AVISO gridded data is as large as 16.0 cm 510 for the NODA experiment (not shown). This large RMSD arises because the mooring is located in the high SSH ridge extended from the open ocean toward the coast, but in the lower SSH zone extended offshore (Fig. 7) in the NODA vertical profiles may correct the SSH locally, create large spurious SSH gradients or geostrophic velocities that may be as large as 0.32 m/s, and thus a sporous "observing system circulation". In contrast, after the routine observations are 515 assimilated, the SSH error has been reduced to around 1.5 cm. The localized observations can be aggressively assimilated, and no intense spurious "observing system circulation" would be generated.

Conclusions and Discussion
A data assimilation system for a high-resolution model has been developed to address the opportunities and challenges posed by the upcoming SWOT satellite mission. This system, dubbed MSDA-SWOT, is based on a multi-scale data assimilation scheme documented in Li et al. (2015bLi et al. ( , 2019b. Three major changes have been implemented. First, the conventional 525 3DVAR formulation has been extended to effectively assimilate observations over a longer time window, which is 11 days for the large and small mesoscale DA in the present configuration (Table 2). This extended 3DVAR uses a persistence (forecast model at DA time) as the background field, and the error due to the difference between the observation and DA time is explicitly accounted for. This imitates a 4DVAR approach, which assimilates observations over a time window but uses a model forecast as background rather than persistence. Second, the assimilation of multi-satellite along-track altimetry 530 observations has been implemented, and the effectiveness has been demonstrated. And third, the multi-scale data assimilation configuration has been optimized for inputs, such as background error covariances.
The system has been used to produce a reanalysis for the SWOT pre-launch field campaign that took place at the been preliminarily evaluated against assimilated and independent observations. MSDA-SWOT showed a significantly 535 improved performance on top of the MS-3DVAR (Li et al., 2019b). Archer et al. (2021) present a comprehensive evaluation of the system based on the pre-launch campaign moorings and glider observations.
As we reiterated, the most important consideration in the formulation and configuration of MSDA-SWOT is the assimilation of all routine observations to minimize the model bias and constrain mesoscale variability down to about 100 km or further. The background error is relatively dominated by small scale errors. As a consequence, the background error 540 correlation length scale becomes small. A small correlation length scale ensures that the dense but localized field campaign observations are effectively assimilated without being smoothed (Li et al., 2015a, b). Further, the evaluation of the SSH, SST and subsurface vertical profile forecasts has shown that the forecast errors can be a few times smaller than the amplitudes of their signal standard deviation (STD). Mathematically, only in a pure linear system can observations be assimilated fully, reduce the error in the analysis that is the initial condition for the subsequent forecast, and ensure a 545 reduction in forecast errors (e.g., Li & Navon, 2001). With nonlinearity in the model, a reduction in the initial condition does not ensure a reduction in forecast errors. In MSDA-SWOT, the forecast error has been significantly reduced by assimilation of routine observations. The innovation associated with the campaign observations and thus the analysis increment has been small. With a small analysis increment, the model performs more linearly in the time evolution related to the increment.
The impact of routine observations in MSDA-SWOT has an important implication for observing system design. It 550 strongly supports the suggestion proposed in Li et al. (2019a) that a field campaign should be designed to leverage routine observing networks. A field campaign should make observations that augment the routine observing networks for resolving smaller scales and higher frequency variability, and/or measure variables that the routine observing networks do not provide.