Any use of observational data for data assimilation requires adequate information of their representativeness in space and time. This is particularly important for sparse, non-synoptic data, which comprise the bulk of oceanic in situ observations in the Arctic. To quantify spatial and temporal scales of temperature and salinity variations, we estimate the autocorrelation function and associated decorrelation scales for the Amerasian Basin of the Arctic Ocean. For this purpose, we compile historical measurements from 1980 to 2015. Assuming spatial and temporal homogeneity of the decorrelation scale in the basin interior (abyssal plain area), we calculate autocorrelations as a function of spatial distance and temporal lag. The examination of the functional form of autocorrelation in each depth range reveals that the autocorrelation is well described by a Gaussian function in space and time. We derive decorrelation scales of 150–200 km in space and 100–300 days in time. These scales are directly applicable to quantify the representation error, which is essential for use of ocean in situ measurements in data assimilation. We also describe how the estimated autocorrelation function and decorrelation scale should be applied for cost function calculation in a data assimilation system.

Any use of observational data requires assumptions, or better knowledge, about the representativeness of each measurement in space and time. This holds even more for in situ observations from data-sparse regions, such as the Arctic Ocean. Interpolation guided by the statistical properties of observed quantities can provide Arctic-wide fields, while data assimilation using comprehensive dynamical models and assimilation methods can, in addition, provide fields that are consistent with the modeled physics. Also, sampling strategies have to take the knowledge of the representativeness of point measurement into account. The temporal and spatial scales, for which a single measurement is representative, depend on local dynamics, external forcing, and the influence of lateral water–mass influxes. Here, we make an attempt to estimate those length scales and timescales in the Arctic Ocean based on observational data from the period 1980–2015. This will be achieved by estimating the autocorrelation function and decorrelation scales of temperature and salinity.

Autocorrelation functions and associated decorrelation scales are useful
measures to characterize physical phenomena occurring in the ocean (Stammer,
1997; Eden, 2007). These functions describe spatial and temporal ranges over
which ocean properties coherently vary, and the scales provide a measure of
the spatial and temporal extent of the variations. The functional form of the
autocorrelation depends on the physical properties, the considered scales
(e.g., synoptic versus mesoscale) and the area. Many studies have estimated
autocorrelation functions through analysis of in situ ocean measurements
(e.g., Meyers et al., 1991; Chu et al., 2002; Delcroix et al., 2005) and
satellite observations (e.g., Kuragano and Kamachi, 2000; Hosoda and
Kawamura, 2004; Tzorti et al., 2016). Generally, the estimated autocorrelation
functions have exponential or Gaussian form (Molinari and Festa, 2000). The
decorrelation scales are usually given by the

Estimated decorrelation scales have been applied to a variety of ocean studies. In the context of dynamical studies, the decorrelation scale is used as a measure of the scale of prevailing phenomena and used to relate dynamical processes with the observed signals (e.g., Stammer, 1997; Ito et al., 2004; Kim and Kosro, 2013). In optimal interpolation and objective mapping, the decorrelation scale gives a measure of influential radius of a point measurement; the autocorrelation function, together with the associated decorrelation scale, provides the weight of a point measurement on mean field estimates (Meyers et al., 1991; Chu et al., 1997; Davis, 1998; Wong et al., 2003; Böhme and Send, 2005). For observation network design, decorrelation scales are one guide to estimate optimal sampling intervals in space and time (Sprintall and Meyers, 1991; White, 1995; Delcroix et al., 2005).

One of the prevalent and growing applications of decorrelation scales is data assimilation. Data assimilation synthesizes observed data and modeled physics based on statistical theories. This is an effective approach to fill the gap between observation and modeling studies (Wunsch, 2006; Blayo et al., 2015). Generally, data assimilation minimizes a model–data misfit with an assessment of errors; the autocorrelation function and the decorrelation scale are necessary for these error assessments (Carton et al., 2000; Forget and Wunsch, 2007). For a model–data misfit calculation, the difference of the spatial (and temporal) scales represented by a model and by the observations should be taken into account. Physical properties simulated in general circulation models (GCMs) represent mean values over each grid cell for a certain temporal period, whereas those from in situ measurements represent values at a localized point in space and in time. The error resulting from the difference of the scales represented by these two approaches is referred to as representation error (see van Leeuwen, 2015 for a summary). The autocorrelation function and the decorrelation scales provide a direct measure of the representation error. In ocean data assimilation, an assessment of the representation error is particularly important, since it is generally an order of magnitude larger than the measurement (instrument) error (Ingleby and Huddleston, 2007).

A necessity of decorrelation scale in ocean data assimilation also comes from the sparseness of ocean measurements. An autocorrelation function is necessary to constrain locations distant from a measurement. Li et al. (2003) pointed out that an assimilation of sparsely distributed data into an eddy-permitting model, without taking its influential radius into account, causes serious problems around the locations where the data are assimilated. Artificial eddies appear around the location of the data, since the density at the data location differs from densities at their surrounding grid points in the model. They also pointed out that the assimilated information disappears on the timescale determined by the model's local advection and diffusion. Note that this situation cannot be solved by applying advanced data assimilation techniques (e.g., 4DVar, EnKF), since the artificial eddies are dynamically consistent with the modeled physics. Autocorrelation function and decorrelation scale provide necessary information to solve such problems by imposing a spatial and temporal radius of influence of each measurement (Forget and Wunsch, 2007; Zuo et al., 2011).

Practically, autocorrelation functions are used to define an “observation operator” in data assimilation systems. The observation operator maps modeled variables onto observational points. If the operator is properly defined, a point measurement will constrain the model, not only at the location where measurements exist but also in areas distant from the measurement. An implementation of such an observation operator makes it possible to fully exploit the potential of sparsely distributed measurements, and can solve problems such as those reported by Li et al. (2003). This is of particular importance as the ocean models used for assimilation become eddy-permitting. An additional important feature of the autocorrelation function is to constrain the scale of temporally varying fluctuations. Unlike the static interpolation approaches, data assimilation provides a four-dimensional analysis field. In order to appropriately assimilate observed temporal fluctuations, the temporal scale of fluctuations should be implemented in the observation operator.

In the midlatitude and equatorial regions, there are a number of
decorrelation scale estimates (e.g., White and Meyers, 1982; Chu et al.,
1997, 2002; Deser et al., 2003; Martins et al., 2015), and these have been
applied for a variety of studies including data assimilation (see the papers
mentioned above). On the other hand, while a few studies have examined scales
of temperature and salinity variability in the Arctic Ocean (e.g., Timmermans
and Winsor, 2013; Marcinko et al., 2015), there has been no assessment of
basin-wide decorrelation scales of

The objective for the following study is to estimate the autocorrelation functions and decorrelation scales of temperature and salinity in the Arctic Ocean at different depths. Few modelling studies have focused on applications of ocean in situ measurements in the Arctic, due to the absence of comprehensive historical archives and representation error estimates. Only the climatology (PHC3.0; Steele et al., 2001) has been widely applied for model validation (e.g., Ilıcak et al., 2016). In recent years, however, assimilations of in situ measurements in the Arctic Ocean have started (Panteleev et al., 2004, 2007; Nguyen et al., 2011; Zuo et al., 2011; Sakov et al., 2012). To promote and enhance the ongoing ocean data assimilations, archiving historical measurements and estimating decorrelation scales are indispensable. To achieve the objective of the present study, we (1) compile historical observations of temperature and salinity in the Arctic Ocean, (2) construct a background mean field necessary for the decorrelation scale estimate, (3) examine the functional form of autocorrelation in temporal- and spatial-lag space, and finally (4) provide an autocorrelation function, decorrelation scales, and representation error covariance, which are directly applicable to error assessment in ocean data assimilation. Note that the estimation of the autocorrelation quantifies basin-scale variability. Smaller-scale variability (e.g., mesoscale eddies on the deformation scale; Zhao et al., 2014) remains unresolved and is an intrinsic part of the autocorrelation function. The study area is the Amerasian Basin. As will be described in Sect. 3, the second step mentioned above requires a different approach for other regions of the Arctic Ocean. The vertical depth range of the analysis is limited to between 0 to 400 m depth due to data availability.

The rest of the paper is organized as follows: Sect. 2 describes the compilation of historical data and quality-control procedures applied prior to the analysis. Section 3 describes the background temperature and salinity field construction and trend analyses. Section 4 describes examination of two-dimensional autocorrelation functions in spatial- and temporal-lag space, and provides decorrelation scale and error covariance estimates. Section 5 gives conclusions.

List of observational data.

Since there is no comprehensive in situ ocean data archive for the Arctic,
we compile historical temperature and salinity measurements with the
objective not only to use the data for the present decorrelation scale
estimate but also to prepare an archive for future applications in model
validation and data assimilation. Since the existing archived data from the
Arctic Ocean are widely dispersed in various datasets with different
formats, we compile these data into one archive with a standard format
focusing on the Arctic and northern North Atlantic Ocean (Table 1). The
original data (Table 1) were acquired from various observational platforms
(e.g., research vessels, moorings, ITPs, and Argo floats) by
conductivity–temperature–depth (CTD) sensors and expendable CTDs (XCTDs). The archiving
effort of this study originates from the data compilation described by Rabe et
al. (2011, 2014) and Somavilla et al. (2013), and is ongoing thanks to support from
many oceanographers. The archived data will be available online
(

The topographic features of the Arctic Ocean

The archived information for each measurement profile includes cruise name,
station number, data type, time stamp, geographical location, bottom depth
(if available), measurement depth (pressure is converted to depth by the
method described by Saunders, 1981), temperature, salinity, data quality
information provided in the original dataset (if available), and data source
information. The spatial coverage of the archived data ranges from
45

Since data obtained from various sources are prone to duplication issues, it
is necessary to identify and remove duplicated data from the archive. A
number of past studies, which compiled large oceanographic datasets, have
suggested various automated procedures to deal with duplicate profiles
(e.g., Ingleby and Huddleston, 2007; Gronell and Wijfefels, 2008; Good et al., 2013).
In this study, we apply a simple duplication-check algorithm suitable for the present application. Since we are concerned
only with basin-scale variability in this analysis, we count profiles that
have small spatial and temporal separations as duplicates. The threshold
applied for time difference between profiles is 1 day (date coincidence) and
that applied for geographical location difference is 0.05

Since the archive contains a number of data that have not been quality
controlled, we apply an additional quality-control procedure (QC) before our
analyses. Note that although we describe the QC procedure as it is applied
to the entire raw dataset in this section, we will use only data from 0 to 400 m
depth (after the QC) in the present scale analysis as mentioned in the
introduction. The QC is composed of two steps: the first step is a grid-based
screening; the second step is an area-based screening. Both steps are
based on statistics of the data samples in discretized depth ranges. We
divide the vertical profiles of temperature (

First, we apply a grid-based screening. The grid-based screening takes the
difference in statistics (mean and standard deviation) in different
locations into account. We define 111 km

Second, we apply an area-based screening for the data deeper than 750 m
depth. In this step, we apply more rigorous statistics calculated from the
entire basin and shelf area. This step is necessary to remove problematic
data in data-sparse areas and data-sparse depth ranges, since the
grid-based screening cannot provide good statistics in these areas due to the
small sample size (no ITP data below 750 m). We classify the archived data
into six subdomains based on the characteristics of dynamical regimes (Nurser
and Bacon, 2014): (1) Amerasian Basin, (2) Amerasian shelf and shelf slope,
(3) Siberian Shelf and shelf slope, (4) Eurasian Basin, (5) Barents and Kara
seas including their shelf slopes, and (6) Nordic Seas (Fig. 2b). Mean and
standard deviation are calculated in individual subdomains. Then, data
outside 5 times the standard deviation (

Temperature

The result of the statistical screening in the Amerasian Basin is shown in Fig. 3. The combined statistical screening successfully removes spurious data in deep depth ranges, while retaining the relatively larger variability in shallow depth ranges. After the combined statistical screening, the vertically discretized data are used for the analyses in the following section.

In this section, we describe the construction of a background mean field of

To derive the scale for the background field construction, we examine the
spatial scale of variation in each depth range (the vertical layers defined
in Fig. 2b are used throughout this study to provide decorrelation scales
directly applicable to data assimilation systems using

To estimate the spatial scale of variation, we introduce a structure function
(Davis et al., 2008; Todd et al., 2013) with the assumption of spatial and temporal isotropy of
variation,

The function

Function

In order to examine the functional form of

The 0- to 90-day temporal average of the function

To closely examine the functional form of

Vertical profile of spatial scale of variation (

The

To take the seasonal variation into account, we divide the observed data into
four seasons (January–March, April–June, July–September, and
October–December), and construct the background mean

Background mean field of

A summary of linear temporal trend in the Amerasian Basin: the
spatial pattern of

For the present anomaly derivation, we also take the temporal trend from 1980
to 2015 into account. The trend is estimated in each
111 km

The warming and freshening trend in the Pacific-water layer has already been reported by many studies (e.g., Proshutinsky et al., 2009; Jackson et al., 2010; Giles et al., 2012; Timmermans et al., 2014). The cooling trend in the central Canada Basin and the warming trend along its southern perimeter are a consequence of deepening of the warm Atlantic water in the central basin and concurrent upwelling of warm Atlantic water at the boundaries, a manifestation of an intensification of the anticyclonic Beaufort Gyre in recent years (e.g., McLaughlin et al., 2009; Karcher et al., 2012; Zhong and Zhao, 2014). Although similar trends can be found in other seasons (from winter to spring), they are not statistically significant.

The temporal trend in each location is used to define a time-varying
background field. Since the temporal distribution of the archived data is not
spatially uniform, the representative time (i.e., the time that the temporal
mean value represents) of the background field

Decorrelation scales used in oceanographic studies are generally defined by
an

The anomaly dataset

Spatial autocorrelation function of temperature

Temporal and spatial averages of the autocorrelation are calculated to
identify its functional form by fitting a suitable empirical function.
Figure 10a and b show the temporal average of the spatial autocorrelation
functions of

Temporal autocorrelation function of temperature

The temporal autocorrelation is also examined by taking spatial-lag averages
(0–20 km) of the two-dimensional autocorrelations of

The spatial and temporal decorrelation scales of

Vertical profiles of zero-lag autocorrelation

Figure 12 summarizes the vertical profiles of the spatial and temporal
decorrelation scales (

Vertical profile of the background mean variance, var

Note that the

The autocorrelation function derived in Sect. 4.1 can be related to an error
covariance by Eq. (9). Since the variance in Eq. (9) used to normalize the
covariance does not depend on spatial and/or temporal separation in
principle (see the assumption in Sect. 4.1), it can be represented by a
variance calculated from all the data in the Amerasian Basin. Therefore, the
error covariance associated with the representation error is given by a
function of spatial and temporal separations,

We examined spatial and temporal scales of

The estimated function and the scales, together with the associated error
covariance, are directly applicable to model–observation misfit calculation
in data assimilation systems, which intend to assimilate a spatially and
temporally varying field. A cost function measuring the model–observation
misfit is given by

The present scale estimates pose a requirement from a basin-scale data assimilation on a sampling strategy. Static interpolation approaches (e.g., optimal interpolation (Gandin, 1965; Reynolds and Smith, 1994), objective mapping (Wong et al., 2003; Böhme and Send, 2005; Böhme et al., 2008), and data-interpolating variational analyses (Troupin et al., 2010, 2012; Korablev, 2014) exploit statistical information of data to derive a mean analysis field. Data assimilation approaches, in addition, exploit modeled physics and provide temporally and spatially varying four-dimensional analysis fields. The former approaches need a scale representing the mean field, while the latter, in addition, needs spatial and temporal scales representing the anomaly field to fully exploit the information embedded in in situ data. For Arctic Ocean studies, statistical interpolation has been using decorrelation scales of 300–500 km (Steele et al., 2001; Proshutinsky et al., 2009; Rabe et al., 2011, 2014), while the present study suggests the necessity of a smaller measurement interval (150–200 km in space and 100–300 days in time) to describe the anomaly field by a basin-scale data assimilation.

Further studies are necessary to interpret the decorrelation scale of

The data in the Amerasian Basin were collected and made
available by the following research programs: Arctic Switchyard project
(

Since data analysis software based on geostatistical approaches (e.g.,
iSATiS, SURFER) is used in oceanographic studies in recent years, it is
useful for providing a summary of the relation between the current approach
and geostatistical approaches. The spatial scale of variation estimated in
Sect. 3.1 is a different notation of the variogram concept used in
geostatistics. In the present formula, we normalize the variance by the sill
of the variogram, and a root-squared value is considered. This is because a
variogram deals with a variance (i.e., spatial scale of the squared
difference between two measurements), while we intend to quantify the spatial
scale of difference between two measurements. We also defined the function by
the value subtracted from 1, in order to obtain a function decaying to zero
at infinity. This is done for mathematical convenience in order to obtain a
Gaussian-like function. This is preferable for the framework of the best
linear unbiased estimator (BLUE), which is constituting the basis of data
assimilation theories. Since the spatial scale of variation originates from
the same concept as variograms, it can be related to the terminology used in
geostatistical approaches. The function

Woods Hole Oceanographic Institution provides ITP temperature and salinity
data at different levels of processing; here, we use both level-3 (final
processed data) and uncalibrated level-2 data when level-3 data are not
available (see Krishfield et al., 2008b). Profile-by-profile conductivity
calibration (not applied to the level-2 data) accounts for conductivity
sensor drift. The calibration method applied to level-3 data is to adjust the
potential conductivity of each profile to the value derived from
bottle-calibrated CTD stations on the deep 0.4

Vertical profiles of standard deviation of

As a measure of the uncertainty of the uncalibrated ITP level-2 data, we
calculate deviations of the ITP level-2 data from the background mean field
(Sect. 3.2). We assume that the standard deviations of the background field
derived from all data represent the natural variability of

To understand the source of the second peaks found around the 200–300 km lag in
the spatial autocorrelation functions, we examine their relation to the
background mean fields. The second peaks in the autocorrelation functions are
always found where the corresponding

Vertical profile of the spatial decorrelation scales estimated from the second peak of the spatial autocorrelation function (see Fig. 10a, b). The scale is obtained from a Gaussian function fitting with two points: zero-lag autocorrelation value from Fig. 12a and the second peak. The second peak is defined by the highest autocorrelation value, the spatial lag of which is larger than 150 km. A three-layer vertical filter is applied to eliminate noise.

The coincidence between the second peak and the circular structure of the
Beaufort Gyre indicates that the peak captures a coherent variation of
isothermal (isohaline) depth. We employ level depth surfaces for the present
analysis; bowl-shaped isosurfaces of

The supplement related to this article is available online at:

The authors declare that they have no conflict of interest.

The authors sincerely appreciate three anonymous reviewers for their
thorough reviews and criticism on the first and second manuscripts, and two
anonymous reviewers for the constructive comments on the third manuscript.
Funding by the Helmholtz Climate Initiative REKLIM (Regional Climate Change),
a joint research project of the Helmholtz Association of German research
centers (HGF), is gratefully acknowledged. This work has partly been supported
by European Commission as part of FP7 project Ice, Climate, and Economics –
Arctic Research on Change (ICE-ARC, project no. 603887). We also would like to
express our gratitude towards the German Federal Ministry of Education and
Research (BMBF) for the support of the project “RACE II – Regional Atlantic
Circulation and Global Change” (03F0729E) and various observational efforts
listed in Table 1. The GFD-DENNOU library
(