The CORA 5.2 dataset: global in-situ Temperature and Salinity measurements dataset. Data description and validation

We present the Copernicus in-situ ocean dataset of temperature and salinity (version V5.2). The 1 ocean subsurface sampling varied widely from 1950 to 2017, as a result of changes in the instrument 2 technology and development of in-situ observational networks (in particular, tropical moorings, Argo 3 program). Thus the global ocean temperature data coverage on an annual basis grows from 10% in 1950 4 (30% for the North Atlantic basin) to 25% in 2000 (60% for the North Atlantic basin) and reaches a plateau 5 exceeding 80% (95% for the North Atlantic Ocean) after the deployment of the Argo program. The average 6 depth reached by the profiles also increased from 1950 to 2017. The validation framework is presented, and 7 an objective analysis-based method is developed to assess the quality of the dataset validation process. 8 Analyses of the ocean variability are calculated without taking into account the data quality flags (raw 9 dataset OA), with the near real-time quality flags (NRT dataset OA) and with the delayed time mode quality 10 flags (CORA dataset OA). The comparison of the objective analysis variability shows that the near real11 time dataset managed to detect and to flag most of the large measurement errors, reducing the analysis error 12 bar compared to the raw dataset error bar. It also shows that the ocean variability of the delayed time mode 13 validated dataset is almost exempt from the random error induced variability. 14


Introduction
Estimating the temperature and salinity ocean state is critical for documenting the evolution of the ocean and its role in the present climate.To do so, the scientific community relies on in-situ measurements at a global scale and 5 into global datasets.
Among the global datasets, one can cite the world ocean database (Boyer et al, 2013, hereafter WOD) and the EN4 database (Good et al. 2013, www.metoffice.org) distributed by the UK Meteorological Office. Here, we present CORA (Coriolis Ocean dataset for ReAnalysis), a dataset distributed by Copernicus Marine Service (hereafter CMEMS) and produced by Coriolis. CORA differs from these earlier datasets by choices in the construction and 10 the production of the dataset. Indeed, WOD is validated with the highest quality control methods at 102 vertical levels, whereas the EN4 profiles are limited to a maximum of 400 vertical levels and is automatically validated (Ingleby and Huddleston, 2007). CORA conversely retains data at the highest vertical resolution. The choice of reducing the number of levels in the data validation and in the dataset, construction helps to quickly cluster new measurements to the dataset and provides easy to handle datasets. On the other hand, these methodologies result 15 in a loss of measurements potentially available for the scientific community, through the vertical sampling of the profiles or in the data validation. In the construction of CORA, all the measurements available are kept, then an automatic validation is first performed followed by a manual/individual check (Gaillard et al. 2009, Cabanes et al, 2013. This validation framework requires the production of two datasets, a near real-time validated dataset, distributing the profiles within days after collection, and a delayed-time validated dataset, covering in year n the 20 historical period up to year n-1. This choice, made in the early versions of CORA, has been retained in the latest one that we describe here.
The global ocean heat content (GOHC) increase has been observed on decadal time scales, whether it is in the upper layers of the ocean (Domingues et al, 2008, Ishii and Kimoto, 2009, Levitus et al, 2009, below the thermocline (Von Schuckmann and Le Traon, 2011) or in the abyss (Purkey and Johnson, 2010). Beside the 25 influence of the mapping method and the baseline climatology (Abraham et al, 2013, Cheng and Zhu, 2015, Boyer et al. 2016, Gouretski, 2018, the data validation performed on in-situ measurements has a direct influence on the estimation of global ocean indicators such as GOHC, global freshwater content and sea level height (Abraham et al, 2013, Gouretski, 2018. As an example, differences in the GOHC estimation in the Johnson et al, 2010 analysis compared to the Lyman et al. (2010) analysis have been shown to result from quality control issues. The particular 30 case of XBT measurements (Levitus et al, 2009, Cheng et al, 2009) influence on the GOHC estimation is well documented. Systematic errors in other instrument types may also introduce systematic biases leading to biases in the GOHC estimation (Lyman et al, 2006, Willis et al, 2011. The validation of a quality control method is thus a critical task to ensure that the dataset flags are accurate enough to flag erroneous measurements without biasing the dataset. The uncertainty surrounding the quality assessment of large oceanographic dataset being a critical 35 topic in the ocean climate studies, we propose here a method of global dataset quality assessment and we apply it to the near real time validated and delayed time mode validated datasets.

3
We will first list the data sources of the CORA measurements in section 2. A description of the CORA data space and time repartition will be reported on section 3. Then, the quality control procedure will be described in section 4. Lastly, gridded temperature and salinity fields are calculated using an objective mapping that is presented in 40 section 5. The results of the dataset validation and quality assessment are finally discussed on section 6.

Data providers
The CORA 5.2 dataset is an incremental version of the previous CORA datasets, covering the period 1950 to now 45 and distributed by CMEMS. Most of the CORA profiles are first collected by the Coriolis data center and validated in near real time mode. Coriolis is a Global Data Assembly Centre (DAC) for the Argo program (Roemmich et al. 2009). It collects Argo profiles from the regional Data Assembly Centers (DACs) and distributes them to the community. Coriolis also collects XBTs, CTDs and XCTDs measurements from French and Europeans research programs as well as from the Global Telecommunication System (GTS), Voluntary Ship System (VOS), 50 subtropical mooring networks (TAO/TRITON/RAMA/PIRATA programs from PMEL). A major effort has also been made to include smaller datasets to the Coriolis dataset that are available in delayed time mode, such as the ITP and CTD profiles from the ICES program, Sea Mammals measurements from MEOP (http://www.meop.net) and validated surface drifter data. Delayed time mode measurements have also been loaded from the Word Ocean Database (WOD13) and the French Service Hydrographique de la Marine (SHOM). It should be noted that in the 55 case of a profile distributed by Coriolis in real time mode and by one of these datasets in delayed time mode, the delayed time mode validated profile replaces the real time mode profile in the CORA database.
Last, recent comparison of the CORA profile positions with the EN4 dataset (metoffice.gov.uk) have shown that some of the profiles distributed in EN4 were not in CORA previous versions. A partnership with the EN4 teams allowed us to detect and to import most of those profiles. 5069864 profiles have been imported in this way, 60 covering the period 1950-2015. However, contrary to the other measurements, the profiles from the EN4 database are not reported with a pressure measurement, but instead with depth and with a maximum number of reported levels in an individual profile set to 400. The issue of the inhomogeneity in the dataset with respect to the vertical sampling, will be discussed.

Dataset description
The CORA dataset aims to provide a comprehensive dataset of in-situ temperature and salinity measurements from 1950 to 2017. The oceanic temperature and salinity measuring instruments have however radically changed during the last 70 years.As a result, the origin and characteristics of data distributed in CORA dataset widely varied in 70 time (Fig : 1) Most of the profiles collected prior to 1965 are mechanical bathythermographs (MBT) measurements or Nansen casts. From the late 1960s to 1990, the most common profile are from the expendable bathythermographs (XBT), developed during the 1960s and widely used by navies. Most of the XBT profiles collected during this period are T4 type sensor, measuring temperature above 460 meter depth.

4
The development of the Sippican T-7 instrument with a maximum depth of 1000m slowly increases the number 75 of measurements between 460m and 1000m during the 1980s (see Fig : 2 for the dataset measurements distribution with depth). An instrument capable of measuring conductivity, temperature and pressure (CTD) was developed in the 1960s, allowing an accurate estimation of sea salinity and temperature. The yearly amount of CTD profiles in the CORA dataset then slightly increased reaching a plateau of about 20000 profiles in the early 1990s.
During this period, the largest density of profiles is found in the North Atlantic Ocean, with a coverage ratio, 80 calculated on a 3° per 3° grid with a one year time step, increasing from 30% in 1950 to a plateau of 60-70% in the 1970s (Fig: 3). The North Pacific mean sampling rate is lower than 10% before 1965, with the largest portion of the collected profiles located close to the Japanese and North American coasts and along a transect connecting the USA West coast to the Hawaian archipelago (not presented). It quickly increases from 1965 to 1970 to reach about 50% in the early 1980s with a more homogeneous spatial resolution. Before 1974 in the other ocean basins, 85 most of the collected profiles are found in the coastal zone and along a few ship tracks. The coverage then slightly increases in the western part of the Indian Ocean and in the eastern part of the South Pacific Ocean, increasing the associated basin sampling rate from 10 % in 1965 to 20-25% in 1990. The Austral Ocean sampling rate remains however around 5% during the whole period.
During the 1990 decade, the yearly number of XBT profiles strongly decreases while the number of bottles and 90 CTD profiles slightly increases. The counter-intuitive behavior is mostly caused by a lack of XBTs in the Coriolis database during the 1990s. The yearly number of XBTs should indeed decrease slowly during the 1990s and reach the CORA level by the end of the decade. This problem should however be fixed in the next version of CORA.
The measurements provided are however deeper than in the previous decade, leading to a better coverage below It must be emphasized that a fraction of the profile numberincrease of the early 2000s results from the data acquisition from high frequency measurement devices such as the ocean drifters, the thermosalinographs (TSGs), both near the ocean surface, or undulating CTDs either towed or untowed (scanfish, seasoar, gliders,...). Indeed, each undulating CTD profile and each independent TSG or drifter measurement is treated as an independent 110 profilewhile one could also cluster them by instruments of by cruise. The dataset structure we retained is however easier to handle by the ocean reanalysis community and leads to a more homogeneous dataset file structure. This dataset structure is also adopted for the mooring measurements which in some cases are also collecting data at high frequency. This large number of mooring data induces a large increase of measurements such as at 250m and 500m depths, whereas at the surface, the large increase is due to data from TSGs and drifting buoys.

Near real time validation
The near real-time dataset validation tests are mostly taken from the Argo real time quality control tests (Wong et al. 2009). The goal is to distinguish the spurious measurements from the good measurements and to flag them quickly. The test checks are designed to detect well known types of errors. A global range test and a regional range test are performed to detect obvious errors with respect to known ocean variability. The bounds of those two tests 135 are very large with respect to the known ocean variability to ensure that no bad flag would be incorrectly attributed.
A spike test and a gradient test are performed to detect measurement spikes in the temperature and salinity fields.
The test is based on the comparison of the temperature and salinity vertical gradient to a threshold. The test thresholds are set large enough to lower the number of incorrect spike detections corresponding to a sharp, yet correct, thermocline or halocline. The stuck value test aims to detect temperature or salinity profiles with a constant 140 value within the vertical reported inaccurately.
A second step in the near real time quality control is performed daily on the Argo profilers distributed by Coriolis using an objective mapping detection method (Gaillard et al. 2009). Following the framework developed by Bretherton et al. (1976), the residual of the objective analysis depends on the covariance from data point to data point. Thus, this second check step aims at detecting measurements departing from other data in its vicinity. The 145 correlation scale in the objective analysis varies with depth and latitude. Spurious detections can however occur when profiles located on both sides of a frontal zone are within a correlation radius. Therefore, detected profiles are visually checked by a PI to distinguish erroneous measurements from correct measurements.

6
Lastly, a quality control based on altimetry comparisons is also performed on a quarterly basis to improve the real time validated dataset (Guinehut et al. 2009). A PI investigation is also performed on profiles flagged as suspicious 150 by comparison with altimetric sea level.

Delayed time mode validation tests
The delayed time mode validation is performed on a yearly basis. This validation framework is based on tests more stringent than the near real-time validation process, which requires a systematic visual control by an oceanographer. The controlled profiles are those which have not been controlled in the previous version of CORA. This study has shown that most of the profiles flagged in EN4 and not in CORA were detected by these three tests and that applying a visual control to the profiles detected in this way results in more accurate flags. The tests have been described in Ingleby and Huddleston, 2007. 180 The stability test detects density inversions for profiles where both temperature and salinity are available. The density inversions with 0 > d\rho >-0.03 kg.m3 are dismissed. Both temperature and salinity are visualized for profiles with larger density inversion. Experience has shown however that most of the density inversions detected in this way are caused by small spikes in the salinity measurements, probably a consequence of anomalies in the conductivity measurement or alignment with temperature when estimating salinity. The spike test is designed to 185 detect the temperature and salinity spikes and steps. It runs with a threshold of temperature and salinity variability 7 varying from 5°C in surface to 1.5° C below 600 meter depth for temperature and from 1 PSU at surface and 0.2 PSU below 300 meter depth for salinity. These tests differ from the real time QC test since the trigger points are lower. They however sometimes create `false positive' detection either by detecting the wrong point on a spurious profile or by detecting a correct measurement. A systematic PI visual flag selection is then performed on each of 190 the detected profiles. Level disorder and duplicated levels The profiles with a non-monotonous PRES or DEPTH vector are detected and the PRES or DEPTH vector are flagged in order to be monotonous. This test has been requested by the CORA end users, the oceanographic reanalysis community, to have a user friendly dataset to work with. Most of the detected profiles are indeed measurements with a very slow sinking speed near the surface, giving pressure vector inversion when exposed to 195 the sea surface swell. Most of the detections are thus confined to the surface layer. Exceptions may however occur in the case of Black Sea Argo floats for which a recurrent problem of slow sinking speed is found at sub surface due to the low salinity level of the Black Sea. Last, "hedgehog" type profiles, with very spiky temperature, salinity and pressure vectors, which are often caused by transmission mistakes on Argo floats, are detected by this test.

200
The global range test aims to detect obvious measurement mistakes. The Temperature measurements under -3 °C or over 43°C and the salinity measurements under 0 PSU or over 46 PSU are detected. This test has a very low detection rate, but it still detects some erroneous profiles each year. Most of them are profiles with a non-classical shape so that they avoid detection by redundant tests (Minmax test or climatological test). A recent example was an Argo float grounded near Mogadicio, Somalia, measuring a temperature exceeding 43°C, whereas the 205 corresponding pressure was just above 0 decibar, so that the measurement avoided the other NRT and delayed time mode tests confined to depths between 0 and 2000 m.
The following step of the CORA data validation is performed in the Coriolis datacenter to detect profiles diverging from the known ocean variability. Each temperature and salinity profile is compared with the minimum and maximum measured value reference profiles. Those profiles originate from reference fields on a gridded mesh

245
The relevance of ocean climate studies strongly depends on the accuracy of ocean measurements. Systematic data errors might thus result in biasing the estimation of ocean state indicators such as the GOHC, the global ocean freshwater content or the global mean steric height (Levitus et al. 2009). Furthermore, random measurement and 250 data error may lead to overestimate the ocean variability. Therefore, indirectly, one can assess the reliability of the global dataset by estimating the influence of the quality control on global metrics such as the ocean mean temperature and salinity and the associated variability.
Two mappings of ocean temperature and salinity based on the CORA dataset measurements are calculated: a raw 255 estimation (GOHCraw) which considers every measurement without taking the data quality flags and a flagged estimation (GOHCflg) which only consider the good and probably good QCs.
Interpolated fields are calculated following the method presented by Forget and Wunch, 2007 that has the advantage of not biasing mean fields and not relying on specifying them. The global ocean is divided in 1° per 1° 260 grid cells with 10 m vertical layers from the surface to 1500 m depth. A first estimation of the mean parameter for a given month is given by calculating the mean of the temperature or the salinity data measured in a given cell.

Commenté [TSOBP03]: Details sur la méthode?
The variance field is estimated by taking the variance of the measurements located in a given cell, if the number of available measurements is greater than 4.

265
A spatial weighting function is defined: [2] With: : The combined variance is estimated with a similar operator.

290
The objective analysis is performed at three steps of the global dataset. A first analysis is performed on a raw dataset, considering all available profile measurements. All the QC flags are considered good. A second analysis is performed on the same data profiles considering the QC available on NRT mode. A third one is performed on the same profiles considering the QC available on delayed time mode.

295
The ocean data coverage is sometimes insufficient to perform the monthly objective analysis on the whole ocean.
As a result, we have limited this study to the latitude between 60°N and 60°S since the ocean data coverage is too sparse out of these limits, leading to random anomalies in the temperature and salinity variability.  A striking feature is the corresponding spike visible in the NRT analysis and in the raw dataset analysis in late 360 2010, which suggest that major data errors have not been flagged in the dataset during the NRT validation. Further exploration of this anomaly has shown that a fraction of the larger error bar in the NRT analysis is caused by an issue in the update of delayed time mode processed Argo profiles. In a few cases when salinity measurements present large drifts, the Argo PIs can decide that the salinity drift is too high to be adjusted. In these cases, the PI provides to the global DAC a delayed time version of the profiles with an adjusted temperature field, but with a 365 practical salinity field filled with fillvalues and a salinity QC field filled with "4" values (bad measurement status).
In some cases, the Coriolis data center had updated the profiles by getting the temperature adjusted field but without creating a salinity adjusted field. The available salinity field and QC in the Coriolis datacenter is therefore the original salinity field which might not have been flagged at "4". In this study, a handful of these profiles, often associated with large salinity measurement drifts (for instance salinity values on the order 20 PSU in the Indian 370 Ocean) have produced large error bars in the NRT analysis fields. This issue will be soon tackled in the Coriolis database.
The CORA analysis salinity standard deviation slowly varies in time, with an order 0.15 PSU in the surface layer, an order 0.1 PSU in the 75m depth -125m depth layer and an order 0.08 PSU in the 275-325m depth layer and 375 below 0.05 PSU in the deeper layers. This behavior is a consequence of the delayed time mode validation process which strongly reduces the number of random mistakes in the dataset. This variability is probably a function of the local data resolution, the oceanic variability and measurement errors. The slow variability of the CORA salinity standard deviation and its reasonable range suggests that remaining errors in the dataset will not have a large importance. Thus this product is likely to present a low error amplitude.

13
A closer look at the vertical profiles of the temperature and salinity mean variability (Figure 9 and 10) shows that the CORA analysis temperature and salinity variability is far smaller than the RAW analysis and the NRT analysis estimation. The depth variability of the temperature and salinity mean variability is moreover closer to the expected 420 oceanic variability, with a maximum ocean variability at the surface or close at sub surface with decreasing variability below the ocean mixed layer depth. We however lack a reference high quality dataset to compare with to prove that the CORA dataset is not decreasing the global ocean variability by over-flagging good data. . Indeed, one should keep in mind that most of the flags applied on these profiles are manually applied by physical oceanographers after receiving a detection alert, and that the rate of flagged profile in the CORA analysis is lower 425 than the rate announced for a reference dataset and analysis based on automatic quality control tests (Gouretski et al. 2018).

435
The CORA dataset is an extensive dataset of temperature and salinity measurements. Efforts have been made to provide the scientific community withinformation as close as possible from the physical measurement and to perform a strict quality control on all profiles. The CORA dataset indeed stands out from the EN4 dataset since the delayed time mode validation is based on automatic detections and systematic PI decision, reducing the number 440 of mistaken bad flags. In addition to that, the profiles are not subsampled and the time series (TSGs and drifters) are distributed. It also stands out from the WOD dataset since all measurements within a profile are validated in delayed time mode, reducing the number of mistaken measurements.
Moreover, this study develops an innovative method to assess the overall quality of a dataset. This method shows the improvements of the dataset quality flags thanks to Coriolis real time QC and the CORA delayed time mode

445
QC frameworks. This method however lacks a comparison with an analysis based on other datasets to ensure that the CORA validation framework is not constraining its description of the ocean variability by over flagging good measurements. This discussion shall be further pursued.This method is based on the mapping of the Ocean variability. It is thus implicit that the ocean sampling is homogeneous and sufficient to perform a monthly analysis.
These conditions are met at a global scale and for the ocean measurements from surface to 2000 m depth since the 450 full deployment of the Argo network. Last, the ocean data coverage is however insufficient to have a global coverage before 2005 (see Fig.3 for the ocean basin data coverage ratio), especially at depth larger than 1000 m between 1990 and 2005 and at depth larger than 500 m before 1990, as seen on Fig.2. The method will thus have to be adapted to the ocean data coverage to provide a synoptic view of the dataset quality.