A turbulence data reduction scheme for autonomous and expendable profiling floats

. Autonomous and expendable profiling float arrays such as deployed in the Argo Program require the transmission of reliable data from remote sites. However, existing satellite data transfer rates preclude complete transmission of rapidly sampled turbulence measurements. It is therefore necessary to reduce turbulence data onboard. Here we propose a scheme for onboard data reduction and test it with existing turbulence data obtained with a newly developed version of a (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)

In this 2019 experiment, one of the shear probes on one of the two units malfunctioned. Hence, the dataset for this paper contains approximately 25% fewer shear data than fast thermistor data.
3 Conversion of measured voltages to physical units 95 The core of our data reduction scheme uses power law fits of voltage spectra that are calculated onboard, and subsequently converted to meaningful turbulence quantities in post-processing. Additional voltage quantities are also recorded to determine temperature, pressure, and profiling speed.
3.1 Nomenclature and conventions -All quantities measured by FCS that are discussed in this paper are sampled at 100 Hz.

100
-All voltage spectra are frequency spectra and denoted Ψ x ( f ) (where x is a label such as s for shear) with units of V 2 Hz −1 .
-Physical spectra of shear and temperature gradient are wavenumber spectra and denoted Φ x (k) with units of s −2 cpm −1 and K 2 m −2 cpm −1 , respectively. Figure 4 is an exception in which shear spectra are frequency spectra: Φ s ( f ).

105
-Wavenumber k has the unit cycles per meter (cpm). Expressions quoted from other papers may differ by factors of 2π for wavenumbers in radians per meter.
-The Kraichnan model spectrum Φ Kr primarily depends on the dissipation rates of turbulent kinetic energy and temperature variance (ε and χ), but it also depends on the molecular viscosity ν and molecular thermal diffusivity D T . For brevity, we write Φ Kr (k, ε, χ) rather than the more complete 110 Φ Kr (k, ε, χ, ν, D T ). Similarly, the Nasmyth spectrum is written Φ Na (k, ε) rather than Φ Na (k, ε, ν).
In cases where the arguments are unambiguous or unimportant, we simply write Φ Na and Φ Kr .
-A pair of angle brackets, ⟨·⟩, denotes the mean value over a segment of length N seg = 512 data points. This equates to ∼1 m at our nominal profiling speed of 0.2 m s −1 .

125
The voltage reported by the shear probe V s is linearly proportional to shear: where W is the flow speed past the sensor. The overall engineering calibration α includes the seawater density ρ, the analog circuit gain G s (equal to 1 for FCS circuitry), the probe sensitivity S s (∼0.25 × 130 10 −3 V m 2 N −1 ) and the differentiator time constant T s (∼1 s).
The linearity in V s admits a simple link between the physical and voltage spectra: where H 2 s (k) is the transfer function that accounts for (i) spatial averaging by the shear probe of highwavenumber motions and (ii) analog and digital filtering of the raw voltage signal (see Appendix A).

135
Note also the use above of the following relation:
The gradient of this ::: the :::::::::::: temperature calibration is Over 5 s time scales, we consider V T to be constant. Consequently, the small-scale vertical temperature 150 gradient T z is linearly proportional to the differentiated voltage V Tt . To demonstrate, we first rewrite T z in terms of more directly measured quantities: The first quantity on the right-hand side is Eq. (7), the last is 1/W , and the second is where C Tt is the gain of the analog differentiator.

180
No smoothing is necessary before calculating ∆V P because its magnitude is so much larger than the quantization of the signal (this being the limiting factor for precision of pressure recorded by FCS). In physical units, P is precise to 0.003 dbar, which is O(300) times smaller than ∆P.
Wave orbitals can introduce variability when W is small (≲0.15 m s −1 ). As a diagnostic we calculate and record the minimum value of W for each segment. This also helps to identify the beginning and end of 185 profiles as shown in Appendix B. In standard processing, we would derive W (t) from the pressure signal low passed at 2 Hz. To avoid the need to low-pass filter the signal onboard, we instead make 10 estimates of W (t) per segment and take the minimum of these: where t i = 1, 51, 101, . . . , 501. Even with this sampling of every 50th element, which follows from sub-190 sampling a 100 Hz signal at 2 Hz, ∆V P (t i ) is large enough that smoothing is unnecessary.

Summarizing
Nasmyth spectra with f 1/3 fits 205 Shear measurements ideally capture both the inertial and viscous subranges and hence use a wide band of the measured spectrum to derive values for ε. In practice, noise and sensor resolution limit how well the true environmental spectrum is resolved. Conventional work-arounds exploit the Nasmyth model spectrum Φ Na (k, ε) (Nasmyth 1970;Oakey 1982). One approach is to iterate toward a solution in which the integral of Φ Na over a specific wavenumber band matches that of the measured spectrum Φ u z (e.g., Moum 210 et al. 1995). Another is to find the best fit of Φ u z to Φ Na by using maximum likelihood estimation together with a model of the expected statistical distribution of the spectral coefficients being fitted (e.g., Bluteau et al. 2016).
Here we develop a new and simpler two-stage approach to fitting shear spectra to Φ Na . In the first stage, we use an f 1/3 power law fit over a fixed frequency range of f l to f h = 1-5 Hz, where f 1/3 follows from 215 the assumption that we are fitting over the inertial subrange. In this ::: the ::::::: second :::::: stage, ::: we :::::::: correct ::: for :::::: when ::: this ::::::::::::: often-invalid :::::::::::: assumption. : :: In :::: the ::::::: inertial : subrange, shear spectra are proportional to k 1/3 and hence also f 1/3 since f = W k. With N fft = 256 and 100 Hz sampling (Sect. 3.1), spectral coefficients are separated by frequency increments of use bounding frequencies of 0.98 and 4.88 Hz as these are half-integer multiples of 0.39 Hz, but for brevity we will write these as 1 and 5 Hz throughout.) Our choice of f l = 1 Hz is dictated by a requirement that we avoid low frequency contamination induced by (i) advection by wave orbital motion and (ii) pitch and roll motions of the profiler. Together, these dominate below 0.3 Hz. Setting f l = 0.5 Hz would add only one more spectral coefficient. Our choice of 225 f h = 5 Hz is a trade-off between maximizing the bandwidth of the fit and minimizing how much measured spectra are subject to either noise or viscous roll off. Other profilers may benefit from different frequency bounds (see Sect. 8).
Our inertial subrange assumption is often false. Indeed, 'assumption' is perhaps a misnomer as we do not expect it to be true; we know that viscous roll off will often occur at frequencies lower than 5 Hz

230
(25 cpm for a nominal value of W = 0.2 m s −1 ). However, because there exists an analytical expression for the viscous roll off, we are able to derive an exact expression that quantifies how much ε is underestimated.
This is the second stage of our approach. We derive an expression for the correction function F Na in such a way that it can be calculated in post-processing. The benefits of this approach are that (i) we can fit uncalibrated (i.e., voltage) spectra and (ii) it simplifies the actual onboard fitting routine (Sect. 4.2).
Nasmyth spectra can be flattened to unity over the inertial subrange with the normalization 8.05k 1/3 ε 2/3 ( Fig. 1b). Values of F Na are based on the mean of these flattened spectra over the wavenumber range k l -k h To remove the dependence of the true value of ε, we substitute using Eq. (17) . Further, to account for the H 2 s (k) factor in Eq.
(3), we make the substitution Think of this substitution as inverting the conventional way that H 2 s (k) is invoked. Usually, a measured shear spectrum is amplified at high wavenumbers by 1/H 2 s (k) and then fit to the model spectrum Φ Na .

265
Here, instead of amplifying the measured spectrum, we reduce the model spectrum. With this latter approach, H 2 s (k) is calculated and applied only during the post-processing stage. (It changes F Na by only ∼5% since we fit over relatively low wavenumbers.) Altogether, the substitutions result in ::::::::: Eq. (17) :: to :::::::: produce an implicit function for F Na , which can be solved numerically: init /F Na :::::::::::::::::  Figure 1. Calculation of the correction function F Na for two values of ε. For ε = 1 × 10 −6 W kg −1 , a k +1/3 power law is a good approximation of the Nasmyth spectrum over the frequency range f l -f h (1-5 Hz) for a profiling speed of W = 0.2 m s −1 . Although the same is not true for ε = 1 × 10 −9 W kg −1 , we can account for the roll off with a factor of F Na . F Na can be defined in terms of either ε (Eq. (18)) or ε init (Eq. (19)). Panel b takes the former approach.
In practice, we must take the latter approach since we do not know ε until after it is derived from ε init and F Na .
(3) and Eq. (16) gives where we have left out H 2 s (k) since it has been incorporated into F Na . Rearranging and substituting k = f /W gives Then, to solve for ε init , we use a least-squares fit (see Appendix C): where the sums are understood to be over the range f l -f h . The quantities α, W , and ε init are calculated in post-processing.

Quality control of the shear spectral fits
Measured shear spectra are often quality controlled either by manual visual inspection or, more objectively, by quantifying the level of mismatch between them and their associated model. Possible mismatch 300 quantities include the mean absolute deviation or the variance of the ratio Φ u z /Φ Na (e.g., Ruddick et al. 2000;Bluteau et al. 2016). We cannot calculate such quantities with our reduced scheme because we do not know what each spectrum should look like until we calculate its ε value in the post-processing stage.
(Recall that Φ Na is a function of ε.) By this stage, we have lost information about the spectral shape through the summing operation in Eq. (24).

305
To retain at least some information about the shape of each voltage spectrum, we will split the 1-5 Hz range and compute two fits rather than one. Doing so allows for a first-order check that the spectrum over the 1-5 Hz range approximately follows the expected shape.
When ε is small (≲ 10 −9 W kg −1 ), the fit score may be consistently low if spectral coefficients in the f m -f h range are affected by noise and consequently ε m-h ≫ ε l-m . For such cases, we choose to use only the lower-frequency fit. We would rather have a more accurate estimate of ε and forgo the fit score than 330 have a biased-high ε value with a biased-low fit score. (Either way, the small values of ε in question will have minimal effect on any averages given that turbulence is approximately lognormal :::::::::::: distributions ::::: have :::: high ::::::::: kurtosis, ::: so ::::: high :::::: values :::::::::: dominate ::::::: means.) Specifically, where the threshold is equivalent to k < 0.1/η with η the Kolmogorov length scale estimated from ε l-m .

Test of the reduction scheme for ε
To test the accuracy of the shear reduction scheme described in the previous section, we apply it retrospectively to the dataset from the 2019 test cruise (Sect. 2). We compare the results to those obtained with the standard processing scheme. This standard scheme (Appendix D) features a more sophisticated 340 despiking routine than used for our reduced scheme, which employs a three standard deviation threshold filter (Appendix E).
(c) As for panel a, but uncorrected (F Na = 1).

Summarizing Kraichnan spectra with f 1 fits 360
Here we take the Kraichnan spectrum Φ Kr (Kraichnan 1968) as our model and for its low-wavenumber approximation we use the viscous-convective subrange, which scales as k +1 . In units of K 2 m −2 cpm −1 , Φ Kr and its approximation are as follows (e.g., Peterson and Fer 2014):

365
where the Batchelor length scale λ B = (νD 2 T /ε) 1/4 and q is a constant taken to be 5.26. This expression does not include a k +1/3 inertial-convective subrange, which we ignore here as it increases the integral of the temperature gradient spectrum from k = 0 to k = ∞ by less than 1% and therefore has negligible effect on our results. function F Kr as F Kr is not raised to a power like F Na (Eq. (17)). For small values of k, Φ Kr ∝ χ whereas Φ Na ∝ ε 2/3 .
The derivation of F Kr is equivalent to F Na . We therefore present only the result:

375
Note that F Kr (ε, χ init ,W ) depends on the underestimate χ init , but the 'true' or 'corrected' value of ε calculated in Sect. 4.

Quality control of the temperature gradient spectral fits
The approach to quality controlling the fast thermistor data is the same as that for shear (Sect. 4.3). That is, we fit Ψ Tt over f l -f m and f m -f h (1-3 and 3-5 Hz). This ultimately provides two estimates of χ for each spectrum, which are combined as follows: We do not apply a low χ threshold equivalent to Eq. (27).
7 Test of the reduction scheme for χ Profiles of χ from the reduced scheme compare well to the standard processing, albeit with a small bias 395 in one direction for low values and in the other direction for high values (Fig. 8). Across all values, the two approaches agree within a factor of two 78% of the time (Fig. 9). By comparison, 82% of segments exhibit a factor-of-two agreement between χ values from the two fast thermistors on the same unit.
Compared to shear spectra, non-dimensionalized temperature gradient spectra have lower fit scores.

All frequencies
Kraichnan f l -f h Figure 10. As for Fig. 7, but for temperature gradient rather than shear.

Setting the scheme's parameters 410
Our scheme requires a few user-defined parameters: f l , f h , N seg , and N fft . For this paper, we based these partly on the profiling speed and scientific goals of FCS. For a different profiler, we suggest the following: -Choose f h based on a typical profiling speed such that k h = f h /W ≈ 25 cpm for a nominal profiling speed W . For a wide range of ε values, 25 cpm is close to, or beyond, the peak of the Nasmyth spectrum (Fig. 1). Further, ε can be sensitive to the wavenumber fitting range, but centering the fit 415 near k ≈ 10-20 cpm minimizes this sensitivity (Bluteau et al. 2016). As an example, if FCS profiled at ∼0.5 m s −1 , we would consider setting f h ≈ 12 Hz.
-Keep f l in the range 0.5-1 Hz. Although our scheme uses the inertial subrange (i.e., low frequencies/wavenumb as its starting point, there is little to be gained by including frequencies of O(0.1) Hz. A possible exception, albeit tangential to this paper, is if the turbulence measurements come from a platform 420 that effectively measures horizontally. In such cases, FFT segments may be many minutes or more and thereby contain useful low-frequency information (e.g., Bluteau et al. 2011;Moum 2015).
-Define f l and f h separately for shear and temperature gradient if appropriate. Although we set them equal here, this is not necessary.
-Reasonable choices for N fft are N seg /2 or N seg /4, which correspond to three or seven half-overlapping 440 subsegments, respectively. There is little to be gained by diving a segment into even more subsegments so as to produce smoother spectra before fitting. As Ruddick et al. (2000) notes, the task is analogous to fitting a line to 20 points at once or first clumping them in groups of, say, five and then fitting the four averaged points.
First, we fit spectra over relatively low frequencies (1-5 Hz) that are unlikely to be affected by noise or vibration. Second, we reduce the data in a way that uses as little arithmetic as possible. Obviously, we cannot reverse-engineer the raw signals, but by making the onboard calculations simple we give ourselves the best chance to later fix or identify any unforeseen issues.

480
Although the onboard reduction eliminates possibilities in how we process turbulence data, it opens up possibilities in how we obtain turbulence data. By visualizing how turbulence evolves over successive dives in near-real-time, we can concentrate on regions of interest by adapting the dive schedule to profile more frequently or to different depths. If instead we encounter quiescent periods, we might consider profiling less frequently and thereby conserving battery life. Our ultimate objective is to treat FCS floats 485 as expendable.
Reshape raw voltage signals Convert each 1D signal to a 2D array (N blk , N seg ) Discard non-profiling data Use W min threshold (∝ min |∆V P (t i )|, Eq. 14, Appendix B) Record T and P voltage quantities for each block Despike shear voltages Apply 3σ threshold to V s1 and V s2 (Appendix E) Fit shear spectra over two ranges Fit Tt spectra over two ranges (Eq. 34) Calibrate averaged voltages T 1 and T 2 (Eq. 6), P (Eq. 12), and W (Eq. 13) Derive viscosity and thermal diffusivity Use measured T and P together with S from SOLO-II CTD Calculate the correction factors F Na (Eq. 21) and F Kr (Eq. 31) Correct initial estimates ε init → ε (Eq. 17) and χ init → χ (Eq. 30) Combine Calculate four 'initial' turbulent dissipation values For S 1 and S 2 , get ε init for f l -f m and f m -f h (Eq. 24) Calculate goodness of fit of spectra ε fit score (Eq. 26) and χ fit score (Eq. 36) Figure 11. Summary of the data reduction scheme. Each 512-element segment of data is ultimately compressed down to the 15 :: 12 highlighted quantities that are then transmitted. These are calibrated and/or converted into turbulence quantities in post-processing.
Voltage signals from shear probes and thermistors are a smoothed representation of the true environmental signal. If the smoothing is a spatial effect, it is described by a transfer function H 2 (k). If the smoothing is a temporal effect, it is more natural to use H 2 ( f ). We can use these interchangeably because f = W k and 490 therefore H 2 ( f ) = H 2 (W k). For FCS, there are three components to the transfer function for each sensor: where we have used the following shorthand: SP = shear probe, FT = fast thermistor, AA = anti-aliasing, and D = digital. We describe each of these in turn.

495
Shear probes built and calibrated by the Ocean Mixing Group are very close in dimension to those examined by Ninnis (1984) who measured their wavenumber response and represented it as where a 0 = 1.000, a 1 = −0.164, a 2 = −4.537, a 3 = 5.503, a 4 = −1.804, and k 0 = 170 cpm.
filter (two-pole Butterworth) with an f c = 40 Hz cut-off: After the analog signal is anti-aliased, it is digitized at 400 Hz. Before subsampling to the final 100 Hz output, the signal is digitally filtered. For the 2019 FCS cruise, the signal was convolved with a symmetric 29-element kernel in which the first 15 elements were 52,221,393,427,174,0,0,0,0,0,1970,5054,8202,10558,11433].
This is a sinc kernel but with negative values set to zero. (We are currently investigating better choices for future implementations). The filter has a half-power (−3 db) point at 25 Hz.
Appendix B: Identifying the start and end of a profile 520 Early in our processing routine, we partition the raw voltage signals into 512-element segments. In order to discard the segments in which FCS was not profiling, we need robust (yet simple) criteria that demarcate the start and end of a profile. For the start, we search for the first three consecutive segments in which W min > 0.05 m s −1 . For the end, we swap the inequality.
A drawback of this approach is the appearance of a quantity in physical units (0.05 m s −1 ). This is the 525 one instance where we hard code a calibration coefficient in the onboard software, rather than apply it in post-processing. Fortunately, the relevant coefficient can be approximated as constant: C 2P = 76.7 psi V −1 (barring a redesign of the circuitry or the use of a different brand or model of pressure sensor). For the two units already built, C 2P = 76.81 psi V −1 and 76.53 psi V −1 . By comparison, among the four shear probes on the two units, the calibration coefficients vary by 30%.

530
At least for the initial implementation of our scheme, we do not include an algorithm to detect the surface to within centimeters. Doing so would let us work backward to put our uppermost depth bin as close to the surface as possible. However, we expect that this could be a fragile part of the scheme. Further, FCS lacks a micro-conductivity sensor, which is likely the sensor best suited for identifying the air-sea interface (e.g., Ward et al. 2014).
Without surface detection, the depths of the uppermost bins will be realized randomly. In the worst cases, we would discard the top ∼1 m (5 s at ∼0.2 m s −1 ). To alleviate this, we may use half-overlapping bins near the surface. The exact implementation will be determined later in the development.
Appendix C: Least-squares fitting of power laws In this paper, we use power law fits to derive turbulence quantities: Ψ s = A ε f 1/3 and Ψ Tt = A χ f 1 , where 540 A ε and A χ are substitutes for the expressions in Eqs. 23 and 33. With only a single parameter for each fit, implementing a least-squares fit is easy.
Assume we are fitting the vector Ψ i to the function A f n i where n is either 1/3 or 1. The sum of squared residuals is therefore The minimum with respect to A is where the derivative is zero: Hence, (C3) We had originally intended to find A by following Becherer and Moum (2017), who were fitting f 1/3 550 spectra. Their simpler method, A = ∑(Ψi/ f n i ), is equivalent to a least-squares fit except that the quantity minimized is the sum of the squares of the adjusted residuals, where adjusted means divided by f n .
Differences can be ignored when n = 1/3, but not when n = 1.

Appendix D: Standard processing of FCS turbulence measurements
The standard processing of FCS turbulence data differs from the reduced scheme in three ways. First, 555 raw data are despiked differently (Appendix E). Second, the 100 Hz raw voltage signals are calibrated into physical quantities right away. Hence, means and spectra are calculated in physical units, not voltage units. Third, the integration of spectra occurs over a variable wavenumber band, which is found iteratively.
When integrating shear spectra (after correction; Appendix A) to find ε, we follow the approach used for the Chameleon profiler (Moum et al. 1995). A first estimate of ε is made by integrating over k = 4-560 10 cpm. This value provides a first estimate of the Kolmogorov wavenumber k s = (ε/ν 3 ) 1/4 /2π. (The lower limit for Chameleon is 2 cpm, but we increase this for FCS given its slower profiling speed and hence the possibility of contamination by waves at lower wavenumbers.) The upper integral limit is then set to 0.5k s (with a minimum of 10 cpm and a maximum of 45 cpm). The Nasmyth spectra (Eq. (15)) is integrated over the same wavenumber range. If the measured and Nasmyth integrals are within 1%, then ε 565 is set equal to the integral of the Nasmyth spectrum over all k. Otherwise, ε and k s are adjusted iteratively until the two integrals agree.
A similar approach is used for integrating T t spectra to find χ. The model spectrum is the Kraichnan spectrum (Eq. (28)) and, again, the lower limit of integration is 4 cpm. The upper limit is the Batchelor wavenumber k b = (ε/νD 2 T ) 1/4 /2π (with a maximum defined by kW = 15 Hz).

570
Appendix E: Identifying and removing noise and spikes in the shear signals To properly despike the raw output of a shear probe requires several steps. Lueck et al. (2018) describe a process in which the signal is high-passed, then rectified, and then low-passed to derive a measure of the local variance. A value is defined as a spike if it is more than eight times (or similar threshold) above the local variance. Spikes are replaced with an average based on surrounding points. This process is then 575 repeated on the new signal, and so on until no spikes are identified.
In our standard processing of FCS data, we use the Lueck et al. (2018) despiking routine. For our data reduction scheme, we use an approach that is easier to implement and quicker to compute, albeit less precise. For each 512-element segment of data, a spike is defined as any data point larger than three standard deviations from the mean. These spikes are replaced by the mean of remaining values in the 580 segment.