High-frequency radar, HFR, is a cost-effective monitoring technique that allows us to obtain high-resolution continuous surface currents, providing new insights for understanding small-scale transport processes in the coastal ocean. In the last years, the use of Lagrangian metrics to study mixing and transport properties has been growing in importance. A common condition among all the Lagrangian techniques is that complete spatial and temporal velocity data are required to compute trajectories of virtual particles in the flow. However, hardware or software failures in the HFR system can compromise the availability of data, resulting in incomplete spatial coverage fields or periods without data. In this regard, several methods have been widely used to fill spatiotemporal gaps in HFR measurements. Despite the growing relevance of these systems there are still many open questions concerning the reliability of gap-filling methods for the Lagrangian assessment of coastal ocean dynamics. In this paper, we first develop a new methodology to reconstruct HFR velocity fields based on self-organizing maps (SOMs). Then, a comparative analysis of this method with other available gap-filling techniques is performed, i.e., open-boundary modal analysis (OMA) and data interpolating empirical orthogonal functions (DINEOFs). The performance of each approach is quantified in the Lagrangian frame through the computation of finite-size Lyapunov exponents, Lagrangian coherent structures and residence times. We determine the limit of applicability of each method regarding four experiments based on the typical temporal and spatial gap distributions observed in HFR systems unveiled by a K-means clustering analysis. Our results show that even when a large number of data are missing, the Lagrangian diagnoses still give an accurate description of oceanic transport properties.

Knowledge of the spatial and temporal complexity of coastal processes is one
of the major challenges in oceanography. Understanding coastal
hydrodynamics is crucial for quantifying the contribution of coastal
margins to the world ocean's biological productivity

HFR systems were originally developed in

In particular, HFR observations are crucial for applications associated
with transport processes, not only addressing the study of marine ecosystems
but also for a wide range of coastal activities. These include search and
rescue operations

The Lagrangian approach addresses the effects of the velocity field on
transported substances, which is of utmost relevance for studying transport
processes. This approach has the advantage of exploiting both the spatial and
temporal variability of a given velocity field. They can even unveil sub-grid
filaments generated by chaotic stirring, providing a more complete
description of transport phenomena. In particular, the concept of Lagrangian
coherent structure (LCS; see the review by

Lagrangian diagnoses require complete spatial and temporal velocity data to compute trajectories of synthetic particles. However, HFR can fail due to external circumstances, i.e., environmental factors, hardware or software malfunction, power failure and communication disruptions at radar stations, leading to incomplete measurements in the form of data gaps both in space and time. In general, the occurrence of gaps in the data follows a pattern that can be associated with a particular cause, for instance the instrumentation malfunction, sea state, signal interference or antenna configuration, facilitating their interpretation and simulation.

In this regard, different methodologies have been developed in the last years
with the aim of filling velocity fields derived from HFR measurements. The
most widely extended technique is open-boundary modal analysis (OMA)

Here we introduce a new method based on unsupervised neural networks, self-organizing maps (SOMs). Then, we perform an intercomparison of this new methodology with OMA and DINEOF. We analyze the effect of these gap-filling methods on the Lagrangian computations derived from reconstructed HFR velocities in the SE region of the Bay of Biscay (BoB). Their robustness under different scenarios of data gaps is discussed.

The paper is organized as follows. After this introduction, we first describe
the data. Gap-filling methods used in the comparison, including the new
method developed here, are described in Sect.

Two long-range CODAR Ocean Sensor SeaSonde HFR sites have been operational in the
SE BoB since 2009. Both antennas emit at a 40 kHz bandwidth centered at a 4.5 MHz frequency and they are part of the Basque in situ operational
oceanography observational network owned by the Directorate of Emergency
Attention and Meteorology of the Basque government's security department.
The sites are located at Cape Higer and Cape Matxitxako (Fig.

Map showing the location of the two antennas (Matxitxako and Higer) of the BoB HFR systems. The grid points of the total HFR velocity field are plotted in blue.

Some methods, i.e., open-boundary modal analysis (OMA) and data interpolating empirical orthogonal functions (DINEOFs), are nowadays widely used to fill spatiotemporal gaps in HFR measurements. We briefly explain these two methods below.

Open modal analysis (OMA,

The data interpolating EOF (DINEOF) is an iterative methodology
used to interpolate gaps or missing data in geophysical datasets

DINEOF was introduced by

Here a new methodology has been developed in order to reconstruct HF radar
velocity fields in the statistical framework of the self-organizing map
(SOM) analysis. SOM is a powerful visualization technique based on an
unsupervised learning neural network, which is especially suited to extract
patterns in large datasets

For typical remote sensing imagery, SOM can be applied to both the space and time domain. Here, since we are interested in the reconstruction of HF currents, we have addressed the analysis in the spatial domain. In this case the input row vector has been built using the radial velocity maps at each time, so each neuron corresponds to a characteristic radial velocity spatial pattern over the coverage area of the HFR. Since each step iteration has an associated time and location of the sample, we can obtain the time of a particular spatial pattern computing the BMU for each time, providing a time series of the corresponding spatial pattern.

From these ideas, we can deduce the following simple algorithm for
reconstructing missing values in the HFR velocity field from the available
HFR data.

The ability of this method relies on the precision of identifying the proper
BMU that accurately describes the missing dynamics. We try to optimize the
algorithm by using as an input vector a concatenation of three maps of HFR
velocities on three different dates. Thus we force the method to distinguish
between three maps instead of one, avoiding the selection of a bad BMU, in
particular when the HFR velocity map has a large number of missing points at
time

The initialization, training process and final output of the SOM algorithm have
to be tuned in order to optimize the results and the computational cost by
selecting particular control parameters. For instance, the optimal size of
the neural network (number of neurons) depends on the number of samples and
the complexity of the patterns to be analyzed. We choose the map size as
[

Compared to conventional statistical methods like EOF and K-means,
SOM is able to introduce nonlinear correlations and it does not require any
particular functional relationship or distribution assumptions about the
data, i.e., distribution normality or equality of the variance

Example of the four prototypes of KMA groups of gap distribution
scenarios obtained from the KMA analysis applied to the HFR availability–absence
matrix for 2014. Panel

The most common real scenarios for spatial gaps in HFR data are mainly
represented by individual antenna failures, range and/or bearing reduction.
The radio signal emitted by an HFR travels along and back through the ocean
surface due to the conductivity of the ocean, and the current velocity is
measured based on the Bragg scattering phenomena

In order to characterize the most typical and realistic gap types observed in
the Basque HFR system, a K-means classification algorithm

Since the goal of this work is to evaluate different gap-filling
methodologies in real situations, the different groups representing observed
gap types are used to introduce artificial gaps in April 2012, April 2013 and
April 2014. Randomly generated spatial gaps are also included to
cover different failures that are not related to the typical situations but
could occur. For all the cases data gaps are introduced in 50 % of the time
series. The proposed gap scenarios are the following: (A) bearing gaps generated by
randomly distributing 10 of the elements of KMA group 6 (Fig.

These gap scenarios (from now on referred to as experiments A, B, C and D) are used to test the SOM, DINEOF and OMA gap-filling methodologies. For OMA, total maps are generated using OMA directly on radial data. For DINEOF and SOM, hourly radials are gap-filled first and then totals are generated using the same least mean square algorithm (spatial interpolation radius of 10 km) used to build the reference data series (i.e., from the reference radial files with no gaps).

Percentage of spatial coverage for each time step and experiment.
Higer data are shown in the upper panels and Matxitxako data in the lower
panels in all cases. Black circles represent the percentage of good data for
the original fields. Red dots denote the percentage of good data for each
experiment. The

Maps of temporal coverage (percent of the total number of hours in
the 3-month period analyzed) for Higer

In this study we compute different Lagrangian quantities using the HFR
velocity field filled by the three different methods explained in Sect.

We use conventional statistical metrics to measure differences between
gap-filled and observation data to quantify methodology skill: absolute
relative error (ARE), mean bias (MB) and root mean squared (RMS) error.
By denoting a set of reference observational values as

Scatterplot of the SOM

Scatterplot of the SOM model

Scatterplots of zonal (

Root mean square (RMS) error , mean bias (MB) and correlation
coefficient (

Root mean square (RMS) error, mean bias (MB) and correlation
coefficient (

Synthetic trajectories are computed by advecting particles in the original HFR
velocity field (reference trajectories) and also in the OMA, SOM and DINEOF
gap-filled currents. A total of 868 particles are initially uniformly
distributed over the HF radar grid with an initial distance between particles
of 5 km. Lagrangian particles are released every hour from 2 to
26 April 2012, 2013 and 2014 and advected during 72 h.
Trajectories are computed using a fourth-order Runge–Kutta integration scheme
and a bilinear interpolation of the gridded velocity field in space and
linear in time. The mean distance of separation (

Separation distance between particle trajectories advected by the
filled HFR velocity field and by the reference velocity field as a function
of time and averaged over all the pairs of trajectories. Different limits in
the vertical axis of the figure have been used for a better data
representation. The error bar is the confidence interval (1 standard
deviation) of the spatial average of

The spatial distribution of separation distances between real and simulated
trajectories has also been analyzed in order to detect any anisotropy in the
differences between them (Fig.

Furthermore, we analyze the effect of the different gap-filling methodologies
on the FSLE computations. FSLE was originally introduced in dynamical
system theory to characterize the growth of non-infinitesimal perturbations
in turbulence

Time average of the separation distance between trajectories computed from the filled HFR using the three methodologies and reference HFR velocities initiated in the same pixel.

Comparison of scale-dependent FSLE curves computed from the
different filled HFR velocities averaged over 240 virtual particle-pair
deployments homogeneously distributed through April 2012, 2013 and 2014.

The contribution of the scales captured by HFR in ocean surface
dynamics can be analyzed by computing particle dispersion at different
spatial scales,

We evaluate how the gap-filling methodologies impact the dynamical scales
captured by the HFR. Figure

Comparing the FSLE slopes obtained from the three gap-filling methodologies
with the REF HFR velocity field we find that the SOM methodology yields the most
similar values, with slopes of

Compared with other observational studies using HFR the

In general the

Next we analyze the effect of the reconstructed velocities on the LCS. FSLE
is used to obtain the LCS by computing the minimum growth time of
particle-pair separations from

First we compare some snapshots of the LCS derived from the three
reconstructed HFRs in order to see differences in the dynamical structures
(Fig.

The time evolution of the spatial average over the HFR domain of the absolute
relative error (ARE; Sect. 5) of the attracting LCS computed
from the reconstructed velocity fields with respect to the REF-LCS is plotted
in Fig.

Snapshots of attracting LCS computed from the three filled and the
reference HFR currents for experiment A

Time series of the spatial average of the pixel-by-pixel absolute relative error of the LCS computed from the three filled HFR velocities with respect to LCS obtained from the reference velocity field.

.

Further analysis is conducted by performing a regional characterization of the
impact of the gap-filling methods on the LCS looking at the spatial
distribution of the time-averaged values of ARE (Fig.

Finally, we quantify the difference between the LCS computed from the filled
velocities and the REF-LCS by computing the spatial and temporal mean of the
relative error defined in Eqs. (

Maps of the time average of the pixel-by-pixel absolute relative error of the LCS computed from the three filled velocity field with respect to LCS obtained from the reference currents. White denotes points at which LCSs have not been identified.

Our results show that even when a large number of pixels is missing the FSLEs still give an accurate picture of oceanic transport properties, and the velocity field filled by the three methods analyzed does not introduce artifacts in the Lagrangian computations.

Another Lagrangian quantity suitable to describe transport process is
residence time (RT)

Mean absolute relative error (ARE) and mean bias (MB) of the LCS obtained using reconstructed HFR velocities from the three methodologies with respect to the REF-LCS for the four experiments.

As done for the LCS, a first comparison is performed looking at the spatial
distribution of RT obtained from the three different filled HFR datasets. In
general, RT maps from the three methodologies are similar to the REF-RT (maps
of RT obtained from the reference HFR velocity field). As an example,
Fig.

To further analyze the periods when the gap-filling methods introduce more
errors in the RT computations we plot in Fig.

Snapshots of RT computed from the three filled velocity fields and the reference HFR currents for experiments A and C corresponding to 4 April 2013, 18:00 UTC.

Time series of the spatial average of the pixel-by-pixel absolute relative error of the RT computed with the reconstructed velocities with respect to the REF-RT.

Maps of the time average of the pixel-by-pixel absolute relative error of the RT computed from the three filled velocities with respect to RT obtained from the reference velocity field.

In order to reveal regions where the effect of the reconstruction on the RT
is higher we compute the time-averaged relative error for all experiments
(Fig.

To finish this section, we summarize in Table

We have investigated the performance of HFR gap-filling methodologies by studying the reliability of some basic Lagrangian metrics for the assessment of coastal dynamical properties when they are computed using reconstructed velocity fields. A sensitivity test has been carried out through four experiments including the most common scenarios of data gaps observed in HFR systems, i.e., hardware failures, communication interruptions, particular environmental conditions affecting the detection of the signal and sea state.

Mean absolute relative error (ARE) and mean bias (MB) of the RT obtained using reconstructed HFR velocities from the three methodologies with respect to the REF-RT for all the experiments. A total of 156 240 RT values are used in the computations.

In contrast to other comparative studies of different gap-filling
methodologies based on Eulerian differences

The four experiments based on different groupings of missing values
demonstrate that even for spatially severe and persistent gaps, the
Lagrangian diagnoses obtained from FSLE fields and RT are robust
representations of the surface dynamics in coastal basins. Our results show
that even when a large number of pixels is missing the FSLE computations
still give an accurate picture of oceanic transport properties. The
velocity fields filled by the three methods analyzed do not introduce
artifacts in the Lagrangian computations. The robustness of the Lagrangian
diagnoses against errors in the velocity data could explain the low relative
error in the FSLE and LCS computations from the reconstructed velocities

While DINEOF presents the lowest errors in the Eulerian frame, SOM is the method with the lowest errors in the trajectory, LCS and RT computations. The DINEOF and SOM reconstruction methods are based on patterns of the velocity field extracted from statistical concepts. These methods strongly depend on the number of modes and/or neurons used in the reconstruction. The greater the number of modes and/or neurons, the less smoothed the pattern will be. However, even when using a large neural network or number of modes, these methods are prone to filter the velocity field, removing some small-scale dynamical features, and in some cases the resulting patterns are a smoothed representation of the real dynamics. DINEOF and SOM also depend on the choice of the modes or patterns. This could explain the large number of points out of the main cloud in the scatterplot shown in Fig. 5 for DINEOF in experiment C. These outliers can propagate the error in the computations of trajectories, explaining the difference between the Eulerian and Lagrangian comparison of the DINEOF and SOM methods. In the SOM methodology the fact that each neuron is composed of velocity fields at three different times could increase the probability of using a more suitable pattern for the reconstruction, in particular in the cases in which there is a large number of missing points (i.e., experiment C, failure of antenna).

On the other hand, OMA is a geometrical approach that uses a combination of
irrotational and incompressible field configurations and the spatial shape of
the HFR domain to decompose the HFR velocity field. This technique also
depends on the choice of the modes and the number of modes. In general, since
only a finite number of modes is computed, an arbitrariness is introduced in
the selection of the modes; the real velocity is projected into a
subspace of modes that are either tangent to the coastline or to the open
boundary. Using a small number of modes could affect the results of the
reconstruction, obtaining a velocity field far away from the HFR data but with
very simple features. In situations in which the flow patterns are simple and
quasi-permanent, the use of a few modes is able to capture these main
features. In contrast, if a large number of modes is used, the reconstructed
field matches the HFR data, but the data are not sufficiently filtered and
smoothed, introducing some dynamical features or even artifacts, which may
not exist in the real flow. In our case, the coastline in the area of the HFR
coverage is relatively large, originating an increase in the number of modes
tangent to the coast to the detriment of the OMA open-boundary modes and
therefore conditioning the resulting inferred velocity field. As explained
in Sect.

In general, regions less affected by missing points are located in the middle of the HFR domain. The large coverage of data can explain this spatial distribution. Moreover, large errors are concentrated in the western part of the HFR domain, likely induced by the high variability of the flow in this region, increasing the effect of velocity error on the trajectories. The low number of gaps in the central part of the HFR domain in experiment B and the separation of the missing points randomly distributed in experiment D can explain the similarity in the Lagrangian diagnosis obtained in these experiments for the three gap-filling methodologies. Experiment C, simulating the failure of one of the two antennas, is the worst-case gap scenario, obtaining the largest errors in all the Lagrangian computations according to the errors found in the Eulerian comparisons.

The time period selected (April) represents the dynamical conditions during
springtime. This is a transition period between the winter (when the
persistent Iberian Poleward Current dominates) and summer when low energy
and stable conditions are observed

Concerning the temporal distribution of gaps, we have introduced gaps in 50 % of the time period, which is much higher than the 10 %–20 % failures accepted in a well-functioning system. Nevertheless, this study is more focused on the spatial gaps and further analysis should be done to analyze the limit of applicability of each method regarding the temporal horizon.

All the results reported in this study cannot be extended to HFR systems working at different frequencies and resolutions. HFR working at different frequencies captures dynamical features at different scales that could be altered during the reconstruction process. Further studies using data from different HFR systems and regions characterized by different dynamical conditions should be performed to address this question. For instance, some SOM and DINEOF modes used in the reconstruction are smoothed representations of the real dynamics and they could remove some small-scale dynamical processes. However, these results are illustrative of the high performance of the gap-filling methodologies in providing reliable HFR velocities for a Lagrangian assessment of ocean coastal dynamics.

Contrary to SOM and, to a lesser extent, to DINEOF, one of the main advantages of using OMA is that this method does not need long time series of data and therefore it can be used immediately after installing the HFR system. Also, OMA allows for the reconstruction of total velocities in areas of large GDOP error, which also represents an advantage since it enables a larger spatial coverage. DINEOF also has the advantage of not requiring long training datasets and of not having subjective parameters. On the other hand, SOM is able to introduce nonlinear correlations in the computation of the patterns.

These experiments also show that the developed SOM methodology is suitable to properly encode the dynamical patterns present in turbulent flows. This is a very promising result and opens up new possibilities for applying this methodology to the inference of HFR currents in periods when data are not available. Although the performance of the SOM and DINEOF methods is high, it is worth noting that in this study we have used a simple algorithm based only on the HFR dataset, which will be improved in future works performing analysis on HFR velocities coupled with other oceanic variables. Moreover, a more rigorous sensitive analysis is required to know the temporal coverage needed to obtain reliable reconstructed velocity fields and to optimize the algorithm in terms of errors and computational cost.

A good approximation to obtain an optimal reconstruction of the velocity field could be the combination of these methodologies. For instance, first filling the gaps of the radial velocities through SOM or DINEOF and then reconstructing the total velocities using OMA would allow us to obtain a wider coverage without removing small-scale features.

Emergency Attention and
Meteorology of the Basque Government and AZTI and can be downloaded from

The supplement related to this article is available online at:

IHC conceived the idea of the study with the support of AO, AR and LS; IHC developed the SOM methodology with the support of AO; IHC produced the SOM HFR velocities and conduced all the Lagrangian calculations and comparisons; AO performed the Eulerian comparison; LS and AR developed the method for grouping the gap scenarios, introduced the gaps in the HFR data and provided the OMA HFR velocities; GE provided the DINEOF HFR velocities; IHC wrote the paper with contributions made by LS, AO, AR and GE.

The authors declare that they have no conflict of interest.

This article is part of the special issue “Coastal marine infrastructure in support of monitoring, science, and policy strategies”. It is not associated with a conference.

This study has been supported by the JERICO-NEXT project funded by the European Union's Horizon 2020 research and innovation program under grant agreement no. 654410. Ismael Hernández-Carrasco acknowledges the Juan de la Cierva contract funded by the Spanish government. The work of Anna Rubio was partially supported by the LIFE-LEMA project (LIFE15 ENV/ES/000252), the Directorate of Emergency Attention and Meteorology of the Basque Government and the Department of Environment, Regional Planning, Agriculture and Fisheries of the Basque Government (Marco Program). Edited by: Stefania Sparnocchia Reviewed by: two anonymous referees