Surface circulation properties in the Eastern Mediterranean emphasized using machine learning methods

. The Eastern Mediterranean surface circulation is highly energetic, composed of structures interacting stochastically. However, some main features are still debated, and the behavior of some fine-scale dynamics and their role in shaping the general circulation is yet unknown. In the following paper, we use an unsupervised neural network clustering method to analyze the long-term variability of the different mesoscale structures. We decompose 26 years of altimetric data into clusters reflecting 5 different circulation patterns of weak and strong flows with either strain or vortex-dominated velocities. The vortex-dominated cluster is more persistent in the western part of the basin, which is more active than the eastern part due to the strong flow along the coast, interacting with the extended bathymetry and engendering continuous instabilities. The cluster that reflects a weak flow dominated the middle of the basin, including the Mid-Mediterranean Jet (MMJ) pathway. However, the temporal analysis shows a frequent and intermittent occurrence of a strong flow in the middle of the basin, which could explain the previous 10 contradictory assessment of MMJ existence using in-situ observations. Moreover, we prove that the Levantine Sea is becoming more and more energetic as the activity of the main mesoscale features is showing a positive trend

high-resolution numerical models with radars observations (Ren et al., 2020), or radar with ADCP dataset, such as in the West Florida Shelf (Liu et al., 2007), and the Long Island Sound tidal estuary (Mau et al., 2007). In the Mediterranean, SOM was applied in the Adriatic Sea using the HF radar measurements (Mihanović et al., 2011), and in the Sicily Channel, using 46 years of a high-resolution model (Jouini et al., 2016). This approach allowed to decompose the surface circulation in the Sicily 60 Channel into modes reflecting the variability of the circulation in space and time at seasonal and inter-annual scales. SOM was also able to provide a prediction of the surface current in the shallow coastal area (Kalinić et al., 2017) and to identify phytoplankton functional types in the Mediterranean Sea using a bioregionlization approach (El Hourany et al., 2019;Basterretxea et al., 2018).
In the light of the major gaps in characterizing the surface currents of the Levantine Sea and previous contradictory assess- The daily geostrophic surface velocity fields between 1993 and 2018, from the Herodotus abyssal plain and until the easternmost part of the Levantine sea, form the input layer of the SOM. In addition to the zonal and meridional components of the geostrophic velocities, the fluid parameter of Okubo-Weiss (OW ) is included in the input layer (see fig. 1B). OW measures 90 the relative importance of deformation and rotation at a given point. Positive OW values indicate strain-dominated regions, while negative OW indicates vortex-dominated. Accordingly, OW is a physical criterion widely used in the methods of eddies detection.
OW = s 2 n + s 2 s − w 2 , where s n and s s are the normal and the shear components of strain and the relative vorticity of the flow defined respectively by (1)

The Self-Organizing Map (SOM)
SOM is an unsupervised neural network method used for data visualization. It projects higher dimensional data into lower dimensional space leveraging topological similarity properties. By this method, multidimensional data are clustered into neurons automatically associated in orderly organisation, where similar neurons are adjacent, and the less similar neurons are situated 100 far from each other in the grid. This way allows obtaining an insight into the topographic relationships of the initial dataset (Kohonen, 2013).
The SOM is structured in two layers: the input layer (in our case, a 3-D input layer composed of the zonal and meridional components and the Okubo-Weiss parameter) and the resulting neuron grid. Each neuron, representing a cluster with data presenting common characteristics, is associated with a referent vector obtained from a learning data set. Each vector of the 105 input layer will be attributed to the neuron with the closest Euclidean distance with the referent vector. This referent vector is called the best matching unit (BMU), and its associated neuron is the "called" winning neuron. The determination of the referent vectors and the topological order of the SOM maps is done by minimizing the cost function where c ∈ SOM represents the neuron index in the SOM, X (z i ) represents the allocation function that assigns each element 110 z i of the input D to the corresponding referent vector w X (zi) . δ(c, X (z i )) represents the discrete distance on the SOM between a neuron c and the neuron allocated to observation z i . K T is a kernel function parameterized by T that weights the discrete distance on the map and decreases during the minimization process. During the minimization of the cost function, the topological order is preserved, thus the more similar neurons are adjacent and the less similar neurons are situated far from each other.
To provide an equal weights distribution of the input parameters, the variables were normalized with their variances. Several 115 tests were conducted to determine the optimal size of the SOM map giving the best representation of the data. These tests were based on the capacity of each map to reproduce with the less error possible the initial dataspace of our input data. Based on that, we opted for a large SOM map of 1400 neurons.

HAC method
The SOM allowed classifying the velocity field into neurons that represent the different circulation patterns of the targeted grid, based on U, V, and OW . To simplify the representation of the physical processes obtained from the different situations captured 125 by each neuron , we applied the HAC to group these neurons into a reduced number of clusters. HAC is a cluster analysis that seeks to build a bottom-up hierarchy of clusters. From the initial partition containing the neuron groups of the SOM map, two neurons of the same neighborhood were clustered at each iteration. The used criterion was Ward's minimum variance method, which provides a partition that minimizes the within-cluster inertia ( Menna et al., 2012;Mauri et al., 2019). Similar to MME, ShE is an area where previously formed eddies tend 145 to accumulate and/or merge (Hamad et al., 2005). Another important mesoscale feature existing in the eastern part is Cyprus eddy (CE) (see fig. 1). It is an intense dynamic feature occurring in the open sea. Unlike the MME and ShE, eddies are formed in this area and do not accumulate (Zodiatis et al., 2005). We should note that other mesoscale structures exist in the eastern Levantine but are less frequently observed. Among these, we mention the "Lattakia Eddy" (LE) taking place between Cyprus and Syria. LE is a cyclonic eddy generated by the interaction of the northward current along the Lebanese and Syrian coasts 150 with a Mid-Mediterranean jet (Zodiatis et al., 2003), and/or between ShE and the coastline (Hamad et al., 2005), and/or by the topography (Gerin et al., 2009).

Definition of main mesoscale features regions
The eddies usually reveal elevations (anticyclones) or depressions of the sea surface. Accordingly, and after decomposing the Levantine surface circulation (e.g. 2G), we delimited the main mesoscale eddies areas by using an approach similar to that

Results and Discussion
In this section, we present the results of decomposing the surface circulation of the Levantine basin into a daily time-series of five clusters obtained by the HAC and SOM methods.

Temporal variation 170
The frequency variation of the five clusters in each of the selected boxes, Bei, Shik, MME, AMC, Nile, and CE, is seen in fig. 4. This frequency variation reflects the percentage of pixels assigned to each of the five clusters in a designated box. As a result, except for C4 in the AMC, all the clusters permanently occurred, with different proportions highly variable with time in each box. Moreover, clusters frequency significantly varies from one box to another. Although clusters of strain-dominated flow (C1 and C2) were not frequent everywhere, C1 and C2 were frequently observed 175 in MME and AMC, respectively, during all the period. Such a high frequency occurred at the expense of other clusters, especially the cluster of weak flow C3 that was less observed in these two boxes. Regarding the vortex-dominated clusters (C4 and C5), C5 was the most frequent. C4 was quasi-absent in AMC and scarcely existed in the other boxes. When comparing between boxes, C5 was most frequent in the MME. Overall, all the clusters occurrences were highly fluctuating with time. To C3 was the main cluster in CE, Nile, Bei, and Shik boxes. Moreover, between 1993 and 2000, C3 was almost exclusively dominating CE. On the other hand, C3 dominance was rare or quasi-nonexistent in the MME and AMC, where instead, C1 and C2 were, respectively, the most frequent clusters. While C2 dominance was not observed in MME, an increasing periodic C1 domination was observed in AMC, starting from 2000. C5 was frequently dominating all the boxes, but intermittently. This dominance was rare in the CE, Nile, and AMC, compared to the MME, Bei, and Shik. Indeed, in these last three boxes, C5 These results showed that MME and AMC are two zones of a special regime of flow. This latter is represented by clusters of intense current, the so-called C1, and C2. The other boxes are zones of relatively weaker currents. In all the boxes, there were sporadic events of intense eddy activity, exhibited by the intermittent periods of C5 dominance. The daily mean kinetic energy of the mean flow per unit of mass (MKE) computed in each box (see fig. A1 in appendix) shows that the lowest MKE values were observed in the boxes where C3 dominated the most (CE, Nile, Shik, and Bei and 1997, C5 was rare in Bei before being more frequently observed as a dominant cluster. Figure 6 shows the seasonal variation of the C1, C3, and C5 averages frequencies in all the boxes between 1993 and 2018. The general trend of C5 frequency was increasing with time. The most intense C5 positive tendency was in MME, where C5 increased by 10 % in 26 years. In all the seasons, the C5 frequency average in MME increased from 25 % in 1993 to 35 % in 2018. There was a similar increase in all

Spatial analysis 225
The spatial variation of clusters' frequencies is shown in fig. 7. The intensity of the along-slope coastal flow showed a spatial variability. Indeed, the high kinetic energy clusters, C1 and C2, frequently occurred off the Libyo-Egyptian coasts and between Turkey and Cyprus, respectively, while the weak current cluster, C3, dominated the easternmost part of the basin. The C5 predominance is mostly situated in the Western part, revealing that the mesoscale activity is more intense in that area. No clear jet was observed in the middle of the basin, including the MMJ pathway, where C1 dominance was disconnected by high C3 230 occupancy, more specifically between 30 and 32 • E (see C1 in fig. 7).

Mid-Mediterranean jet (MMJ)
To further investigate the time evolution of the potentially existing MMJ, we present in fig. 8A a Hovmoeller diagram (along longitude 31.5625 • E) that shows the temporal variation of the clusters across its potential path. The longitude selection

Vortex-dominated cluster analysis
Both of C4 and C5 are vortex-dominated clusters. However, the previous results showed that C4 is a peripheral cluster scarcely 250 observed, dominating only very few pixels close to the coast. C5 was the main cluster that mainly reflected the eddies' presence.
Here we present a more detailed analysis of the C5 evolution that reveals eddies activity in the Levantine sea. Figure 9A shows the spatial distribution of C5 whose occurrence exceeded 40 % (C5P). The highest persistent C5P number was at the borders of the Herodotus plain. In addition, another group of C5P occurred in areas of the extended continental shelf, more precisely the shelf located offshore Egypt and between Cyprus and Turkey in the northern part of the basin. A 255 small number of C5P followed the bathymetric iso-bath of 1000 m in the western part of the basin. On the other hand, C5P was absent in the eastern part of the Levantine. In the panel B, we present the variation of C5P numbers regarding their distances with the closest main bathymetric structures (iso-lines of 1000, 2000, and 3000 m). More than 80 % of C5P were located at a distance less than 60 km from these main features. In the zones of extended continental shelf, such as offshore the Libyo-Egyptian coasts, there is a strong flow dominated by C2 and C1. Previous studies have shown that a current becomes unstable 260 when wider than the bathymetry. Thus it favors the eddies formation conditions (Wolfe and Cenedese, 2006). The C5P absence off the Lebanese coasts is explained by the tight continental shelf almost absent. Indeed, the weak current dominated by C3 near the coastline is not strong enough to permanently create instabilities. The group of C5P observed in MME is due to the Herodotus abyssal plain impact. The vertically extended eddies pinch off from the coast and propagate to the east before being trapped, thus accumulating eddies (Alhammoud et al., 2005;Elsharkawy et al., 2017). These results show that the main 265 bathymetric features could potentially influence eddies creation and persistence in the Levantine basin. Figure 10. Panel A shows the average velocity field obtained before (blue) and after assimilating (red) the drifters' trajectories represented by the grey lines, circulating in CE from the start of March until late July. Panel B represents the percentage of pixels assigned to the clusters from 1 to 5 before (dark bars) and after assimilation (colored bars).
Llull et al., 2021), will improve the decomposition of the surface circulation by our method, without changing the main conclusions.

290
In this study, we analyzed the surface circulation of the Levantine basin using the SOM + HAC method that allows decomposing 26 years data set of surface geostrophic velocities into five clusters representing the different surface current flowing types.
By tracking the clusters variability, we showed that the surface circulation is complex and divided into several energetic boxes. We highlighted the increasing mesoscale activity in the basin, where these eddy-rich boxes are showing a positive trend with time. The cluster of weak flow is being progressively substituted by those of higher kinetic energy and vorticity, and 295 thus the Levantine Sea is becoming more and more energetic. We were able to show the sporadic occurrence of the MMJ, which could explain the contradictory statements about the MMJ's existence. We highlighted the crucial role of bathymetry and the coastal flow intensity in increasing instabilities and eddies formation in the Levantine sea. Accordingly, the most persistent eddies occurred in areas characterized by a strong coastal flow and an extended continental shelf or around Herodotus Abyssal plain, explaining thus the disproportions of eddies frequencies and persistence in the Levantine. It is a promising 300 method that will undoubtedly benefit from more accurate and higher resolution of future altimetric missions. Also, it could be associated with other parameters such as Sea Surface Temperature and chlorophyll to study the interactions between the physical and biogeochemical water properties. Further work should expand the studied area to the entire Mediterranean to investigate whether these increasing trends are only observed in the eastern Levantine or extend to a larger scale. Figure A2. The seasonal variation of the C2 and C4 average in each box and their resulting linear regression.