These authors contributed equally to this work.
In this work, we explore the performance of a statistical forecasting system for marine-litter concentration in the Mediterranean Sea. In particular, we assess the potential skills of a system based on the analogues method. The system uses a historical database of marine-litter concentration simulated by a high-resolution realistic model and is trained to identify meteorological situations in the past that are similar to the forecasted ones. Then, the corresponding marine-litter concentrations of the past analogue days are used to construct the marine-litter concentration forecast. Due to the scarcity of observations, the forecasting system has been validated against a synthetic reality (i.e., the outputs from a marine-litter-modeling system). Different approaches have been tested to refine the system, and the results show that using integral definitions for the similarity function, based on the history of the meteorological situation, improves the system performance. We also find that the system accuracy depends on the domain of application being better for larger regions. Also, the method performs well in capturing the spatial patterns but performs worse in capturing the temporal variability, especially the extreme values. Despite the inherent limitations of using a synthetic reality to validate the system, the results are promising, and the approach has potential to become a suitable cost-effective forecasting method for marine-litter concentration.
The ubiquity of the plastic waste pollution in seas and oceans worldwide raises great concern in society and the scientific community, as it poses a significant environmental and socioeconomic threat (UNEP, 2009). In consequence, the analysis of the impacts of marine-litter pollution on marine life and ecosystems has become a hot topic in marine research in recent years (Maximenko et al., 2019; Van Sebille et al., 2020; Lebreton et al., 2019; Lebreton and Andrady, 2019; Soto-Navarro et al., 2021). Marine-litter particles accumulate in both shallow and deep waters and particularly in enclosed basins such as the Mediterranean Sea (Soto-Navarro et al., 2020; Cózar et al., 2015), where the observed concentrations are in the same range as those measured in the great plastic patches formed in the subtropical gyres of the open oceans (Cózar et al., 2015; Law et al., 2014; Van Sebille et al., 2015). Moreover, risk analyses have shown that marine organisms in the Mediterranean basin can be highly impacted by marine-litter pollution (Compa et al., 2019; Soto-Navarro et al., 2021). The starting point to analyze those impacts and to establish suitable mitigation strategies is to understand the spatial distribution and temporal evolution of the marine-litter particles. Unfortunately, to carry on that analysis solely based on observations is not feasible. The large spatial and temporal heterogeneities of the field campaigns, along with the lack of standardized observational protocols, do not allow a synoptic representation of the marine-litter distribution (see Maximenko et al., 2019, for a thorough analysis of the marine-litter-observation problems and proposed improvements). For these reasons, numerical modeling emerges as a fundamental tool for achieving a synoptic description of marine-litter dispersion patterns and as the base for the forecasting systems that would reproduce marine-litter spatial variability and time evolution.
Marine-litter-forecasting systems are usually based on the combination of two different numerical models (Lebreton et al., 2012; Van Sebille et al., 2015; Maximenko et al., 2012). On the one hand, an ocean circulation forecasting system is implemented to provide ocean currents. On the other hand, a Lagrangian model uses those currents to simulate the advection and diffusion of passive particles in the ocean that mimic the evolution of marine litter. In the Mediterranean, several studies using this methodology have been carried out using current fields from high-resolution regional models covering the whole basin (Liubartseva et al., 2018; Macias et al., 2019; Mansui et al., 2015; Soto-Navarro et al., 2020) or specific regions such as the Adriatic, the Tyrrhenian or the Aegean (Politikos et al., 2017; Fossi et al., 2017; Liubartseva et al., 2016; Palatinus et al., 2019). This modeling approach is considered to be the most accurate choice for marine-litter forecasting (Van Sebille et al., 2020), provided the marine-litter inputs are correctly prescribed (Liubartseva et al., 2018; Soto-Navarro et al., 2020).
The downside of developing a forecasting system based on the direct modeling approach is that it involves a high technical complexity and computational cost. In order to overcome these limitations, it might be possible to develop a fast and light forecasting system based on statistical methods. One choice would be the so called statistical downscaling methods (SDMs) which rely on determining statistical relationships between large-scale variables (usually atmospheric patterns) and local variables. They are broadly used in atmospheric modeling to forecast the evolution of local variables within large-scale atmospheric models. The advantage of the SDMs is that the mathematical relationship derived by the model between the local and the large-scale variables is not only valid for the present climate but can also be used to estimate the future evolution of the local variables. In summary, the SDMs provide a simplified static methodology to forecast the evolution of local variables without running complex dynamical models. There are numerous downscaling methodologies based on different statistical properties. Among them, the analogues method (Lorenz, 1969) is the most broadly used due to its simplicity and accuracy (Grouillet et al., 2016). This technique assumes that similar (or analogous) atmospheric patterns over a given region, represented by large-scale atmospheric variables or predictors, lead to similar local meteorological outcomes (or predictands) in a particular location. This assumption provides a simple algorithm to downscale the local occurrence of the variable of interest from a given large-scale atmospheric pattern (see Sect. 2.1 for a detailed description). In general, it has been shown that the analogues method performs as well as other more sophisticated downscaling techniques (Zorita and von Storch, 1999), indicating that it is an efficient alternative to many downscaling problems. Its main advantages are that it is non-parametric (i.e., no assumptions are made about the distribution of the variables used as predictors), non-linear (i.e., it can take into account the non-linearity of the relationships between predictors and predictands), and spatially coherent (i.e., it preserves the spatial covariance structure of the local variables). The analogues method has been satisfactorily applied in the Mediterranean region not only for the downscaling of meteorological or hydrological variables, such as precipitation or river runoff (Grouillet et al., 2016; Wu et al., 2012; Caillouet et al., 2016), but also for the reconstruction of sea surface temperature in the glacial period (Hayes et al., 2005), the assimilation of satellite-derived sea surface height (Lopez-Radcenco et al., 2019) and the projection of complex climatic impact indices such as the fire weather index or the physiological equivalent temperature (Casanueva et al., 2014).
In this study, we explore the feasibility of a marine-litter-concentration forecasting system based on the analogues method. In particular, the surface marine-litter concentration is linked to the atmospheric patterns of a reference period. Then, during the forecasting phase, the forecasted atmospheric situation is compared to that realized during the reference period to identify analogue situations. The marine-litter concentration during those analogue situations is considered to be a good approximation of the marine-litter concentration that will occur during the forecasted date. As this is a new approach that has never been tested before for marine-litter dispersion, the first step was to run several tests to fine-tune the methodology and to characterize its limits in terms of validity. Ideally, the tuning and validation of the method should have been done using in situ observations, but unfortunately, the available marine-litter concentration datasets are too scarce, and this was not possible. Therefore, in this exploratory study, we have used numerically simulated marine-litter concentration fields for the development and validation of the system.
The rest of the paper is organized as follows. In Sect. 2, the statistical method, the datasets used and the different choices tested are introduced. In Sect. 3, the model results are presented and discussed, and finally, some conclusions about the capabilities of this new approach are outlined in Sect. 4.
The implementation of the analogues method requires two sets of data. First,
we need a reference dataset of the variables that describe the atmospheric
patterns over the region of study, the so-called predictors (
In our case, the predictors used to characterize the atmospheric conditions
will be the sea-level pressure (SLP) and the wind speed (
The first step to implement the analogues method is to define a cost
function, JM, that measures the similarity between different meteorological
situations. Then, for the forecast day (
In a second step, we identify the analogue dates as being those with the lowest
values of JM. We keep those dates in which JM is lower than the 1 % percentile
of all JM. Then the marine-litter concentration maps (
The period considered for the implementation of the analogues method is 2003–2013, which coincides with the period simulated by the marine-litter dispersion model (as described in the following section). The climatic
dataset necessary for the model reference period is based on the ERA5
reanalysis dataset, available at the Copernicus Climate Change Service (C3S)
web platform (
Two variables have been considered for the characterization of atmospheric
patterns forcing the marine-litter dispersion, namely the wind speed at 10 m
height (
The marine-litter concentration data are obtained from the simulations performed by Soto-Navarro et al. (2020), as they are considered to be among the most realistic for the Mediterranean Sea. Due to the relevance of the quality of the marine-litter concentration data, some details on the modeling system are presented below, and more information can be found in Soto-Navarro et al. (2020).
The system is based on the following two components: a regional high-resolution circulation model (RCM) that reproduces the 3D current velocity field in the Mediterranean (NEMOMED36) and a Lagrangian model that simulates the evolution of floating particles (Ichthyop 3.3).
The hydrodynamical model used to simulate the Mediterranean current field is
an implementation on the NEMO model, with a spatial resolution of
The individual-based model (IBM) Ichthyop 3.3 (
The results of the numerical experiments are processed to produce average
marine-litter concentration maps over the Mediterranean basin. These maps
are computed by dividing the Mediterranean basin into a regular grid of
Average SLP (in Pa) for the year 2013 in the region, computed from the ERA5 dataset. The red line at the Strait of Sicily marks the boundary between the western and eastern basins. The red rectangles limit the sub-basins of the Balearic Islands, the Gulf of Lions and the Aegean Sea, where specific analyses were carried out.
As mentioned before, there are no suitable observational datasets to
validate the forecasting system. Homogenized datasets covering a long period
of time would be required for this task. Although there have been some efforts to
develop new databases
(Maximenko et al., 2019), to
our knowledge, there are no such datasets in the Mediterranean yet. Thus, in
order to have a first assessment of the quality of this methodology, we have
to use the marine-litter concentration maps from the database as a virtual
reality and compare the forecast (
To define the forecast day, we pick any date from the reference period and forecast the marine litter for that day using all the data available except for that from a week before and after the forecast day to avoid spurious good results due to autocorrelation. This has been repeated for all the days in the reference period (3 650), and several statistical metrics have been computed to assess the skills of the method.
To test if the model shows different skills depending on the domain of application, we have applied the method to the following seven different regions: the whole Mediterranean, the eastern and western basins, the Gulf of Lions, the region around the Balearic Islands, the Adriatic Sea, and the Aegean Sea (see Fig. 2). In each case, the analogue days have been defined using only data on the selected region.
Additionally, we have tested if the skill of the method depends on the timescales of the marine-litter concentration variability. So, in addition to
using the marine-litter concentration dataset, we have used two filtered
versions of it, separating those processes above and below 15 d
(
Finally, for completeness, we propose three additional models for the
forecasting. First, we forecast the concentration change over 7 d
(
In summary, we have tested four configurations of the model over seven different
regions to forecast
Several diagnostics are used to characterize the quality of the forecasts in
the different experiments. The first one is the root-median-square error
(RMEDSE):
The temporal correlation and the RR of the marine-litter concentration reconstruction using different cost functions and forecasting models are presented in Figs. 4 and 5. The spatial patterns of the correlation are very consistent among the different combinations. The fields are relatively patchy, with the highest values in the eastern basin, close to the Turkish coasts, in the Gulf of Gabes, in the west of Sardinia and towards the north of the Balearic Islands. Conversely, the minimum correlation values are found in the Alboran Sea, the Algerian basin and the Gulf of Lions. The RR maps are very consistent, showing lower values where and when the correlation is higher and values closer to 1 where and when the correlation is lower.
Concerning the different cost functions used to identify the analogue
situations, the performances using only SLP (JM
Using 7 d long persistence to forecast the marine-litter concentration (model 3; see Fig. 4), the results largely improve. They show correlations that range from 0.20 in the Alboran Sea and the Gulf of Lions to 0.82 around Cyprus, with an average value of 0.60. The RR reaches values as low as 0.4, with an average value of 0.79. Finally, combining both methodologies in model 4 provides the best results. Combining the 7 d long persistence with the analogues-based forecast of the concentration change increases the forecasting skills. In this case, the averaged correlation is 0.62, and the averaged RR is 0.79.
Temporal correlation of the forecasts using different models and cost functions with the reference dataset. Each column corresponds to a different forecasting model: the analogues-based forecast of the concentration (model 1), the analogues-based forecast of the concentration changes over 7 d (model 2), the 7 d long persistence (model 3), and the 7 d long persistence in combination with the forecast of the concentration change over 7 d (model 4). Each row corresponds to the different cost functions used to identify the analogues (see text for details). Note that all panels in the third column are the same, as in model 3 no cost function is used.
Same as Fig. 5 but for the RMEDSE ratio. Values close to 1 (white) indicate that the forecast brings little improvement with respect to using a random day.
For completeness, we also include an example of the concentration time series for the reference and models 1, 3 and 4 for a point where the forecasts perform well (Fig. 6a). It can be seen that model 1 is well correlated with the reference, showing a good chronology of events despite being unable to capture the concentration peaks. During those periods, the analogues-based forecast largely underestimates the reference values. Models 3 and 4 show almost identically good results as far as the persistence is enough to capture most of the variability. The underlying reason for this success is that, at this location, the changes in marine-litter concentration are relatively slower, so assuming persistence can be a good predictor. For comparison, the time series for a point where the models perform poorly are shown in Fig. 6b. In this case, the analogues-based forecast is unable to capture any variability, and it basically produces the mean value. The other two models are able to follow the variability, although in this case, the skills are lower than in the previous case. The reason is that, at this point, the marine-litter concentration varies more rapidly, so assuming the persistence is not as good a predictor as it was in the previous location.
Time series of marine-litter concentration (in kg km
A complementary view of the performance of the different forecasting models can be obtained by looking at the marine-litter concentration anomalies (i.e., with respect to the temporal mean) on given dates. In Fig. 7, we show the results for a date when the models show good agreement with the reference (spatial correlation values are 0.70, 0.76 and 0.78 for models 1, 3 and 4, respectively). All three models are able to identify the areas of high and low concentrations. Maximum values in the north of the Balearic Islands, the Gulf of Gabes and the south of Italy and minimum values in the Adriatic Sea, the Algerian basin and the easternmost part of the Mediterranean are well captured. The analogues-based forecast (model 1) shows smoother patterns with fewer low extremes. This is in good agreement with what has been seen in the time series in Fig. 6, suggesting that this model presents with difficulties in capturing very high concentration values. Regarding the persistence-based models, for this particular date, they perform very well, capturing not only the large-scale patterns but also the local features. Looking at a date when the performance is lower, something interesting appears. Although the spatial correlation of model 1 is not significant (Fig. 8b), the large-scale features seem to be well captured. However, the small-scale features are clearly not captured, which degrades the spatial correlation. This would also support the previous finding reinforcing the idea that the analogues-based forecast performs better for the large-scale features. In places or dates where or when the small-scale features become dominant, the performance of the model drops.
Maps of marine-litter concentration anomalies for a date where the
analogues-based forecast performs well.
The time series of the spatial correlations and spatial RR at each time step are presented in Fig. 9. The results are similar for the three models forecasting the marine-litter concentration (models 1, 3 and 4). The skills of the forecasts show a high temporal variability, with correlation values ranging from 0.5 to almost 1 and an averaged value of 0.78, 0.81 and 0.84, respectively. For RR, the values range from 0.3 to more than 1, with an average value of 0.76, 0.79 and 0.71, respectively. This diagnostic also confirms that the best model is the one that combines the persistence with the forecast of the concentration change.
The methodology has also been applied to different domains. That is, the cost function, JM, has been computed in the regions defined in Fig. 2, and the validation has been performed looking only at the marine-litter concentration in those regions. In general, better results are obtained when the analogues-based forecasts are applied to a larger region (see Tables 1 and 2). For instance, the analogues-based forecast (model 1) provides modest results, with correlations of 0.31 and 0.35 and RR of 0.92 and 0.86 for the eastern and western Mediterranean, respectively. At a local scale, the correlation ranges between 0.29 and 0.34, and the RR ranges between 0.80 and 0.94. The analogues-based forecast for the concentration change (model 2) shows lower skills, with correlations below 0.23 and RR above 0.98 in all regions. Both models show better performance when forecasting the low-frequency component than when forecasting the high-frequency one. The correlations of model 1 forecasts in the different regions range between 0.31 and 0.40 for the low-frequency component, while they range between 0.15 and 0.22 for the high-frequency component. Consistent results are found when looking at the RR and model 2 forecasts.
The 7 d long persistence (model 3) is shown to be a good predictor for the full signal and the low-frequency component, but it struggles to capture the high-frequency variability, as expected. Provided that the low-frequency part of the signal is what dominates the marine-litter concentration variability, this model shows good skills for the full signal, with correlations in all regions ranging from 0.55 to 0.64 and RR ranging from 0.75 to 0.82.
The best results for the forecast of the marine-litter concentration are obtained when combining the 7 d long persistence with the analogues-based forecast of the 7 d concentration change (model 4). The averaged temporal correlation is over 0.54 in all regions, reaching a value of 0.65 when applied to the western Mediterranean, while RR is below 0.80 and reaches 0.76 for the whole Mediterranean.
Regionally averaged temporal correlation of the different
forecasting models (M1–M4) applied in different regions (see Fig. 2). The
models have been applied to forecast the full signal of marine-litter
concentration, the high-frequency component (period
Same as Fig. 7 but for a situation where the forecasts perform worse.
Time series of
The spatial diagnostics have also been computed by applying the models to different domains (Table 3). In this case, the analogues-based forecast of concentration (M1) shows average spatial correlations higher than 0.62 when applied to any region, reaching up to 0.94 in the Aegean Sea. Also, the analogues-based forecast of concentration change (M2) shows significant average correlations, ranging between 0.19 and 0.30. The 7 d long persistence (M3) again shows an improvement in the results, although the combination of the 7 d long persistence and the analogues-based forecast of concentration change (M4) is the best model when applied in any region. The average correlation ranges between 0.67 and 0.96, and RR is lower than 0.83 everywhere.
Same as Table 1 but for the RMEDSE ratio.
Temporally averaged regional correlation and RR of the different forecasting models (M1–M4) applied in different regions (see Fig. 2).
It is worth mentioning that we have also tested other options for the cost function, such as using different temporal averages or using correlations as similarity metrics, but no significant differences have been found. Also, we have tried to change the criterion for defining the analogue days. Instead of identifying as analogues those days with JM lower than the 1 % percentile of the whole JM time series, we have used less-restrictive criteria (5 % or 10 %). In both cases, the results worsened.
The analogues-based forecasting technique has been applied to marine-litter concentration for what is, to our knowledge, the first time. It has proven to be very inexpensive and relatively easy to set up, so it is an alternative to direct modeling worthy of being considered. A key step in the set-up is to select a suitable cost function and the best threshold to identify the analogue meteorological situations. In our case, it seems that using integral definitions for the cost function improves the results. In other words, it is better to identify the analogue days based on the history of the meteorological situation. It is probable that using a different averaging time for each domain would allow for an increase in the skills of the analogues-based model. However, this fine tuning is out of the scope of this paper, as there are no suitable observations to validate it, as will be discussed later.
The quality of the analogues-based forecasts depends on the region of application. Our results suggest that the larger the region of application, the better, as we get better results for the whole Mediterranean or for the eastern and western basins than for smaller local areas. A hypothesis for explaining this result is that using the atmospheric situation as a predictor may not be suitable for capturing small-scale features (e.g., those related to ocean currents or the interaction with coastlines). Further tests including other predictors, such as ocean currents, could be done to refine the method.
Another important point is that the method struggles to capture the extreme values, as it produces smooth spatiotemporal patterns of marine-litter concentration. Therefore, in locations or regions where short, intense events or small-scale features dominate the variability, the method performs worse. This is also one of the reasons why the temporal skills (i.e., temporal correlation and RR) are relatively low (see Sect. 3.1). Conversely, if instead of the time variability it is the spatial structures that are aimed at, the method shows a high level of skill in terms of being able to locate relative maxima and minima (see Sect. 3.2).
We have also shown that persistence is a very good predictor almost everywhere. This is because the marine-litter concentration changes relatively slowly (i.e., the system has memory of several days), at least at the spatial scales solved by the reference dataset. This means that, if reliable information was available (e.g., from a monitoring program), this could be used as a first guess regarding the marine-litter concentration several days later. Complementarily, the analogues-based method has also been applied to forecasting concentration change. In this case, the results were significantly poorer in terms of capturing both the time and spatial variability. However, the analogues-based method could be useful for improving the persistence-based forecasts.
Regarding the reliability of the analogues-based forecasts that could be generated from this reference dataset, its quality would directly depend on the accuracy of the reference dataset. In our case, this dataset comes from the outputs of a realistic modeling (Soto-Navarro et al., 2020). However, the model may have some shortcomings, such as its spatial resolution, beaching parameterization or the realism of marine-litter sources. Consequently, the forecasts would be, in the best case, as good as the model outputs are. Therefore, it would have been better to validate the different forecasting models against actual observations. Unfortunately, the lack of observations with a suitable spatial and temporal coverage prevents us from doing this. In the future, it would be worth setting up a monitoring program with sufficient spatial and temporal resolutions that would allow for a comprehensive-enough reference dataset to be generated. This dataset could be used to train the analogues-based forecasting system and to validate other existing systems.
In any case, it is worth noting that the validation of the methodology can be considered to be robust. For that purpose, it is not required that the reference dataset is an accurate representation of the actual marine-litter concentration. Only the statistics of the marine-litter concentration's spatiotemporal evolution have to be reproduced, and in that sense, the model integrates the effects of a realistic atmospheric forcing and a realistic ocean current field. So, it is expected that the statistics of the marine-litter concentration field are realistic enough. This extent should also be confirmed by a comprehensive observational dataset, at least in certain regions.
In conclusion, the analogues-based model presented here has potential to become a suitable, cost-effective forecasting method for marine-litter concentration. It could be easily implemented in any region of the world where a realistic reference dataset is available. In those regions where the large-scale marine-litter concentration patterns dominate the variability, the method will probably work better than in regions where the variability is dominated by small-scale structures.
The code and data required to implement the model described in the paper and
to reproduce the results can be publicly accessed at Jordà
and Soto-Navarro (2022). Additionally, the atmospheric fields can be
downloaded from the Copernicus portal (
Both authors (GJ and JSN) contributed equally to the design of the study, the coding of the modeling system, the performance of the simulations, the analysis of the results, and the preparation and revision of the paper.
The contact author has declared that neither of the authors has any competing interests.
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We acknowledge the Junta de Andalucía-funded Origen y evolución de la basura marina en la costa andaluza (OBAMARAN; reference no. Proy_Exel_00344). The authors also thank Mr. Paul Dupin for his help in the first steps of the model development.
This research has been supported by the European Regional Development Fund, Interreg (grant no. 4MED17_3.2_M123_027). We also received funding from the EU-Interreg MPAs Plastic Busters project: preserving biodiversity from plastics in Mediterranean Marine Protected Areas, co-financed by the European Regional Development Fund (grant agreement no. 4MED17_3.2_M123_027). The publication fee was supported by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI).
This paper was edited by Erik van Sebille and reviewed by Andres Cozar and one anonymous referee.