Using machine learning and beach cleanup data to explain litter quantities along the Dutch North Sea coast

Kaandorp, Mikael L. A.; Ypma, Stefanie L.; Boonstra, Marijke; Dijkstra, Henk A.; van Sebille, Erik

doi:https://doi.org/10.5194/os-18-269-2022

Articles | Volume 18, issue 1

https://doi.org/10.5194/os-18-269-2022

Articles | Volume 18, issue 1

Research article

| Highlight paper

03 Mar 2022

Research article | Highlight paper |

| 03 Mar 2022

Using machine learning and beach cleanup data to explain litter quantities along the Dutch North Sea coast

Mikael L. A. Kaandorp, Stefanie L. Ypma, Marijke Boonstra, Henk A. Dijkstra, and Erik van Sebille

Abstract

Coastlines potentially harbor a large part of litter entering the oceans, such as plastic waste. The relative importance of the physical processes that influence the beaching of litter is still relatively unknown. Here, we investigate the beaching of litter by analyzing a data set of litter gathered along the Dutch North Sea coast during extensive beach cleanup efforts between the years 2014 and 2019. This data set is unique in the sense that data are gathered consistently over various years by many volunteers (a total of 14 000) on beaches that are quite similar in substrate (sandy). This makes the data set valuable to identify which environmental variables play an important role in the beaching process and to explore the variability of beach litter concentrations. We investigate this by fitting a random forest machine learning regression model to the observed litter concentrations. We find that tides play an especially important role, where an increasing tidal variability and tidal height leads to less litter found on beaches. Relatively straight and exposed coastlines appear to accumulate more litter. The regression model indicates that transport of litter through the marine environment is also important in explaining beach litter variability. By understanding which processes cause the accumulation of litter on the coast, recommendations can be given for more effective removal of litter from the marine environment, such as organizing beach cleanups during low tides at exposed coastlines. We estimate that 16 500–31 200 kg (95 % confidence interval) of litter is located along the 365 km of Dutch North Sea coastline.

Download & links

Article (PDF, 6389 KB)

Download & links

How to cite.

Received: 27 Aug 2021 – Discussion started: 14 Sep 2021 – Revised: 23 Dec 2021 – Accepted: 25 Jan 2022 – Published: 03 Mar 2022

1 Introduction

The accelerated release of mismanaged plastic waste into the global ocean gives rise to the need for effective cleanup strategies (Ogunola et al., 2018). In order to minimize the negative impact of plastic pollution on the environment, cleanup strategies need to be optimized to target the most impacted areas while limiting the economic cost (Haarr et al., 2019; Newman et al., 2015). Recent studies indicate that plastics remain trapped in coastal zones (Koelmans et al., 2017; Lebreton et al., 2019; Kaandorp et al., 2021 a; Morales-Caselles et al., 2021), with at least 77 % of buoyant marine plastic debris beaching or floating in coastal waters (Onink et al., 2021). Therefore, beach cleanups have the potential to be a highly effective mitigation measure.

In addition, the plastic concentrations found on beaches are generally higher compared to other environmental compartments, such as the surface water or the seafloor (Morales-Caselles et al., 2021), making beaches favorable locations for cleanup activities. Furthermore, by limiting the resuspension of plastic items by removal, the overall plastic concentration on the beach decreases over time and the formation of microplastic is reduced (Andrady, 2011; Haarr et al., 2020; Lebreton et al., 2019). At the same time, as cleanup activities generally involve a large number of volunteers, awareness of the plastic pollution problem increases, leading to a reduction of plastic waste in the local environment (Kordella et al., 2013).

Although the benefits of beach cleanups are well known, the location and timing of these activities are often not optimized. Haarr et al. (2019) identified accumulation zones of beached plastic using the shoreline curvature and gradient in Lofoten, Norway, and showed that high-accumulation areas are often missed by cleanup actions. Other coastal properties like substrate and backshore type have been found to influence debris quantities as well (Hardesty et al., 2017; Brennan et al., 2018), with more litter accumulating in areas with increased backshore vegetation. Additionally, physical processes play an important role in the beaching of plastics and should be considered when selecting effective sites for beach cleanups.

However, the relative importance of the various physical processes involved and how these can be parameterized so far remains unknown (van Sebille et al., 2020; Pawlowicz, 2020). Studies have addressed the importance of the landward wind direction for debris accumulation rates (Eriksson et al., 2013; Critchell et al., 2015; Hengstmann et al., 2017; Moy et al., 2018), the landward ocean circulation direction (Thepwilai et al., 2021), and the role of tides (Eriksson et al., 2013; Pawlowicz, 2020) and waves (Williams and Tudor, 2001). The spatial and temporal variability of the sources, e.g., rivers, population density, and the fishing industry, also play an important role for the accumulation of plastic on beaches (Rech et al., 2014; Critchell and Lambrechts, 2016).

In addition to the study by Haarr et al. (2019), there are several other studies that assess the prediction or monitoring of beached plastic items using machine learning methods. These algorithms can be useful in discovering complex relations between environmental variables and litter concentrations. In Granado et al. (2019), a marine litter forecasting model was made using Bayesian networks, involving various variables like wave height and period, wind velocity and direction, precipitation, and river flow. Neural networks have been used to quantify litter categories in Balas et al. (2004) and Schulz and Matthies (2014), and deep learning methods have been used to automatically identify debris on beaches (Song et al., 2021).

In order to make data-driven methods work, relatively large and consistent data sets are necessary, but most observational data sets are sparse. Beach cleanups and citizen science initiatives can potentially provide valuable information for scientific studies on marine pollution (Zettler et al., 2017), as these data are based on a considerable amount of person hours. Examples of citizen science data used in marine pollution research can be seen in the work of Hidalgo-Ruz and Thiel (2013), where schoolchildren in Chile documented the distribution and abundance of plastic debris on beaches, and Ribic et al. (2010, 2012), where amounts of marine debris were measured by volunteer teams on beaches in the Pacific and Atlantic.

Here, we will build upon past data-driven studies by using an unprecedented data set obtained from beach cleanup efforts organized along the Dutch North Sea coast between 2014 and 2019. The number of participants (about 14 000), person hours (about 84 000 h), the length of beach sampled (about 1400 km), and the fact that all beaches sampled were similar in substrate (sandy) make this data set unique and very appropriate to apply data-driven methods. Furthermore, a large set of explanatory variables will be created based on environmental conditions and modeled transport of marine litter. We will fit a random forest regression model to the observed litter concentrations as a function of these explanatory variables and investigate which ones are important to explain the variability in beach litter. This allows us to investigate which variables are important predictors for the amount of litter present on beaches to get a better understanding of marine pollution and to increase the efficacy of beach cleanups by creating a predictive model that could aid future cleanup efforts.

2 Data description and region of interest

Since 2013 the North Sea Foundation, a Dutch environmental non-governmental organization (NGO) advocating the protection and sustainable use of the North Sea marine ecosystem, has organized the national Boskalis Beach Cleanup Tour. During this tour, every year in August the entire Dutch North Sea coast is cleaned up by volunteers. It is the largest cleanup campaign in The Netherlands. The tour is divided into stages along the North Sea coast. The length of each stage is between 8–10 km. The midway points of all stages are plotted in Fig. 1 using the black crosses.

During the first three editions (2013–2015), the tour was organized over a period of a month, with one stage per day. From 2016 on, the tour took 15 d, with simultaneous cleaning of two stages per day. One cleanup team started on the Wadden Island Schiermonnikoog (the easternmost cross in Fig. 1), the other team started in the southwestern province Zeeland in Cadzand (the westernmost cross in Fig. 1). On day 15, both teams met halfway in Zandvoort (≈4.5^∘ E). The cleanups started around 10:00 LT (local time) and ended around 16:00 LT, with total cleanup times taking between 4 and 6 h for each stage. The volunteers were guided by cleanup teams of the North Sea Foundation, which consist of professional employees of the North Sea Foundation and trained volunteers.

At each stage, all litter present on the beach was collected in plastic bags and weighed. The weighing of the collected litter was done using analogue and/or digital scales (during the stage or at the end of the stage) and carried out by one of the members of the cleanup team. Most of the litter found was plastic (estimated percentage between 80 %–90 % in terms of numbers). The years over which weights of collected litter are available for each stage are plotted in Fig. 1 using the colored squares. For most stages, weights are available for all years, in some cases stages were added in later years. Figures with the observed amount of litter per location per year are presented in Figs. A1 and A2.

To get an impression of the mean environmental conditions along the Dutch North Sea coast, the mean surface currents are plotted in Fig. 1 using the arrows (Global Monitoring and Forecasting Center, 2021), and the mean wind speed and direction are plotted using the wind rose (Hersbach et al., 2020), all averaged over August between 2014 and 2019. The wind predominantly comes from the southwest. Generally, the currents move from southwest to northeast along the North Sea coast. The effect of freshwater influxes from rivers is visible around the southern province of Zeeland (<52^∘ N). The effect of this freshwater influx can be observed over considerable distances along the Dutch coast, for example in the form of freshwater lenses traveling downstream (De Ruijter et al., 1997; Rijnsburger et al., 2021). Ricker and Stanev (2020) found that locations with high-salinity gradients due to a freshwater influx can act as a barrier for neutrally buoyant particles, possibly causing accumulation of litter along these fronts. Finally, tidal currents move along the coast to the northeast during flood tide and southwest during ebb tide (not plotted in the Fig. 1).

https://os.copernicus.org/articles/18/269/2022/os-18-269-2022-f01

Figure 1Locations of the midway points for each cleanup tour stage (black crosses) and dates showing for which year data are available (the colored squares). For stages with multiple data points per year, different stretches of beach were cleaned (e.g., once on the northern side and once on the southern side). Also plotted are the mean surface currents (arrows) (Global Monitoring and Forecasting Center, 2021) and the wind rose (Hersbach et al., 2020) calculated for August for the years 2014–2019.

3 Methodology

3.1 Data preprocessing

Different sources of marine litter exist, such as mismanagement of waste near the coast, input from rivers, or fishing gear which is lost at sea. The litter is then transported through the environment and can eventually end up on beaches, influenced by various factors such as ocean currents and winds. However, how all of these variables combined influence the beaching of litter is unknown. A regression model is used here to relate various environmental variables to the observed litter concentrations. We will assess whether it is possible to use the regression model to make predictions about the amount of beached litter and, if so, which environmental variables are important predictors to take into account.

For the environmental variables, three classes of data are used. First of all, hydrodynamic data (ocean currents, ocean surface waves, tides) and wind data are used (Sect. 3.1.1). Furthermore, we use Lagrangian simulation data, capturing transport of virtual particles representing floating litter. These simulations are used to estimate fluxes of litter onto beaches (Sect. 3.1.2). Finally, we use data of the coastal geometry and orientation (Sect. 3.1.3). Environmental variables are calculated for various lead times and distances from the measurement locations (expressed as radii around the stage midway points). These variables are then fed into a random forest algorithm to make the regression model.

3.1.1 Hydrodynamic and wind data

Numerical model data are used to specify the state of the sea and wind around the beach cleanup locations, as these factors have been found to likely play a role in the accumulation of beach litter (Eriksson et al., 2013; Thepwilai et al., 2021; Williams and Tudor, 2001). Reanalysis data are used where historical observational data have been assimilated in numerical models.

Information about ocean surface currents (U_curr.), salinity (S), Stokes drift (U_Stokes), and significant wave height (H_s) are derived from EU Copernicus Marine Environmental Monitoring Service Information data. High-frequency tidal forcing has been used to produce the ocean current data, but output is only provided daily. To capture the effects of tides on a high temporal resolution, FES2014 data are used. Tidal currents (U_tides) and heights (h_tide) are calculated, taking the M₂, S₂, K₁, and O₁ constituents into account (Sterl et al., 2020), as well as the M₄ and M₆ components, which have been shown to play an important role in transport of suspended particles in the North Sea (Gräwe et al., 2014). The wind velocity field at 10 m (U_wind) is taken from ERA5 reanalysis data. ERA5 data are used for the atmospheric forcing in the European Northwest Shelf reanalysis product from which the surface current data are obtained, making these data sets consistent. Further details on the temporal and spatial resolution and assimilated data are given in Table 1.

Global Monitoring and Forecasting Center (2021)Global Monitoring and Forecasting Center (2020)Lyard et al. (2021)Hersbach et al. (2020)

Table 1An overview of the numerical hydrodynamic and wind data used to derive the variables for the regression analysis. The data set name, temporal and spatial resolution, data used to assimilated the numerical models, and corresponding references are presented.

^* Data are used from July to September for the years 2014 to 2019. $^{* *}$ Data are used for all months from January 2011 up to September 2019, as these are also used for the Lagrangian model simulations.

Download Print Version | Download XLSX

3.1.2 Lagrangian model setup

While data on the sea state and wind might explain the litter accumulating on beaches to some extent, it misses information on possible sources of litter and how this litter is transported through the marine environment. We therefore include estimates of beached litter fluxes in our analysis based on Lagrangian particle simulations.

Using the OceanParcels Lagrangian ocean analysis framework (Delandmeter and van Sebille, 2019), we model the trajectories of virtual buoyant particles at the sea surface using a Runge–Kutta 4 integration scheme. These virtual particles represent floating litter such as plastics. For the trajectories we consider a domain between 40–65^∘ N and 20^∘ W–13^∘ E; see Fig. 2. We simulate a total of about 380 000 trajectories over the years 2011–2019. When particles move out of the specified domain they are removed, which mainly happens after particles move northward along the Norwegian coast. The ocean surface currents and Stokes drift from the hydrodynamic data are used to move the virtual particles around. We do not add additional tidal forcing to the Lagrangian model (Sterl et al., 2020) since the net effect of tides is already included in the ocean surface current data set (Global Monitoring and Forecasting Center, 2021). It is assumed that particles move just below the surface water and do not experience a direct wind drag (Lebreton et al., 2018; Macias et al., 2019; Kaandorp et al., 2020). Effects of subgrid-scale phenomena are parameterized using a zeroth-order Markov model (van Sebille et al., 2018). The tracer diffusivity is set to a constant value of 10 m² s⁻¹, appropriate for the given mesh size (Neumann et al., 2014).

We use the same approach as in Kaandorp et al. (2020) to define sources of marine plastic litter. Particles are released daily at river mouths, proportional to the estimated monthly riverine outflow of plastic waste based on the model by Lebreton et al. (2017). These sources are plotted using green circles in Fig. 2. Particles are released daily in the sea, proportional to the amount of fishing hours based on Kroodsma et al. (2018), shown in blue in Fig. 2. These data are dependent on fishing vessel transponders, which are not equally present over the years. We therefore release a constant input of virtual particles from this source each day. Finally, there is a constant daily release of particles along coastlines proportional to the amount of estimated land-based mismanaged plastic waste within a radius of 50 km from the coastline (Jambeck et al., 2015; SEDAC et al., 2005). These sources are plotted in red in Fig. 2.

A beaching timescale τ_beach parameterizes how quickly litter moves from the sea onto the beach when residing near the coast (Kaandorp et al., 2020). Here, the probability of beaching P_beach is given by

\begin{matrix} (1) & P_{beach} = 1 - e^{- t_{coast} / τ_{beach}}, \end{matrix}

where t_coast is the time that particles spend in the model ocean cell adjacent to the coast. Various values for τ_beach are tested here, from τ_beach=25 d estimated for plastic particles and τ_beach=75 d estimated for drifter buoys in Kaandorp et al. (2020), to a more conservative value of τ_beach=150 d. While in reality τ_beach might vary significantly both in space and time, it is unknown how this can be best parameterized (Onink et al., 2021). We use the Lagrangian model simulations to capture the large-scale transport of litter and allow the regression model to pick the most appropriate value for τ_beach later on. Only direct pathways of litter through the surface water are considered here and resuspension of litter from beaches (Onink et al., 2021) is ignored. Particles are tracked until they have lost more than 99 % of their initial mass in the most conservative scenario of τ_beach=150 d. This means that particles are deleted when they have spent more than 691 d near the coast.

Each virtual particle starts with a unit mass. For each time step that a virtual particle spends near the coast, a fraction of its mass is lost due to the beaching process. This means that as t_coast increases for a virtual particle, a fraction of its mass is lost, which is calculated using Eq. (1). For each virtual particle, we calculate where and when it loses mass due to the beaching process. These masses lost to beaching are binned in a $1 / 9^{\circ} \times 1 / 15^{\circ}$ beaching flux histogram for each day. These beaching fluxes are denoted by F_beach and are calculated for each particle source: F_beach,fis., F_beach,riv., and F_beach,pop. for fishing activity, river inputs, and mismanaged plastic waste from coastal population, respectively.

https://os.copernicus.org/articles/18/269/2022/os-18-269-2022-f02

Figure 2Input scenarios used to seed virtual litter particles in the Lagrangian simulations. Riverine input is indicated by the green circles, the amount of fishing hours is shown in blue, and the coastal mismanaged plastic waste density is shown in red. Note the log scale used for all input scenarios. While all rivers from Lebreton et al. (2017) are included in our analysis, only rivers predicted to transport more than 0.2 t of plastic litter into the ocean are plotted here.

3.1.3 Coastal orientation and geometry

Coastal orientation, geometry, and substrate are likely to influence the amount of litter that actually beaches on coastlines (Brennan et al., 2018; Andrades et al., 2018; Hardesty et al., 2017). Although the substrate of beaches in the Netherlands is relatively similar (sandy), there are local variations in the coastline orientation with respect to the large-scale coastline. We take this into account by including information on how the hydrodynamic and wind data are oriented with respect to the local coastline.

The Natural Earth data set is used here at a 1:10 million resolution (Kelso and Patterson, 2010), which is fine enough to estimate the general orientation of the beaches on which the cleanup stages have taken place. Two locations are not present in the coastal geometry of this data set (two constructed beaches along dams: Brouwersdam and Neeltje Jans); the coastal orientations of these locations were determined manually.

Normal vectors to the coastline (denoted by n) are estimated by fitting a tangent plane through the points defining the coastline segments. Using a singular value decomposition we minimize the orthogonal distance between these points and the plane. All points within a box of 10×10 km centered around the stage midway point are selected (roughly the length scale of the beach cleanup tours). One example is plotted in Fig. 3a, where the dotted box is the selection around the stage midway point, and the coastline segments within this box are indicated in orange. The resulting normal vector to this coastline segment is plotted using the orange arrow.

Dot products are calculated for vector fields (e.g., current velocity) with respect to the coastline normal vectors to quantify how much a vector points onshore (positive dot product) or offshore (negative dot product). An example is presented in Fig. 3b. At a given stage midway point, the numerical data within a certain radius are selected. For each of the cells we can then calculate the dot product of the vector data with respect to the coastline normal vector. In the example of Fig. 3b, the normal vector points towards the northeast. Cells where the velocity vector points in roughly the same direction (onshore) are colored red, the opposite directions (offshore) are colored blue. In Fig. 3b the example is presented for only one time snapshot: the quantities can be calculated for various lead times. We then save derived quantities such as the mean, maximum, or minimum dot product over the lead time in a given radius, which will be further explained in Sect. 3.2.1.

The coastal normal vectors are also used to estimate the misalignment between the numerical model coastline and the high resolution coastline. In Fig. 3a, the numerical model grid cell centers at the coast are plotted using the brown dots. A singular value decomposition is used again to estimate the coastline normal vector of the numerical grid (n_grid, indicated by the brown arrow). At each stage midway point, the dot product is taken of n_grid with respect to the high-resolution coastline normal vector n to obtain a measure for the misalignment. In the example plotted in Fig. 3a there would be a large amount of misalignment between n_grid and n, resulting in a negative dot product between the two quantities.

Finally, the coastline length per grid cell is estimated. For each cell of the numerical model, we take the coastline segments within the given cell and calculate their total length. Since coastlines show fractal behavior (Kappraff, 1986) their Euclidian length is not well defined. This means that the lengths calculated here are estimates and that their value would increase when taking a higher model resolution.

https://os.copernicus.org/articles/18/269/2022/os-18-269-2022-f03

Figure 3Illustration of the methodology used to calculate the directional variables. Panel (a) shows the high-resolution coastline points and the derived normal vector (n), shown in orange, located around the stage midway point (the black cross). Also shown are the numerical model coastline points and the derived normal vector (n_grid) in brown. Panel (b) shows how the dot product variables are calculated. In a radius around the stage midway point, the dot product of the vector field is calculated with respect to the high-resolution coastline normal vector (n), where offshore components are indicated in blue and onshore components are shown in red.

Using machine learning and beach cleanup data to explain litter quantities along the Dutch North Sea coast

3.1 Data preprocessing

3.1.1 Hydrodynamic and wind data

3.1.2 Lagrangian model setup

3.1.3 Coastal orientation and geometry

3.1.4 Spatial variability

3.2 Model

3.2.1 Machine learning features

3.2.2 Regression model

4.1 Regression analysis

4.2 Spatial variability

4.3 Extrapolating litter quantities to the entire coastline

B1 Gini importance overview

B2 Excluding Lagrangian model features

B3 Effect of using only the top N features

B4 Number of participants

B5 Feature effect