Detecting most effective cleanup locations using network theory to reduce marine plastic debris: A case study in the Galapagos Marine Reserve

. The Galapagos Marine Reserve was established in 1986 to ensure protection of the islands’ unique biodiversity. Unfortunately, the islands are polluted by marine plastic debris and the island authorities face the challenge to effectively remove plastic from its shorelines due to limited resources. To optimise efforts, we have identiﬁed the most effective cleanup locations on the Galapagos Islands using network theory. A network is constructed from a Lagrangian simulation describing the ﬂow of macroplastic between the various islands within the Galapagos Marine Reserve, where the nodes represent locations 5 along the coastline and the edges the likelihood for plastic to travel from one location and beach at another. We have found four network centralities that provide the best coastline ranking to optimise the cleanup effort based on various impact metrics. In particular locations with a high retention rate are favourable for cleanup. The results indicate that using the most effective centrality for ﬁnding cleanup locations is a good strategy for heavily polluted regions if the distribution of marine plastic debris on the coastlines is unknown and limited cleanup resources are available.

. Map of the main islands in the Galapagos Marine Reserve. All port locations are indicated by diamond markers and virtual particle pathways (blue lines) show all existing connections via ocean currents using the MITgcm model simulation from Puerto Ayora (blue diamond) to other coastlines.
Although monitoring efforts to measure macroplastic abundance along the Galapagos coastlines are increasing, the macroplas-55 tic distribution is still largely unknown due to limited resources. Conservation efforts will therefore strongly benefit from identifying key locations for cleanup. Previous studies have shown that a strong inter-island genetic connectivity exists within the Galapagos Marine Reserve, mainly due to seed dispersal by ocean currents (e.g. Fajardo et al., 2019;Arjona et al., 2020).
Thus, it is likely that also macroplastic may have high residence timescales within the marine reserve due to ocean transport processes between the islands. Therefore, the connectivity between the coastlines of the Galapagos Islands provides an ideal 60 case study to investigate the use of centrality theory to find optimal cleanup locations.
In this paper we aim to identify which centralities provide the best ranking to optimise cleanup efficacy at coastlines of the Galapagos Islands. The resulting cleanup efficacy-model will provide a valuable first indication of which locations to target and can easily be applied to other regions across the globe.

65
In order to set up the network of macroplastic connectivity between the Galapagos Islands, virtual particles are released from all island coastlines. The pathways of the particles are constructed using a Lagrangian tracking tool (section 2.1) where the 3 https://doi.org/10.5194/egusphere-2022-426 Preprint. Discussion started: 16 June 2022 c Author(s) 2022. CC BY 4.0 License. arrival of particles at the coastline is parameterised by a beaching parameterisation (section 2.2). From these pathways, a transition matrix is constructed that provides the likelihood that two coastlines are connected (section 2.3). It is then possible to find the optimum cleanup locations using various coastline rankings based on centrality measures (section 2.4). To test the 70 impact of these rankings, three metrics are proposed in section 2.5 that optimise for maximum removal of particles and minimal distribution of particles on land.

Lagrangian particle simulation
The transition matrix used for the network analysis is constructed from a Lagrangian simulation of macroplastic transport between the Galapagos Islands. In this paper we only consider positively buoyant macroplastic (>0.5 cm) as larger pieces of 75 plastic are the targets of the beach cleanup (see section 1). As such, we used the two-dimensional daily-mean surface velocity fields from the regional MITgcm model described in detail by Forryan et al. (2021). The model has a horizontal resolution of 4 km and is forced at the three open boundaries (north, south and west) by the temperature, salinity and velocity fields of the ORCA0083-N01 1/12 • NEMO model 1 . Surface forcing from ERA-Interim at 6-hourly resolution was applied and the simulation years 2008-2012 were used for the particle tracking. The model is able to capture the main surface circulation and 80 variability observed near the Galapagos Islands and is therefore well suited to investigate the MPD connectivity between the islands (Forryan et al., 2021).
Virtual particles were released daily throughout the five years of the model simulation at every coastal grid point of the 10 islands that are resolved by the MITgcm model (see Figure 1). A coastal grid point is defined as the model grid point that shares one or more corners with a land grid point. The particles are advected for 60 days through fourth-order Runga-Kutta 85 integration in the Parcels Lagrangian framework (Delandmeter and van Sebille, 2019) with an hourly output frequency. A two month timescale has proven sufficient to connect any two coastal grid points within the marine reserve. In total, 707.400 particle trajectories (1800 days x 393 coastal grid points) were used to construct the transition matrix.
As the timescale to travel from one coastline to another is at most two months, processes that impact the vertical movement of macroplastic, such as fragmentation processes and biofouling are not included as these typically have a much longer timescale 90 (e.g. Andrady, 2011;Song et al., 2017;Gerritse et al., 2020;Kaandorp et al., 2020;Lobelle et al., 2021). Further, we have chosen not to include any artificial Lagrangian diffusion as the mesoscale variability is largely resolved by the flow fields. We also did not take into account effects of wind drag on floating plastic or the role of Stokes drift. Although these effects likely play a role for macroplastic transport in the marine reserve, it is unclear how they should be parameterised as there are only few observations from floats near the Galapagos Islands available for validation (van Sebille et al., 2019). We have therefore 95 chosen to solely use the Eulerian surface flow for constructing the network, but the method can be easily extended.

Beaching parameterisation
To construct the transition matrix we compute the likelihood that a particle entering the ocean from one location will beach at another location (or at the same location). We are only interested in the particle arrival as this would provide information on whether a cleanup is needed. We therefore consider the beaching processes and not the resuspension process for the construction of the transition matrix in section 2.3. As the horizontal model resolution is only 4 km, the beaching process is parameterised using the same method as described by Onink et al. (2021): if a particle reaches a coastal grid point, a beaching probability p B to determine whether the particle is beached is calculated as:  Onink et al., 2021;Kaandorp et al., 2022). It seems that mainly quantitative analyses such as mass-budgets are sensitive to the choice of the beaching timescale, but that beaching patterns are qualitatively robust against the choice for λ B . In this paper we use a range of plausible beaching timescales, λ B ∈ [1, 2, 5, 10, 26, 35] days, to test the sensitivity of our transition matrix on the choice of λ B . We have decided not to test for λ B > 35 days as our advection timescale is only 60 days.

110
Combining the Lagrangian simulation (section 2.1) and the beaching parameterisation (section 2.2) we construct a transition matrix that provides the likelihood that a particle travels from a source coastal grid point n so (hereinafter a source node) and beaches at a sink coastal grid point n si (hereinafter a sink node, Figure 2a). To ease interpretation, all nodes are grouped per island (from east to west) and ranked in a clockwise direction starting at the northernmost node of each island.
Between Isla Isabela and Isla Fernandina a narrow strait exists (the Bolivar Channel, see Figure 1), which is only one ocean 115 grid point wide in the hydrodynamic model. As the beaching parameterisation can only provide information on beaching within one grid point from land, the channel is chosen as a separate source/sink location.
The transition matrix is then translated to a directed network, where each node represents a coastal grid point and the edge weight p so,si the probability that a particle travels from n so to n si . Note that using this method, only connections exist between two nodes if the travel time is below the maximum particle advection time of 60 days. As mentioned before, increasing the 120 advection time does not change the conclusions drawn in this paper. As such, we are not concerned about the time scale, but purely on whether a connection between two nodes exists.
The directed network is constructed using the range of beaching timescales introduced in section 2.2. A larger beaching timescale reduces the likelihood of pathways between the nodes (Figure 2b), but the connectivity pattern itself seems insensitive to changes in λ B . Therefore, for the remainder of this paper we present results using λ B = 5 days, where 42% of the particles 125 beach. Figure 2. The transition matrix (a) giving the probability that a particle starting at a source coastal node (y-axis) arrives at a sink coastal node (x-axis) for λB= 5 days, and the probability (b) that a particle starting at a source coastal node is not beaching within 60 days and therefore 'lost' to the ocean, including the sensitivity of this loss to changes in the beaching timescale (colored lines) compared to the reference simulation (black line). The different islands are delimited by horizontal and vertical grey lines.

Network centralities to identify effective cleanup locations
To identify which locations should be cleaned first to reach maximum impact, a ranking of all coastline nodes is established based on various network centralities, where nodes with a higher centrality value are cleaned first. Whenever relevant, the NetworkX package version 2.5.1 is used to calculate the centrality from the network constructed in section 2.3 (Hagberg et al.,130 2008). We chose ten centralities for which the description and motivation is provided below.
-The Retention Rate (retention) indicates the percentage of particles for which the sink node is the same as their source node. Nodes with a Retention Rate equal to one have no outgoing edges, but may have incoming edges, and represent regions where particles are very unlikely to leave. -The Loss Rate (loss) is the percentage of particles that are lost to the ocean, i.e. the edge weight of the edge that connects 135 each node to the ocean (Figure 2b). Nodes with a Loss Rate equal to one have no outgoing edges to coastline nodes, but may have incoming edges. These nodes may represent regions that should be cleaned rather fast to avoid loss to the open ocean where removal is more difficult.
-The remote Beaching Rate (beaching) is the percentage of particles that have a coastline sink node that is different from their source node. Prioritising nodes with a high remote beaching rate for cleanup may have a non-local impact by 140 preventing the spread of particles to other locations.
-The Source-Sink Index (SSI sink and SSI source ) is given by where P in and P out give the total edge weight of all incoming and outgoing edges respectively. Positive SSI values (SSI sink ) indicate that a node is a net sink, whereas negative SSI values (SSI source ) indicate that a node is a net source (Pata and Yñiguez, 2021). Comparing the impact of these two centralities will give insight in whether removal at a net source or at a net sink is more important.

145
-We use the Sink Diversity (SiD) and Source Diversity (SoD) as described by Pata and Yñiguez (2021) to identify nodes that have outgoing edges to many different sinks with a high edge weight (high SiD) or to identify nodes that have strong incoming edges from many different sources (high SoD). The index is computed from a modified Shannon's diversity index (Holstein et al., 2014) using where p i and p j are the weights of each outgoing and incoming edge normalised by the sum of all outgoing and incoming edge weights at each source and sink node.
-The Pagerank Centrality (PR) is calculated to quantify the importance of each node proportional to the importance of its neighbours. To calculate this importance, often the eigenvector centrality is used (e.g. Watson et al., 2011;McAdam and van Sebille, 2018), but pagerank centrality is more appropriate to use in case of directed networks (Newman, 2018) 150 as it can give more importance to incoming edges than to outgoing edges. As it is a priori not clear whether an incoming edge or an outgoing edge is more important for finding effective cleanup locations, the pagerank centrality is calculated from both the original directed network (PR in ) as its transpose (PR out ).
-Lastly, we use the Betweenness Centrality (betweenness) that, instead of considering the in-and out-degree of the various nodes, gives more importance to nodes that lie on shortest paths between other nodes. In other words, the betweenness centrality could help find regions where it is likely that a large amounts of particles pass by. This centrality has been used before to find effective cleanup locations along the shorelines of South Korea by Jeon et al. (2015).
However in contrast to their method, we define the shortest path based on the most likely path instead of the shortest distance or travel time as we want to optimise for maximal removal of particles, not for e.g. minimal residence time of particles in the ocean. By simply rewriting the edge weight probability p so,si as

Impact metrics
As a management objective we define the most effective cleanup locations as regions where cleanups minimise the macroplastic distribution on land and maximise the total amount of macroplastic removed. Furthermore, as there are only limited resources available for cleanup, we need to take into account how much of the coastline can be cleaned, and how often, to reach a specific macroplastic distribution.

160
The particle distribution on coastlines v t * at a time t * can be calculated by multiplying the previous distribution with the transition matrix as v t * = v t * −1 A so,si .
Note that the time t * does not represent an actual time as the travel time for specific pathways is variable and we did not include any residence time scales for particles on shorelines. It does however provide an indication for how often specific locations should be cleaned until there are no more particles arriving from other locations. We initialise the system with a homogeneous particle distribution before cleanup starts. Then, using the node rankings from the various centralities, the outgoing edges of the cleanup target nodes are removed. This method bares similarities with the 165 percolation process in network theory used to study clusters (Newman, 2018). However, in this paper we leave the incoming edges intact. After every iteration, particles will accumulate at the target cleanup node until a steady state is reached, which provides a means to quantify the cleanup impact.
To ensure that we perform sufficient iterations to reach steady state, we calculated the change in particle number in the ocean between two iterations. We then define to have reached steady state when this change is smaller than 0.1% of the total number 170 of particles.
Three impact metrics are used to evaluate the efficacy of the different centrality node rankings. The benefit metric indicates the difference (in %) between the total number of particles removed and the number of particles removed if there had been zero connectivity between the different nodes. The Left Behind on Land metric gives the percentage of particles that are still on land when a steady state is reached. Then finally, the iteration metric provides an estimate of how often the calculation for 175 the next particle distribution is performed to reach a steady state.
For a given percentage of coastline that is cleaned, the most effective centrality node ranking would maximise the benefit metric (total number of particles removed) and minimise the Left Behind on Land and iteration metrics (minimal particle distribution on land and only little repeated cleaning needed).
3 Results one, which means that there always exists a pathway connecting any two nodes in the network. Furthermore, most connections between nodes are weak (i.e. pathways with low probability) and only a few are strong (i.e. pathways with a high probability).
Therefore, the distribution of all pathway probabilities (i.e. the edge weights) is relatively linear in a log-log plot (not shown) indicating that the macroplastic connectivity can be classified as a small-world network (e.g. Watts and Strogatz, 1998;Amaral et al., 2000;Kininmonth et al., 2010;Watson et al., 2011). 190 The ocean is the largest sink for particles leaving the islands (Figure 3a) when using a beaching timescale of λ B = 5 days (note that the total percentage lost to the ocean decreases for smaller beaching timescales, Figure 2b). The amount lost to the ocean, and therefore also the connectivity to other island coastlines, is however strongly spatially variable. Small islands show a stronger connectivity to the ocean than larger islands (Figure 3a). The larger islands show interesting spatial connectivity variability. At the islands of San Cristóbal, Santa Cruz and Fernandina, the largest sink to the ocean is found at southern . Indeed, the island-to-island connectivity is mainly in a westward direction ( Figure 2a) and in general, more particles arrive at islands downstream than upstream (Figure 3b). Also, the source distribution ( Figure 3c) reveals that particles arriving at Isabela Island are more likely to come from other islands (>20%) than the particles arriving at San Cristóbal Island (2.0%).
For some of the islands (San Cristóbal, Santiago and Fernandina), more particles beach at the western shorelines than at the 205 eastern shorelines (Figure 3b). This might seem contradictory to previous work that suggests that most macroplastic arrives at the eastern shorelines due to the prevailing ocean current direction (e.g. van Sebille et al., 2019;Jones et al., 2021). However, here, the initial distribution is homogeneous and therefore more representative of potential local sources of macroplastic instead of remote (i.e. sources from outside the marine reserve). Apparently, without an influx of remotely sourced macroplastic, the local dynamics leads to accumulation at the western coastlines at these islands.
San Cristóbal Island and Santa Cruz Island display a very similar sink distribution pattern (Figure 3a). The particles seem to travel from one node to another in a clockwise direction at the eastern flank and a counter-clockwise direction at the western flank of the islands. They follow the shoreline, until they reach a 'take-off' node, i.e. the location where the particles are most likely to leave the island. The negative correlation between the likelihood to beach at another island and the likelihood to stay close to the source is also visible in Figure 3a. This indicates that the connectivity between the nodes that belong to 215 the same island is fundamentally different from the island-to-island connectivity. This is most likely a result of predominantly low velocities close to the shorelines in combination with the beaching parameterisation that is applied only close to shore.
The 'take-off' locations might be interesting targets for cleanup strategies as they seem to be accumulation points for particles originating from the same island, and at the same time display a strong out-flux of particles to the ocean and to other islands.

Impact of cleanup based on the coastline node rankings 220
The visual representation of the source and sink distribution of macroplastic connectivity already provides some insight into potential cleanup locations. However, a more quantitative analysis can be performed by iteratively removing the outgoing edges from specific nodes (i.e. the cleanup target locations) and assessing the impact as described in section 2.5. Which outgoing edge to remove first is determined by the ranking of the coastline nodes given by various centralities (see section 2.4).
The impact metrics reveal that the cleanup efficacy is strongly dependent on the order in which the coastline nodes are 225 cleaned. As expected, but important to note, is that due to the connections between the different potential cleanup locations, there is always a positive benefit from cleaning only a fraction of the coastline (Figure 4a). For example, cleaning 30% of the coastline based on the SSI sink ranking can lead to a reduction of >50% of the total number of particles present in the system (red line in Figure 4a). This means that the cleanup effort can indeed be optimised using limited resources.
Which centrality to use for the most efficient cleanup strategy however depends on how much of the coastline can be cleaned.

230
Overall, the SSI sink seems to provide the highest benefit (red line in Figure 4a). However, for low cleanup effort (<10% of the coastline), the betweenness centrality (grey line), the retention rate (blue line) and the pagerank centrality favouring incoming edges (PR in , yellow line) lead to similar benefit rates. For high cleanup effort (>90% of the coastline) the loss rate seems to outperform the SSI sink regarding the benefit rate. Interestingly, the particles that would not be removed are still on land when using a cleanup strategy based on the loss rate, instead of being lost to the ocean if the SSI sink ranking is used for the 235 cleanup strategy (Figure 4b). This demonstrates that depending on the management objective and the cleanup capacity different centralities might be of importance.
Due to the remoteness of the Galapagos Islands, we would be mainly interested in a cleanup with limited effort and we will therefore focus on efforts that would be able to clean up to 10% of the coastline. Furthermore, as described in section 2.5, we do not only want to optimise our cleaning efforts to reduce the overall amount of macroplastic, but also minimise the amount  how often the matrix multiplication needs to be performed to reach steady state. The impact metrics are calculated for when 5% (diamond marker) and 10% (star marker) of the coastline would be cleaned.
Using these objectives, the retention rate seems to provide the most effective cleanup strategy for small (5%) cleanup effort (blue diamond marker in Figure 5). Targeting regions with a high retention rate would lead to cleanest coastlines, with the least amount of repeated cleaning and with the highest benefit. This would indicate that the dynamics that keeps macroplastic at the same location is more important for effective cleanup efforts than the arrival of macroplastic from other locations.

245
For slightly higher cleanup effort (10%) we again find that the node rankings based on the betweenness (grey marker), the SSI sink (red marker) and the PR in (yellow marker) centralities become important. Apart from the betweenness centrality, these node rankings, together with the retention rate, have in common that they give more importance to incoming edges than to outgoing edges. This is an interesting finding as this implies it might be more effective to clean at a sink than at a source for the overall macroplastic removal.

Sensitivity to initial macroplastic distribution
So far, we have assumed a uniform initial distribution of particles to evaluate the impact of the various nodes rankings. As mentioned in the introduction (section 1), we are interested in finding a cleanup strategy that can be used when the distribution of macroplastic is unknown. Therefore, we also test the impact of the node rankings against a random distribution of particles across the coastlines nodes. With a random distribution we mean that every coastal node is initialised with a random particle 255 weight between 0 and 1.
Logically, the potential total particle 'mass' removed is sensitive to how much of the coastline is initially polluted. However, we note that the relative efficacy of the node rankings based on the various centralities stays the same. For example, if 10% of the coastline could be cleaned, targeting those coastline nodes with a high SSI sink centrality or betweenness centrality would most likely lead to a higher number of particles removed than targeting nodes based on other centralities, regardless of the 260 initial distribution (not shown). Theoretically, this would even mean that by measuring the amount of waste cleaned at these locations, one could infer how much pollution is present in total. Note however, that using this method we do not consider any new sources of macroplastic during the cleanup event, neither from land nor from remote sources. The resuspension time scale for macroplastic to return to the ocean is also not included in this study. Therefore, for long resuspension time scales the total number of particles may change before a steady state can be reached.

265
By using a random initial particle distribution, one can also compare the cleanup impact to cleanup efforts that would target the most polluted coastlines (i.e. when the initial distribution was known). Figure 6 provides an example of this comparison when the cleanup effort is 10% and the SSI sink centrality is used to identify cleanup locations. The impact is calculated by determining the total removed particle 'mass' when steady state is reached divided by the total particle 'mass' at initialisation.
The distribution is initialised randomly, where a fraction of the coastline is clean. Then the total particle mass removed is 270 calculated, both when targeting nodes with the highest particle weight and when targeting nodes with a high SSI sink centrality.
The advantage of knowing the initial distribution is then given by the difference of the two removal estimates ( Figure 6). If the advantage is equal to zero, targeting coastline nodes with highest particle weight has a similar impact as targeting nodes with a high SSI sink centrality.
As expected, targeting nodes with the highest particle weight will almost always have an advantage to using the node 275 ranking based on the centralities. However, there is one important take away from this comparison. The lower the fraction of clean coastline nodes, the less important it is to know the exact particle distribution. Even when targeting locations with a high particle weight, the total removed particle mass is limited by the cleanup effort and the difference with targeting locations with a high centrality ranking is minimal.

Optimal cleanup locations 280
Finally, the 10% highest ranked coastline nodes given by the various centralities are shown in Figure 7. The coastline locations indicated by the first three centralities (the Retention Rate, the Loss Rate and the Beaching Rate, Figure 7a-c) could have already been inferred from the sink distribution shown in Figure 3a. Indeed, the smallest islands lose most particles to the ocean ( Figure 7b) and Isabela Island retains most particles ( Figure 7a) and has the highest likelihood of a particle sink on land ( Figure 7c). The impact of the westward South Equatorial Current is most notable for coastline nodes with a high Source and
Interestingly, the mean westward flow does not lead to net sinks in downstream regions or net sources in upstream regions (Figures 7d and 7e). However, these net sinks and sources do seem to be related to the orientation of the islands with respect to the mean flow. The SSI sink centrality reveals that the southern region of San Cristóbal Island and Santa Cruz Island and the northern and southern region of Isabela Island are strong net sinks (Figure 7d). These locations correspond with the 'take-off' 290 locations discussed in section 3.1. As the SSI sink centrality ranking has a high impact (section 3.2), these 'take-off' locations are indeed effective cleanup targets as hypothesised.
From the impact assessment of section 3.2 we can conclude that locations with a high Retention Rate (Figure 7a

Discussion and Conclusions
In this paper we have identified which centralities are useful to optimise beach cleanup efficacy for macroplastic removal 300 based on macroplastic flow connectivity between the coastlines of the Galapagos Islands. We have identified four centralities, the SSI sink centrality, the Retention Rate, the PR in centrality and the betweenness centrality (see for definition section 2.4) that perform well. We have shown that using the most effective centrality for finding cleanup locations is a good strategy if the distribution of macroplastic on coastlines is unknown, in particular if only limited cleanup resources are available and the coastlines are strongly polluted. This means that the cleanup management can focus on repeated cleaning at a few key locations 305 instead of needing to travel to different locations each time.
Due to the increasing tourism on the Galapagos Islands (Escobar-Camacho et al., 2021) and the waste management challenges the islands face (e.g. Mestanza-Ramón et al., 2020;Wang et al., 2021), it is important to not only unravel the waste streams on the islands and the mainland, but also understand the waste streams in the ocean if the waste is not properly managed. The effective cleanup locations presented in Figures 7a, 7d, 7h and 7i therefore do not only indicate regions where a 310 cleanup can be effective, they also represent locations where good local waste management is very important.
There are however several limitations to the methodology presented in this paper. As we did not include any resuspension time scales, it is not possible to give recommendations on how often or how long specific locations should be cleaned for. Furthermore, we did not include the effect of wind and waves on the movement of macroplastic through the marine reserve and neglected the impact of tides. These limitations could to some extend be removed if observational data sets are improved for both flow patterns of macroplastic through the reserve and estimates of macroplastic retention rate on coastlines of the Galapagos Islands. These would enable us to validate and adjust the connectivity matrix constructed in this paper. Further, 320