Clustering analysis of the Sargassum transport process: application to stranding prediction in the Lesser Antilles
- 1LARGE, University of the French West Indies, 97157 Pointe-à-Pitre, Guadeloupe, France
- 2LAMIA, University of the French West Indies, 97157 Pointe-à-Pitre, Guadeloupe, France
- 1LARGE, University of the French West Indies, 97157 Pointe-à-Pitre, Guadeloupe, France
- 2LAMIA, University of the French West Indies, 97157 Pointe-à-Pitre, Guadeloupe, France
Abstract. The massive Sargassum algae strandings observed over the past decade are the new natural hazard that currently impacts the island states of the Caribbean region (human health, environmental damages, and economic losses). This study aims to improve the prediction of the surface current dynamic leading to beachings in the Lesser Antilles, using clustering analysis methods. The input surface currents including windage effect were derived from the Mercator model and the Hybrid Coordinate Ocean Model (HYCOM). Past daily observations of Sargassum stranding on Guadeloupe coasts were also integrated. Four representative current regimes were identified for both Mercator and HYCOM data. The analysis of the backward current sequences leading to strandings showed that the recurrence of two current regimes is related to the beaching peaks observed respectively in March and in August. A decision tree classifier was built and its accuracy reaches 73.3 % with 0.04°-scale HYCOM data and 50.8 % with 0.08°-scale Mercator data. This significant accuracy difference highlights the need of very small-scale current data (i.e., lower than 5 km scale) to assess coastal Sargassum hazard in the Lesser Antilles. The present clustering analysis predictive system would help improve this risk management in the islands of this region.
Didier Clément Bernard et al.
Status: final response (author comments only)
-
RC1: 'Comment on os-2021-109', Anonymous Referee #1, 15 Dec 2021
Review of manuscript by Bernard D et al
This study deals with an analysis of the surface circulation in the vicinity of the Lesser Antilles in order to determine the situations favoring Sargassum stranding. It uses clustering of the surface current fields from two reanalysis datasets to identify recurring patterns. Based on these clusters and stranding observations, a decision tree classifier is built to forecast stranding probability. This is an original work addressing the drivers of stranding based on state of the art datasets, and dedicated methods. While the performance of the classifier is rather low, the topic and methodological approach is appealing and worth a publication.
The main remarks I have are
1) The discussion of the appropriate time and space scale must be introduced in the methods to justify the choices made (30 days sequences, areas, monthly probability).
2) Strandings occuring in Guadeloupe are not affected by the dynamics of zones LA3 and LA1, but only by LA2. You should take it into account in the study.
Although interesting, in its present form the overall approach needs to be better explained and the text is quite difficult to read. The language can be significantly improved and small typos removed.
Detailed comments
The abstract
L15 “small scale”. Better use high resolution, as this is the crucial model configuration choice.
L20. Windward vs leeward. The only study citing this point is Marechal et al 2017.
L23 typo: also observed
L41 wording: The volumes to be collected
L55 There is also Jouanno et al 2020 (Env Res Letters) on the role of rivers.
L67 Unclear : The probability of a set of data…
L94 u and v : better zonal and meridional
L91-L100 : You need to better describe the configuration of the reanalysis datasets you are using. In particular you should give the forcing fields (winds etc) and the data assimilated in each model. This is important as Mercator and Hycom models assimilate the same type of data (altimetry in particular), which largely explains their consistency in terms of large scale patterns (ie clusters).
L105. See also Berline et al 2020
L106 Berline et al 2020, not 2017
L112. Check the year for Putman et al
L121 Past tense is expected
L125: 2.5.1: useless subsection
L132-133 Wording. Better : can lead to group different physical situations…
L134. Unclear : “ From L2”
L139. Unclear terminology : Meshes. Better grid points. Need to be homogeneous throughout the text (grid cell at L205, grid points at L209)
The lines 141-150 on clustering should be grouped in a dedicated section.
The clarity and wording of this whole section must be improved. For instance “ The similarity of the most similar fields (...) “
Explanation should be given for one zone to avoid redundancy. A schematic would help.
I do not clearly understand the algorithm for clustering. What is the role of the average divergence ?
L152-162 (Section 2.6) : wording and clarity should be improved. The word ‘backward’ is misleading here as there is no time integration in your analysis. You simply take the 30 days before one peculiar stranding event.
What justifies the 30 days duration? Transport? Then is it consistent with the areas LA1, 2, 3?
L158-162. Unclear. “optimal matching methods” : which one ?
You compute a distance metric between the sequences of cluster numbers from previous section?
L161 Wald’s or Ward ?
L164 “At a given location” : which one ?
L168 : Why monthly ? Are the stranding observations autocorrelated at this scale ?
L170 L172 L176. Wording : “which “ can be removed
L184 I understand you compute the average of P over an ensemble j pertaining to R. Use this notation then.
L191 Is it including windage ?
L191 “intensities “: better magnitude
L205 Mercator and Hycom outputs are not given on the same grid. Then how do you compare them ? Should be explained in methods.
L205 “At sea” : you mean offshore
L226 parangon
L241-242 should be in methods
L245 “most important” : better highest
L273 Where is the central Atlantic region used to quantify offshore abundance ? Add it on the map.
L276-281 : Avoid redundancy.
L317 (Discussion) I suggest splitting this section. One for surface current and one for Sargassum stranding (L343-352).
L334 North Current: You mean NEC?
L344 “Out of sync”?
L345 remove “and”
L357 “independent variables”: you mean explanatory
L372 “ocean current 3D models”: better ocean current reanalysis.
L404. This discussion of the appropriate time and space scale must be introduced earlier to justify the choices you made (30 days sequences, areas, monthly probability)
Figures
Fig3 current magnitude
Fig5 Why not showing relative difference of magnitude, to see if Hycom is higher than Mercator for instance. What is the grid shown?
Fig7 and 8. Mention Parangon as in the text. How is computed the stream function?
For all figures showing clusters, in the tables and text: for clarity, you should rename the clusters from Mercator and Hycom to make similar patterns match. As HC1 is consistent with MC2, rename MC2 into MC1, etc.
This similarity of patterns is expected given the similarity of data assimilated into the two models.
Fig13 These are clusters of sequences. Mention it. Use same color as in figs 11-12
Fig 15. Time index: What is the corresponding date?
Table 3. Use cluster names as in figures (MC1, HC1, ..)
Table 6. Add recall.
-
AC1: 'Reply on RC1', Didier Bernard, 11 Mar 2022
Dear referee 1,
We thank you for the attention that you paid to this review and for your helpful comments and suggestions.
Firstly, in the introduction of your report, you mentioned the “rather low performance of the classifier”. Following this remark, we made some major changes to strengthen the evaluation of the decision tree classifier and to improve its recall scores. To strengthen the performance evaluation, the testing period was extended from the first four months of 2021 (i.e., from January 2021 to April 2021) to the full year of 2021 including seasonal variations of the offshore Sargassum abundance. To improve the recall score of the classifier, the module A producing the monthly probability of beaching was replaced by a new module based on satellite observations which produces the weekly probability to reach the maximum observed cumulative floating algae density in an area of 100 km radius offshore Guadeloupe. The performance evaluation of the classifier was also extended by adding three temporal uncertainty ranges around the decision day, respectively: +/-1 days, +/-2 days, +/-3 days. While the classifier may reproduce 61.5% of the observed beachings in 2021 with an accuracy lower than one day (this value reached 41.7% with the old module A and the limited testing period of four months), this recall score reaches 74.4% at +/-3 days accuracy.
Please find in the attached file our answers to your remarks (in bold). The proposed changes in the text are marked in red.
-
AC1: 'Reply on RC1', Didier Bernard, 11 Mar 2022
-
CC1: 'Comment on os-2021-109', Nathan Putman, 20 Dec 2021
Throughout the text: “Sargassum” is the genus name of the pelagic, brown algae discussed. Accordingly, it should be italicized wherever used.
Lines 79-80: Citing Putman et al. 2018 (already cited elsewhere) would be appropriate here as they model the % of Sargassum that follows these routes.
Lines 91-96: I am confused what HYCOM output you are using. What is reported here (GOMu0.04/expt_90.1m000 version) appears to only extend from latitude 18N to 32N and is thus outside of the area of this study. Can you please clarify? Did you run your own HYCOM at 1/25 degree resolution? The Global Analysis of HYCOM uses a grid of 0.04 degree longitude and 0.08 degree latitude, is this what you actually used?
Line 101-103: See also, Putman NF & He R (2013) Tracking the long-distance dispersal of marine organisms: sensitivity to ocean model resolution. Journal of the Royal Society Interface, 10:20120979
Line 104: I am confused, what is the basis for assuming the “optimal factors of Cw = 0.01”? Surely this is not the case based on data from Johns et al. 2020, which showed no evidence that a windage factor of 1% was appropriate for Sargassum. They simply picked the “reasonable” value that has been used in the earlier publication Putman et al. 2018. The value of 1% was chosen by Putman et al. 2018 to test the sensitivity of model predictions to windage and did not claim that it was optimal (or even somewhat correct). Work since that point has been conducted which seems to suggest that the situation is somewhat more complicated, see Putman et al. 2020 (already cited elsewhere) and Johnson, D.R., Franks, J.S., Oxenford, H.A. and Cox, S.A.L., 2020. Pelagic Sargassum Prediction and Marine Connectivity in the Tropical Atlantic. Gulf and Caribbean Research, 31(1), pp.GCFI20-GCFI30. Whether the best windage value is 0, 0.5%, 1%, 3% or something else likely depends on the oceanographic region and the ocean circulation model and wind product used.
Line 112: I think that “Putman et al. (2016)” should be “Putman et al. (2018)”
Line 206: change to “current speed differences are relatively small…”
Line 334: change to “…due to the North Equatorial Current…”
Lines 360-361: Another issue may be that ocean current patterns may be highly important for “non-beaching” events (e.g., the currents are directed so that material doesn’t reach the island), but for Sargassum to beach there needs to be Sargassum present. Thus, currents might be in a state to transport material to the island, but if there is no Sargassum present, there can be no beaching. Am I correct that this predictive model is based only on circulation/wind and not Sargassum abundance/coverage/distribution?
Lines 400-406: You may wish to draw reader’s attention to the fact that there is considerable interest in monitoring and predicting coastal inundation by Sargassum. For instance, you may note how your smaller-scale study’s goals might enhance the region-wide efforts such as the Sargassum Inundation Reports (SIR) discussed here:
Trinanes J, Putman NF, Goni G, Hu C, Wang M (2021) Monitoring pelagic Sargassum inundation potential for coastal communities. Journal of Operational Oceanography 14, in press (published online).
-
AC2: 'Reply on CC1', Didier Bernard, 14 Mar 2022
Dear Dr Nathan Putman, we thank you for your helpful comments and suggestions.
Your two last remarks dealing with the integration of Sargassum abundance in the decision support system and your suggested reference “Trinanes et al. (2021)” were very useful to strengthen and improve the predictive model. The predictive model originally based on circulation/wind/past-beachings was modified with a new module based on satellite observations which produces the weekly probability to reach the maximum observed cumulative floating algae density in an area of 100 km radius offshore Guadeloupe.Please find in the attached file our answers to your comments.
-
AC2: 'Reply on CC1', Didier Bernard, 14 Mar 2022
-
RC2: 'Comment on os-2021-109', Anonymous Referee #2, 16 Feb 2022
Review of manuscript: Clustering analysis of the Sargassum transport process: application to stranding prediction in the Lesser Antilles by Bernard et al
1. General comments
The authors present a very interesting framework and method to better understand the ocean dynamics behind the strandings of Sargassum in the Lesser Antilles and to estimate their occurrence. The methodology presented is quite complex as well. A better explanation of the methodology is necessary, especially for the oceanographic audience of this journal to adequately follow and understand this interesting study. Section 2 I believe can be improved by making it easier for the reader to follow, especially the non-experts in these clustering methods. The technical details necessary for the reader to follow the study should be clearly described and the other details can be added as a section in supplementary material. A schematic of the method is given in fig. 2 for Section 2.7, but maybe a schematic for sections 2.5 and 2.6 could help too. In the discussion, I found that some comment on the impact (if any) of considering processes other than windage (e.g. presence of nutrients, sinking of Sargassum, waves?) could have on an even better understanding of the Sargassum strandings, was missing.
2. Specific comments
L23: “Strandings were also be observed in Africa (Széchy et al., 2012).” Why mention the occurrence of strandings in Africa? Any connection with the Caribbean strandings? Did the Sargassum strandings also cause natural hazards on the African coast?
L61: “MODIS AFAI satellite images”, please define/describe
L66-68: Could be useful to include some references of the methodology here.
L69: A general definition of predictive modelling is missing in the introduction for the readers which do not know about this method and how it compares with a conventional forecast. For example could be included here (Line 69).
L75-76: “To optimize the final partitioning, an additional metric based on the Kullback Leiber divergence (Kulback and Leibler, 1951, Biabiany et al., 2020) will be included” : quite specific on the methodology, for readers not familiarised with this method it could be hard to follow in this point in the introduction. More general details can be given, or this point can be moved to the methods section.
L82-83: “This ocean region corresponds to the CA and TA1 boxes in Johns (2020)”, maybe say approximately corresponds, as not exactly the same. The LA3 region goes further south and LA2 and LA3 go until -55ËE, whales region TA1 till -50ËE. Most importantly, why choose the study regions to correspond to CA and TA1 boxes from Johns (2020)?
L96-96: From what I understand this dataset was not used before to simulate Sargassum trajectories, but was it used in any other Lagrangian study? Any validation studies done on the velocity outputs of this dataset?
L101: “Comparison between HYCOM and Mercator results” Do you mean the results from the Sargassum trajectories or a comparison of the velocity outputs of these datasets?
Section 2.3: Whats is the spatial and temporal resolution of the ERA-5 wind dataset?
L128: “Ward's method for HAC” Please explain and add reference.
L129-130: “with its own expertise on the input data” What do you mean by these? Please provide further explanations. Also, the new method name is not specified at all in section 2.5.1, and it will help for the reader to better follow the methodology. This section is only 5 lines long, more details on the process of the clustering methods could be given.
L132: “L2 clustering methods…” Please explain L2 in this context.
L133: “gatherings of different physical situations”. What do you mean by this? Maybe give an example of physical situations for this particular study scenario. You refer to this in the next phrase as “biases”. Is there then a tendency towards a specific physical situation?
L134: “spatial variability” : At what scales?
L139-L140: “The analyzed daily fields include a total of 14 279 meshes (4 282 meshes in LA1, 3 407 meshes in LA2 and 4 536 meshes in LA3). The remainder corresponds to land areas.” What do you refer to here with meshes? The land areas then correspond to Sargassum strandings? For clarity, these details could be described in a dataset section better, rather than in the middle of the methods description.
L141-142: “The second step was to group the information carried by the daily current velocity fields conditionally to the three given zones into histograms.” More details on histograms, for example binning, velocity data from HYCOM and Mercator?
L158: “optimal matching methods” Please explain and add some references.
L158: “dividing the population” what do you refer to exactly here by population? Population of strandings or backward sequences?
L160-162: Please give further details (maybe as supplementary material?) and add more references.
L186-L187: “was experimented on the first 120 days…”. Was experimented to…? Recall aim of doing these tests. Also why 120 days and during this period of time? Could results vary a lot if done during the northern hemisphere Summer months instead?
L190: Can maybe start section 3.1 giving some context on why this analysis is done.
L191: “90% of them remain below 0.65 m/s”. For both models exactly same?
L193: Figure 3 distributions how are they calculated? With histograms? Kernel Density Estimator or something else applied to obtain this “smooth” distribution curves?
L194-L195: 5 times greater for both models?
L207-208: what are the implications of these differences?
L272-L273: “The monthly evolution of observed stranding days on the Guadeloupe coasts, the monthly evolution of Sargassum abundance over the Central Atlantic region (SaWS, https://optics.marine.usf.edu/projects/SaWS.html)” I imagine it should be: “Guadeloupe coasts and the monthly evolution…”, to make clear you talking about two datasets. The observed stranding dataset is mentioned in the dataset section (section 2.4), but not the Sargassum abundance over the Central Atlantic region.
3. Technical corrections
Please write Sargassum in italics, like it is done in other studies like for example Johns et al., (2020), as you are writing its scientific name, and even if it is just the genus in this case.
L10: “including windage effect”: gives the impression the HYCOM and Mercator datasets already include the windage effect, when you actually added separately. Please improve phrasing.
L20: “LA received…” to “The LA received…”
L23: “…were also be observed…” to “…were also observed…”
L46: Improve sentence, e.g. “… multi-year reanalysis of wind and current, and numerical models, both the role of subsurface nutrient supply and surface current transport were estimated.”
L50: “Sargassum Watch System SaWS” to “Sargassum Watch System (SaWS)”
L83: “in Johns (2020)” et al. missing.
L92: Please define the abbreviations HYCOM and NCODA (HYCOM defined in abstract but not in the main text)
L94: Please define 12Z fields.
L94-95: “u and v components” to “zonal (u) and meridional (v) velocity components”
L101-102: “Comparison…in the focused region” to “A comparison.. in the study region.”
L107: “Sargassum raft transport”, maybe trajectories instead of transport is more appropriate?
L112-113: “The region analyzed in the present work corresponds to the CA - TA1 region defined in Johns et al. (2020)” already mentioned in L82-83, is it necessary to repeat here?
L116-117: “This period includes 730 observational days with 110 days of observed strandings.” , phrasing not clear do you mean that out of the total 730 days of data, only 110 days included observations of Sargassum strandings?
L137: “above Barbados island” to “above the island of Barbados”
L142: “The similarity of the most similar fields is estimated per pair..” Improve phrasing. What do you refer to exactly? Per pair of Sargassum meshes?
L148: “The SaMk index” to “The Silhouette (SaMk) index”
L151: Define all variables of equation 2!
L153-154: Improve phrasing.
L156: “January 2020” to “January 2019”
L165-L166: “ surface currents with windage effects (Mercator, HYCOM and ERA-5)” to “ surface currents (Mercator and HYCOM) with windage effects (ERA-5)”
L186-L187: “The proposed tree in Fig. 2…”. Move to new line, to separate it from the phrase explaining the terms in equation (4)
L191: “do not exceed 2.57 m/s”. Maybe better to say the maximum is 2.57 m/s, if not it sounds like 2.57 m/s is a key velocity value that should not be exceeded for some reason.
L193-L194: add at end to which model it each value corresponds to e.g. “.. for HYCOM and Mercator, respectively.”
L205: “Globally, at sea, the current..” Is it necessary to specify at sea? What do you exactly mean with at sea here, open ocean?
L210: “into three magnitude groups of 45Ë” to “into three magnitude groups of 45Ë intervals”?
L215: Improve phrasing, gives the impression you used equation (1) to perform the clustering.
L244: “Table 3 shows results” to “Table 3 shows the results”
L297: “remain with probabilities” add probabilities of… Help the reader follow better your study, recalling details.
L317: Improve wording of Section 4.2 title, for example can simply remove “hazard”
L320: “retroflexion” to “retroflection”
L345 “The first peak of strandings, in March and seems..” to “The first peak of strandings, in March, seems..”
L373: Write as K-Means, and also in L217, write method in the same way.
4. Figures and tables
Figure 2: Describe BASE abbreviation as in L175.
Figures 4, 9 and 10: x-axis tick labels not clear, please improve.
Table 1: Header mean to Mean
Table 5: Caption mention what n and % refer to exactly.
-
AC3: 'Reply on RC2', Didier Bernard, 16 Mar 2022
Dear referee 2,
We thank you sincerely for your comments which helped us to improve the quality of the paper.
Firstly, we would like to draw your attention on some major changes we proposed to strengthen the evaluation of the decision tree classifier and to improve its recall scores. To strengthen the performance evaluation, the testing period was extended from the first four months of 2021 (i.e., from January 2021 to April 2021) to the full year of 2021 including seasonal variations of the offshore Sargassum abundance. To improve the recall score of the classifier, the module A producing the monthly probability of beaching was replaced by a new module based on satellite observations which produces the weekly probability to reach the maximum observed cumulative floating algae density in an area of 100 km radius offshore Guadeloupe. The performance evaluation of the classifier was also extended by adding three temporal uncertainty ranges around the decision day, respectively: +/-1 days, +/-2 days, +/-3 days. While the classifier may reproduce 61.5% of the observed beachings in 2021 with an accuracy lower than one day (this value reached 41.7% with the old module A and the limited testing period of four months), this recall score reaches 74.4% at +/-3 days accuracy.
Please find in the attached file our answers to your remarks.
-
AC3: 'Reply on RC2', Didier Bernard, 16 Mar 2022
Didier Clément Bernard et al.
Didier Clément Bernard et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
594 | 133 | 13 | 740 | 7 | 2 |
- HTML: 594
- PDF: 133
- XML: 13
- Total: 740
- BibTeX: 7
- EndNote: 2
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1