the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A method for quantifying correlation in the shape of oceanographic profile data
Mark Taylor
Stephanie Henson
Download
- Final revised paper (published on 28 Apr 2026)
- Preprint (discussion started on 04 Feb 2026)
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2026-135', Winnie Chu, 04 Mar 2026
-
AC1: 'Reply on RC1', Mark Taylor, 16 Apr 2026
Thank you to the reviewer for their helpful comments. Below we have responded to each comment in order with the responses denoted by bold text.
In general, this manuscript is well-written and comprehensive. It is effectively organized, with straightforward notation and two case studies that illustrate a wide range of use cases. The authors present the application of functional data analysis (FDA) to oceanographic profiles to calculate a scalar correlation coefficient for vertically-varying data. FDA significantly addresses a shortcoming of current oceanographic analyses, in which a specific depth level must be selected or some other dimension must be fixed in order to calculate correlation. Figure 1 was especially helpful for understanding how the method works! This technique has the potential to be invaluable in many different applications, and I look forward to using it myself in the future.
My main suggestion is to better emphasize the utility of FDA, both in the introduction and when drawing conclusions from the case studies. Presenting more concrete findings that cannot be easily drawn through the use of other correlation analyses will help to underscore the importance of this method. Below, I point to specific instances in the text where I think some further explanation or analysis may bolster your argument. In addition, I have a handful of other minor comments. At the end of this review, there are a couple of remaining questions I have, which could be taken into consideration when revising but may fall outside the immediate scope of the paper.
Thank you for the opportunity to review this paper-- it was a pleasure to read.
The primary benefits of using FDA have been mentioned in the introduction (some points have been added) (now Lines 32-38): “This enables analysis of the shape of the resulting functions. Recent work has demonstrated the benefits of treating oceanographic profiles as continuous functional data objects, where depth (or pressure) serves as the indexing variable (Yarger et al., 2022; Korte-Staff et al., 2022; Kande et al., 2024). This allows for the essence of the profile shape to be captured within each datum, alongside the numerical values. This perspective is consistent with mathematical models of the ocean’s vertical structure, which capture depth dependent interactions among key variables (Steele, 1964; Fennel & Boss, 2003; Alhassan et al., 2024). Additionally, representing profiles as functions helps mitigate sampling irregularities between profiles.”
The case study interpretations have been highlighted in the introduction (Lines 68-71): “In the first case study, the relationship between potential density and dissolved oxygen is quantified, with results indicating that variability in density may be driven by changes in salinity profiles. In addition, the strength of the seasonal variability in each variable is quantified. The second case study highlights that spatial variability can dominate temporal variability for mobile platforms”.
Specific comments:
Lines 67-69: It could be meaningful to also mention conclusions drawn about the seasonal cycle, since the “seasonal strength was not quantified” in past studies (Line 61).- The following sentence has been added ”In addition, the strength of the seasonal variability in each variable is quantified.” (now Line 70)
Lines 69-70: The interpretation that “relationships between temperature and chlorophyll profiles, as well as their temporal ACFs, vary spatially” seems quite generic. Would it be possible to say something more concrete about the takeaways from the BGC-Argo profiles, like the conclusions from Lines 265-266, or to discuss more about how the correlations between the profiles bring new insight to “relationships between environmental conditions and vertical chlorophyll structure” (Line 64). - The generic statement has been replaced with a more interesting and specific result as suggested in the comment. This sentence now reads “The second case study highlights that spatial variability can dominate temporal variability for mobile platforms.” (now Lines 70-71).
Lines 113-114 / Figure 2: In Example 4, it is stated that a strong negative correlation is because of the near surface deviations in opposite directions between the two sets. Yet, in Example 2, it seems like the near-surface deviations are even more pronounced, such as the Chl (dark purple) decreasing near the surface and the Temp (dark purple) increasing near the surface. Including the “typical profile shape” in each of the Fig 2 examples can help justify these assertions since readers can then visualize the deviations more easily. - Dashed lines showing the mean profiles have been added to Figure 2. The examples have been changed (there are now six examples) to better emphasise that the method compares dependence between shapes, rather than shape similarity. The corresponding description in the text (Lines 110-116) has been updated accordingly to describe the interpretations of the new examples.
Lines 148-149: It might be useful to explicitly mention that the maximum number of Fourier basis coefficients is the number of measurements within each profile in the Methods section. - This point has been added to the following sentence in Section 2.2 (now Lines 126-128): “Each profile was converted to basis function coefficients using a fast Fourier transform (FFT), where the number of basis functions equals the number of depth levels in the profile (and therefore the maximum number of basis functions is equal to the number of depth levels).”
Lines 157: In the Case 2 analysis, the correlation between physical variables and oxygen is compared even though there are substantially less oxygen profiles. Could this lead to biases in the correlation between oxygen and physical variables since the correlations between physical variables to other physical variables have been sampled over different times? The conclusion in Line 157 states that the oxygen distribution is driven by salinity, but the correlations between (salinity and density) and (density and oxygen) are not over the same times, although maybe the difference is small since there is an overall large number of profiles. - The impact of missing data on biased correlation coefficients was tested on simulated scalar-valued datasets. Removing the proportion of observations missing in the oxygen time series did not affect correlation coefficients by much (a difference of <= 0.02). The following sentences have been added in Section 3.1.1 (Lines 164-166): “As a check, correlation coefficients were computed using simulated scalar data with the sample sizes described above. The resulting values differed from the theoretical correlations (in the absence of missing observations) by at most 0.02. This suggests that the correlation estimates in this case study are robust to missing data.”
Line 175: Why were the BGC-Argo profiles smoothed with a “15 m moving-median window” while the CE09OSPM was not smoothed? Are there specific criteria that should be used to determine whether or not to smooth the data before performing FDA? - The CE09OSPM profiles were the daily average of eight profiles, so they were already smoothed to some degree, as stated in Section 3.1.1 (Lines 156-157).
Lines 186-194: Other analyses might, for example, correlate the deep chlorophyll maximum with a specific isotherm (as mentioned in Line 48) when analyzing these profiles. It would be interesting to see whether the same conclusions are drawn when comparing correlations between this kind of conventional method and the vertically-resolved FDA method. The possible differences and additional insights that are drawn using FDA would strengthen the argument for using this technique. - The following paragraph has been added in Section 3.2.2 (now Lines 202-209): “Whilst previous studies have assessed relationships between scalar-valued metrics of chlorophyll profiles and water column features (Jayaram et al., 2021; Xu et al., 2022; Zampollo et al., 2023), the results presented here instead characterise the dependence between entire profiles of chlorophyll and temperature. A key implication is that the resulting correlation coefficients naturally integrate the features of chlorophyll profiles identified by Xu et al. (2022) - including peak concentration, depth, and thickness - within a single metric. This removes the need to explicitly define and detect such features, which can be particularly advantageous when profiles lack a well-defined peak. A similar interpretation applies to temperature, where the metric implicitly incorporates properties such as surface temperature, mixed layer depth, and thermocline gradient. However, the resulting coefficient does not distinguish which, if any, specific characteristics of the profiles drive the observed correlation.”
Lines 233-234: It could be helpful to speculate on how well FDA works when applied to profiles with different vertical resolutions that are interpolated/preprocessed. I assume this would be highly dependent on the interpolation/preprocessing method, but as a potential user of FDA, it would be nice to know whether the authors recommend that this technique be mostly applied to profiles with the same vertical resolution, or if there is enough flexibility that users could perform cross-platform analyses with differing resolutions (which is alluded to in Line 249). - The following sentences have been added to address this point in Section 4.3 (Lines 266-269). “In cases where profiles from different platforms have different vertical resolutions, it is important to ensure that the analysis resolution is adequate to resolve the main differences in shape. This may require either reducing the resolution of higher-resolution profiles or interpolating lower-resolution profiles to a finer grid, depending on the vertical scale of the features being investigated.”
Lines 251-254: An additional application that could be worth mentioning is for validating GCMs and other models. In particular, FDA could help to assess model performance on depth-dependent characteristics such as the vertical extent of OMZs, which show discrepancies with the climatology (Cabré et al. 2015). Although I imagine some difficulty might arise from differences in vertical resolution. - This potential application has been added in Section 4.4 (Lines 290-293): “Similarly, outputs from a general circulation model (GCM) can be validated by pairing predicted profiles with the observations used to generate the model (Mignot et al., 2021), while similar analyses across multiple GCMs (Cabre et al., 2015) can help assess inter-model variability.”
Line 227: Could you comment more generally on the data quality needed for this technique? Is there a possibility of spurious signals from FDA, e.g., spectral leakage, Gibbs phenomenon in a profile with sharper gradients than another, or issues with noisy data? - Further details have been added to Section 4.3 (Lines 262-266): “However, as with any basis expansion, artefacts can arise when profiles contain sharp gradients or substantial noise. For example, truncated Fourier representations may exhibit oscillatory behaviour near sharp transitions (similar to the Gibbs phenomenon), and noisy observations can introduce small-scale fluctuations in the fitted curves. Consequently, some smoothing may be necessary prior to analysis, although excessive smoothing can artificially inflate correlation estimates.”
Typos & clarification:
Lines 23-24: “comprehensive profiling datasets, however they are” Semicolon or period might be better than the comma – This has been changed to a semi colon.
Line 31: “continuous, shape based” As someone unfamiliar with FDA, I was unsure of what shape based meant when describing profiles. Maybe another term could be used, or a slight reorganization with Lines 32-33, which clarified the meaning of “shape based” for me. - The first sentences of this paragraph have been adapted to the following (Lines 31-34): “Functional data analysis (FDA) provides a framework for analysing data that take the form of continuous curves, where a variable of interest is expressed as a function of an indexing variable (Ramsay and Silverman, 2005). This enables analysis of the shape of the resulting functions. Recent work has demonstrated the benefits of treating oceanographic profiles as continuous functional data objects, where depth (or pressure) serves as the indexing variable”.
Line 78: “described briefly here. and a” Comma instead of period - This has been replaced with a comma.
Line 91: “linear combination of the mean basis coefficients represent a mean function” I think it would be more accurate to say the combination of coefficients multiplied by their respective basis functions are the mean function. - This line has been edited using the other reviewer’s suggestion to the following (Lines 92-93) “Using the mean basis coefficients with the basis functions represents a mean function for each set, describing a characteristic profile shape”.
Line 105: “functional shape is maintained perfectly” Maintained is unclear to me, maybe “perfectly consistent” or “change in unison” - This sentence has been removed since there was potential for confusion within that paragraph. The meaning should be clearer now.
Line 106: “out of phase” suggests a periodic signal to me, could this be rephrased? Alternatively the phrase could be deleted so the sentence reads “suggests that any deviation in …” - This phrase has been removed to help the clarity of that paragraph. The sentence now reads “Specifically, this implies that any deviation in a particular component in set X corresponds to a deviation in the opposite direction from the mean function in set Y.” (now Lines 107-108)
Line 109: “profiles, and their correlation” No comma here – the examples in Figure 2 have changed and this sentence was removed.
Line 111: Instead of a “clear relationship”, could you describe what that relationship is, as is done in Lines 112-114? - The examples and the accompanying description in the text have been changed to the following (now Lines 110-116): “Figure 2 shows several examples of datasets containing pairs of oceanographic profiles, illustrating a range of correlation structures. In Examples 1 and 2, the profiles exhibit strong positive correlation, meaning that positive deviations from the mean in one dataset are mirrored by similar deviations in the other. Notably, Example 2 highlights that this correlation coefficient reflects dependence between deviations, rather than similarity in the overall shape of the mean profiles. In contrast, Examples 3 and 4 show negligible correlation, with variations in one profile occurring independently of the other. Finally, Examples 5 and 6 demonstrate negative correlation, where positive deviations in one dataset are associated with corresponding negative deviations in the other.”
Line 174: “20m and 230 m” Space between 20 and m – a space has been aded here (now Line 190).
Line 182: Include comma before “and” in “(5906204, 5904021 and 6901585)” - a comma has been added (now Line 195).
Line 207: “expected given the variables are” Should this be “given that”? - “given” added (now Line 231).
Line 229: Some clarification on why this method does not “characterise relationships between functional variables” might be worthwhile. What is the distinction between functional data and functional variables? Is a functional variable the same as the indexing variable, like depth? - The terms variable and data are being used in a general statistics context, with “variable” being a generic label for an unknown, measurable quantity, and data being a label for actual measurement. We just added “functional” to each of these since we are working with functional variables and functional data here. The indexing variable is always a scalar value. This sentence has been changed to the following (Lines 253-255): “Second, as in the analysis of scalar-valued data, the correlation coefficient presented here does not quantify the magnitude of the causal effects between functional variables in the way that the slope of a linear regression does.”
Line 262: “coefficient that accounts describes the dependence” Either “accounts for” or “describes” - removed “accounts” (now Line 303).
Line 264: “floats respectively,” Comma after “floats” - comma added (now Line 305).
Figure 3: Could the x-axis of the variables be aligned so that it is easier to compare the times when there are no profiles of oxygen? - The times on the x-axis are now aligned, as suggested.
Figure 6: Not necessary, but since the float trajectories are associated with a specific color in Fig 5, maybe there could be a colored box behind the float numbers in Fig 6 for each float. That way, at a quick glance, readers can see which float corresponds to which trajectory without having to compare the numbers. - Colours have been added to the float labels in Figure 6 and the caption has been updated to note the connection to the previous figure.
Questions:
How appropriate would it be to compare correlations of variables that are truncated differently? For instance, in the CE09OSPM case, if the physical variables had relatively low rates of missing data outside of 40-400 m but the oxygen had high rates of missing data, could one perform FDA with a larger depth range for the physical variables and then truncate later for correlation with the oxygen? Or would that lead to conclusions that are not physically consistent since the depth ranges have changed? - The following lines have been added to the end of Section 2.1 (Lines 117-121): “It is worth noting that this method could, in principle, be extended to compare sets of profiles measured over different depth ranges, although an example is not provided in the present study. This could be achieved by rescaling each range to a common interval (e.g., [0,1]) and representing the profiles using regularly spaced measurements with a consistent number of points and basis functions. The interpretation would then differ slightly, in that a deviation in one set of profiles may correspond to a deviation at a different physical depth in another set.”
Would it be helpful to use less basis functions on noisier data to avoid overfitting? It could be interesting to assess the method’s effectiveness when using only the Fourier basis functions with largest magnitude, but for now, we recommend smoothing the profiles to reduce the effects of noise.
Citation: https://doi.org/10.5194/egusphere-2026-135-AC1
-
AC1: 'Reply on RC1', Mark Taylor, 16 Apr 2026
-
RC2: 'Comment on egusphere-2026-135', Anonymous Referee #2, 10 Mar 2026
This manuscript was a pleasure to read. It is well written and already polished, with a scope that is clearly defined and placed well within the current literature. The authors have done a great job providing enough background to the FDA method so that the manuscript is self-consistent, I could follow the manuscript without having to constantly look over the reference material. I have only very minor comments/questions.
Comments:
Line 91: “The linear combination of the mean basis coefficients …” Do you mean “Using the mean basis coefficients with the basis functions represents a mean function …”?
Line 102 and 104: “… each basis coefficient is perfectly linear …” what do you mean by linear in this sense? I couldn’t find a supporting definition in Urbano-Leon et al. (2023), it would be good to clarify what you mean here.
Line 116: “stored in a matrix format” to “stored as a matrix”, as a matrix is a mathematical object and not a data type.
Line 117: “Each profile was converted to basis function coefficients using a FFT”. Earlier you mention that you use p basis functions, and later on you mention that there is a basis function for each depth level. I think it would be good to mention here already that you select the number of basis functions based on the number of depth levels.
Line 122: “stored in a matrix” to “stored as a matrix”.
Figure 3: Units for the colour bars would be helpful (even though they are in the figure titles).
Figure 5: Where does the data for the background chlorophyll map come from? A citation would be helpful here. Also, axes labels.
Line 262: “… correlation coefficient that accounts describes …” I think you can remove “accounts”, or change to “accounts for”
Questions:
Q1: How sensitive is the technique to data-gap filling and smoothing choices? While I imagine this is out-of-scope and the technique is likely to be robust to these choices, a comment on this would be useful.Q2: The computed correlations between variables are over the entire water column. I can imagine that some variables are well correlated over some depths (say, in the mixed layer), and a physical relationship is lost beyond this depth level. Could this technique be useful in identifying where a coupling between two variables may breakdown? E.g. could you only the coefficients of the basis functions up to a certain depth level, and compare the correlations of using less/more coefficients?
Q3: You mention that this technique may be useful for validating output for machine learning algorithms. Could you expand on exactly how this might be done using your technique? It isn’t quite clear to me.
Citation: https://doi.org/10.5194/egusphere-2026-135-RC2 -
AC2: 'Reply on RC2', Mark Taylor, 16 Apr 2026
Thank you to the reviewer for their helpful comments. The responses to each comment are listed below in order, denoted by bold text.
This manuscript was a pleasure to read. It is well written and already polished, with a scope that is clearly defined and placed well within the current literature. The authors have done a great job providing enough background to the FDA method so that the manuscript is self-consistent, I could follow the manuscript without having to constantly look over the reference material. I have only very minor comments/questions.
Comments:
Line 91: “The linear combination of the mean basis coefficients …” Do you mean “Using the mean basis coefficients with the basis functions represents a mean function …”? - This sentence has been changed to the following (now Lines 92-93) “Using the mean basis coefficients with the basis functions represents a mean function for each set, describing a characteristic profile shape.”
Line 102 and 104: “… each basis coefficient is perfectly linear …” what do you mean by linear in this sense? I couldn’t find a supporting definition in Urbano-Leon et al. (2023), it would be good to clarify what you mean here. - I have added the following definition of what I meant by this in the first instance I used the term (now lines 102-105): “Mathematically, a correlation of +1 implies that corresponding basis coefficients are perfectly positively linearly related, such that a positive change in a basis coefficient in one dataset results in a proportional positive change in the same coefficient in the corresponding observation.”
Line 116: “stored in a matrix format” to “stored as a matrix”, as a matrix is a mathematical object and not a data type. - changed to “as a matrix (now Line 125)”.
Line 117: “Each profile was converted to basis function coefficients using a FFT”. Earlier you mention that you use p basis functions, and later on you mention that there is a basis function for each depth level. I think it would be good to mention here already that you select the number of basis functions based on the number of depth levels. - That sentence (now Lines 126-128) has been edited to make this clearer and now says the following: “Each profile was converted to basis function coefficients using a fast Fourier transform (FFT), where the number of basis functions equals the number of depth levels in the profile (and therefore the maximum number of basis functions is equal to the number of depth levels).”
Line 122: “stored in a matrix” to “stored as a matrix”. - changed to “as a matrix” (now Line 136).
Figure 3: Units for the colour bars would be helpful (even though they are in the figure titles). - The units have been moved to above the colour bars.
Figure 5: Where does the data for the background chlorophyll map come from? A citation would be helpful here. Also, axes labels. - A citation has been added to the caption and longitudes and latitudes have been added to the figure.
Line 262: “… correlation coefficient that accounts describes …” I think you can remove “accounts”, or change to “accounts for” - removed “accounts” (now Line 303).
Questions:
Q1: How sensitive is the technique to data-gap filling and smoothing choices? While I imagine this is out-of-scope and the technique is likely to be robust to these choices, a comment on this would be useful. - The following sentences have been added to the first paragraph of Section 2.2 (now Lines 132-135): “It is worth noting that this study does not explore the effect of smoothing on the computed correlation coefficients. However, the approach is expected to be robust, provided that the general shape of the profiles is sufficiently clear to allow identification of meaningful water column features. Where profiles exhibit substantial noise relative to the underlying signal, smoothing is recommended prior to computing correlation coefficients.”Q2: The computed correlations between variables are over the entire water column. I can imagine that some variables are well correlated over some depths (say, in the mixed layer), and a physical relationship is lost beyond this depth level. Could this technique be useful in identifying where a coupling between two variables may breakdown? E.g. could you only the coefficients of the basis functions up to a certain depth level, and compare the correlations of using less/more coefficients? - Yes, profiles with a reduced range of depths could be used, although the basis functions don’t each relate to a certain depth. The following sentence has been added to Section 2.1 (Lines 121-123): “This approach could also help identify depth ranges over which two variables are correlated, such as within the mixed layer, by iteratively repeating the analysis with modified profile segments to detect where the dependence in profile shape breaks down.”
Q3: You mention that this technique may be useful for validating output for machine learning algorithms. Could you expand on exactly how this might be done using your technique? It isn’t quite clear to me. - The following sentences have been added in Section 4.4 to address this point (Lines 287-290): “The method could also, in principle, be used to assess reconstructed profile shapes from machine learning predictions (Sauzede et al., 2016; Chen et al., 2022; Pietropolli et al., 2023; Mignot et al., 2023). Although such models are often evaluated using pointwise metrics (e.g., RMSE), the proposed approach provides a complementary means of comparing predicted and observed profiles, potentially offering further insight into the representation of vertical water column structure.”
Citation: https://doi.org/10.5194/egusphere-2026-135-AC2
-
AC2: 'Reply on RC2', Mark Taylor, 16 Apr 2026
In general, this manuscript is well-written and comprehensive. It is effectively organized, with straightforward notation and two case studies that illustrate a wide range of use cases. The authors present the application of functional data analysis (FDA) to oceanographic profiles to calculate a scalar correlation coefficient for vertically-varying data. FDA significantly addresses a shortcoming of current oceanographic analyses, in which a specific depth level must be selected or some other dimension must be fixed in order to calculate correlation. Figure 1 was especially helpful for understanding how the method works! This technique has the potential to be invaluable in many different applications, and I look forward to using it myself in the future.
My main suggestion is to better emphasize the utility of FDA, both in the introduction and when drawing conclusions from the case studies. Presenting more concrete findings that cannot be easily drawn through the use of other correlation analyses will help to underscore the importance of this method. Below, I point to specific instances in the text where I think some further explanation or analysis may bolster your argument. In addition, I have a handful of other minor comments. At the end of this review, there are a couple of remaining questions I have, which could be taken into consideration when revising but may fall outside the immediate scope of the paper.
Thank you for the opportunity to review this paper-- it was a pleasure to read.
Specific comments:
Lines 67-69: It could be meaningful to also mention conclusions drawn about the seasonal cycle, since the “seasonal strength was not quantified” in past studies (Line 61).
Lines 69-70: The interpretation that “relationships between temperature and chlorophyll profiles, as well as their temporal ACFs, vary spatially” seems quite generic. Would it be possible to say something more concrete about the takeaways from the BGC-Argo profiles, like the conclusions from Lines 265-266, or to discuss more about how the correlations between the profiles bring new insight to “relationships between environmental conditions and vertical chlorophyll structure” (Line 64).
Lines 113-114 / Figure 2: In Example 4, it is stated that a strong negative correlation is because of the near surface deviations in opposite directions between the two sets. Yet, in Example 2, it seems like the near-surface deviations are even more pronounced, such as the Chl (dark purple) decreasing near the surface and the Temp (dark purple) increasing near the surface. Including the “typical profile shape” in each of the Fig 2 examples can help justify these assertions since readers can then visualize the deviations more easily.
Lines 148-149: It might be useful to explicitly mention that the maximum number of Fourier basis coefficients is the number of measurements within each profile in the Methods section.
Lines 157: In the Case 2 analysis, the correlation between physical variables and oxygen is compared even though there are substantially less oxygen profiles. Could this lead to biases in the correlation between oxygen and physical variables since the correlations between physical variables to other physical variables have been sampled over different times? The conclusion in Line 157 states that the oxygen distribution is driven by salinity, but the correlations between (salinity and density) and (density and oxygen) are not over the same times, although maybe the difference is small since there is an overall large number of profiles.
Line 175: Why were the BGC-Argo profiles smoothed with a “15 m moving-median window” while the CE09OSPM was not smoothed? Are there specific criteria that should be used to determine whether or not to smooth the data before performing FDA?
Lines 186-194: Other analyses might, for example, correlate the deep chlorophyll maximum with a specific isotherm (as mentioned in Line 48) when analyzing these profiles. It would be interesting to see whether the same conclusions are drawn when comparing correlations between this kind of conventional method and the vertically-resolved FDA method. The possible differences and additional insights that are drawn using FDA would strengthen the argument for using this technique.
Lines 233-234: It could be helpful to speculate on how well FDA works when applied to profiles with different vertical resolutions that are interpolated/preprocessed. I assume this would be highly dependent on the interpolation/preprocessing method, but as a potential user of FDA, it would be nice to know whether the authors recommend that this technique be mostly applied to profiles with the same vertical resolution, or if there is enough flexibility that users could perform cross-platform analyses with differing resolutions (which is alluded to in Line 249).
Lines 251-254: An additional application that could be worth mentioning is for validating GCMs and other models. In particular, FDA could help to assess model performance on depth-dependent characteristics such as the vertical extent of OMZs, which show discrepancies with the climatology (Cabré et al. 2015). Although I imagine some difficulty might arise from differences in vertical resolution.
Line 227: Could you comment more generally on the data quality needed for this technique? Is there a possibility of spurious signals from FDA, e.g., spectral leakage, Gibbs phenomenon in a profile with sharper gradients than another, or issues with noisy data?
Typos & clarification:
Lines 23-24: “comprehensive profiling datasets, however they are” Semicolon or period might be better than the comma
Line 31: “continuous, shape based” As someone unfamiliar with FDA, I was unsure of what shape based meant when describing profiles. Maybe another term could be used, or a slight reorganization with Lines 32-33, which clarified the meaning of “shape based” for me.
Line 78: “described briefly here. and a” Comma instead of period
Line 91: “linear combination of the mean basis coefficients represent a mean function” I think it would be more accurate to say the combination of coefficients multiplied by their respective basis functions are the mean function.
Line 105: “functional shape is maintained perfectly” Maintained is unclear to me, maybe “perfectly consistent” or “change in unison”
Line 106: “out of phase” suggests a periodic signal to me, could this be rephrased? Alternatively the phrase could be deleted so the sentence reads “suggests that any deviation in …”
Line 109: “profiles, and their correlation” No comma here
Line 111: Instead of a “clear relationship”, could you describe what that relationship is, as is done in Lines 112-114?
Line 174: “20m and 230 m” Space between 20 and m
Line 182: Include comma before “and” in “(5906204, 5904021 and 6901585)”
Line 207: “expected given the variables are” Should this be “given that”?
Line 229: Some clarification on why this method does not “characterise relationships between functional variables” might be worthwhile. What is the distinction between functional data and functional variables? Is a functional variable the same as the indexing variable, like depth?
Line 262: “coefficient that accounts describes the dependence” Either “accounts for” or “describes”
Line 264: “floats respectively,” Comma after “floats”
Figure 3: Could the x-axis of the variables be aligned so that it is easier to compare the times when there are no profiles of oxygen?
Figure 6: Not necessary, but since the float trajectories are associated with a specific color in Fig 5, maybe there could be a colored box behind the float numbers in Fig 6 for each float. That way, at a quick glance, readers can see which float corresponds to which trajectory without having to compare the numbers.
Questions:
How appropriate would it be to compare correlations of variables that are truncated differently? For instance, in the CE09OSPM case, if the physical variables had relatively low rates of missing data outside of 40-400 m but the oxygen had high rates of missing data, could one perform FDA with a larger depth range for the physical variables and then truncate later for correlation with the oxygen? Or would that lead to conclusions that are not physically consistent since the depth ranges have changed?
Would it be helpful to use less basis functions on noisier data to avoid overfitting?