Using empirical orthogonal functions derived from remote sensing reflectance for the prediction of concentrations of phytoplankton pigments

Introduction Conclusions References


Conclusions References
Tables Figures

Introduction
Optical measurements taken from various platforms have been successfully used to determine the total chlorophyll a (TChl a) concentration, e.g.see summary by Mc-Clain (2009).Those measurements can be taken continuously and, thereby, estimate TChl a concentration at a much higher temporal and spatial resolution than it is possible from chemical measurements in the laboratory, e.g., by High Performance Liquid Chromatography (HPLC) analysis of discrete water samples.Chl a is the major pigment in all phytoplankton species and is often used as an indicator of phytoplankton biomass.When pigments are measured by HPLC, TChl a is defined as the sum of monovinyl Chl a (MVChl a), divinyl Chl a (DVChl a) and chlorophyllide a (which is mainly formed as artefact of the former two during the extraction process and therefore included in the calculation).DVChl a exists only in the prokaryotic genus Prochlorococcus.MVChl a is the Chl a pigment for all other phytoplankton (other cyanobacteria and eukaryotes).Besides Chl a there are many other pigments in phytoplankton that are either involved in light harvesting (such as chlorophyll b (Chl b), chlorophyll c (Chl c) and several carotenoids, called photosynthetic carotenoids (PSC)), or protecting Chl a and other sensitive pigments from photodamage (photoprotective carotenoids, PPC).Some pigments, e.g., zeaxanthin (Zea) in cyanobacteria, only occur in certain phytoplankton groups and are used as marker pigments to identify them (e.g., via the program CHEMTAX developed by Mackey et al., 1996).
When analysing biogeochemical fluxes in the oceans, however, it is inadequate to consider phytoplankton as a single variable (i.e.TChl a) because various groups have different roles in the biogeochemical processes (such as carbon fixation and export, nitrogen fixation, and silicon uptake).Their overall biomass and primary production is not well correlated to their TChl a concentration due to variations in pigment concentration Introduction

Conclusions References
Tables Figures

Back Close
Full in response to several factors (e.g., light, temperature, and nutrients).The knowledge of the distribution of different phytoplankton pigments gives insight into phytoplankton composition, overall light absorption, and physiological state.
Several researchers lately have investigated the potential to derive pigments other than TChl a from continuous optical data which have the potential to deliver a data set with much better spatial and temporal coverage than obtained with analysing water samples.Chase et al. (2013) decomposed a large global data set of hyperspectral particulate absorption measurements into Gaussian function components and assessed the magnitude of specific Gaussian functions to the absorption by specific pigments or pigment groups.The method provided robust results for obtaining concentrations of TChl a, TChl b (sum of different types of Chl b), TChl c (sum of different types of Chl c), PSC, PPC and PE (phycoerythrine).Organelli et al. (2013) used a multivariate approach applied to fourth derivate spectra of phytoplankton or particulate absorption (a ph and a p , respectively) data to retrieve TChl a, the total concentrations of seven diagnostic pigments and three phytoplankton size classes.However, a p and a ph are inherent optical properties (IOP) which cannot be determined from satellite ocean colour measurements directly where only after successful atmospheric correction water leaving reflectance (ρ w ), an apparent optical property (AOP), is derived.ρ w is not only related to phytoplankton absorption and therefore the imprints of different types of pigments, which in addition correlate in most parts of the spectrum among each other, are masked not only by scattering and absorption of other water constituents and water itself but also by changing radiance distribution in response to varying environmental conditions, e.g., observation geometry, surface waves, atmospheric conditions, etc.. Pan et al. (2010) developed empirical algorithms based on reflectance ratios to approximate key phytoplankton pigment concentrations.The band ratio algorithms were developed from underwater radiometric measurements collocated to pigment data taken at the US northeast coast and were successful in deriving the concentration of TChl a, TChl b, TChl c and nine different carotenoids.However, such band ratio algorithms require a very large data base (> 400 collocations with satellite data) from a certain Introduction

Conclusions References
Tables Figures

Back Close
Full region to derive robust results.Pan et al. (2013) later described that the algorithm had to be adapted by modifying the pigment specific coefficients based on a regional specific data set.Craig et al. (2012) developed local models to estimate TChl a and a ph at different wavelengths from hyperspectral in situ measurements of remote sensing reflectance (R RS (λ)) in an optically complex water body.The models were based on empirical orthogonal functions (EOF) analysis of normalized R RS (λ) spectra and subsequently linear fitting of measured TChl a concentration and a ph (λ), respectively, as response variables to EOF loadings as predictor variables.Taylor et al. (2013) showed that the method could be used similarly to derive PE concentration from underwater upwelling radiance spectra (L u (λ)) which enabled continuous profile predictions of PE concentrations.The present study aims to use the spectral information contained in reflectance data to derive the optical signature of different pigments by applying an automatic and generic technique, and with an additional focus on evaluating performance as a function of sample size.The EOF analysis is applied to R RS and to ρ wN (i.e.normalized ρ w just above suface) data measured in the field and by satellite sensors, respectively, in the Atlantic Ocean in order to predict the concentrations of several phytoplankton pigments and pigment groups.In addition, the application of our statistical method to study the large scale distribution and photo-physiology of phytoplankton based on various pigments concentrations is investigated.

Material and methods
Two sets of optical and pigment data from the Atlantic Ocean were used in the analysis.A first model set-up used a data set which included only optical measurements taken in situ (as depth profiles) and collocated surface pigment data collected during the transatlantic RV Polarstern cruises ANT XXIVI/4, ANTXXV/1 and ANTXXVI74.In the following, we call this data set "field data set".For a second data set, the "satellite-based data set", we considered water reflectance measurements from the satellite sensor Introduction

Conclusions References
Tables Figures

Back Close
Full MERIS collocated to pigment data from various researchers in the tropical Atlantic Ocean.

Field data set
Samples for the field data set were collected during three RV Polarstern cruises: the expeditions ANTXXIV/4 in April/May 2008 and ANTXXVI/4 in April/May 2010 followed a South-to-North transect through the Atlantic Ocean from Punta Arenas (Chile) to Bremerhaven (Germany); ANTXXV/1 in November 2008 followed a North-to-South transect through the eastern Atlantic Ocean from Bremerhaven to Cape Town (South Africa) (see Fig. 1).Sampling was generally conducted at noon local time and involved CTD casts with water samplers as well as below-water radiance and irradiance and above-water irradiance measurements.Water samples from surface water (< 10 m) for pigment analysis and for PE analysis were filtered on GF/F filters and on 0.4 µm polycarbonate filters, respectively.Filters were immediately shock-frozen in liquid nitrogen and stored at −80 • C until further analysis at the home laboratories at Alfred-Wegener-Institute Helmholtz Centre of Polar and Marine Research (AWI).

Pigment data
The composition of pigments which are soluble in organic solvents was analysed by HPLC following the method by Barlow et al. (1997) adjusted to our temperaturecontrolled instruments as detailed in Taylor et al. (2011).We determined the list of pigments shown in Table 1 of Taylor et al. (2011) and applied the method by Aiken et al. (2009) for quality control of the pigment data.HPLC data for ANTXXV/1 were already published in Taylor et al. (2011) and are available from PANGAEA (doi.pangaea.de/10.1594/PANGAEA.819070).The relative concentration of PE was taken from the data set published for all three cruises in PANGAEA (doi.pangaea.de/10.1594/PANGAEA.819624) and analysed in Taylor et al. (2013).As outlined in Taylor et al. (2013), the PE Introduction

Conclusions References
Tables Figures

Back Close
Full concentration is expressed as a relative value, while all other pigments concentrations are directly measured values.

Reflectance data field data set
We used RRS(λ) data obtained for all three cruises as AOP input data.RRS data of AN-TXXV/1 were already published in Taylor et al. (2011) and are available from PANGAEA (doi.pangaea.de/10.1594/PANGAEA.819506).For the other two cruises we applied the same technique and instrumentation as in Taylor et al. (2011) to derive the RRS spectrum at each station.To test the influence of spectral resolution of AOPs, the hyperspectral field RRS(λ) data were reduced to the multispectral bands of MERIS (412, 443, 490, 510, 560, 620, 665 and 681 nm) by taking the integral over all wavebands within one band (±10 nm around the center wavelength; except for 681 nm ± 7.5 nm was used).
To allow direct comparisons to MERIS normalized water-leaving reflectance, from the RRS(λ) the water-leaving reflectance, ρ w_in situ , was calculated as (1) ρ w_insitu (λ) was then normalized to the sun and sensor position at nadir (ρ wN (λ)) according to (Barker et al., 2008) using the solar zenith angle at observation and corrections for the bidirectional structure of the light field (lookup tables for f /Q and R factors) as provided in Morel andGentili (1993, 1996) and Morel et al. (1995).The later were only available for the first seven wavebands but not for 681 nm.

Satellite-based data set
A large data set of phytoplankton pigment data has been compiled (for more details on the data set see Supplement Table S1).The pigment concentrations had been determined from the sea surface (< 10 m) with HPLC by several investigators within the area of 35 Full ( [2002][2003][2004][2005][2006][2007][2008][2009][2010][2011][2012].A large part of those data are publicly available from the SEABASS and BODC databases.The other part consists of pigment data from the field data set within this area, including additional data from stations where no radiometric measurements had been taken, and from four other cruises: pigment data from the RV Maria S. Merian cruise MSM-18/3 were analysed by AWI as described above in Sect.2.1.1;those from two RV Polarstern cruises (ANTXXIII/1 and ANTXXIV/1) were analysed by HZG following Zapata et al. (2000); data of the Bonus Good Hope (BGH) cruise, conducted by the Laboratoire d'Oceanographie de Villefranche, (LOV), have been acquired as outlined in Speich et al. (2008) and analysed following the method by Ras et al. (2008).
As AOP input data we used the MERIS Polymer level-2 ρ wN (λ) product.The Polymer algorithm (for details see Steinmetz et al., 2011) provides a powerful atmospheric correction.It is an iterative spectral matching method over the whole available sensor spectrum and uses two decoupled models: first, the water reflectance is modelled using as two parameters the Chl a concentration and the particle backscattering coefficient.Second, the reflectance of the atmosphere including aerosols and contamination by sun-glint is simplified by using an analytical expression, that can account for multiple interactions between molecular and aerosol scatterings (and glitter) without reference to a specific aerosol model.Hence, it allows to retrieve large amounts of MERIS observations in sun-glint, thin clouds or heavy aerosol plumes contaminated conditions which could not be treated correctly by standard atmospheric correction schemes extrapolating from the near infra-red.MERIS Polymer products thus improve the spatial coverage by almost a factor of two and have been proven successful for retrieving MERIS Ocean Colour products: Polymer was selected as the MERIS processor for atmospheric correction in the frame of the Ocean Colour Climate Change Initiative after an extensive validation and inter-comparison with other atmospheric correction algorithms in which each algorithm's uncertainty was assessed (Müller and Krasemann, 2012).However, still uncertainties probably result from the different size of foot-prints (1 km by 1 km) from the satellite data and about 20 cm by 20 cm sampled area for the water sample.Introduction

Conclusions References
Tables Figures

Back Close
Full Matchups between pigment data and MERIS Polymer ρ wN (λ) and TChl a products were determined according to the MERMAID data base as 1 × 1 (within the MERIS pixel), as 3×3 and 5×5 pixels around the field observation (see Barker et al., 2008).For the 3 × 3 and 5 × 5 MERIS pixel match-ups the mean ρ wN (λ) and TChl a concentrations were calculated.Then the 1 × 1, mean 3 × 3 and mean 5 × 5 MERIS ρ wN (λ) matchup data were used for deriving predicted (modelled) pigment concentrations, as outlined in Sect.2.3.The mean Polymer TChl a data were validated as outlined in Sect.2.4.

Conclusions References
Tables Figures

Back Close
Full set because for some pigment samples of the data set collocated to the satellite data, these pigments have not been analysed.

Empirical Orthogonal Function analysis
Following Taylor et al. (2013), the spectral data were subjected to an Empirical Orthogonal Function (EOF) analysis, also known as "Principal Component Analysis" (PCA), in order to reduce the high dimensionality of the data and derive the dominant signals ("modes") that best describe variance within the data set.In addition to dimension reduction of spectral data, the use of EOF modes in statistical model building also avoids problems associated with multicollinearity amongst the original predictor variables.All calculations in the following where done with the statistical computing software "R" (R Development Core Team, 2013).Spectral data were contained in a data matrix X, of dimensions M sample rows by N reflectance band columns.Spectral samples were collocated to the respective pigment data set Y, of dimensions M sample rows by N pigment columns.While hyper_RRS data consisted of 350-700 (N = 351) or 380 to 700 nm (N = 321) bands, band_RRS and the satellite_ρwN data consisted of the eight MERIS visual wavebands (N = 8).As in Taylor et al. (2013), spectral datasets X were standardized for each sample row by first subtracting the mean spectral value (centering) followed by division by the spectral standard deviation (scaling), which focused the analysis on the spectral shape rather than the magnitude.The standardized matrix X was then subjected to Singular Value Decomposition (SVD) in order to derive EOF modes: where V is a N × N matrix containing the EOFs (spectral pattern), U is an M × N matrix containing the principal components (PCs), Σ is an N ×N matrix containing the singular values on the diagonal, and k is the EOF mode index (length N).Only EOFs ≤ min(M, N) will carry information.This notation differs slightly from that presented in Taylor et 2083 Introduction

Conclusions References
Tables Figures

Back Close
Full  2013), where a covariance matrix of the dataset was subjected to Eigen decomposition with subsequent projection of data onto EOFs to derive PCs.The results of both approaches are similar, except that PCs U derived via SVD are unitary and Σ contains standard deviation rather than variance.The SVD method is presented here due to its more straightforward notation; EOFs and PCs are determined in a single step whereas the alternate Eigen decomposition is a three-step calculation.

Log transformed general linear model
A general linear model was used to predict log-transformed pigment concentrations of each pigment y p based on a subset of PCs, U, as covariates.The linear model uses log-transformed pigment concentrations.Since only positive, non-zero values are permissible with this transformation, a small value was added to all concentrations (0.00001 mg m −3 ) to allow for the inclusion of samples where pigment concentrations were essentially zero or below the detection limit.A truncated subset of PCs was used as defined by the magnitude of their standard deviation.PCs with standard deviations of ≤ 0.0001 times the standard deviation of the first component were omitted.The resulting multiple regression had the form: linear regression based on the log-scaled predicted (log (y p )) versus the log-scaled observed (log (y o )) pigment concentration data, and the root mean square error (RMSE), the mean percent difference (MPD), the percent bias (PB) and the median percent difference (MDPD) for the non-log transformed data were determined.The following equations for these statistics have been used: with y − o as the mean value of the observed specific pigment concentration and i identifying the specific sample pair.

Model prediction error
In addition to the statistics performed for each pigment linear model (Sect.2.3.2),we performed a cross validation of the linear model fitting in order to better test the robustness of the models' prediction error.Data was split into two groups: the first part of the 2085 Introduction

Conclusions References
Tables Figures

Back Close
Full data was used for model fitting, while the second part was used for prediction validation.According to Craig et al. (2012), we assessed the number of observations required to achieve adequate predictions by the pigment linear models using the variable jackknife procedure of Wu (1986).So the data splitting for the cross validation procedure was varied as follows, with n = total number of samples, tp = number of training points and vp = number of points used for validation: Since the number of permutations for data splitting definitely is restricted by computing time, the procedure was run for 500 permutations, similar to what was recommended by Craig et al. (2012).Such a high number of permutations rules out that the model error is assessed based on a spatially or temporally biased data set.Each cross validation procedure was as follows: 1.For 500 permutations, do steps 2-9. to derive their PCs U valid : 8. Record pairs of observed and predicted validation pigment concentrations y o and y valid p in a new object for all permutations for later calculation of prediction error.
For each permutation the R 2 based on the log-scaled predicted (log y valid p ) versus the log-scaled measured (log(y p )) were derived and finally over all permutations the mean value, R 2 cv, was calculated.Prediction error was described in terms of absolute

Pigment concentration predictions with MERIS reflectance data
In order to predict pigment concentration from MERIS ρ wN (λ) data where we did not have corresponding pigment measurements, we projected standardized MERIS ρ wN (λ) data onto the EOF loading (V) to derive their principal components (U), which were subsequently used for the prediction with the fitted linear model (as in Sect.2.3.3,step7, Eq. 12).reflectance in the green, at 560 nm, while the standardized field data set contains four spectra with maxima at 510 nm.

Validation of MERIS
The composition and range of pigments is as well similar for both data sets (as detailed in Supplement Table S2).However, for all pigments (except for Fuco for which it is equal, and for Zea for which it is vice versa) the collocations to the field data set contain higher maxima and minima than the collocations to the satellite-based data set.The higher concentration of total pigments in the field data set may explain the small differences in the shape of the reflectance spectra of the two (field versus satellitebased) data sets.However, DVChl b, MVChl b, TChl b, Allo, Diato, Lut, Neo, Peri, Viola, Pras, Chlorophyllide a and TPheo had values of 0 mg m −3 in more than 20 % of all stations in both data sets.Also Chl c 3 had a concentration of 0 mg m −3 in one sample collocated to the field and in over 30 % of samples collocated to the satellite-based data set.Several pigments had occasionally (< 10 %) concentrations of 0 mg m −3 in samples collocated to the satellite-based data set: Caro, Chl c 1/2 , 19BF, 19HF, Zea, DVChl a, Diadino and Fuco, the later three also for the field data sets.All other pigments not listed here reached in all samples concentrations higher than 0 mg m −3 .

EOF analysis -shape of modes and relevance for predictions
The decomposition of the standardized spectra by EOF analysis returned nine significant modes (EOF-1 to EOF-9) for the hyper_RRS and seven significant modes for the band_RRS and satellite_ρ wN data sets (the first four modes are presented in Fig. 3) given our inclusion criterion based on the explained standard deviation relative to EOF-1 (see Sect. 2.3.2).For all data sets, the first three included modes explain over 99.8 % of the variance for all three data sets and EOF-1 explains already between 94.5 and 96 % of the variance (Table 1).
The shapes of the first three EOF modes are very similar among all three reflectance data sets.They are nearly identical for the band_RRS and the satellite_ρ wN data sets, but show smoother shapes and peaks for hyper_RRS for the first two modes.Still one has to bear in mind that although the RAMSES measurements deliver 1 nm resolved 2089 Introduction

Conclusions References
Tables Figures

Back Close
Full data, the real spectral resolution of the sensors is 3.3 nm. the hyper_RRS data.Because of the limited number of wavelengths for the two multispectral data sets, starting from EOF-3 their peaks are clearly shifted (peak at 412 and 443 nm for EOF-3 and EOF-4, respectively) as compared to hyper_RRS (peak at 360 and 410 nm for EOF-3 and EOF-4, respectively) where the narrow spectral resolution allows for more precision in identifying spectral regions of higher variance.For EOF-4, the satellite_ρ wN mode is much flatter beyond 500 nm and shows no trough between 600 and 650 nm as opposed to the EOF-4 for the two other data sets.No much similarity is seen among the EOF-5 of the different spectra data sets, while for EOF-6 the two field data sets are similar in the overall shape and peak positions which are in contrast shifted towards longer wavelengths for the satellite data set.EOF-7 and EOF-8 show very similar shapes for hyper_RRS and deviate from EOF-7 of the band data sets, while EOF-9 from hyper_R RS looks much more like the later ones.
The EOF analyses deliver modes of oscillation which can be interpreted as imprints of changes in the optical properties of water constituents in the water column: Compared to the shape of spectra obtained in case-2 waters by Lubac and Loisel (2007) and Craig et al. (2012) only our reflectance spectra taken in high TChl a waters with mineral fraction (identified as cluster V for the ANTXXV/1 data in Taylor et al., 2011) correspond to part of the spectra presented in those studies (e.g.class 5 in Lubac and Loisel 2007).While all our other spectra (typical case-1-water) are not contained in the data set of those studies.This explains the minor differences in the shape and loading of EOFs between their and our data set.In the following we focus the discussion on our hyper_RRS data set results since also the Craig et al. ( 2012) study was based on hyper-spectral RRS data.
Our first three EOF modes more or less correspond to the ones derived for the hyperspectral case-2 reflectance data set of Craig et al. (2012).As pointed in their study, EOF-1 is likely the signature of bulk oscillations in phytoplankton biomass concentration.However, our EOF-1 already explains much more of the variance as compared to Craig et al. (2012) where it only accounted for 72.4 % and showed much more structure Introduction

Conclusions References
Tables Figures

Back Close
Full and a weaker exponential decrease from 400 to 550 nm.This indicates that in our open ocean data set, the change in total attenuation is the main difference among the different sampled stations, which is mainly reflecting the attenuation as affected by the total pigment concentration.Our data set was largely composed of samples from waters with lower TChl a concentration, ranging from 0.005 to 3.553 mg m −3 , while in the study of Craig et al. (2012) it ranged from 0.584 to 18.020 mg m −3 .EOF-2 superficially resembles the oscillation in the amplitude of RRS which also is affected by overall changes in the total absorption over broad band structures.It is strongly decreasing from 350 to 510 nm and again increasing above 570 nm, which is connected to total pigment and water absorption, respectively.There is a peak around 683 nm which can be linked to MVChl a and DVChl a fluorescence.While this peak is present in EOF-1 and EOF-2 in the Craig et al. ( 2012) data set, it is only apparent in EOF-2 of our data set probably because of the lower TChl a concentrations.EOF-3 of our data set as compared to the one of Craig et al. (2012) shows a much steeper decrease with wavelength in the blue spectral range.These changes may reflect concomitant changes of absorption by chlorophyll and non-algal particles which are expected to be co-varying and of much lower concentration in our case-1-waters, as the scattering by other particles than phytoplankton was much higher in the case-2-water of Craig et al. (2012) leading to a less steep slope of this EOF mode.EOF-4 appears different in the relation of the three peaks.Similar as for EOF-2 and EOF-3, these differences are caused by the different composition and overall loading of water constituents of our and their sampled stations.All higher EOF were not presented in Craig et al. (2012) because they were not used to predict TChl a from RRS data, as it was the case for our TChl a (and MVChl a) linear model predictions (Sect.3.3.3).EOF modes higher than four were probably reflecting imprints of specific pigment groups or pigments, as indicated by the results of the ∆AIC values and further pointed out in the end of the next section (Sect.3.3.3).Introduction

Conclusions References
Tables Figures

Back Close
Full

Field data set linear models
All pigments which were apparent in all samples of the field data set were well predicted by linear models based on hyperspectral (hyper_RRS) or the reduced eight band (band_RRS) resolution.Correlations between predicted and observed pigment concentrations were based on a significance level of p < 0.0001 and cross validation statistics reached reasonable quality with R 2 cv ≥ 0.5, MDPDcv ≤ 45 % and MPDcv ≤ 60 % (Table 2a, upper part).For some pigments (TChl a, MVChl a, Hex, PSC) EOFs based on 380 to 700 nm produced much better linear model results using hyper_RRS data than based on 350 to 700 nm.Plots of observed versus predicted values for the full data set of pigments TChl a, PSC, PPC, Hex and Zea are shown in Fig. 4. Lower quality for one statistical parameter for both linear models was reached for Zea (R 2 cv 0.31 and 0.27), But (MPDcv 81 and 95 %) and for two parameters for PE (MDPDcv 65 and 67 %, MPDcv 139 and 156 %).For all other pigments, predictions were of low quality (results not shown), demonstrating that the linear model approach does not produce robust predictions as soon as a pigment is not present (i.e., 0 mg m −3 ) in every sample.The replacement of concentration of 0 mg m −3 with 0.00001 mg m −3 for specific pigments did not enable robust linear model construction and produced large errors, especially for the cross validation statistical parameters.We re-ran the predictions for specific pigments where only a few samples (< 10 %) had concentrations of 0 mg m −3 , as it was the case for DVChl a, Fuco, Diadino and Chl c 3 (see Supplement Table S2).In those specific linear model runs we only included as input data the data points where the specific pigment concentrations were > 0 mg m −3 .The resulting predictions (Table 2a, lower part, and for DVChl a see full-fit results in Fig. 4d) from using the adjusted input data for those pigments show robust and significant cross validation results within the same quality range as for the pigments which were detected in all data.No robust predictions were obtained for all other pigments which reached in less than 80 % of all samples concentrations > 0 mg m −3 , 2092 Introduction

Conclusions References
Tables Figures

Back Close
Full even when in the specific linear model runs we only included as input data the data points with specific pigment concentrations > 0 mg m −3 (results not shown).
Cross validation results of well predicted pigments ( TChl a (MVChl a in line with that) and PSC are dominating the overall phytoplankton pigment composition and absorption.TChl a has been shown to be well retrieved by band-ratio algorithms for the main phytoplankton biomass indicator (e.g.see Brewin et al., 2014).For pigments which are very similar in their spectral range, such as But, Hex and Fuco, the hyperspectral resolution of the linear models provides much more robust pigment predictions (Table 2a).The hyper_RRS linear models also produced better predictions for DVChl a, Zea, Diadino and PPC where the specific linear models require more than the first seven EOF modes (see Sect. 3.3.3).These are not available using the multispectral resolution of RRS data.

Satellite-based data set linear models
Results for the models predicting pigment concentration from the satellite-based data set were very similar when using 1 × 1, 3 × 3 or 5 × 5 collocated MERIS ρ wN data.Deviations were within 1 to 3 % for all statistical parameters.R 2 cv values were best in all cases for well predicted pigment concentrations in the 1 × 1 collocations, while MPDcv was best in the 3 × 3 collocations.For simplicity, in the following we are presenting and discussing the results of the 1 × 1 collocated reflectance data only.Introduction

Conclusions References
Tables Figures

Back Close
Full In line with field data linear model results, pigment groups and pigments which reached in every sample concentrations > 0 mg m −3 (MVChl a, TChl a, PSC and PPC; the full-fit linear model results are shown in Fig. 5a-c) were well predicted with similar cross validation statistics values using the satellite_ρ wN data set (Table 2b, upper part).Also good predictions for some pigments (DVChl a, Zea, Diadino, Hex, But, Fuco and Chl c 1/2 ) could be obtained by re-running the linear model analysis by excluding collocations with respective pigment concentrations of 0 mg m −3 (Table 2b, lower part).For DVChl a, Hex and Zea exemplarily results of the full-fit linear model are shown in Fig. 5d-f, respectively.Though, some of these pigments show only medium quality for one cross validation statistical parameter (lower R 2 cv for DVChl a and Zea, higher MPDcv for Fuco, Chl c 1/2 and Diadino).Similar to the field data linear models, no robust predictions were obtained for all other pigments which reached in less than 80 % of all samples concentrations > 0 mg m −3 , even when only data points with specific pigment concentrations > 0 mg m −3 were included (results not shown).

EOF modes relevant for pigment predictions
Results of the ∆AIC showing the significance of each EOF mode for the main pigment prediction linear models are presented in Table 3.For the hyper_RRS data set, the prediction linear models used EOF-2 and EOF-3 for all pigments.EOF-2 was the most relevant in the respective models for all pigment prediction, except for Zea and DVChl a were EOF-3 was the most important and closely followed by several other EOF modes.
For all other well-predicted pigments, EOF-3 followed EOF-2 by importance, except for Chl c 3 (EOF-4) and for PE (EOF-1).Besides PE, EOF-1 only was used (with medium importance) for But, DVChl a and Zea linear models.As discussed Sect.3.2, EOF-2 is reflecting the optical imprint of all phytoplankton pigments.The high ∆AIC value of EOF-2 for most pigments' linear models is probably caused by that the concentration of these specific pigments and most phytoplankton groups increase when TChl a increases.In contrast to that, cyanobacteria

Conclusions References
Tables Figures

Back Close
Full and especially its subgroup Prochlorococcus, containing the marker pigments Zea and DVChl a, respectively, are the most abundant phytoplankton under low TChl a concentrations.This has been manifested in the abundance-based algorithms to retrieve picoplankton from TChl a data (Uitz et al., 2006;Hirata et al., 2011).This may explain why predictions of those marker pigments by our linear models show lower ∆AIC for EOF-2 and require several different EOF modes in their linear models.
For DVChl a all nine EOF-9 and for Zea EOF-1 to EOF-4 and EOF-5 to EOF-8 were incorporated in their respective linear model.As in Craig et al. (2012), EOF-2 to EOF-4 were relevant for our hyper_RRS based TChl a and MVChl a predictions.EOF models developed by Taylor et al. (2013) to predict PE concentrations based on Lu data required the first four EOF modes, while our PE prediction based on RRS data required the first three EOFs only.For all other pigments, also the higher EOFs were necessary for robust predictions.
Similarly to the hyper_RRS linear models, the two multispectral linear models also showed EOF-2 to be the most important predictor for specific pigment models, except for DVChl a (both models) and Zea (only band_RRS).However, compared to the hyper_RR S linear models, much more EOF modes from the multispectral data were needed for all specific pigment models.

Number of data points to construct robust models
Our presented linear models to predict specific pigment or pigment group concentration are calibrated for an ocean colour data set of a specific region with coincident pigment measurements.Results of the variable jack-knife procedure indicate that the selection of minimal training points to set up a robust linear model for predictions varies among pigments and pigment groups and also among all three statistical parameters: the ratio of R 2 cv to R 2 (R 2 cv /R 2 ), the ratio of MPDcv to MPD (MPDcv / MPD) and RMSEcv, shown exemplarily for predicting TChl a, PSC, PPC and PE in Fig. 6.While R 2 /R 2 cv (Fig. 6a, d) already drops below 0.8 and then decreases exponentially for PPC with each step diminishing the number of training points below 50 for all linear models, for 2095 Introduction

Conclusions References
Tables Figures

Back Close
Full all other pigment predictions this is the case when it drops below 30 data points and even 15 data point for the hyper_RR S PE linear model.The slope of increase in RM-SEcv (Fig. 6c, f) varies among pigments and linear models.It is especially high for band_RR S TChl a and satellite_ρ wN PSC predictions (< 45 and < 70 training points, respectively).For the other predictions, RMSEcv indicate that more than 40 and 50 training points are required for the field and satellite-based data linear models, respectively.MPDcv/MPD below 1.4 indicating robust fits considering this criterion only are obtained for all pigments above 40 training points for the satellite_ρ wN (Fig. 6e) and above 30 for the hyper_RR S data sets (Fig. 6b).Generally, we observe that the band_RR S as compared to the hyper_RR S linear model results deteriorate faster with a decreasing number of samples used for training, especially for TChl a and PE.For the set-up of linear models at least 45 to 50 training data points are required, while for some pigments (e.g.TChl a) using the hyper_RR S data as input only 25 training data points are necessary.Based on the results we are confident that for both, the field and the satellite-based, data sets our number of data points used for linear model construction and cross validation, chosen for results presented in Sect.3.3, had been adequate for robust predictions.The number of collocated PE samples seems to have been too small, especially for the multispectral resolution, for predicting robust PE concentrations.data.The number of collocations used for training to obtain robust results for TChl a predictions was also similar for both studies, with more than 25 recommended for our hyper_RR S linear model and more than 15 for the Craig et al. (2012) linear model.Chase et al. (2013) used Gaussian functions to derive different chlorophylls, PSC and PPC concentrations from a large global data set of hyperspectral particulate absorption measurements.Their validation results showed MDPD values between predicted and observed concentrations of 30-36, 40-53, 49 and 51 % for TChl a, Chl_c, PSC and PPC, respectively.Our three linear models show similar (TChl a 27-32 %) or even much better MDPDcv values (Chl c 1/2 : 33-41 %, PSC: 32-42 %, PPC: 25-27 %) which indicates that our method produces robust results, also considering the fact that we use a more indirect measure of pigments, an AOP (reflectance), as opposed to the IOPs used in their study.Pan et al. (2010) developed pigment specific band-ratio algorithms with collocated in situ RRS(λ) and pigment measurements from the United States northeast coast.Those algorithms are based on deriving pigment specific coefficients for third order polynomial functions using the band ratio of either 490 to 550 nm or 490 to 670 nm (for SeaWiFS; for MODIS changed accordingly to MODIS bands 488 and 547 nm).Validation of results with collocated satellite (SeaWiFS and MODIS) reflectance data and pigment concentrations showed very good quality (MPD, RMSE and R 2 range from 36 to 48 %, 0.23 to 0.29 and 0.65 to 0.90, respectively, for SeaWiFS, with similar results

Comparison to other approaches deriving pigment concentration
for MODIS) for several pigments, among them TChl a, TChl c, Caro, Fuco, Diadino and Zea.This method was modified to the Northern South China Sea accordingly using globally derived relationships and locally identified links between pigment concentration and sea surface temperature (Pan et al., 2013).They obtained similar validation results as in Pan et al. (2011).Compared to our linear model results the quality to predict pigment concentration is of similar quality: while our results for MPDcv and R 2 cv are slightly worse (42 to 50 % and 0.61 to 0.80, respectively), our results for RMSEcv (0.06 to 0.18 mg m −3 , except for TChl a 0.41 mg m −3 ) are much better.Introduction

Conclusions References
Tables Figures

Back Close
Full present in the region investigated.The advantage of our linear models, either set-up with reflectance data measured directly in the ocean water or obtained from a satellite ocean colour sensor, is that we can obtain robust results for other pigment groups and some specific pigments as well.For the Eastern Tropical Atlantic Ocean data set, these additional pigments (other than TChl a) include PPC, PSC, DVChl a and MChl a.
To some extent we can claim that even more pigments can be predicted when the linear model runs are adjusted to a data set which only incorporates samples from a region where the specific pigment is measured in every sample.Generally, we can also see from the field data linear models, that using a coherent in situ data set where all pigments have been measured by the same method and instrumentation will provide a wider range of pigment predictions because also the pigment data, used for linear model fitting and validation, have a more homogeneous error.An advantage of our linear method to pigment specific band algorithms is that we require a much smaller data set for establishing the prediction (about 50 as opposed to several hundreds) of collocated pigment and reflectance data.

Application of linear model to study large scale pigment distributions
For demonstrating the application of our linear model, we used the satellite_ρ wN specific pigment's full-fit models for TChl a, MVChl a, PSC and PPC and run these specific models using November 2008 MERIS Polymer ρ wN level-2 data to retrieve those pigments for an example time period on a larger spatial scale.By subtracting the MVChl a value from TChl a we also derived concentrations of DVChl a.Our predicted PPC concentrations show values in the same range as TChl a at the oligotrophic areas and about 50 % in the enhanced TChl a areas and the southern part of the bloom.As for DVChl a, in the northern part of the bloom PPC concentrations are significantly lower and only contribute to less than 10 % to the total pigment concentrations.PSC concentration in the oligotrophic and enhanced TChl a areas are much lower than PPC or even DVChl a concentrations, but reflect more or less on the large scale the TChl a distribution.Within the northern part of the Mauritanian upwelling PSC concentrations reach even values as high as for TChl a, while concentrations at the bloom further south contribute only to less than 10 % of the total pigment concentrations.In Taylor et al. (2011) the analysis of pigment and additional microscopic data clearly showed very high concentrations of Fuco, a main pigment of PSC, and a high dominance of diatoms within water samples at the Northern bloom collected at the same time period.
From our results, we can conclude, that the Northern phytoplankton bloom at the Mauritanian upwelling seems to have been freshly growing with very high photosynthetic activity while for most of the other areas a lot of the energy build up via photosynthesis was used for photo-protection.We have no information on photodegradation since no significant prediction linear model could be developed for pheopigments.These pigments had only been identified in less than 60 % of all samples collocated to Introduction

Conclusions References
Tables Figures

Back Close
Full Zea, Caro, PE), but also multi-spectral reflectance data from field or satellite (MERIS Polymer) data which are collocated to pigment data can be used to establish predictive linear models based on EOFs.A limitation for all predictions is that only pigments can be predicted which have been identified in every collocated sample and that adding a small value (0.0001 mg m −3 ) was not an appropriate solution to this problem.
The method proves for the first time to be applicable for predicting concentrations of not only TChl a and PE, but also of other pigments and pigment groups with weaker peculiar imprints on the underwater light field.Statistical resampling used for crossvalidation indicates that predictions were robust (R 2 cv ≥ 0.5, MDPDcv ≤ 44 % and MPDcv ≤ 60 %) for all pigments (except for PE, Zea and 19BF, which deviated for one of these measures) and pigment groups.Hyperspectral linear models proved to be already stable with less collocated samples for most pigment or pigment groups used for training (n > 30 to 40) than linear models based on multispectral reflectance data (n > 50).The linear models using MERIS Polymer reflectance data as input were applied to one month of satellite data to predict the concentrations of TChl a, PSC, PPC, MVChl a and DVChl a for the whole Eastern Tropical Atlantic.For the first time a consistent picture of several phytoplankton pigments indicating group specific behaviour and photo-physiology on larger spatial scale for this area were shown.
Our presented linear models are generic and can be applied to even a small, consistent collocated reflectance and pigment data set to enable various specific pigment predictions from continuous optical measurements.The optical data can be obtained from radiometric measurements based on various platforms (buoys, gliders, floats or satellite).On a global scale, TChl a, PSC and PPC are persistently predicted accurately, while other pigments may be better predicted on smaller spatial scales.Highly temporally resolved time series data, which depending on the platform even may be of good spatial coverage, can be used to study variability and change of overall phytoplankton and photo-physiological response to environmental variables.While we established the linear models for prediction of various pigments in typical case-1 waters, the method should be tested in the future for its applicability in case-2 waters as well.Introduction

Conclusions References
Tables Figures
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | al. ( where log(y p ) is the natural log-transformed concentration of pigment p, e 1,2,... n u 1,2,...n are the leading n PC scores from U, a is the intercept, and b 1,2,...n b 1,2,...n are the regression coefficients.A bidirectional stepwise routine was used to search for smaller multiple regression models based on fewer predictor terms.Best linear models were selected through minimization of the Akaike information criterion (AIC).Once the best linear model was determined, the relative importance of included terms was defined by the change in AIC (∆AIC) following each term's removal.Since the range of concentration varies a lot among the different pigments, we calculated mainly relative error statistics.According to the GlobColour full validation report(ACRI, 2007), the coefficient of determination (R 2 ), the slope (S) and intercept (I) of the Introduction Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 2. Randomly select n • d of collocated samples to include in training sets X train and Y train for spectra and pigment data, respectively.Remaining n(1 − d ) of samples are allocated to the validation sets X valid and Y valid .3. Standardize X train and perform EOF following Eq.(2) to obtain U train , Σ train , and V train .4. For each pigment concentration y valid p of Y valid do steps 5-9. 5. Fit linear model to log-transformed pigment concentrations using selected U train as in Eq. (3).log y train p = a + b 1 u train 1 + b 2 u train 2 + • • • + +b n u train n Discussion Paper | Discussion Paper | Discussion Paper | 6. Perform bidirectional stepwise search for smaller linear model.7. Standardize validation set and project X valid onto the EOFs V train and the inverse of singular values Σ train −1 o )/y o , respectively.Mean and median relative difference (MPDcv and MDPDcv, respectively) and the root mean square absolute difference (RMSEcv) over all permutation were determined, as followsp − y valid i ,p y valid i ,p •100   , i = 1, N[ %].(15Discussion Paper | Discussion Paper | Discussion Paper |

Figure 1
Figure 1 presents the distribution of collocated pigment and reflectance measurements for both data sets which were used as input for the EOF analysis.The field data set with 53 collocations has been obtained in two seasons, spring and fall, in 2008 and 2010, while the satellite-based data set consisted of 155, 150 and 135 collocated samples from 2002 to 2012 for the 5 × 5, 3 × 3 and 1 × 1 pixel collocation, respectively, covering all months except January, March and December.Fig. 2 shows the original and standardized spectra of the field and satellite-based data sets.Considering the conversion between RRS(λ) to ρ wN (λ) data by a factor of π, magnitude and shape of the original and standardized spectra are similar for the band resolved data sets, except that the standardized satellite_ρ wN data set contains only one spectrum with maximum 2088 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Our hyper_RR S TChl a linear model results (R 2 = 0.82, RMSE = 0.30, R 2 cv = 0.77, RMSEcv = 0.41; Fig. 4 and Table 2) are comparable to results by Craig et al. (2012; R 2 = 0.84, RMSE = 0.30, R 2 cv = 0.76, RMSEcv = 0.21).Even though Craig et al. (2012) used measurements only from one location and sampled about weekly throughout one year, while our field data set was from a much larger region (covering 95 • in latitudes and 85 • in longitude) and sampled at two seasons in 2008 and 2010 only.In their study the same linear model set-up was used with collocated in situ reflectance and TChl a data sampled at Compass Buoy Station in the Bedford Basin near Halifax as input 2096 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Figure 7 shows the monthly averages for those various pigment groups and pigments.Also the MERIS Polymer TChl a concentration for the same time and region is shown.The distribution of TChl a from the EOF model prediction or from the Polymer algorithm are very similar ranging from 0.00003 to 7.52 mg TChl a m −3 .For this particular month, the total biomass of phytoplankton shows a strong phytoplankton bloom (> 2 mg m −3 ) at the Mauritanian Upwelling spread in two parts, 19-24 • N and 14-7 • N, and high values (> 0.5 mg m −3 ) at all coastal areas of the African continent.Discussion Paper | Discussion Paper | Discussion Paper | TChl a concentrations > 0.3 mg m −3 are also spreading into the open ocean especially at 5-20 • N and 30-40 • W, along the 0 • latitude across from Africa to South America, and South of this at 3-10 • S from 3 • E to about 25 • W. MVChl a follows more or less the TChl a distribution, however only at the northern bloom it reaches the magnitude indicated by the TChl a values.The deviation between TChl a and MVChl a is obvious in the distribution of DVChl a which indicates that at the northern part of the Mauritanian Upwelling bloom Prochlorococcus (the only phytoplankton genus which contains DVChl a) seems to have contributed to this bloom by only a very minor fraction (few percent), while elsewhere it presents a substantial background of about 30 % of all phytoplankton.
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Table3.∆AIC for the robust pigment predictions of the pigment groups TChl a, PSC, PPC and the pigments MVChl a, Zea, and DVChl a by the EOF models based on field RRS in (a) hyper-(hyper_RRS) and (b) multi-(band_RRS) spectral resolution and c) the satellite_ρ wN (from MERIS Polymer) using the 1 × 1 pixel collocation criterion.The pigments listed under "no 0 mg m −3 " were predicted using a reduced data set where the respective pigment reached concentrations above 0 mg m −3 .Bold highlights the EOF mode with the highest ∆AIC.

Figure 1 .Figure 4 .Figure 5 .
Figure 1.Position of pigment samples used in this study.Red: field data set, black: samples which are only collocated to satellite-based but not to field reflectance data, circles: samples which are only collocated to field but not to satellite-based reflectance data.Stars, diamonds and squares show collocations to MERIS Polymer data based on the 1 × 1, 3 × 3 and 5 × 5 pixel criteria, respectively.
Table 2a) show that, especially regarding the R 2 cv and RMSEcv values, the hyper_RRS based linear models perform either slightly better (PPC, PSC, Chl c 1/2 , But, Chl c 3 ) or much better (TChl a, Fuco, MVChl a, PE, Diadino, and Hex.Considering the MDPDcv and MPDcv values it is less clear for MVChl a, Chl c 1/2 , TChl a and PSC predictions.For the later the multispectral resolution seems to be sufficient to obtain similar robust linear models.

Table 1 .
Percent of total variance explained (expl.variation) by the significant EOFs derived from field RRS spectra in hyper-(hyper_RRS) and multi-(band_RRS) spectral resolution and from satellite_ρ wN (from MERIS Polymer) using the 1 × 1 pixel collocation criterion.