Ocean Color Algorithm for the Retrieval of the Particle Size Distribution and Carbon-Based Phytoplankton Size Classes Using a Two-Component Coated-Spheres Backscattering Model

. The particle size distribution (PSD) of suspended particles in near-surface seawater is a key property linking biogeochemical and ecosystem characteristics with optical properties that affect ocean color remote sensing. Phytoplankton size affects their physiological characteristics and ecosystem and biogeochemical roles, e.g. in the biological carbon pump, which has an important role in the global carbon cycle and thus climate. It is thus important to develop capabilities for measurement and predictive understanding of the structure and function of oceanic ecosystems, including the PSD, phytoplankton size 5 classes (PSCs) and phytoplankton functional types (PFTs). Here, we present an ocean color satellite algorithm for the retrieval of the parameters of an assumed power-law PSD. The forward optical model considers two distinct particle populations — phytoplankton and non-algal particles (NAP). Phytoplankton are modeled as coated spheres following the Equivalent Algal Populations (EAP) framework, and NAP are modeled as homogeneous spheres. The forward model uses Mie and Aden-Kerker scattering computations, for homogeneous and coated spheres, respectively, to model the total particulate spectral backscatter- 10 ing coefficient as the sum of phytoplankton and NAP backscattering. The PSD retrieval is achieved via Spectral Angle Mapping (SAM) which uses backscattering end-members created by the forward model. The PSD is used to retrieve size-partitioned absolute and fractional phytoplankton carbon concentrations (i.e. carbon-based PSCs), as well as particulate organic carbon (POC), using allometric coefficients. This model formulation also allows the estimation of chlorophyll-a concentration via the retrieved PSD, as well as percent of backscattering due to NAP vs. phytoplankton. The PSD algorithm is operationally applied

where D is particle diameter, N [m −4 ] is the differential number concentration of particles per unit volume seawater and per bin width of particle diameter, N 0 = N (2 µm) is the particle number concentration at a reference diameter, here D 0 = 2 µm, and ξ is the power-law slope of the PSD. Equation 1 has to be integrated over a given diameter range to get the total particle number concentration in that range, N T , [m −3 ]. 50 Ocean color is quantified by the spectral shape and magnitude of the remote-sensing reflectance, R rs (λ) [sr −1 ], where λ is the wavelength of light in vacuo. The Kostadinov-Siegel-Maritorena 2009(KSM09, Kostadinov et al. (2009) algorithm retrieves the parameters of an assumed power-law PSD (ξ and N 0 in Eq. 1) from ocean color remote-sensing observations, using the spectral shape (Loisel et al., 2006) and magnitude of the particulate backscattering coefficient, b bp (λ) [m −1 ]. b bp (λ) can be retrieved using existing inherent optical property (IOP) inversion algorithms; KSM09 uses the Loisel and Stramski (2000) IOP 55 inversion. Subsequently, the retrieved PSD parameters allow the quantification of absolute and fractional PSCs -picoplankton, nanoplankton and microplankton, based on bio-volume (Kostadinov et al., 2010) or phytoplankton carbon (Kostadinov et al., 2016a) (henceforth TK16) via allometric relationships (Menden-Deuer and Lessard, 2000). Phytoplankton carbon (phyto C) is the key variable of interest for carbon cycle and climate studies and modeling, and TK16 (data set available -Kostadinov et al. (2016b)) represents a relatively unique carbon-based approach among PSC/PFT algorithms (Mouw et al., 2017) as it is 60 based on knowledge of the PSD and allometric relationships to get at size-partitioned phyto C. Roy et al. (2013Roy et al. ( , 2017 retrieve phytoplankton-specific PSD and size-partitioned phyto C, based on the phytoplankton absorption coefficient. The KSM09 PSD algorithm (and the TK16 phyto C/PSC derived from it) is built on the assumption of a single population of particles (approximated by homogeneous spheres), representing backscattering due to the entire oceanic particle assemblage -phytoplankton cells and non-algal particles (NAP). However, particle internal composition and shape influence its optical (e.g. Dall'Olmo et al. (2009)). Thus, a 2-component particle model is used here, separately modeling NAP as homogeneous spheres of wider size range than phytoplankton, so that bulk b bp of oceanic waters can be modeled (e.g. Stramski et al. (2001); Moutier et al. (2016); Duforêt-Gaurier et al. (2018)). NAP are modeled as having generally organic detrital composition, but with some allowance for higher indices of refraction to account for minerogenic particle contributions. The PSD forward model 85 can thus also produce a first-order estimate of POC, and the percent contribution of phytoplankton and NAP to b bp . Subsequent sections present details of the 2-component, EAP-based forward IOP model, the inversion methodology developed for operational application of the PSD algorithm, and the use of the satellite-derived PSD to retrieve derived products (following the methods of TK16 with some modifications), namely -absolute and fractional size-partitioned phytoplankton carbon (henceforth phyto C) (i.e. carbon-based PSCs), as well as Chl and POC estimates. The novel algorithm is applied oper-90 ationally to monthly data from the multi-sensor merged OC-CCI v5.0 data set (Sathyendranath et al., 2019(Sathyendranath et al., , 2021) -examples are shown in the manuscript, and the entire data set is publicly available and linked below (See Sec. 4). We then present and discuss an initial effort of validation of the new PSD algorithm and derived products using global compilations of PSD, picophytoplankton carbon and POC in-situ data. A comparison with other existing methods to retrieve phyto C is presented. We also discuss algorithm uncertainties, assumptions and limitations as well as future work directions. 95 2 Data and Methods

Particle optical model input specification for Phytoplankton and NAP
The contributions of two separate particle populations to bulk backscattering are modeled using Mie theory (Mie, 1908) for homogeneous spherical particles and the Aden-Kerker (Aden and Kerker, 1951) method for coated spheres. Living phytoplankton cells are represented by the first particle population, and all other suspended particles of any origin (i.e. NAP) are 100 represented by the second population. Living phytoplankton cells are modeled as coated spheres using the Equivalent Algal Populations (EAP) framework (Bernard et al., 2009;Robertson Lain et al., 2014;Robertson Lain and Bernard, 2018) for determining optical model inputs, in particular the complex indices of refraction of the particle core and coat. NAP are modeled as homogeneous spheres meant to represent organic detritus, but also allowing for their real index of refraction to vary over a wider range to take into account the contribution of mineral particles.

105
A characteristic of the PSD algorithm presented here is that it is mechanistic to the extent feasible, i.e. based on first principles and causality, even at the expense of increasing complexity. For example, as in EAP, the imaginary refractive index (RI) of the cell is a function of intracellular chlorophyll concentration, Chl i . We vary some optical model inputs in a Monte Carlo simulation in order to assess uncertainty and base the PSD inversion on an ensemble of forward runs rather than a single set of inputs. Details of uncertainty estimation and propagation are given in Supplement Sec. S1. Details of how each input 110 parameter for phytoplankton cells and for NAP is specified, as well as the statistical distributions from which the Monte Carlo simulation instances were picked are specified in Table 1 and Table 2.
As in the EAP model, the chloroplast is represented by the particle coat. Its relative volume, V s , is picked from a distribution as shown in Table 1. The chloroplast's imaginary refractive index (RI) (relative to seawater) at 675 nm, n ′ (675), is then computed as follows (Morel and Bricaud, 1986;Bernard et al., 2009;Robertson Lain and Bernard, 2018): 115 n ′ (675) = Chl * × Chl i × 10 6 × 675 × 10 −9 4π × V s × n sw (675) ( 2) where Chl * = 0.027 m 2 mg −1 is the theoretical maximum specific absorption coefficient of chlorophyll at 675 nm when dissolved in water (Bernard et al., 2009;Robertson Lain and Bernard, 2018), Chl i is the intracellular chlorophyll concentration in kg Chl m −3 of cellular material, and n sw (675) is seawater's absolute real RI at 675 nm. A hyperspectral basis vector from the EAP model (based on measurements, for details see Bernard et al. (2009); Robertson Lain and Bernard (2018)) is then 120 scaled using the value at 675 from Eq. 2, obtaining a hyperspectral relative imaginary RI for the coat as chloroplast. In Eq. 2, Chl i applies to the whole cell and is therefore scaled using V s to obtain n ′ (675) for the coat alone. The nominal chloroplast's real relative RI is then picked from a distribution as shown in Table 1, and modified as a function of its imaginary RI according to the Kramers-Kronig relations (implemented as a Hilbert transform) (Bernard et al., 2009;Robertson Lain and Bernard, 2018).

125
The cell cytoplasm is represented by the particle core. It's real relative RI is picked from a distribution given in Table 1, and it is modified by the Kramers-Kronig relations using a constant hyperspectral detritus-like imaginary RI, i.e. having a colored dissolved organic matter (CDOM)-like exponential spectral shape, resulting in spectrally-varying hyperspectral relative real RI. The phytoplankton particle population relative RIs and their Monte Carlo variability are summarized in Supplement Fig.   S1.

130
The NAP population is represented by a homogeneous sphere, the relative RIs of which are picked so that its absorption spectrum is detritus-like (same as the core of phytoplankton), and its real RI is allowed to vary over a wider range of values, meant to represent mostly organic detritus, but with some minerogenic contributions, resulting in a mean nominal real relative RI of ≈ 1.06. The input RIs and other input parameters for NAP are summarized in Table 2.
Specification of the input PSD parameters and the relationship of NAP to phytoplankton PSDs is key to the construction 135 of the forward and inverse models. Necessarily, some key simplifying assumptions are made here in order to construct an algorithm with operational application to modern multi-spectral ocean color sensors. The two key assumptions are: 1) Phytoplankton and NAP have a power-law PSD (Eq. 1) with the same slope ξ, and 2) The scaling parameter N 0 for NAP is twice that of N 0 for phytoplankton (the forward model uses default values as in Tables 1 and 2). The latter assumption is chosen so that it results in a phyto C:POC ratio of 1:3 (see Kostadinov et al. (2016a) and Behrenfeld et al. (2005), and Sec. 3.4 here) (as

Coat (Chloroplast)
Relative Imaginary RI n ′ coat (λ) is computed from a hyperspectral basis vector (from Bernard et al. (2009);Robertson Lain and Bernard (2018)) that is scaled to the value at n ′ (675) using Eq. 2.

Backscattering Calculations
The backscattering efficiencies Q bb (λ), for a single phytoplankton cell and NAP particle were computed using the inputs 145 described above in Sec. 2.1 and Tables 1 and 2. The coated spheres code of Zhang et al. (2002) was used for both coated and homogeneous spheres. This code is included with the algorithm development scientific code of the PSD algorithm (see Sec. 4).
Calculations were run for N = 3000 instances of Monte Carlo simulations, each with a unique randomly picked combination of inputs for phytoplankton and NAP. This resulted in 3000 sets of hyperspectral Q bb values. High sampling resolution in diameter space was picked for the coated spheres (10000 samples between minimum and maximum diameter) in order to minimize the 150 influence of resonance spikes in Q bb . For NAP, 1000 samples of D were used.
Indices of refraction for both phytoplankton and NAP are specified hyperspectrally (Supplement Fig. S1) and the computations are performed from 400 nm to 700 nm wavelength in vacuo with a step of 1 nm, allowing the resulting hyperspectral Q bb (λ) values to be adapted for use with any combination of visible optical wavebands pertaining to recent and currently operating ocean color multispectral sensors, or for planned (e.g. PACE (Werdell et al., 2019)) or existing hyperspectral sensors.

155
Before b bp calculation, hyperspectral backscattering efficiencies, Q bb , for each Monte Carlo run were first pre-processed by applying quality control, and band-averaging using a moving average 11-nm-wide top-hat filter, using as central wavelengths the nominal bands of the following ocean color sensors: Sea-viewing Wide Field-of-view Sensor (SeaWiFS), Moderate Resolution Imaging Spectroradiometer (MODIS) Aqua, Medium Resolution Imaging Spectrometer (MERIS) and Ocean and Land nership (S-NPP), plus 440 and 550 nm, resulting in 19 unique bands for band-averaged backscattering efficiencies, denoted here as Q bb (λ). The band-averaged spectral particulate backscattering coefficient, b bp (λ) was then calculated from the Q bb (λ) values and the input PSD as follows (e.g. van de Hulst (1981); Kostadinov et al. (2009) where m is the complex index of refraction (specified separately for coat and core in the case of phytoplankton). Equation 3 165 is applied separately to the phytoplankton and NAP modeled Q bb values, and for each of the 3000 Monte Carlo runs. Bandaveraged total b bp (λ) spectra are then calculated as the linear sum of phytoplankton and NAP backscattering.

End-member construction
Band-averaged total b bp (λ) spectra were used to construct the backscattering end-members, E(λ), corresponding to specific 170 input values of the PSD slope ξ. First, individual total b bp spectra from each Monte Carlo run (N= 3000) were normalized by the value at 555 nm. The median of all normalized spectra at each waveband was used as the end-member for each PSD slope, from ξ=2.5 to ξ = 6 in steps of 0.05 (see Table 1). This approach allows the isolation of b bp spectral shape (dependent on ξ), and spectral magnitude (dependent on N 0 ) (Eq. 3). Using the hyperspectral underlying Q bb values, end-members can be constructed for any desired set of wavelengths. 175 2.3.2 PSD parameter retrieval and operational application to OC-CCI ocean color data The PSD parameters ξ and N 0 are retrieved using the backscattering end-members, E(λ), via the spectral angle mapping (SAM) technique (e.g. Dennison et al. (2004)). Briefly, the end-members and satellite-observed b bp spectra are treated as n-dimensional vectors where n is the number of bands. The spectral angle between a given end-member and the observed spectrum is then calculated using the vector dot product as: Thus, spectral angle is an index of spectral shape similarity between two spectra, with more similar spectral shapes resulting in lower spectral angles. Equation 4 was used to calculate the spectral angle Θ between each of the 71 end-members,E(λ), and the input observed b bp(λ) spectrum. The value of ξ corresponding to the smallest spectral angle is then assigned as the retrieved PSD slope. Three wavebands were used, namely 490, 510 and 550 nm. For operational application to OC-CCI v5.0 estimate the corresponding R rs (550), which is used in the Loisel and Stramski (2000) IOP inversion. The band-shifting was constructed using the band ratios between the respective original and target bands from a hyperspectral run of the Morel and 2.4 Derived products: Size-partitioned phytoplankton carbon, PSCs, POC and Chlorophyll Once the PSD parameters are known, they can be used to compute derived products (Kostadinov et al., 2010(Kostadinov et al., , 2016aRoy et al., 2017)). Phytoplankton carbon in any size class spanning from cell diameter D min to cell diameter D max (in m) can be 195 estimated as: where N 0ϕ = 1 3 N 0 , and N 0 (m −4 ) for the total PSD is the satellite-retrieved parameter from total particulate backscattering; the other PSD parameters are as in Eq. 1. Equation 5 was used to compute size-partitioned phyto C in three size classespicophytoplankton (0.2 to 2 µm in diameter), nanophytoplankton (2 to 20 µm in diameter) and microphytoplankton (20 to 50 200 µm in diameter), as well as total phyto C as the sum of the three classes. Carbon-based PSCs are defined as the fractional contribution of each of the three size classes to total phyto C (Kostadinov et al., 2016a). Given the first-order correspondence between PSCs and PFTs (e.g. Quéré et al. (2005)), these PSCs can also be interpreted as PFTs. The allometric coefficients of Roy et al. (2017) are used here, namely a = 0.54 and b = 0.85; when cell volume V is expressed in µm 3 , cellular carbon is computed in pg C per cell using these coefficients (Eq. 5, see also Menden-Deuer and Lessard (2000)). Phyto C in Eq. 5 205 is given in mg m −3 ; the conversion factors in Eq. 5 are used to convert from m 3 to µm 3 , and from pg to mg C (Kostadinov et al. (2016a); Roy et al. (2017)). The factor of 1 3 is an assumption of the model (Tables 1 and 2). Thus, an estimate of POC (computed using the same size limits as total phyto C) was calculated as 3 × phyto C.
Chlorophyll concentration was estimated from the PSD retrievals and the input intracellular chlorophyll concentration, Chl i (Table 1; Roy et al. (2017)) as follows: Here, Chl i , D, D 0 and N 0ϕ all have to be expressed in consistent units so that Chl is obtained in mg m −3 . Here we use the median Chl i across all Monte Carlo simulations to produce a single Chl estimate.

Validation and Comparison
A data set of near-surface in-situ PSD measurements was compiled for validation of the PSD parameter products, ξ and 215 N 0 (Eq. 1). The data set consists of Coulter counter and Laser In-Situ Scattering and Transmissometry (LISST) measurements, and a small set of PSDs derived from multiple instruments and modeling. Specifically, the compilation consists of the following data sets: 1) a compilation of several data sets of Coulter counter measurements, as used in the KSM09 algorithm validation in Kostadinov et al. (2009), 2) LISST-100X (Sequoia Scientific©) measurements from the Plumes and Blooms Project (e.g. Toole and Siegel (2001); Kostadinov et al. (2007)) in the Santa Barbara Channel, as used in Kostadinov https://oceanexports.org/); and 5) PSDs obtained using a VSF-inversion technique (Zhang et al., 2011(Zhang et al., , 2012 from the volume scattering functions (VSFs) measured during the NASA EXPORTS campaign (Siegel et al., 2016) in the North Pacific in 2018 (Siegel et al., 2021).
The compiled PSD data set was used to fit for the PSD parameters of Eq. 1 using the 2 to 20 µm diameter range. One data point was removed from the 2018 EXPORTS PSD data due to a poor fit to a power-law PSD. These in-situ estimates were 230 matched to satellite OC-CCI v5.0 (Sathyendranath et al., 2019(Sathyendranath et al., , 2021 satellite R rs using the same matching methods described below for POC and pico-phytoplankton carbon data. Matched reflectances were used as input to the novel PSD algorithm presented here. The in-situ and satellite PSD parameters were then compared using a type II linear regression and several additional algorithm performance metrics (e.g. Seegers et al. (2018)), details of which are given in the Fig. 8 caption.
A large compilation of in-situ POC data was collected from various public databases and private contributors and was used 235 here to perform match-ups with satellite OC-CCI v5.0 data. In addition to the POC data (1997-2012) used in Evers-King et al.
(2017) for algorithm validation (N = 3891), this study also incorporated recent in-situ POC data (2013-2020) from the SeaWiFS Bio-optical Archive and Storage System archive (https://seabass.gsfc.nasa.gov/). The daily, 4 km, sinusoidal projection OC-CCI v5.0 data (1997-2020) (Sathyendranath et al., 2019(Sathyendranath et al., , 2021 were used to extract the closest central satellite pixels to the in-situ data points. If the central satellite match-up pixels was valid, the surrounding eight pixels (a 3 x 3 pixel box) were 240 also extracted to estimate the mean, median, and standard deviation of all OC-CCI variables. The match-up data points were then averaged with respect to depth (0 to 10 m), location, and date. Moreover, a number of uncertain match-up data points were removed, as described below. A total number of 6041 match-up data points were obtained and used for analysis. Here, the median satellite R rs (λ) matched-up spectra were used to compute the satellite-retrieved POC data using the PSD-based algorithm.

245
The in-situ pico-phytoplankton carbon data set compiled and used for algorithm inter-comparison as part of the ESA POCO project (Martínez- Vicente et al., 2017) was used here to generate match-ups with satellite OC-CCI v5.0 R rs data for further validation. Match-ups were generated in the same way as described above for POC.
All in-situ data described above were excluded from the validation if any of the following conditions were met: 1) average bathymetric depth from an ≈ 9 km buffer around the in-situ sample location was less than or equal to 200 m, or any grid 250 cell elevations in that buffer were 0 m or higher, using a downsampled, 4 km version of the NOAA ETOPO1 data set (https: //www.ngdc.noaa.gov/mgg/global/); 2) the in-situ sample depth was 15 m or greater, or 3) there were three or fewer satellite pixels available to use in the match-up, as detailed above.
All duplicate in-situ match-ups (in the sense of multiple in-situ data points that are close in space and time and receiving the same satellite match-up) were combined into a single match-up point as follows: for the PSD, the medians of the PSD 255 measurements from the NAAMES and EXPORTS cruised in each LISST size bin for such duplicates were used for the calculation of in-situ PSD parameters (a large number of duplicates since the data is in-line); for the rest of the PSD data, the fit PSD parameters themselves were averaged (a small number of duplicates). For the POC (large number of duplicates) and pico-phytoplankton carbon data (small number of duplicates), the averages of the duplicate in-situ were used.
In addition to validation against in-situ measurements of the PSD, POC and pico-phytoplankton carbon, satellite chlorophyll-  (Tables 1 and   2). These normalized spectra illustrate the strong spectral shape dependence on the PSD slope ξ. Phytoplankton b bp spectral shapes are complex, with various peaks and troughs near the absorption peaks of chlorophyll, but are more linear in the 490 of their absorption (Fig. 1B). Fundamentally, it is evident from Figs. 1A and 1B that the higher the PSD slope ξ, the steeper b bp spectral shape becomes, with higher values in the blue, since smaller particles dominate the signal. This dependence is at  The 71 end-members (EMs) created for operational application to existing major satellite ocean color missions and corresponding to PSD slope values between 2.5 and 6.0 with a step of 0.05 are displayed in Fig. 2A. They represent the modeled b bp (λ) spectra against which satellite-measured b bp spectra are compared using the SAM method (Eq. 4). The spectral shape 285 dependence on ξ demonstrates the theoretical ability to retrieve this parameter from space.
An important question in bio-optical oceanography is determining the sources of backscattering in the ocean and their relative contributions. This is still not a resolved issue , though progress has been made (e.g. Organelli et al.  The uncertainty in PSD slope ξ retrieval as a function of ξ is illustrated in Fig. 2C. These estimates are not symmetric about the ξ value and are derived via Kruskal-Wallis analysis of variance to determine class similarity (Supplement Sec. S1). As in KSM09, the general tendency is for the range of uncertainty in ξ to increase for lower PSD slopes, but it is always less than 0.5. The uncertainty in the b bp (443)/N 0 ratio used to retrieve the N 0 parameter is shown in Fig. 2D in log10 space. Mean and 315 median values are similar, and the uncertainty about them does not vary much with PSD slope, also similarly to KSM09. The uncertainty in Fig. 2D at each ξ value includes all statistically similar classes of EMs.
3.2 Operational Application of the PSD/Phyto C Algorithm to OC-CCI v5.0 Merged Satellite Data

PSD Parameters
The operational PSD algorithm presented here was applied to the monthly 4-km OC-CCI v5.0 R rs (λ) data set (Sathyendranath 320 et al. (2019, 2021)). Both PSD parameters (ξ and N 0 , Eq. 1) and derived products were generated (Sec. 2.3 and Sec. 2.4). These data and their monthly and overall climatologies (and associated uncertainties) are made publicly available (see 4). Here, we use May 2015 data to illustrate and discuss the new algorithm.
The PSD map (Fig. 3A) reveals a global spatial pattern consistent with expectations and with KSM09, namely the subtropical oligotrophic gyres are characterized by high PSD slopes (relatively high numerical dominance of small particles), whereas more 325 eutrophic areas such as coastal areas, Equatorial upwelling zones, and high latitudes exhibit lower slopes (increasing relative abundance of larger particles). This is consistent with oligotrophic ocean ecosystems being dominated by picophytoplankton, whereas microphytoplankton contribute significantly to the phytoplankton assemblage in eutrophic areas and during blooms (e.g. Kostadinov et al. (2009Kostadinov et al. ( , 2010 and refs. therein). PSD slope values retrieved by the SAM-based algorithm span the full modeled range of 2.5 ≤ ξ ≤ 6.0. This is in contrast to KSM09, where values below 3.0 were not retrieved. The N 0 PSD 330 parameter (Eq. 1) is, as expected, higher in coastal, high latitude and eutrophic areas (indicating higher particle loads), and lower in the oligotrophic subtropical gyres (Fig. 3B). N 0 varies over a few orders of magnitude. Note N 0 's units of m −4 (Eq.
1) and that care should be taken when comparing Eq. 1 and N 0 to other formulations of the PSD, e.g. the k parameter in Roy et al. (2017), as these are related, but not equivalent (see also Vidondo et al. (1997)). parameter are more uniform spatially, but higher in the gyres (Fig. 3D. Note that those are given in log10 space as a standard 340 deviation, and a relatively small absolute value of the uncertainty translates to relatively large uncertainties in absolute particle concentrations.

345
(2016a), values range over approximately 3 orders of magnitude, which is a higher range than retrievals based on other methods, namely direct empirical algorithm POC retrieval (Stramski et al., 2008) or the Behrenfeld et al. (2005) method of scaling backscattering, and it is also higher than the range in CMIP5 model ensembles (cf. Fig. 1 in Kostadinov et al. (2016a)). This putative underestimation in the gyres and overestimation in eutrophic areas suggests the need for algorithm tuning, which is discussed in Sec. 3.3 along with implications of validation results. Global validation of phyto C retrievals with analytical phyto 350 C measurements is planned, but is currently challenging as phytoplankton-specific carbon data are relatively novel (Graff et al., 2012(Graff et al., , 2015 and still scarce. Here, an initial validation effort is undertaken using several other variables, see Sec. 3.3. A key feature of the PSD-based algorithm is that phyto C can be partitioned into any number of size classes by choosing appropriate integration limits of Eq. 5. Absolute concentrations of pico-, nano-, and micro-phytoplankton are illustrated for May 2015 in Fig. 4B, C, and D, respectively. Pico-phytoplankton C is mapped on the same color scale as total phyto C (Fig. 4A), 355 but pico-and nano-phytoplankton C maps have differing scales, illustrating that while pico-phytoplankton C varies over ≈ 3 orders of magnitude spatially globally, nano-phytoplankton C varies over ≈ 4-5 orders of magnitude, and micro-phytoplankton -over ≈ 7 orders of magnitude spatially (see also Kostadinov et al. (2010Kostadinov et al. ( , 2016a. Note that empirical tuning will affect these ranges of variability, see Sect. 3.3. Fractional contributions of each of the three PSCs used here to total phyto C are illustrated in Fig. 5. Pico-phytoplankton dominate much of the open-ocean, lower latitude oligotrophic areas, contributing nearly 100% 360 of the carbon biomass there (Fig. 5A), nano-phytoplankton contribute up to ≈ 50% of biomass in the higher latitude and more eutrophic areas, and micro-phytoplankton contribute significantly only in the most eutrophic areas, e.g. during the North Atlantic bloom at ≈ 45-50 • N latitude (May 2015 is shown). As previously noted (Kostadinov et al., 2010(Kostadinov et al., , 2016a, this general pattern is consistent with current understanding of ocean ecosystems. The fractional carbon-based PSCs (Fig. 5) are ratios of two integrals of Eq. 5, thus they are analytical functions of the 365 PSD slope ξ and the b allometric coefficient (as well as the limits of integration used for each class and total phyto C).
These functions are plotted in Fig. 6, together with the satellite-observed ξ histogram for May 2015, illustrating which the most common values for the PSCs in the ocean are. Area-wise, the ocean is dominated by oligotrophic areas with high picophytoplankton contributions to C biomass. As an illustration of uncertainty propagation to derived products, the propagated uncertainty to total phyto C (Fig. 7A) and 370 fractional pico-phytoplankton C biomass (Fig. 7B) are shown. Comparison of Fig. 4A with Fig. 7A indicates that absolute total phyto C uncertainties are of the same order of magnitude as the values themselves. This is a partial uncertainty estimation due to the assumed distributions of the Mie inputs (Tables 1 and 2), and due to the allometric coefficients. The Mie inputs are varied over wide ranges to accommodate various environments in the global ocean, with the goal of having a single first-principlesbased operational algorithm applicable to first order globally. This increases the uncertainty estimates. The uncertainty for 375 the fractional PSC products depends only on the uncertainties in ξ and b, thus they exhibit much lower internal algorithm uncertainty compared with absolute values. For pico-phytoplankton, they are < ≈ 2% for the oligotrophic gyres, and do not exceed ≈ 7% globally. This suggests that the fractional PSCs are more reliable products than the absolute values, and they can also be used with other products to partition them -e.g. total phytoplankton carbon estimated using the alternative methods shown in Fig. 9   The formulation of the PSD algorithm allows for both POC and Chl (Eq. 6) to be estimated from the retrieved PSD. Due to the assumptions used, POC is phyto C multiplied by three (Fig. 7C). This is strictly true only if the POC estimate uses the 385 same limits of integration as phyto C, which is an approximation to the usual POC operational definition (e.g. see discussion of POC-PSD closure analysis in Kostadinov et al. (2016a)). POC thus is estimated to first order, treating the retrieved NAP as being composed of POC only, and applying the same allometric relationships to it as phyto C, in spite of the fact that the assumed RI distribution of the NAP is broader (  (2018)). This is a planned development of the model in the future; the goal here is to build an operational PSD/phyto C algorithm (based on first principles, as mechanistic as feasible) for use with multi-spectral satellite data of limited degrees of freedom. Hyperspectral sensors such as PACE (Werdell et al., 2019) should allow for some more degrees of freedom and thus for more independent particle components and their PSDs to be modeled separately. However, note that even hyperspectral data has limits on its degrees of freedom that are expected to be much fewer than the number 395 of sensor bands (Lee et al., 2007;Cael et al., 2020). An important benefit of POC is that it is a widely observed variable, available for global validation efforts (Sec. 3.3), as is Chl. Similarly to POC, there are benefits of the PSD-derived estimate of Chl (Fig. 7D) -it can be used as additional verification/validation of model retrievals, and/or PSD-retrieved Chl can be used as a parameter to optimize for in algorithm tuning, as discussed shortly (Sec. 3.3). Next, we discuss validation/verification and tuning efforts in which both PSD-derived POC and PSD-derived Chl are used.

Concentrations
In an initial validation effort, the novel PSD/phyto C algorithm is validated/verified using several variables. It is challenging to directly globally and thoroughly validate the major products of the algorithm -the PSD and size-partitioned phyto C, due to a paucity of globally-spanning in-situ observations which are further reduced when performing satellite match-ups. Here, 405 we validate or verify algorithm performance against compilations of the following variables: 1) in-situ PSD observations (Sect 2.5); 2) in-situ POC observations; 3) in-situ pico-phytoplankton C observations; 4) concurrent satellite observations of Chl.
Maps of the locations of in-situ observations are shown in Supplement Fig. S5. In addition, we compare phyto C retrievals to several existing methods using the example May 2015 OC-CCI v5.0 image. Further, based on these results, we suggest an empirical tuning of the algorithm.

410
Validation results for the PSD slope ξ (Fig. 8A) indicate a statistically significant but noisy relationship between retrieved and observed slopes, with a positive bias for satellite retrievals, and a regression slope substantially greater than unity. Most validation points are scattered in a cloud of data between 3.0 and 4.5 that does not exhibit much correlation, and there is a somewhat separate cluster of data centered about a slope of 5.25 in the satellite retrieval that has smaller corresponding in-situ values of about 4.0. There is generally a clear tendency for points from more oligotrophic areas (as indicated by Chl 415 color coding) to exhibit higher satellite values, and more eutrophic areas to exhibit much lower satellite values. This tendency is weaker for the in-situ observations, which tend to have a narrower range, mostly between 3.0 and 4.5. To first order, the satellite data are in the same range as in-situ data and the retrievals capture the in-situ data trend; however, there is a pattern of having a bigger range of PSD slopes in satellite data than in the in-situ match-ups, with the algorithm underestimating low values and overestimating high values. Validation for the N 0 parameter (Fig. 8B) is statistically significant (somewhat higher R 2 than the ξ regression) but also quite noisy.Strong clustering of the in-situ observations around 10 15.5 -10 16.0 m −4 is observed, and the majority of these observations are somewhat overestimated in the satellite retrievals, which cluster around 10 16.25 . Notably, much lower satellite values of validation points that form a separate cluster are associated with lower Chl, and are underestimated instead -the in-situ values are also lower than the main cluster of point just discussed, but less so. Since N 0 is the PSD scaling parameter 425 which generally controls absolute number, volume and carbon concentrations variability to first order, this has implications for the global pattern of phytoplankton carbon retrievals (Fig. 4), namely it is consistent with underestimation in the oligotrophic gyres and overestimation in the eutrophic areas. Overall, both satellite and in-situ data exhibit increasing values of N 0 with increasing Chl concentrations, as expected -i.e. more oligotrophic waters are associated with smaller overall particle number concentrations. Further discussion of the PSD validation by location of in-situ data ( Fig. S5A and S5B) is provided in 430 Supplement Sect. S4 and illustrated in Fig. S6.
This pattern of under-and overestimation in the N 0 validation drives the slope of the validation regression to be much greater than unity, and suggests an empirical tuning to absolute phytoplankton carbon estimates, via a linear (in log10 space) tuning of N 0 , as done in TK16 (Kostadinov et al., 2016a), who based the tuning on the validation regression. A similar approach is proposed here, but it is derived differently. Details of the tuning derivation procedure are given in Supplement Sec. S5. The

435
following global tuning equation was obtained: N 0 _tuned = 10 0.3859 log10(N0)+9.5531 (7) where N 0 is the original (un-tuned) PSD parameter. This tuning changes N 0 retrievals in a similar fashion to the TK16 tuning and is consistent with a tuning suggested by the N 0 in-situ validation presented here (Fig. 8B), namely, low satellite N 0 values are increased, and high N 0 values are decreased, decreasing the overall range of variability of retrieved N 0 and thus the range 440 of the retrieved derived variables as well. This addresses the low bias in oligotrophic gyres and the high bias in eutrophic areas.
The goal of the tuning is to get more realistic absolute retrievals of POC and Chl (hypothesizing that this should also lead to more realistic phyto C retrievals as well -however, see discussion below about the pico-phytoplankton C validation -Sect.

3.3.2).
The latitude oceans exhibit correction factors mostly less than unity in linear space (mostly between 0.1 and 1), which decreases phyto C and Chl up to an order of magnitude (rare, mostly less). This tuning is not applied to figures previously discussed here.

Comparison of the PSD-based phytoplankton carbon retrieval with existing satellite algorithms
In this section, we compare PSD-based phyto C retrievals presented here with two existing methods for its retrieval. The May 455 2015 original total phyto C retrieval is compared with the tuned total phyto C and the retrievals of the absorption-and PSDbased algorithm of Roy et al. (2017) and with the Graff et al. (2015) algorithm in Supplement Fig. S8. The histograms of these four images are compared in Fig. 9. The tuned retrievals are similar to those of Graff et al. (2015), whereas the original retrievals are similar to those of Roy et al. (2017), and the latter two have exaggerated ranges globally compared to the former two. Of these algorithms, the simplest is the Graff et al. (2015), as it is a direct scaling of b bp , and it is based on in-situ 460 chemical analytical measurements of phyto C (Graff et al., 2012(Graff et al., , 2015. These dichotomous inter-comparison results suggest that further algorithm inter-comparison and validation with direct in-situ measurements of phyto C are needed to guide future algorithm developments; however these data are relatively novel and scarce globally. Validation results using in-situ POC and pico-phytoplankton carbon (discussed next) exhibit a similar dichotomy.

465
PSD-based estimates of POC are validated against in-situ measurements for the original algorithm (Fig. 10A), and the tuned algorithm (Fig. 10C). Both regressions have satisfactory R 2 values, and also illustrate that in general higher POC values are associated with higher Chl (colormap). Notably, the original algorithm validation has a slope of ≈ 2 and exhibits substantial underestimates at low POC, and overestimates at high POC. As intended, the tuning corrects this range exaggeration, and significantly improves the slope, intercept, bias, RMS, and MAE. The regression with the N 0 tuning applied should not be 470 considered a truly independent validation, because the algorithm has been empirically tuned to retrieve POC well; however, the tuning was done with global POC imagery (using monthly images for 2004 and 2015) that uses the Stramski et al. (2008) empirical POC algorithm, not with these in-situ POC data directly.
In addition to the validation with in-situ POC, we performed a comparison of the matched satellite Chl and the corresponding PSD-based Chl estimate (Eq. 6), for the original (Fig. 10B) and the tuned algorithm (Fig. 10D). Both comparisons exhibit very 475 high R 2 values, and similarly to POC, the original algorithm underestimated Chl at low values, and overestimated at high Chl values. The tuning successfully addresses this, leading to excellent overall comparison of the tuned algorithm, with slope near 1.0 and low intercept. However, for the lowest Chl values (Chl < 0.1 mg m −3 ), performance deteriorates. The tuned comparison is not a fully independent validation, as the algorithm was tuned to compare well with OC-CCIv5.0 satellite retrievals (using global monthly images for 2004 and 2015). Overall, the comparison with Chl is encouraging, indicating that the model is able 480 to reasonably reproduce (with tuning) OC-CCI v5.0 standard satellite Chl values at the match-up points.
Validation against in-situ pico-phytoplankton carbon is presented in Fig. 11A (with no N 0 tuning applied), and in Fig. 11C with the tuning applied. The corresponding Chl comparisons between matched standard OC-CCIv5.0 Chl and Chl derived via the PSD model are shown in Fig. 11B and D. As with the POC match-ups ( Fig. 10B and D), comparisons with Chl are better for the tuned version of the algorithm, indicating that the tuning is needed to reproduce more realistic Chl values 485 globally. However, the tuning does not lead to any improvement in the validation results of pico-phytoplankton C (cf. Fig. 11A and C). The validation regression without tuning is statistically significant (p<0.05), albeit noisy (low R 2 = 0.18); satellite retrievals and in-situ data cover approximately the same ranges, and increasing Chl and in-situ pico-phytoplankton C generally correspond to increasing satellite values as well, with some tendency for under-and over-estimation as with the other variables.
However, the tuned satellite retrievals have a very narrow range that does not cover the range of the in-situ data, and validation

Further Discussion, Summary, and Conclusions
The novel PSD/phyto C algorithm described here represents a major overhaul of the KSM09 algorithm (Kostadinov et al., 2009)

495
(a comparison between KSM09 and the present algorithm is briefly discussed in Supplement Sect. S6). Unlike KSM09, two distinct particle populations are used -phytoplankton and NAP. Phytoplankton backscattering is modeled using coated spheres Mie calculations with inputs based on the Equivalent Algal Populations (EAP) approach (Bernard et al., 2009;Robertson Lain and Bernard, 2018). This model formulation allows assessment of percent contribution of phytoplankton and NAP to total b bp , as well as Chl to be estimated from the retrieved PSD. Underlying b bp forward modeling is hyperspectral, facilitating adaptation 500 of the algorithm to upcoming hyperspectral sensors like PACE (Werdell et al., 2019). PSD retrieval is achieved via spectral angle mapping (SAM), and no spectral shape is imposed on b bp ; operational end-members for current and past multi-spectral sensors and the OC-CCI v5.0 merged ocean color data set are created via band-averaging from the underlying hyperspectral modeled b bp .
The algorithm has been used to create an accompanying data set based on the OC-CCI v5.0 data set (Kostadinov et al. 505 (2022), see Sec. 4). We emphasize that the PSD parameters and derived retrievals presented here and in the accompanying data set ) are an experimental, research satellite product with relatively large uncertainties. We do  not claim that it is akin in validity and accuracy to the more established (and much more empirical!) algorithms for canonical products such as Chl and POC. As emphasized elsewhere in this text, the goal is to build an operational algorithm based on first principles as much as feasible, even at the expense of accuracy, in order to push the boundaries of what is retrievable 510 from space and move the science of bio-optical algorithm development forward. Potential users of these PSD and derived data ) need to be aware of its limitations, uncertainties and validation status, before using them, for example, in building or validation/constraining biogeochemical models. The choice of IOP algorithm to retrieve b bp (λ) is key for the PSD/phyto C algorithm, as the spectral shape of b bp is what the PSD slope retrieval is based upon (Eq. 4). The Loisel and Stramski (2000) IOP algorithm is chosen here, as in KSM09, because it allows spectral b bp retrievals that are not 515 constrained by a specific spectral function or parameterization on b bp as is done, for example, in QAA  and GSM (Maritorena et al., 2002. For the wavelengths used in the PSD slope retrieval, modeled and satellite-derived b bp spectral shapes compare well when the Loisel and Stramski (2000) algorithm is used and global patterns of the retrieved PSD parameters appear reasonable. Preliminary tests with Loisel et al. (2018) indicate that this algorithm is not as suitable for PSD retrieval in this regard. Use of Loisel et al. (2018), Jorge et al. (2021 and other IOP algorithms will be further investigated in 520 future development of the PSD algorithm. An important assumption of the model is that N 0 for NAP is twice that for phytoplankton, so that the phyto C to POC ratio is a constant 1:3. This ratio is expected to vary in the real ocean, and the value used here is a reasonable average choice (e.g.  S8). Further direct analytical observations of phyto C and the reconciliation and better understanding of the spatio-temporal variability of the phyto C to POC ratio should be a high priority in order to improve understanding of carbon pools and their relationships in the ocean (Brewin et al., 2021) and retrieve phyto C reliably from space.
The relatively poor PSD parameter validation results should be interpreted with caution, as there are multiple reasons for 535 discrepancies between the in-situ and satellite data and for the observed poor regression statistics, and the in-situ data have their own limitations. Importantly, the in-situ data PSD parameters are fit over a much narrower diameter range than the size range optically contributing the bulk of b bp (e.g. see Supplement Fig. S4), at least according to the modeled spectra. It is recognized that in the real world the particle assemblage is very complex and its sources of backscattering are still not fully resolved (e.g. Stramski et al. (2004); ). In particular, the composition and PSD of small sub-micron particles 540 appears to be of importance and is not well known; here we assume the same PSD and NAP composition across all size classes and globally. There is also a mismatch in temporal and spatial scales of sampling between the satellite and in-situ data. For example, the matched in-situ PSD data do not exhibit the same negative correlation between ξ and N 0 that the satellite data do (Supplement Fig. S9). We note that this negative correlation in the satellite data has a theoretical underpinning because what we know about global ocean ecosystems, namely that oligotrophic areas exhibit relative dominance of smaller phytoplankton (and 545 smaller overall concentrations of particles/biomass), as opposed to increased importance of larger phytoplankton and increased biomass in more eutrophic areas. We thus expect backscattering in the ocean to become "bluer", i.e. to have a steeper spectral slope, in oligotrophic areas. This is indeed observed in satellite data (e.g. Loisel et al. (2006)) and is the basis for our algorithm.
Therefore, we expect, in the ocean, globally and on average, N 0 to decrease with increasing ξ. This is not necessarily going to be captured by in-situ data of limited spatio-temporal coverage and fit over a narrower size range. 550 We further note that the number of matched up sample points in the validation regression is different among PSD, POC and pico-phytoplankton C, and their geographic distribution is different as well (Supplement Fig. S5). Namely, there are about an order of magnitude more POC match-ups than pico-phytoplankton carbon ones. Thus the different validation results presented here do not necessarily represent the same oceanographic conditions, e.g. the pico-phytoplankton C in-situ data has less representation of eutrophic areas and spans a smaller range of Chl than the POC validation, with very few points exceeding 555 Chl = 1.0 mg m −3 (cf. Fig. 10B and 11B). The pico-phytoplankton C data in Martínez- Vicente et al. (2017) are derived from cell counts (abundance) converted to carbon using specific conversion factors for different species/groups. Namely, 60 fg C per cell was used for Prochlorococcus, 154 fg cell −1 -for Synechococcus, and 1319 fg cell −1 for pico-eukaryotes. This differs from the PSD-based phyto C retrieval algorithm in which the conversion is a function of cell volume and is continuous. with increasing Chl associated with decreasing PSD slope, and increasing N 0 , phyto C and POC. While the relationship is strong, there is significant spread of the PSD parameters and phyto C data for a given Chl value, suggesting that there is added value in retrieving them separately, and that they should not all be treated as simply correlates of Chl. We note that there is a need for further investigation to avoid uniqueness of retrieval issues and degrees of freedom/independence issues, as well as more comprehensive and complete error propagation, since a lot of ecosystem properties are indeed correlated with Chl, and 570 all these retrievals come from the same multispectral data.
The power law (Eq. 1) is a parameterization of real-world PSDs, and while there are theoretical underpinnings (e.g. West  (2021)).There is less information on living phytoplankton only, and their specific PSDs, because it has been historically difficult to separate living phytoplankton and measure, say, their PSD or carbon (e.g. (Graff et al., 2012(Graff et al., , 2015). A recent study by (Haëntjens et al., 2022) investigates phytoplankton specific PSDs; their observations support the conclusion that the phytoplankton-specific PSD shape is consistent with a power-law to first order. We note that phytoplankton share their size domain with other organisms (bacteria on the low end and zooplankton at the high end) and we 580 note that a drop-off in the phytoplankton PSD will be expected at the limits of the size range of autotrophs (e.g. see Hatton et al. (2021)) hence a phytoplankton-specific power-law will have upper and lower range limits of applicability, and it is not expected to apply equally well over the same size range everywhere and always in the global ocean. Hatton et al. (2021) offer an assessment of the PSD of marine life over a huge range of sizes (body mass), demonstrating that a specific power-law applies, in the context of the Sheldon et al. (1972) hypothesis that equal biomass tends to occur in each logarithmically-spaced 585 size bin; their work offer support for use of the power-law for modeling phytoplankton (over their size range) globally.
The power-law is not a converging PSD model, i.e. it is sensitive to the chosen limits of integration (for a sensitivity analysis to the integration limits, see Kostadinov et al. (2016a)). Gamma functions may be a better choice to represent marine PSDs (Risović, 1993;Risović, 2002). However, we choose to use the power-law because of its theoretical underpinnings and because the goal is to build an operational algorithm (based on first principles as much as possible) for existing multispectral data 590 with limited degrees of freedom. We additionally assume that the PSD slope for both phytoplankton and NAP is the same, limiting the number of parameters to be retrieved. Hyperspectral data and observations of phytoplankton and NAP-specific PSDs and IOPs will be needed to relax these assumptions in the future. Organelli et al. (2020) observed that the PSD slope steepened for small particles, deviating from a power-law. This steepening could partially explain the putative under-estimates of the original algorithm in oligotrophic gyres. Moreover, the absolute number of particles retrieved is sensitive to uncertainties 595 in the real index of refraction assumed. In this context, we note that the algorithm is able to pick up the concentration of particles, to first order, according to the N 0 validation (Fig. 8B). We find this to be impressive and consider it a success, given that the algorithm makes no a-priori prescriptions about particle concentrations -they are solved for from the magnitude and shape of satellite-observed b bp . While the goal here is to create a global algorithm which uses one set of end-members, we recognize that future implementations can be improved by assessing the impact of using regionally variable subsets of index 600 of refraction distributions. The PSD parameterization and choices of Mie inputs, in particular complex indices of refraction, represent important sources of uncertainty and can also affect the need for tuning and the degree of suitability of estimating POC with our generic NAP population. Further algorithm analysis of performance and improvements need to focus on the index of refraction choices for the particle populations. For further discussion of algorithm uncertainties, see Kostadinov et al. (2009),Kostadinov et al. (2010 and Kostadinov et al. (2016a). Graff et al. (2015) observe a relationship between phyto C and b bp that is stronger than that for other proxies. This is encouraging for the use of backscattering as a proxy for phytoplankton carbon biomass. However, the link between the PSD and b bp spectral shape is a second-order effect that is not easily observed in in-situ observations (Kostadinov et al., 2009;Slade and Boss, 2015;Boss et al., 2018;Organelli et al., 2020), even though theoretical modeling demonstrates a clear link (Kostadinov et al. (2009); this study). Kostadinov et al. (2012) discuss some reasons why it may be difficult to observe this 610 relationship in current in-situ data, e.g. the fact that the PSD is fit over a narrow range of diameters compared to the size range theoretically affecting b bp . Nevertheless, these considerations and the overall performance of the KSM09 homogeneous algorithm as compared to the algorithm presented here leads to the conclusion that there are four primary directions that should be priorities for moving forward. First, investigate the effect of choices of index of refraction distributions, as discussed above.
Second, rather than relying only on b bp for PSD and phyto C retrieval, a blended approach should be developed that also uses 615 absorption, i.e. combine the approach here with that of Roy et al. (2017). Third, investigate the ability of hyperspectral data to provide more degrees of freedom for retrieval of more variables simultaneously, allowing relaxation of some key assumptions and perhaps a third particle population to represent POC and mineral particles separately; this is important in light of the upcoming PACE mission (Werdell et al., 2019). Hyperspectral absorption data in particular have the potential to increase information content and allow group-specific retrievals (e.g. Kramer et al. (2022), but see also Cael et al. (2020)). Finally, 620 fourth, collect more global, comprehensive in-situ data sets of all relevant variables, including and especially of phyto C (Graff et al. (2015)), for further model development and validation. With regard to the latter, agencies and investigators should focus on building quality controlled, one-stop-shop data sets.

Additional Information
Code and data availability. Code and data associated with algorithm development, as well as operational application to OC-CCI v5.0 data are published on the Zenodo ® repository  and are available at the following URL: https://doi.org/10.5281/zenodo.

6354654.
An OC-CCI v5.0-based satellite PSD/phyto C data set (monthly, 1997-2020, plus monthly and overall climatologies) has been published on the PANGAEA ® repository  and is freely available in netCDF format and browse images at the following URL: https://doi.org/10.1594/PANGAEA.939863

630
Appendix A: Details on the OC-CCI v5.0 Dataset Processing and analysis was done using the sinusoidal projection of OC-CCI v5.0. For user convenience, once the final products were generated, they were re-projected to equidistant cylindrical projection (unprojected latitude/longitude) before publication to the data repository linked above (Sec. 4). The empirical tuning (Sec. 3.3) is not applied to the variables in the published data set (Sec. 4). Instead, the spatially-explicit linear-space multiplicative tuning factor (Supplement Fig. S7B) is given. The 635 choice to provide an optional tuning to be applied at the user's discretion is dictated by the validation and comparison results discussed in the manuscript.
Author contributions. TSK designed the study, conducted the modeling and algorithm development and data analyses, and wrote the manuscript. SB and LRL provided the EAP model code and technical support for EAP modeling. SM helped with error propagation estimates and designed the band shifting methodology. CEK, BJ, VMV and SS extracted match ups and/or provided match up in-situ and satellite data set compilations. XZ provided the coated spheres code and technical support for it, as well as validation PSD data. HL and DSFJ provided technical assistance with IOP code testing. EK tested backscattering spectral shape sensitivity. SR provided Roy et al. (2017) algorithm output data. SB, LRL, XZ, SM, EK, SR, CEK, BJ, SS, HL, and DSFJ read the manuscript and provided comments/edits.
Competing interests. The authors declare no competing interests.
Disclaimer. The views and opinions expressed here are those of the authors and do not necessarily express those of NASA.
help with it. We acknowledge all in-situ data contributors to the BICEP/POCO projects compilations of POC and pico-phytoplankton carbon data sets. The OC-CCI reference is Sathyendranath et al. (2019) and the v5.0 specific reference is Sathyendranath et al. (2021). The modeling and processing is done using the sinusoidal projection (one of the projections provided by OC-CCI), whereas maps here are presented in equidistant cylindrical projection (unprojected lat/lon). Erik Fields, ESA, BEAM (Brockmann Consult GmbH) and NASA are acknowledged for the re-projection algorithm. Coastlines in maps shown here are from v2.3.7 of the GSHHS data set -see Wessel and Smith (1996).

660
The NOAA ETOPO1 data set (https://www.ngdc.noaa.gov/mgg/global/ was used in validation for bathymetry masking. Modeling and data processing was done in MATLAB ® . The cividis colormap used in most visualizations is a variant of the viridis colormap optimized for color vision deficiency perception and is from Nuñez et al. (2018) and the function to implement it in MATLAB ® is due to Ed Hawkins.
We acknowledge Emmanuel Boss and two anonymous reviewers for their very useful comments which helped improve the manuscript.