Technical note: Stability of tris pH buffer in artificial seawater stored in bags

Equimolal tris (2-amino-2-hydroxymethylpropane-1,3-diol) buffer in artificial seawater is a well characterized and commonly used standard for oceanographic pH measurements. We evaluated the stability of tris pH when stored in purportedly gas-impermeable bags across a variety of experimental conditions, including bag type and storage in air vs. seawater over 300 d. Bench-top spectrophotometric pH analysis revealed that the pH of tris stored in bags decreased at a rate of 0.0058± 0.0011 yr−1 (mean slope ±95 % confidence interval of slope). The upper and lower bounds of expected pH change at t = 365 d, calculated using the averages and confidence intervals of slope and intercept of measured pH change vs. time data, were −0.0042 and −0.0076 from initial pH. Analyses of total dissolved inorganic carbon confirmed that a combination of CO2 infiltration and/or microbial respiration led to the observed decrease in pH. Eliminating the change in pH of bagged tris remains a goal, yet the rate of pH change is lower than many processes of interest and demonstrates the potential of bagged tris for sensor calibration and validation of autonomous in situ pH measurements.


Introduction
Ocean pH is a key measurement used for tracking biogeochemical processes such as photosynthesis, respiration, and calcification  and represents perhaps the most recognized variable associated with ocean acidification (OA), the decrease in ocean pH due to the uptake of anthropogenic carbon dioxide . OA pro-gresses with a global average pH decline of 0.002 per year in the surface open ocean (Bates et al., 2014), and the accumulated and projected near-term effects of OA have been shown to have deleterious effects on many calcifying organisms (Cooley and Doney, 2009). Beyond the narrow scope of calcifiers, organismal response is complex, exhibiting varied responses across processes such as reproduction, growth rate, and sensory perception. Organismal responses are further complicated by their impact on ecosystem level dynamics, such as altering competition and predator-prey relationships (Doney et al., 2020). Furthermore, pH effects are often exacerbated by concomitant stressors, such as decreased dissolved oxygen or increased temperature. Ultimately, OA will affect humans through impacts on fisheries, aquaculture, and shoreline protection (Branch et al., 2013;Doney et al., 2020).
The quality of pH measurement required to observe various phenomena is often broken into "climate" and "weather" levels of uncertainty (Newton et al., 2015), or 0.003 and 0.02, respectively. Discrete sampling has been shown to be capable of meeting the climate level of uncertainty when best practices are followed, yet many labs do not consistently meet this standard (Bockmon and Dickson, 2015). Furthermore, while discrete, bench-top methodologies can be the most accurate, the ocean's vast size limits the oceanographic community's ability to make ship-based discrete pH measurements to decadal reoccupations of a few major sections per ocean basin (Sloyan et al., 2019). The sparsity of ship-board measurements hinders our ability to assess sub-decadal processes, such as seasonal cycles or bloom events, over much of the ocean (Karl, 2010) and highlights the need for autonomous, high-frequency pH measurements. Technological advancements have led to more routine autonomous pH measurements over the past decade, providing opportunities to fill some gaps in time and space in discrete sampling programs (e.g., Byrne, 2014;Martz et al., 2015;Lai et al., 2018;Wang et al., 2019;Tilbrook et al., 2019). Globally, pH sensors now operate on hundreds of autonomous platforms including moorings and profiling floats, delivering unique data sets in the form of Eulerian and depth-resolved Lagrangian time series Bushinsky et al., 2019;Sutton et al., 2019). While sensors increase data coverage, many sensor-based pH measurements, particularly on moored systems, continue to fall short of both climate and weather levels of uncertainty, as highlighted in the intercomparison tests carried out by the Alliance for Coastal Technologies (ACT, 2012) and by the Wendy Schmidt Ocean Health XPRIZE (Okazaki et al., 2017).
Independent validation is typically required for autonomous sensors to meet both weather and climate levels of uncertainty. For example, autonomous underway pCO 2 systems (Pierrot et al., 2009), moorings (Bushinsky et al., 2019), and autonomous surface vehicles (Chavez et al., 2017;Sabine et al., 2020) are able to provide climate quality observations with an uncertainty of ±2 µatm because traceable standard gases are frequently measured in situ. For pH measurements on profiling floats , sensor performance is validated by comparing to a deep reference pH field that is calculated using empirical algorithms (Williams et al., 2016;Bittig et al., 2018;Carter et al., 2018). This approach has demonstrated the ability to obtain high-quality pH measurements from a network of profiling floats  but requires measurements in the deep ocean where pH is comparatively stable. It is atypical for other pH sensors, including coastal moored sensors, to have an automated or remote validation. Therefore, on such deployments, validation has largely relied on discrete samples taken alongside the sensor (Bresnahan et al., 2014;McLaughlin et al., 2017;Takeshita et al., 2018), which presents unique challenges, primarily that spatiotemporal discrepancy can lead to errors of > 0.1, especially in highly dynamic systems (Bresnahan et al., 2014).
Similar to the method in use by pCO 2 systems, one approach to validate in situ pH sensors is by measuring a reference material or pH standard, repeatedly during a sensor deployment. The most commonly used standard for oceanographic pH measurement is an equimolal tris (2-amino-2hydroxymethyl-propane-1,3-diol) buffer in artificial seawater (ASW), hereafter referred to as tris or tris-ASW (DelValls and Dickson, 1998;Papadimitriou et al., 2016). The pH of tris has been characterized over a range of temperature, salinity, and pressure (DelValls and Dickson, 1998;Rodriguez et al., 2015;Takeshita et al., 2017;, allowing for accurate calculation of tris pH across a wide range of marine conditions. Furthermore, when stored in borosilicate bottles and under ideal conditions, these buffers have been shown to be stable to better than 0.0005 over a year (Dickson, 1993;Nemzer and Dickson, 2005), making tris a good candidate for in situ validation of long-term deployments of autonomous pH sensors. To be utilized for in situ applications, the reference solution must be stored in bags (as in Hales et al., 2005;Seidel et al., 2008;Sayles and Eck, 2009;Spaulding et al., 2014;Wang et al., 2015;Lai et al., 2018). Recently, in situ sensor validation using bagged tris was demonstrated by Lai et al. (2018) during a 150 d deployment of an autonomous pH sensor, where the tris standard was measured in situ every 5 d. However, the stability of tris when stored in bags has not been quantified systematically using spectrophotometric bench-top pH measurement techniques recommended as best practice (Dickson et al., 2007).
In this work we quantified the stability of tris stored in bags for 300 d. Tris from four separately prepared batches was stored in two bag types either in a lab or submerged in seawater. In addition, one batch was stored in borosilicate bottles in the lab as a control. Spectrophotometric pH measurements were made approximately every 2 months on each bag of tris. Throughout the experiment, certified reference materials (CRMs) for oceanic CO 2 measurements (Dickson, 2001) were used to assess the stability of the spectrophotometric pH system.

Methods
Two bag types were tested for storing tris (Fig. 1). Bag type 1 was custom made based on a design used in the "Burkeo-Lator" system (Hales et al., 2005;Bandstra et al., 2006), made from PAKDRY 7500 barrier film (IMPAK P75C0919). The barrier film is made of layers of polyester and nylon with a sealant layer of metallocene polyethylene. Two 23 cm × 48 cm (9 ×19 ) sheets were heat sealed on three sides, forming a pocket, and a 1.9 cm (3/4 ) diameter hole was cut into one of the pocket walls for the bulkhead fitting and bulkhead nut (McMaster-Carr 8674T55). The bulkhead was sealed into the wall with a silicone gasket (McMaster-Carr 9010K13) and washer (McMaster-Carr 95649A256) and coated with silicone sealant (McMaster-Carr 74955A53). A "push-toconnect" ball valve fitting (McMaster-Carr 4379K41) was attached to the bulkhead. The modified pocket was rinsed, dried, and heat sealed along the final edge to create a ∼ 4 L bag. Bags were left to dry for at least 24 h before filling. Bag type 2 was a commercially available 3 L Cali-5-Bond bag purchased from Calibrated Instruments and used without modification. It is a multi-layer bag made of plastic, aluminum foil (to prevent liquid and gas permeation), a layer of inert high-density polyethylene (to form a non-reactive inner wall), and a polycarbonate Stopcock Luer valve.
In this experiment, four batches of tris were prepared following the procedure in DelValls and Dickson (1998), using off-the-shelf reagents with no additional standardization or purification (e.g., recrystallization of salts). The focus of this paper is stability of bagged tris over time and does not pri- Figure 1. A picture of bag type 1 and 2 used to store tris in this study. oritize obtaining highly accurate equimolal tris (as would be necessary for characterization of thermodynamic constants, for example). The calculated pH of tris in this study was 8.2652 at 20 • C, based on quantity of reagents used. This is 0.0135 higher than the pH of equimolal tris, 8.2517 at 20 • C (DelValls and Dickson, 1998). The pH discrepancy was due to a unit error in the measurement of HCl (our preparation used mol L −1 rather than the prescribed mol kg-sol −1 ). This unit error resulted in a tris : trisH + of 1 : 0.97 that slightly differs from the 1 : 1 of truly equimolal tris. As this ratio is nearly equimolal, the term "equimolal" will continue to be used throughout this study. The details of the specific reagents used to prepare the tris solution can be found in Table A1.
Three stability tests were initiated at different times over the course of 18 months. The initiation of a given test is defined as the date of preparation of the tris used in that test. A summary of the differences between these tests is shown in Table 1 and described here. Each bag has a unique identifier in the format of "batch number, bag number, lab or tank." If this identifier is duplicative, the bags are differentiated with letters A to D. Each bag was rinsed before filling: 3 times with deionized water (DI), 5 times with ultrapure water (> 18 M resistivity), and at least 3 times with 200 mL of tris. Tris bags were stored on a lab bench or in a 5000 L test tank filled with ozone-sterilized, filtered seawater. Bag type 2 experienced delamination of exterior layers when stored in seawater during test 2 and was not used in further testing. Tris from batch 4 was also stored in borosilicate bottles fol-lowing the procedure in Nemzer and Dickson (2005). In addition to pH measurements, dissolved inorganic carbon (C T ) was measured on both bagged and bottled tris during test 3 to see if changes in pH were due to increased CO 2 . C T samples were measured using a custom-built system based on an infrared (IR) analyzer (LI-COR 7000) similar to systems used by O' Sullivan andMillero (1998) andFriederich et al. (2002). This IR measurement system is capable of measuring relatively low C T without requiring method adjustment and has been used to make near-zero C T measurements (Paulsen and Dickson, unpublished data). C T measurements were made on CRMs (Batch 179 and 183). The precision of the C T measurements was ±1.4 µmol kg −1 (pooled SD, n samples = 15, n measurements = 44).
Tris pH was measured every 55 ± 20 d (mean ± SD of measurement interval) throughout the experiment. The pH of tris was measured in triplicate at each time point with spectrophotometry using m-cresol purple as the indicator dye using the system described in Carter et al. (2013). Absorbance measurements were made in a 10 cm jacketed cell, and the temperature was measured directly adjacent to the cell outflow using a NIST-traceable thermometer (±0.1 • C, QTI DTU6028P-001-SC). Blank and sample were held for 3 min in the jacketed flow cell prior to absorbance measurements.
On average, temperature was stable to within a 0.02 • C range over the course of the day; the mean temperature throughout the experiment was 20.09 ± 0.23 • C (1σ ), although temperature was 0.6 • C higher than the average on one measurement day. Spectrophotometric pH measurements are reported at 20 • C by adjusting the measured pH value at the measured cell temperature T C (pH spec,T C ) to 20 • C (pH spec,20 • C ) using the known temperature dependence of tris (pH tris ) as follows: pH tris,T C and pH tris,20 • C are the theoretical pH of tris (at the measured temperature and 20 • C respectively) and were calculated using Eq. (18) in DelValls and Dickson (1998). This adjustment assumes that any potential difference in ∂pH/∂T between that corresponding to equimolal tris and that corresponding to our 1 : 0.97 tris : trisH + ratio has a negligible effect over the small temperature range observed.
To account for pH-dependent errors from impurities in unpurified mCP, a pH-dependent correction factor was determined based on the protocol outlined in Takeshita et al. (2021). Briefly, pH of natural seawater with different ratios of added tris : trisH + was measured subsequently using impure dye (pH impure ; from Aldrich, lot MKBH6858V) and purified dye (pH pure ; from Robert Byrne's laboratory, University of South Florida; Liu et al., 2011) over a range of pH between 7.4 to 8.2 at approximately 0.2 intervals. Varying ratios of tris : trisH + were used to obtain different solution pH, and to buffer any changes in pH during the experiment, which negates the need for dye perturbation corrections in this characterization. Triplicate measurements were made at each pH. A second-order pH-dependent error was observed as previously described, following the equation (R 2 = 0.975, RMSE = 0.000434) All subsequent pH spec measurements in this study were conducted with impure dye and are reported with this dye impurity correction (Eq. 2) applied. The correction adjusted the reported pH by 0.0093±0.0002 (mean ± SD, n = 126). No dye perturbation correction was used (a correction for a change in pH caused by the addition of the dye), as the high buffering capacity of tris, in combination with a dye adjusted to a pH similar to that of tris, results in a negligible change in measured pH. Measurements of tris batches 1 and 2 made in the first 150 d have been removed from the data set due to procedural changes made to the spectrophotometric pH system to correct for problems with temperature equilibration. Outliers were removed from the spectrophotometric pH measurements if the absorbance at 760 nm was above 0.005 or below −0.002 (indicative of a measurement problem, such as a bubble or lamp drift), resulting in the removal of 2 out of 163 measurements. Additionally, outliers were removed from the data set if they were greater than 3 SD from the mean of a measurement triplicate, where SD is calculated as using all sets of triplicates (1 SD = 0.0004, n = 55), resulting in the removal of 2 of 161 remaining measurements. The remaining 159 measurements were used for the analysis presented here. An analysis of variation, or ANOVA, was used to detect the dependence of the results on tris batch, bag/bottle, type and storage location. Analysis was performed using MATLAB R2020a and the standard function "anovan" Throughout the experiment, CRMs (procured from Andrew Dickson, Scripps Institution of Oceanography) for seawater C T and total alkalinity were measured regularly to verify instrument performance (Dickson, 2001). A time series of CRM measurements over the duration of the work described here showed no systematic drift. (Fig. A1 in Appendix A). To assess if the change in pH was driven by the addition of CO 2 , the final pH and available C T measurements were compared with a model described here. The theoretical change in tris-artificial seawater (ASW) pH due to an increase in C T is straightforward to calculate, since both tris and CO 2 acid-base equilibria are well-characterized in seawater and ASW media. The pH is calculated for tris-ASW + C T using an equilibrium model following the approach described in Sect. 2 of Dickson et al. (2007) for the case of known alkalinity and C T . In the case of ASW, the seawater equilibrium constants for CO 2 are appropriate because minor ions present in seawater and not ASW do not appreciably affect the CO 2 equilibrium constants (particularly when the goal is to compute relative changes in pH) as the ionic background of ASW is closely matched to that of seawater at salinity = 35. In our model, minor acid-base species important to seawater alkalinity but not present in ASW (borate, phosphate, silicate, fluoride) are set to zero. The definition of total alkalinity is modified to include the tris acid-base system following the definition of acid-base donor/acceptor criteria given by Dickson (1981): tris is assigned as a level-1 proton acceptor and tris-H + is at the zero level. Thus, in our model, tris tot = 0.08 molal, alkalinity = 0.04 molal, and C T is a variable. An algorithm (see Annexe 1 in Dickson et al., 2007) is then used to find the root of the alkalinity equation in its residual form by solving for pH.
3 Results and discussion Figure 2 depicts pH spec,20 • C , stored in either a bag or bottle, as a function of time and is subdivided for tests 1, 2, and 3. A linear decrease was observed for all bags or bottles. A linear regression was calculated for each experimental condition and, in the cases where measurements at t = 0 were removed due to protocol changes described above, the line was extrapolated back to t = 0, shown by the dotted line. The measured or extrapolated y intercept is reported as the initial pH in Table 2. In all tests, trend lines are extrapolated to t = 365 d to illustrate observed and predicted change over the course of a year as shown by the solid line. For ease of visual comparison, the y axis of each subplot has an identical pH range of 0.017.
Only bags from test 3, using tris batch 4 and bag type 1, have direct initial pH measurements and replicate bags. Initial pH measurements of these 4 bags were 8.2630 ± 0.0007 (mean ± SD, n = 12). Importantly, the very low SD suggests that a single initial pH measurement is representative of all replicate bags filled with a single tris batch, if the preparation procedure used in test 3 is followed. This inter-bag consistency is beneficial because it reduces the number of initial pH measurements required when filling multiple bags. There is also strong agreement in initial pH measurements between bagged and bottled tris in test 3, with the initial pH of bottled tris 0.0007 higher than bagged tris (8.26327 ± 0.0004, n = 6). The differences in filling procedure or impurities between bags and bottles in test 3 appear to have little effect on the initial pH. The mean initial pH of tris batch 4 is 0.002 (n = 5) lower than calculated pH tris,20 • C (Fig. A2). This difference between the mean initial pH of tris batch 4 and calculated pH tris,20 • C is similar in direction and magnitude to those reported in other studies: DeGrandpre et al. (2014) reported −0.0012 ± 0.0025, and  reported −0.002 to −0.008 (measured pH minus pH tris,T C ). With standard laboratory equipment and off-the-shelf reagents, an uncertainty of 0.006 is expected in prepared tris (Paulsen and Dickson, 2020). Measurements were also made on Dickson standard tris (batch T35) using the same instrument, and the pH was 0.0019 higher than the calculated pH tris,20 • C (n = 2). In tests 1 and 2, the initial pH was extrapolated from a linear regression. The extrapolated initial pH values are more variable and lower (on average) than those directly measured (Fig. A2). These differences may be a result of the extrapolation or different experimental variables such as the increased rinsing of bags, or the single bag type and storage location used in test 3. Figure 3 depicts a composite of all test results as the change from the initial pH of tris ( pH = pH t=day spec,20 • C − pH t=0 spec,20 • C ) as a function of time elapsed since bagging. A linear regression on all pH measurements, excluding the outlier of "batch 2, bag 1, lab", of tris stored in bag types 1 or 2, has a slope of −0.0058 ± 0.0011 yr −1 (mean ± 95 % confidence interval (CI)). The upper and lower bounds of pH at t = 365 d, −0.0042, and −0.0076 are important to consider when utilizing this bagged storage method of tris. These bounds provide the broadest expected range in pH change over a year of storage and include both the intercept and slope confidence intervals (slope CI and intercept CI , respectively). For example, the upper bound of pH at t = 365 d is calculated as upper bound = (slope + slope CI ) × 365 + intercept + intercept CI . The outlier (batch 2, bag 1, lab) was excluded due to noticeable damage to the bag (see Fig. A3 in Appendix A), which is believed to have caused its pH to decrease at more than 2 times the average rate of the other bags. The damage appears to be a break in the metallic bag layer, potentially caused by creasing or pinching of the bag during handling. This observation highlights the importance of maintaining bag integrity, particularly during use in the field. A successful 2-week field deployment has been conducted using the tris bags described here and a modified SeapHOx in a shallow, coral reef flat (Bresnahan et al., 2021). This 2-week deployment was significantly shorter than the year of storage described here, and further field testing in longer deployments in varied environments is required before widespread use of this technology. For the longer time frame depicted in Fig. 3, the only comparable example found in the literature is the work of Lai et al. (2018). In this work, Lai et al. (2018) used bagged tris for sensor calibration, with in situ tris measurements made over 150 d. Lai et al. (2018) did not report a change in the pH of bagged tris over the deployment; however, the reported precision of the SAMI-pH in situ instrument (±0.003) would not resolve the expected change shown in our Fig. 3. Therefore, the results of Lai et al. (2018) are not inconsistent with our study. A significant increase in C T was observed for all types of bags and bottles in Experiment 3 (Fig. 4). A high correlation between solution pH and C T was observed, with a slope of −0.0029 ± 0.0006 pH per 100 µmol kg −1 (n = 14, r 2 = 0.70), suggesting that the change in tris pH and C T was primarily driven by an increase in CO 2 . The observed slope agrees closely with a theoretical model prediction of a linear decrease in pH of −0.0024 per 100 µmol kg −1 of C T added (over the range of C T observed). There are two possible sources of the increasing C T : gas exchange of CO 2 with the environment and microbial respiration within the storage vessel. Gas exchange should not be a significant source of CO 2 for tris stored in a borosilicate bottle, as this is the standard equipment used to store seawater CO 2 and tris buffers and is known to minimize gas exchange (Dickson et al., 2007). Therefore, it is likely that respiration was the primary driver for the increase in C T for tris stored in bottles. On average, the pH decrease in tris stored in bags was larger than that in the standard bottle (Fig. 2), indicating either an additional source of CO 2 from gas exchange, or larger amounts of res- Figure 2. Individual time series of measured pH in tris buffer solutions. Tris batch is indicated by shape, storage vessel by color, and storage location by fill. This marker system is also followed in Fig. A2. The solid line is a linear regression starting at the first included pH measurement and ending 365 d after the tris was bagged. The dotted line illustrates the extrapolation back to 0 d stored in bag when measurements at t = 0 do not exist. The range of the y axis scale is fixed at 0.017 pH for all subplots. piration. Distinguishing between these two theorized sources would require measurements of additional parameters such as dissolved organic carbon.
The pH stability of tris could be improved by reducing either likely source of C T : gas exchange or microbial respiration. For bags, CO 2 may diffuse through the fittings, gasket, or bag walls, particularly if damaged. The relatively small breaks in the aluminum foil layer caused "batch 2, bag 1, lab" to decrease more than twice as fast as the average bag. Storage bag, fitting, and gasket material, as well as careful handling, are therefore important factors in minimizing gas exchange. For example, silicone is permeable to CO 2 and thus could have been a path of gas exchange into the tris for this experiment. As noted above, Nemzer and Dickson (2005) found an almost negligible change of 0.5 mpH yr −1 in bottled tris. Our bottled tris changed at −3.0 mpH yr −1 (n = 10 bottles measured over 161 d), approximately half the rate of the tris stored in bags. While −3.0 mpH yr −1 is near the detec-tion limit of our measurements, it suggests that the bottling protocol used in this study was not as well controlled as that of Nemzer and Dickson (2005). For example, the Dickson lab at Scripps Institution of Oceanography regularly uses an annealing oven to combust all trace organic films that may persist on glass bottles, but in our study, bottles were not annealed. Although bags cannot be annealed, future steps that may be worth consideration to reduce microbial respiration in bags include addition of a biocide to the tris solution, acid cleaning the bags, and using ultraviolet light to remove organics from the ultrapure water used to prepare tris. There are some disadvantages to these proposed steps. Addition of a biocide may not be ideal for use in sensitive environments if the tris is discharged after use and would alter the composition of the solution slightly. While rinsing or prolonged soaking of the bags with an acid may help to remove organics, it is unclear if it would have negative effects on the integrity of the bags. Beyond removing organics on the bag Table 2. Linear regression statistics from trend lines shown in Figs. 1 and 2. The last row shows the regression statistics for tris from all batches, in either bag type, stored in the lab or test tank. Slope and intercept are shown as mean ±95 % CI. The reported intercept is the regression intercept; when initial pH measurements are available, they differ by less than 0.0003 from regression intercept.

Batch and storage method
Slope (mpH yr −1 ) Intercept (initial pH) RMSE (mpH) r 2 n Batch  surfaces, care should be taken to avoid introducing organic contaminates into the tris during the solution preparation and bag-filling procedures to minimize future respiration. Both bag type 1 and 2 experienced problems with structural integrity during this experiment. A single type 2 bag experienced delamination of exterior bag layers when stored submerged in seawater, causing the eventual tearing and fail- Figure 4. pH plotted against C T shows a linear relationship between the two parameters in a tris buffer with a slope of −0.0029 pH for every 100 µmol kg −1 of C T added. The measurements shown are from three sampling occurrences between 130-300 d stored on bags and bottles used in Test 3. Only two measurements are shown for "batch 4, bag 1d, lab" because it ran empty before C T were made. ure of the bag when handling. Bag type 2 was not used in test 3 due to this failure. It should be noted that in other studies which successfully used bag type 2, the bag was submerged in seawater for less time than in this experiment (Sayles and Eck, 2009;Aßmann et al., 2011;Wang et al., 2015). A single bag type 1 had the subtler problem of small breaks in the aluminum foil bag layer, likely causing an increased pH rate of change. In non-damaged bags, factors such as bag type/bottle, lab/tank storage, or tris batch did not have statistically significant (p value < 0.05) correlations with the pH change of tris (p values 0.12, 0.11 and 0.09, respectively). The results of the ANOVA support that tris can be held in bag type 1 or 2 and stored in a lab or tank, and the pH will change similarly regardless of storage method for up to 300 d. Additional bag types could be tested, such as bags made by Pollution Measurement Corp. used by Lai et al. (2018) or Scholle DuraShield used by Takeshita et al. (2015).
These results suggest that when bags are carefully handled prior to and after filling, tris pH changes are small over time. Specific recommendations for further work include the following: bags must be handled with care and enclosed in protective containers to prevent damage, bags must be rinsed with tris prior to filling, and additional testing is merited to determine sources of and methods to reduce contamination, such as acid washing.

Conclusions
This article describes our characterization of the stability of tris buffer in artificial seawater when stored in purportedly gas-impermeable bags. Several different tests, initiated over the course of a year and a half and lasting up to 300 d, exhibited an average decrease of 5.8 mpH yr −1 . In comparison, tris stored in standard borosilicate bottles was shown to have a decrease of 3.0 mpH yr −1 . For yearlong deployments, an expected pH change of −0.0058 is well below the weather quality threshold of 0.02 pH units. This low rate of change demonstrates the value of bagged tris for in situ validation of autonomous pH sensors (regardless of sensor operating principles), particularly in highly dynamic areas where repeatability of calibration based on discrete samples is challenging. Given the thorough characterization of tris over wide ranges of environmental variables, this contribution can aid in the traceability and intercomparability of pH sensor measurements. While valuable at the current stage of development (as demonstrated by, for example, Lai et al., 2018 andBresnahan et al., 2021), further development would ideally result in a commercially available bag and filling procedure that can yield a rate of pH change less than the climate threshold of 0.003 per year. This will require further tests to identify the source of CO 2 , gas exchange, or microbial respiration, as well as steps to reduce or eliminate these sources.
Periodic measurement of bagged tris in situ would allow for detection of sensor drift. Most in situ pH sensors are deployed in the euphotic zone in coastal areas, typically resulting in expedited biofouling and sedimentation and leading to sensor drift (Bresnahan et al., 2014) that could be identified and potentially corrected. Such periodic calibration/validation would aid in identifying sensor issues and allow for greater consistency and continuity between a time series and planned or vicarious crossovers where an automated calibration can be used to augment or replace pre-and post-deployment calibrations/validations.  Figure A1. A time series of the residual between measured and calculated CRM pH throughout the experiment. Marker color denotes CRM batch number. There is a clear variability between measured and calculated pH, which is typical of CRM batches (Andrew Dickson, personal communication, 2019). There was no observable systematic drift in the pH system during the experiment. The mean standard deviation of pH measurements within a CRM batch is 0.0016, which is comparable to the 0.0019 reported in Bockmon and Dickson (2015). The same 760 nm absorbance wavelength outlier removal procedure used for tris measurements was applied to CRM measurements. Figure A2. The initial pH residual of each tris bag or bottle measured in this experiment. The initial pH is reported as a residual from the calculated pH at 20 • C. The initial pH was measured directly for tris batch 4 and extrapolated for tris batches 1-3. Additionally, two bottles of Dickson standard tris (show by the black "X") were measured on 12 October 2018. The zero black dashed line is the calculated pH of tris at 20 • C, based upon the measured reagent concentrations (DelValls and Dickson, 1998). Figure A3. The ovals indicate marks on the exterior of "batch 2, bag 1, lab". These marks appear to be damage to the interior metallic layer, possibly due to creasing of the bag. These marks were not present on any other bag used in this study.
Data availability. pH and C T data are available via the UC San Diego Library Digital Collections at https://doi.org/10.6075/J0QC022G (Wolfe et al., 2021).
Author contributions. WW performed formal analysis, visualization, and writing (original draft preparation). KS and TW contributed to investigation and writing (review and editing). PB, YT, and TM contributed to funding acquisition, conceptualization, formal analysis, and writing (review and editing).