Comparing historical and modern methods of sea surface temperature measurement – Part 2 : Field comparison in the central tropical

Discrepancies between historical sea surface temperature (SST) datasets have been partly ascribed to use of different adjustments to account for variable measurement methods. Until recently, adjustments had only been applied to bucket temperatures from the late 19th and early 20th centuries, with the aim of correcting their supposed coolness relative to engine cooling water intake temperatures. In the UK Met Office Hadley Centre SST 3 dataset (HadSST3), adjustments have been applied over its full duration to observations from buckets, buoys and engine intakes. Here we investigate uncertainties in the accuracy of such adjustments by direct field comparison of historical and modern methods of shipboard SST measurement. We compare wood, canvas and rubber bucket temperatures to 3 m seawater intake temperature along a central tropical Pacific transect conducted in May and June 2008. We find no average difference between the temperatures obtained with the different bucket types in our short measurement period (∼ 1 min). Previous field, lab and model experiments have found sizeable temperature change of seawater samples in buckets of smaller volume under longer exposure times. We do, however, report the presence of strong near-surface temperature gradients day and night, indicating that intake and bucket measurements cannot be assumed equivalent in this region. We thus suggest bucket and buoy measurements be considered distinct from intake measurements due to differences in sampling depth. As such, we argue for exclusion of intake temperatures from historical SST datasets and suggest this would likely reduce the need for poorly field-tested bucket adjustments. We also call for improvement in the general quality of intake temperatures from Voluntary Observing Ships. Using a physical model we demonstrate that warming of intake seawater by hot engine room air is an unlikely cause of overly warm intake temperatures. We suggest that reliable correction for such warm errors is not possible since they are largely of unknown origin and can be offset by real nearsurface temperature gradients.


Introduction
Here we address issues surrounding the construction of sea surface temperature (SST) datasets using observations obtained from a mix of different platforms, instruments and depths.Modern platforms include ships, moored and drifting buoys and satellites, with shipboard measurements mostly obtained from buckets, engine cooling water intakes and hull contact sensors.Measurement methods were reviewed in detail in Part 1.
Satellite-based methods measure temperature within the sea surface skin (upper ∼ 1 mm) whereas in situ methods measure the so-called bulk temperature beneath (Donlon et al., 2002).Skin temperatures are generally a few tenths of a • C cooler than the bulk temperatures immediately below.
Here we distinguish between different types of bulk temperature based on sampling depth.After Webster et al. (1996), we take temperatures from the upper few centimetres to be measurements of "actual" sea surface temperature.We call temperatures observed beneath the surface skin and within the upper 1 m "upper sea surface temperatures".These are

Methods
Original data were collected on a 5-week research cruise from Papeete, Tahiti to Honolulu, Hawaii aboard the SSV Robert C. Seamans of the US Sea Education Association from May 9th to June 14th 2008 (Siuda, 2008;Matthews, 2009).The Seamans is a ∼ 41 m-long modern sailing vessel of draft ∼ 4 m, achieving on our cruise an average speed of around 4.7 ± 1.8 kt (∼ 2.4 ± 0.9 ms −1 ) under-sail and 7.2 ± 1.7 kt (∼ 3.7± 0.9 m s −1 ) under-motor.She would be considered a "slow" ship by the FP95 definition.The vessel is equipped with physical, chemical, biological and geological oceanographic sampling equipment and is a World Meteorological Organization (WMO) voluntary observing ship (VOS), reporting once daily.
Several upper surface and near-surface temperature measurement methods were directly compared along the cruise transect (Fig. 1), which was conducted at the end of the 2007/8 La Niña event.Hourly bucket temperatures were obtained from ∼ 17.5 • S to ∼ 3 • N using three different bucket types, with various meteorological measurements recorded near-simultaneously.Thermosalinograph temperature at a nominal depth of 3 m was measured each minute between 17.5 • S and 19 • N and considered analogous to accurate engine intake temperature (EIT) for the same intake depth.Daytime temperature profiles to 20 m were obtained by CTD at the locations marked in Fig. 1, enabling assessment of temperature variation over the typical depth range of VOS intakes.

Bucket temperatures
Near-continuous hourly bucket temperatures were taken for 10 consecutive (local) days from May 11th to 20th 2008 between 17.09 • S, 149.77 • W and 8.95 • S, 140.30 • W. Daily average track coverage during this period was 80 ± 21 nautical miles (149 ± 38 km), 0.8 ± 0.2 • latitude and 0.9 ± 0.6 • longitude.Measurements then temporarily ceased for a port call at Nuku Hiva in the Marquesas Islands.Bucket measurements resumed for the first full local day on May 25th at 8.83 • S, 140.35 • W and continued until 3.08 • N, 143.23 • W on the morning of June 1st.
Bucket temperatures were obtained using wood, canvas and a modern rubber meteorological bucket (Zubrycki bucket) in what was apparently the first major field comparison of wood and canvas bucket temperatures.The wood and canvas buckets were of similar size (wood: 22.5-25.5 cm inner diameter by 18 cm deep, volumetric capacity ∼ 8 L; canvas: 24 cm by 25.5 cm, capacity ∼ 11.5 L; Fig. 2), with the canvas bucket being a modern general-purpose ships' bucket.The wood bucket is of similar diameter but reduced height to The blue line denotes the portion of the transect where both bucket and 3 m thermosalinograph temperatures (T 3m ) were observed.The black line denotes the portion of the transect where bucket measurements were not taken.Locations of CTD casts are marked by red dots.
the 19th century wooden ships' bucket modelled by FP95 (25 cm inner diameter by 25 cm deep, volumetric capacity ∼ 12 L).Whilst constructed of softwood pine rather than the hardwood oak of the FP95 wooden bucket, pine is of similar specific heat capacity to oak (2.5 kJ kg −1 K −1 compared to 1.9 kJ kg −1 K −1 ).The volumetric capacity of our canvas bucket was around three times that of the canvas bucket described by Brooks (1926) (∼ 4 L, 13 cm diameter by 36 cm high) and that of the UK Met Office Mk II canvas meteorological bucket (∼ 4 L, 16 cm by 25 cm, fillable to 20 cm deep).However, it is of similar capacity to canvas buckets used by Japanese ships around the 1930s (∼ 12.5-28 L, 20-30 cm diameter by 40 cm high, Uwai and Komura, 1992).Unlike the Mk II, our canvas bucket did not have a wooden lid or base and could be placed on deck without collapse.The Zubrycki rubber bucket had the smallest volumetric capacity at ∼ 0.7 L (the sample vessel was ∼ 7.5 cm in inner diameter by 16.5 cm deep), far smaller than the 5 L rubber bucket used by Tabata (1978a).A transparent plastic tube extends from the base to house a thermometer, although one was not fitted.Temperatures from this bucket were used as our reference, with captured seawater samples assumed not to warm or cool prior to measurement.Bucket temperatures were collected underway by 18 undergraduate students (a mixture of science and arts majors) working on a three-watch system.This simulates multiple observers in historical datasets.At each bucket station the three buckets were consecutively cast overboard, filled with seawater, hauled up and placed on the wooden deck.A factory-calibrated Fisher traceable thermistor probe with 0.1 • C resolution was inserted into each bucket sample and a reading recorded once the display stabilised in around 10-20 s.Stations were generally conducted within five minutes prior to the top of a given hour.Deployment, retrieval and measurement were conducted on the port side outside the wet lab, a location that frequently switched from leeward to windward.The buckets were not deliberately placed in a sunshaded or wind-exposed location for measurement but were stored in the wet lab between stations.The walls of the wood and canvas buckets generally remained wet from one deployment to the next.Hauling times were short given that bucket launch and retrieval was from ∼ 2.5 m above the waterline.The total hauling and on-deck measurement period (the "exposure time") was ∼ 1 min.
Sampling was easiest with the rubber bucket since this would dip near-vertically into the sea surface and so did not need to be dragged to obtain a sample like the wood and canvas buckets.The canvas bucket tended to close flat when dragged and so not fill while the wood bucket would bounce along the surface when under-motor.Several attempts were sometimes required to capture sufficient samples with the wood and canvas buckets (around two-thirds capacity) whereas the rubber bucket would consistently fill to the brim.Retrieval of the wood and canvas buckets became difficult if too much line was released and they drifted far back towards the stern.

Meteorological observations
Several meteorological variables were recorded at each bucket station.Dry and wet bulb air temperatures were taken from liquid-in-glass thermometers mounted in a Stevenson screen on the poop deck (∼ 5 m above the waterline) and reported to 0.5 or 1 • C. Beaufort wind force and cloud cover in oktas were estimated by eye and atmospheric pressure read from a barometer installed in the deckhouse.
Wind speed and direction were measured each minute by anemometer atop the foremast at ∼ 33 m above the waterline.Wind speed at 33 m (U 33 ) was converted to wind speed at other heights (U z ) using the log-profile formula from the www.ocean-sci.net/9/695/2013/Ocean Sci., 9, 695-711, 2013 Fig. 2. From left to right, the wood, canvas and rubber buckets used in our field comparison.Note that the wooden bucket was sealed with white caulk along the inner seams and reinforced around the outside by two stainless steel bands.The rubber bucket is of both plastic and rubber construction, with a black rubber protective layer around the base.
TurboWin software, as given by Thomas et al. (2005): TurboWin is a meteorological logbook program widely used by the European VOS (Kent et al., 2007).Wind speed and direction from ≤ 5 min prior to the top of each hour were averaged for comparison to hourly measurements.

Subsurface measurements
Scientific seawater intake temperature was recorded at 1 min intervals by thermosalinograph or TSG (Seabird SBE45, calibrated in February 2008, accurate to at least 0.01 • C).The TSG measures seawater in the scientific flow through, sampled by a sea chest at ∼ 3 m depth and piped up to the TSG in the wet lab at the main external deck level.TSG temperature was averaged as per wind speed and direction for comparison to hourly measurements.CTD casts with a Seabird SEACAT Profiler (SBE19plus, temperature accurate to at least 0.01 • C) were taken hove to at 22 locations along the transect (Fig. 1).Mean speed over ground whilst hove to was 1.4 ± 0.8 kt (∼ 0.7± 0.4 m s −1 ), with hove to periods identified from coincident changes in apparent wind direction.At each location, CTD temperature was recorded every 5m at nominal depths between 5 and 20 m.Besides two mid-afternoon casts observed around 15:30-16:30 LT (local time, UTC-10), CTD-1 and CTD-22, all casts were taken in mid-to late morning between 9 a.m. and noon.Current velocities at ∼ 19 m depth were measured every 20 min using a shipboard acoustic Doppler current profiler or ADCP (RDI Ocean Surveyor 75 kHz).

OSTIA data
Daily foundation temperatures from the Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) were obtained for comparison to our shipboard temperatures.OS-TIA is a high-resolution (1/20 • , ∼ 6 km) gridded dataset derived from buoy, ship and satellite (infrared and microwave) observations by optimal interpolation (Donlon et al., 2012).Temperatures obtained in daytime under low wind speeds (< 6 m s −1 ) are rejected in an attempt to exclude measurements influenced by formation of a diurnal thermocline.
OSTIA is used as a boundary condition for weather forecast models at the UK Met Office and European Centre for Medium-range Weather Forecasting.Note that the equatorial Pacific can be a problematic region for SST measurement by satellite-mounted infrared sensors due to the thick band of cumulonimbus clouds associated with the Intertropical Convergence Zone.
The OSTIA system uses a rolling 36 h observation window centred on 12:00 UTC with a single field produced for each UTC day.OSTIA grid cells traversed by the Seamans on each local day were identified and the corresponding foundation temperatures extracted and averaged for the equivalent OS-TIA UTC day.Difference in phasing of local and UTC days was ignored, given the long observation window.

Bucket temperature comparison
No significant difference was found between the wood, canvas and rubber bucket temperatures across the stations, with  mean differences of 0.0 • C (σ M = 0.0 • C, σ = 0.1 • C) between all bucket types (Fig. 3).This was also the case when observations were separated by day and night, with daytime measurements taken to be those obtained between the local times of sunrise and sunset and vice versa for nighttime measurements.When partitioned into the regions identified in Table 1 and Fig. 4, absolute mean inter-bucket temperature differences were all under 0.1 • C, with standard deviations around ± 0.1 to ± 0.2 • C.This was also true when observations were further separated by day and night, except for daytime measurements from the North Equatorial Countercurrent (NECC) outside the equatorial cold tongue, where sample size was < 10.An unintended experiment occurred after the wooden bucket was damaged ∼ 9 • S, leaking heavily thereafter.No evidence was found that this had any effect on measured temperatures (i.e.there was no change in the mean or standard deviation of wood-canvas or rubber-wood bucket tempera-ture differences) despite the seawater samples draining completely in a few minutes.Leaking wood bucket temperatures were thus retained for all analyses.
The rubber bucket temperatures show a slight cool tendency relative to those from the canvas and wood buckets, with rubber-canvas and rubber-wood differences of −0.1 • C found for a relatively large number of stations (26 and 30 %, respectively).This might reflect susceptibility for the rubber bucket samples to cool prior to measurement due to their small volume.Even so, assumption that the rubber bucket samples remained of stable temperature pre-measurement is a reasonable approximation.We conclude our bucket temperatures accurate to 0.1 • C and average over temperatures from each bucket type at each station to create a "composite" bucket temperature variable.
No correlations were found between inter-bucket temperature differences and apparent wind speed at 3 m, apparent wind direction, ship speed over ground, local time, atmo-  spheric pressure, air minus composite bucket temperature or relative humidity.To assess correlations between interbucket differences and meteorological variables estimated by eye (i.e.Beaufort wind force and cloud cover), temperature differences were split into two groups from coincidence with high or low values of these meteorological variables.High wind forces were considered those ≥ 4 and high cloud cover ≥ 5 oktas.All groupings were found to have means of 0.0 Our results suggest that accurate bucket temperatures can be obtained using large-volume buckets and fast-response scientific thermometers.We find no evidence for evaporative cooling of seawater samples in our wood and canvas buckets in the ∼ 1 min exposure period.
It is unclear whether the FP95 bucket models would also simulate negligible cooling after one minute if adapted to the buckets we used and environmental conditions experienced.Their bucket adjustments for the tropical Pacific are amongst the largest derived on an annual average, due to the strong and seasonally invariant evaporation rates.Their final adjustments for June in the central tropical Pacific are around +0.1-0.3 • C and +0.4-0.7 • C in 1860 and 1940, respectively.The corresponding adjustments for December are around +0.1-0.2 • C and +0.4-0.6 • C.These values are not directly comparable to our results given the longer exposure times used (4 min for the wooden bucket adjustments) and the different bucket sample volumes.At two-thirds full, our canvas bucket contained nearly three times the simulated filled volume of the Mk II (∼ 8 vs. ∼ 3 L, Mk II water depth: 14 cm), the larger of the two canvas buckets modelled by FP95.Conversely the sample volume in their modelled wooden bucket (water depth 20 cm) was around twice that of ours at twothirds capacity (∼ 10 L vs. ∼ 5.5 L).
An earlier document describing the FP95 models (Folland, 1991) presents a plot showing simulated cooling for a canvas bucket of similar diameter to the Mk II (filled with ∼ 4 L) as being 0.25 • C in the first minute given an air temperature of 28 • C, sea temperature of 30 • C, relative humidity of 75 %, 10 m wind speed of 5 m s −1 and a ship speed of 7 m s −1 .Besides the ship speed, these environmental conditions are comparable to those experienced aboard the Seamans around the nighttime daily maximum sea-air temperature difference, for which composite bucket temperature averaged 2.1 ± 0.5 • C warmer than the air temperature.Our observed relative humidity and 10 m wind speed respectively averaged 79 ± 6 % and 4.9 ± 1.8 m s −1 across bucket deployments.Apparent wind speed at 3 m (approximately the height of the bucket deployments) averaged 4.9 ± 1.5 m s −1 .Under the same aforementioned model environmental conditions, but given a ship speed of 4 m s −1 , more comparable to that of the Seamans (average speed over ground of 2.4 ± 1.1 m s −1 across bucket stations), simulated cooling of a wooden bucket sample of 10 L was only 0.025 • C in the first minute.Simulated cooling of the canvas bucket sample after 1 min appears to have been ∼ 0.2 • C, based on reported cooling at 4 min of 0.6 • C (note that the cooling slows with time).For comparison, our rubber-canvas and rubber-wood bucket differences averaged 0.0 Assuming the same heat loss from our canvas and wood bucket samples in the first minute as for the respective simulations with a 4 m s −1 ship speed, we would have expected average rubber-canvas and rubber-wood bucket differences of ∼ 0.1 and 0.05 • C, respectively.However, given the different bucket diameter-to-height ratios of the modelled buckets to those we used, it is not clear that the heat loss from our samples would have actually been similar for the same exposure conditions and sample volumes.Further, our canvas bucket did not have a lid, in contrast to that modelled by Folland (1991) and FP95.Thus we cannot directly assess the accuracy of the FP95 bucket models using our experimental results.However, the models do make use of some poorly tested and uncertain assumptions on which it is appropriate to comment.For instance, the canvas bucket samples are assumed to be well mixed and at the same temperature as the bucket walls, implying the sample was actively stirred by the observer.Whether this was actually generally the case is not known.In the absence of stirring we would expect sample heat loss to be strongest near the walls, with a temperature gradient across them.Further, FP95 assume wooden buckets were filled to near the brim and so the water surface fairly exposed to the airflow.However, we think it difficult to fill buckets of this type more than two-thirds full based on our practical experience.Our results suggest that large-volume samples (≥ 5 L) in wood and canvas buckets do not change temperature appreciably in the first minute after collection.This conclusion is of most direct relevance to wood and canvas bucket temperatures obtained underway aboard sailing vessels in the 19th and early 20th century, for which exposure times could have been short (1.5 min or less; e.g. 30 s hauling period, 1 min on-deck period), although this is uncertain.

Vertical near-surface temperature gradients
Given that our bucket temperatures appear accurate, they can be used together with subsurface temperatures from the TSG and CTD casts to reveal near-surface temperature gradients within the depth range of VOS intakes.Here we restrict discussion to vertical gradients within the coverage of the bucket measurements (∼ 17.5 • S to ∼ 3 • N).Strong vertical gradients were consistently observed day and night throughout this portion of the transect (Fig. 5, Table 1), with the temperature difference between 3 and 0 m averaging −0.4 ± 0.2 • C. Temperature gradients were weaker at nighttime than in daytime, respectively averaging −0.10 • C m −1 and −0.16 • C m −1 across the upper 3 m, with the corresponding average 3 m-0 m differences being −0.3 ± 0.1 • C and −0.5 ± 0.2 • C. Evidently the near-surface thermocline did not breakdown overnight, in contrast to the observed behaviour in the western equatorial Pacific (Soloviev and Lukas, 2006).Differences across the upper 3 m were found to be strongest in early to mid-afternoon (around 12:00-15:00 LT) and weakest overnight from 19:00-07:00 LT (Fig. 6).This is a consequence of the diurnal temperature cycles being of larger amplitude at the surface than at 3 m.Diurnal air temperature cycles were larger still due to the lower specific heat capacity of air.Diurnal ranges in composite bucket SST were particularly large in the weak and moderate branches of the South Equatorial Current (SEC) averaging 0.9 ± 0.3 • C and somewhat reduced in its strong branch averaging 0.6 ± 0.1 • C. The corresponding average diurnal ranges in 3 m TSG temperatures were 0.5 ± 0.2 • C and 0.3 ± 0.1 • C. CTD (indicated by the crosses) while those at 0.1 and 3 m are from composite bucket SST and thermosalinograph, respectively.The bucket measurements were obtained within 2 h and 15 km of each respective CTD cast.Bucket temperatures are not plotted for casts 11-13 since they are unlikely to be representative of conditions at the time of the respective CTD cast (they were obtained at 12:00 or 13:00 LT whereas the CTD casts were conducted between 10:00 and 11:00 LT).The remaining bucket temperatures will be in error by ∼ 0.1 • C or less due to such timing differences.All casts were taken between 9:00 and 12:00 LT, except CTD-1 which was taken around 15:30-16:00 LT.Cast numbers correspond to those on Fig. 1.The red and black lines characterise the daily extremes of the upper 3 m temperature profile on the local day of the corresponding CTD cast.They are respectively defined from maximum and minimum 3-hourly average 3 m temperatures, and corresponding 3-hourly average composite bucket temperatures.They are not plotted in the panels for the SECC and cold tongue, where diurnal cycles were masked by transit through strong meridional gradients.
Thermoclines were found across the upper 5-15 m in all CTD casts (Fig. 7).Temperature differences and gradients over the upper 5m respectively averaged −0.8 ± 0.2 • C and −0.15 • C m −1 during morning casts, excluding casts 11-13 (for which the temporally-closest bucket temperatures could not be considered near-contemporaneous with the deeper measurements).Gradients between 5 and 10 m were generally weak, with temperature differences averaging −0.08 ± 0.08 • C over morning casts, although several differences around −0.1 to −0.3 • C were found.Temperature declines between 10 and 15 m ranged from 0.00-0.03• C across morning casts.The only afternoon cast with a corresponding composite bucket temperature, CTD-1, recorded temperatures 1.3 • C colder at 10 m than at the surface, with the coincident gradient across the upper 5 m being −0.24 • C m −1 .The 10 m temperature difference is likely slightly overestimated by ∼ 0.1-0.2• C due to mismatch in timing of the bucket measurements and CTD cast (the CTD cast was conducted around 15:30 LT while the bucket temperatures were obtained at 14:00 LT).Temperature differences between 5 and 10 m, and 10 and 15 m were −0.11 and −0.07 • C, respectively.The apparent temperature difference over the upper 3 m was −0.9 • C, close to the largest observed, which was around −1 • C. Strong vertical temperature gradients in the upper 10 m are thought ubiquitous under weak winds and strong insolation.Temperature contrasts of up to several • C have been found across the upper few meters in the tropi-cal Pacific and Gulf of California (e.g Webster et al., 1996;Donlon et al., 2002).
Interestingly the near-surface thermocline persisted when 10 m wind speeds exceeded 6 m s −1 (Fig. 8a), both day and night, in contrast to general thinking (Soloviev and Lukas, 2006;Donlon et al., 2012).Daytime upper 3 m temperature declines exceeding 0.7 • C were, however, generally not encountered under these conditions.Note that where 10 m wind speeds exceeded 6 m s −1 , all remained below 10 m s −1 except in one case.
We find slight negative correlation between ship speed and upper 3 m temperature difference (Fig. 8b), suggesting measured near-surface temperature gradients were slightly reduced at higher ship speeds.As a further test we compared average 3 m TSG temperatures for periods when the ship was hove to for scientific sampling with those for the 30-min periods immediately before and after.A mean difference of 0.0± 0.1 • C was found suggesting ship motion did not strongly mix the near-surface.

Comparison to OSTIA
Foundation temperatures from OSTIA are comparable to our CTD temperatures at 15 m (Fig. 5b).The CTD 15 m -OSTIA temperature difference from all CTD casts averaged 0.0 ± 0.2 • C, smaller than the supplied OSTIA errors which ranged from ± 0.3 to ± 0. = 0.06.The vertical dashed line on (a) denotes a wind speed of 6 m s −1 .General thinking holds that the near-surface should be near-isothermal at higher wind speeds.
for comparison in the North Equatorial Current (NEC).A temperature dip observed in daily average composite bucket SST in the moderate branch of the South Equatorial Current is particularly pronounced in OSTIA, with temperatures dropping ∼ 0.8 • C from the weak SEC regime.OSTIA temperatures were closest to daily average 3 m temperatures in the NECC outside the cold tongue but were still ∼ 0.2 • C cooler.Evidently it would be inappropriate to substitute OS-TIA foundation temperatures for daily average bucket SST.

Intake temperature errors and engine room warming
Where EIT have been found to average warmer than bucket temperatures, heating of intake seawater by warm engine room air has often been suggested as a potential cause (e.g.Saur, 1963).To test this idea we developed a physical model for warming of intake seawater by net heat transfer into the intake pipe across the pipe wall.Our model is based on standard calculations from chemical engineering (McCabe et al., 2001).Fixed parameters were set so as to maximise computed warming.Pipe wall thickness was varied in tandem with outside diameter (o.d.) according to Table A1, with the largest common wall thickness used for each standard outside diameter.Note that real engine intake pipes are of lower schedule than those modelled, with flow velocities standardised at 1-1.5 m s −1 .We use a lower limit flow velocity of 1 m s −1 and an upper limit engine room air temperature of 50 • C. The model is derived in Appendix A. Calculated warming after a 20 m length of pipe (an upper limit for inlet-thermometer distance) with variable o.d. and inlet temperature is presented in Fig. A3.Warming is en-hanced with larger temperature contrast across the pipe wall (i.e. as inlet temperature is lowered).Calculated warming is minimal for all but the smallest o.d.pipes and largest temperature contrasts.Engine intakes on merchant vessels generally have outside diameters exceeding 20 cm (discussed in Appendix A), for which computed warming was below 0.05 • C. Thus heating of intake seawater by engine room air is unlikely a major cause of reported negative average bucketintake temperature offsets of several tenths of a • C.
This was previously noted by James and Shank (1964) who found that given an 8-inch (∼ 20 cm) diameter pipe, a 2000 gallon min −1 (∼ 3.8 m s −1 ) flow rate and a 30 • F (∼ 16.5 • C) temperature contrast across the pipe wall, over 1000 ft (∼ 305 m) of pipe would be required for a 0.1 • F (∼ 0.05 • C) temperature rise.Modelling a standard 21.91 cm o.d.pipe with 20.63 cm inside diameter (schedule 20) and flow velocity of 3 m s −1 (a modern absolute maximum) with this temperature contrast, we find a 0.1 • F temperature rise would require a pipe length ∼ 432 m.Pipe lengths necessary to achieve along-pipe warming of 0.2 • C are plotted in Fig. A4, again for a range of outside diameters and temperature contrasts.The minimum pipe length required is ∼ 92 m for o.d.above 20 cm and the longest ∼ 737 m.These are far greater than the inlet-thermometer distances reported in the literature (Table A2).For instance, James and Fox (1972) found 73 % of intake thermometers to be within 3 m of the inlet.
Other explanations for warm bias in intake temperatures include heating of thermometers by conduction along metal fittings (Saur, 1963) and gradual warming of stagnant intake seawater around pumps (Brooks, 1926) or in faucet pipes (Piip, 1974).Intake temperatures from Ice Class vessels traversing high latitudes may be influenced by mixing of exhaust intake with fresh intake prior to use as a cooling agent, a process designed to prevent engine shock.It is unclear whether this is the case for any intake temperatures in the International Comprehensive Ocean-Atmosphere Data Set, ICOADS (Woodruff et al., 2011), the primary compilation of historical SST measurements.
Engine intake temperatures tend to be noisy, with random errors likely reflecting poor observing and recording practices.Poor quality is unsurprising given that these measurements were traditionally obtained by ships' engineers for engine monitoring purposes, where accuracy of 1-2 • C is sufficient.Sailors are likely to report at most to the smallest graduation on the thermometer used, which appears often to have been 1 • C or • F or more for intake thermometers.A preference for whole-number values was found in our dry bulb air temperatures where the thermometer was marked in 1 • C increments.Further, intake thermometers have sometimes been noted as difficult to read, with unclear graduations and locations close to floor level (Brooks, 1926).They may be particularly prone to drift in the harsh engine room environment.

Conclusions and recommendations
Progress in the field of historical SST reconstruction has been hampered by neglect of near-surface dynamics, lack of comprehensive field comparisons between measurement methods, limited metadata and observations of variable quality.We find no evidence for cold bias in wood or canvas bucket temperatures in the central tropical Pacific when measurement is rapid (∼ 1 min) and the bucket samples of large volume (≥ 5 L).Our results suggest susceptibility of bucket samples to heat loss or gain may be more dependent on their volume than bucket material.Thus we suggest volumetric capacity be the principal consideration in design of meteorological buckets.Additional field experiments should test whether our findings apply in other seasons and ENSO states and to historically used buckets of smaller volume and different type.Experiments should be conducted on vessels of different class and in other ocean regions.In particular, the accuracy of bucket temperatures from large modern merchant vessels should be evaluated, on which hauling times would be longer and apparent wind speeds stronger.Studies could initially target those regions and seasons where bucket cooling is predicted to be largest (e.g. the Gulf Stream in winter).Bucket experiments would benefit from continuous monitoring of the sample temperature during the measurement period.This could be achieved by attachment of a rugged digital thermometer and data logger to the bucket wall.This setup could also be used to measure the hauling time, of which there are few reports in the literature.Combined with estimates of response time for a range of fast and slow-response liquid-in-glass thermometers, the lower bound of possible exposure times could be better constrained.It could be as-sumed that mariners obtained a temperature reading as soon as the thermometer achieved approximate equilibration.
While the results of our bucket comparison are not directly comparable to the bucket models and adjustments of FP95, we question their derivation and use of long exposure times (4 min for wooden buckets).As described in Part 1, FP95 estimated exposure times for canvas buckets using their finding that seasonal SST cycles in the extratropics were generally of larger amplitude prior to 1942.Although not stated directly, their method effectively assumes that seasonal cycles of spatially co-located bucket and intake temperatures are the same in their 1951-1980 reference period.However, if seasonal cycles in the extratropics are, in fact, generally larger at the surface compared to, say, 5-10 m depth, then a portion of the larger amplitude cycles of pre-1942 years may be attributable to sampling being from a generally shallower depth (more bucket than intake observations).Characterisation of climatological seasonal temperature cycles at various upper surface and near-surface depths (e.g. using drifting and moored buoy data) would enable separation of such depthrelated effects from other influences (e.g.bucket cooling).A complete explanation for the anomalous seasonal cycles pre-World War II (WWII) must be able to account for the spatial pattern of the differences (e.g. the particularly enhanced amplitudes about the Gulf Stream and Kuroshio), which the bucket cooling theory can explain.
Field and lab experiments have typically found cooling rates of around 0.05-0.1 • C min −1 for small-volume canvas buckets, although rates of 0.15 • Cmin −1 or more are sometimes reported.A critical assumption in converting cooling rates to bucket adjustments is the time taken for a reading to be obtained post-sampling (i.e. the exposure time).As discussed in Part 1, we suggest historical exposure periods for wood and canvas buckets were typically shorter than those derived by FP95 (1-2 min as opposed to 4-5) and thus that their corresponding bucket adjustments are too large (the largest in the central tropical Pacific being ∼ 0.7 • C).However, the distribution of actual historical exposure times remains highly uncertain and so this suggestion only serves to widen the range of possible average exposure times.Even so, the long exposure times used by FP95 imply that mariners would have waited several minutes for thermometers to equilibrate before reading, which we think unlikely.
While both bucket and intake temperatures can exhibit large systematic and random errors (e.g.Brooks, 1926Brooks, , 1928;;Roll, 1951;Saur, 1963;Tauber, 1969;Tabata, 1978a, b;Kent and Challenor, 2006), we consider EITs a particularly unreliable measure of actual SST (as defined here) given the potential for large vertical near-surface temperature gradients.Intakes sample at variable and often unknown depth, at which the temperature may differ by a few tenths to several • C from that in the upper few centimetres.We found temperature declines of up to 1 • C across the upper 3 m in the central tropical Pacific.Our average upper 3 m temperature difference between 17.5 • S and 3 • N was −0.4 ± 0.2 • C, with differences of this order found to persist day and night, even when 10 m wind speeds exceeded 6 m s −1 .EIT generally cannot be corrected for such near-surface gradients, even where these are known, due to limited metadata on intake depth.While intake depths have been reported for some voluntary observing ships since 1995, they remain unknown in many cases and must be assumed invariant even where they are reported (individual vessels are assigned a single intake depth whereas actual sampling depth varies with vessel loading).
The extent to which mechanical stirring by VOS ship propellers and motion acts to disturb near-surface gradients is unclear, as is its influence on measured bucket and intake temperatures.The latter likely depends on sampling point, with the near-surface probably less disturbed away from the stern.Evidently findings of large negative average bucketintake differences cannot reflect typical near-surface temperature gradients.Our physical modelling suggests they are also not likely due to warming of intake seawater by engine room air (we estimate this to be hundredths rather than tenths of a • C).Thus we cannot assume that intake thermometers accurately measure the intake temperature and that warm bias is simply due to warming of the incoming seawater.EITs have been found to average systematically too warm on some ships by > 0.5 • C (e.g.Brooks, 1928;Tauber, 1969).We suggest that reliable correction of such errors is not possible since their cause is largely unknown and their general magnitude can only be indirectly estimated from signals in the data.
We propose a new, alternative approach to SST record construction in which the need for poorly field-tested adjustments is reduced through more restrictive data selection.Namely, we suggest exclusion of intake and other subsurface temperatures based on the potential for strong vertical nearsurface temperature gradients.Removal of subsurface temperatures would suppress any artificial signals from variable measurement depth since the main remaining in situ methods (bucket and buoy) measure at a more consistent and historically invariant depth.It would also likely reduce the need for reliance on bucket adjustments to improve homogeneity, provided a new reference period climatology was also developed.Additional homogenisation could be achieved by identification and removal of bucket temperatures suspected to be in large error due to sample cooling (e.g.those collected using small-volume canvas buckets under strong winds and large sea-air temperature contrasts).Further field experiments would be required to determine environmental criteria for such exclusion.That bucket adjustments might still yield improvements in homogeneity in spite of the proposed approach has not been ruled out.
Note that we do not question that bucket and other adjustments can improve homogeneity in SST datasets and comparability with records derived from independent datasets, and do so with some skill (e.g.spatially).FP95 found that their bucket adjustments resolved an offset between global and hemispheric-average SST and Night Marine Air Tempera-ture anomalies pre-WWII.Kennedy et al. (2011b) developed separate global and hemispheric SST records for 1945-2006 using bucket and intake measurements and found that adjustments improved consistency between them, particularly over 1945-1970.However, large uncertainty remains surrounding the accuracy of such adjustments.This is apparent from Gouretski et al. (2012) who compared adjusted and unadjusted versions of the HadSST3 global-average SST record against a global-average record of near-surface temperature (0-20 m) derived using independent hydrographic observations.While application of bucket adjustments to HadSST3 reduced the offset between these records pre-WWII, notable discrepancies remained, the precise cause of which is unclear (the hydrographic observations were also adjusted).Post-WWII, global-average SST from HadSST3 shows similar trends to the hydrographic record with and without adjustments, suggesting the trends are robust.
Loss of spatial and temporal coverage due to exclusion of subsurface temperatures will require detailed consideration, but may not be as dramatic as first suspected.Intake temperatures appear to have comprised only a small proportion of the SST measurements obtained pre-WWII.Post-WWII, bucket temperatures are thought to have comprised around 40-60 % of global monthly SST observations until the introduction of moored and drifting buoys in the 1970s (Kennedy et al., 2011b).Note that around 2.5-15 % or more of monthly observations were of unknown method during this period and that undoubtedly some portion of the measurements assumed to be by bucket will have come from intakes.Improved metadata will thus be required to more completely identify subsurface measurements for exclusion.We suggest historical meteorological data recovery initiates (e.g.Wilkinson et al., 2011) target digitisation of bucket temperatures over intake temperatures from unknown or poorly-known depth.
Subsurface VOS temperatures can contribute to knowledge of diurnal and seasonal near-surface hydrodynamics where accurate and of known sampling depth.Thermometers used for bucket and intake measurements should ideally be calibrated before every cruise and measure to precision of at least 0.01 • C.There is an urgent need to improve the general quality of VOS SST data since they are used for a wide variety of scientific purposes (not just for producing global-mean SST records, for which random errors are less critical).Sea surface salinity (SSS) should be considered of equal climatic importance to SST, yet is only measured on select VOS ships and not included in ICOADS.The Global Surface Underway Data project (Petit de la Villéon et al., 2010) is working to collate SSS measurements from VOS ships such as those obtained through the French SSS Observation Service (Delcroix et al., 2010).Reprogramming of Argo floats to measure temperature and salinity every meter in the upper 20 m would improve coverage of near-surface variability, particularly beyond the shipping lanes to which VOS are largely restricted.Synthesis of near-surface hydrodynamics from existing floats measuring at least two temperatures and salinities within the T out u Fig. A1.Schematic of our model for warming of intake seawater by engine room air at temperature T air .The seawater is flowing at velocity u in a pipe of inside diameter D i .The initial seawater temperature is T in and the temperature after pipe length L is T out .We do not explicitly model a sea chest, rather we assume the temperature of the seawater in a sea chest is the same as that of the external seawater beyond the inlet (i.e.T in ).The model can thus be considered to represent a length of pipe inboard of a sea chest.upper 10m should also be conducted.Further data could be obtained by mounting additional thermometers on moored buoys in the upper 30 m.

Appendix A Engine intake warming model
We developed the following model for heating of seawater flowing through a pipe to test whether engine room warming of intake seawater is physically plausible.Fixed-value model parameters are given in Table A3 together with their symbols, units and prescribed value(s) used to generate Figs.A3  and A4.Computed model variables and their symbols, units and range of values calculated in generation of Fig. A3 are given in Table A4.Illustrative schematics highlighting some of the basic model parameters and variables are provided in Figs.A1 and A2.
Volumetric flow rate through a pipe is given by where ρ is density, m is mass, t is time and ṁ the mass flow rate.
Flow velocity is given by where A c is the inside cross-sectional area of the pipe.For a pipe of length L, the surface area of the inside wall is given by Similarly the surface area of the outside wall, A o = πD o L.
A single heat transfer process is assumed to occur in each medium; free (natural) convection in the engine room air, conduction across the pipe wall and forced convection in the intake seawater.Radiative transfer is neglected.
From Fourier's Law of Conduction, the rate of conductive heat transfer in one dimension is given by where T is a positive temperature difference across a material of thermal conductivity k, surface area A and thickness x. From Newton's Law of Cooling, the rate of convective heat transfer is given by where h is the convective heat transfer coefficient.Since the surface area of a cylindrical pipe is different for the inside and outside walls, we replace A in Eq. (A5) with a log-mean cross-sectional area, Thin boundary layers or films exist along the inside and outside walls of intake pipes, with flow velocity reduced towards the wall and strong temperature gradients present (Fig. A2).We define convective heat transfer coefficients for the inside and outside films, h if and h of , respectively.
Equating convective heat flow across the outside and inside films with conductive heat flow across the pipe wall we have where T 1−4 are defined as in Fig. A2, k w is the thermal conductivity of the wall and x w the wall thickness.We model an unlagged steel pipe.
Rearranging for the temperature contrasts driving the convective and conductive heat flow Combining Eqs.(A7), ( A8) and (A9) we can solve for the outside and inside wall temperatures, T 2 and T 3 as Given that seawater temperature varies along the pipe, we replace T 4 with an average seawater temperature, T ave = T in +T out 2 and T 1 − T 4 with a log-mean temperature difference, . T in and T out are the seawater temperatures at the inlet and after pipe length L, respectively.
We can now define an overall inside heat transfer coefficient, U i such that Summing Eqs.(A7), (A8) and (A9) and taking T 1 − T 4 = T lm then We can now solve for U i using Eq.(A12): The specific heat capacity of the intake seawater, c p is related to its warming by Equating Eqs.(A12) and (A15) and substituting in Eq. (A3): Rearranging for the temperature change after pipe length L: For the range of inside diameters adopted (Table A1) and our specified flow velocity of 1 m s −1 , pipe flow is turbulent with Reynolds number, Re, exceeding 10 000.Note Reynolds number is calculated as Re = 4 ṁ π D i µ with µ the dynamic viscosity.
We model convective heat transfer about the inside film (if) as for fully developed turbulent flow, using the empirical correlation of Gnielinski (1976) for a smooth tube: where Nu is the Nusselt number, f the friction factor and Pr the Prandtl number given by P r = c p µ k .Equation (A18) is valid for 0.5<P r<2000 and 3000 <Re<5 × 10 6 .We compute the friction factor using the explicit relation of Petukhov (1970) The convective heat transfer coefficient for the inside film is calculated using the thermal conductivity of the inside film, k if as For convection about the outside film (of) we use the Nusselt number formulation of Tahavvor and Yaghoubi (2008) for natural convection around a cold horizontal cylinder: where R aD is the Rayleigh number based on D o as the characteristic length and given by R aD = gβ of α of ν of (T 1 − T 2 )D 3 o (Homayoni and Yaghoubi, 2008).β of is the thermal expansion coefficient, α of thermal diffusivity, ν of kinematic viscosity and g acceleration due to gravity.We use Eq.(A20) up to R aD = 4.44×10 8 , above the specified R aD upper limit of 10 8 .This is acceptable given that only relations for warm cylinders (i.e.those with outside wall temperature warmer than the adjacent air) are otherwise available and use of these yields similar values for h of .For instance, use of relation (16b) in Tahavvor and Yaghoubi (2008), valid for warm cylinders and R aD > 10 8 , yields h of values ranging from 3.9-5.4W m −2 K −1 for Fig. A3 compared to 4.1-7.1 W m −2 K −1 using Eq.(A20).Differences between computed T out − T in values were all < 0.01 • C.
Similar to Eq. (A19): Dimensionless parameters and other variables computed to find h if and h of are calculated respectively at the inside and outside film temperatures (T if and T of ), taken to be The intake warming model is solved iteratively from initial guesses for T out , h if and h of with T out updated each iteration as follows: where n is iteration number.
We adopt an upper limit for engine room air temperature of 50 • C and vary inlet temperature in 1 • C intervals between 0 and 30 • C. Pipe inside diameter is varied from around 6 to 37 cm corresponding to a range of standard outside diameters with wall thicknesses of common upper limit (Table A1).Pipe inside diameters are dependent on engine horsepower and type and determined from volume flux requirements for engine cooling.Kirk and Gordon (1952) report intake pipes of 14 inch (∼ 35 cm) inside diameter on British ocean weather ships while Saur (1963) notes pipe diameters varied between 4 and 20 inches (around 10 to 50 cm) across 12 US military vessels.Piip (1974) describes well thermometers inserted into engine intakes to at least 25 cm depth, so inside diameters were perhaps double this.Tabata (1978a) reports an engine intake pipe of 20 cm diameter on a Canadian research vessel.A typical inside diameter on a modern 100 000 tonne diesel tanker would be ∼ 25 cm.Intakes on steamships could have been larger still given that steam engines are closed cycle and so do not expel some of their waste heat through gaseous exhaust like diesel engines.To derive Fig. A3 we adopted a fixed pipe length of 20 m, above the upper end of inlet-thermometer distances reported in the literature (Table A2).Seawater-specific heat capacity, thermal conductivity and dynamic viscosity were calculated using the Massachusetts Institute of Technology Thermophysical Properties of Seawater toolbox (http://web.mit.edu/seawater/), using a salinity of 35 psu.

Fig. 1 .
Fig. 1.Map of the cruise transect across the central tropical Pacific.The blue line denotes the portion of the transect where both bucket and 3 m thermosalinograph temperatures (T 3m ) were observed.The black line denotes the portion of the transect where bucket measurements were not taken.Locations of CTD casts are marked by red dots.

Fig. 3 .
Fig. 3. Histograms of differences between near-simultaneous sea surface temperatures obtained with (a) wood and canvas buckets, (b) rubber and canvas buckets and (c) rubber and wood buckets.A value of 0.7 • C is excluded from (b), hence this subplot has one fewer total number of stations than (a) and (c).

Fig. 4 .
Fig. 4. Eastward 19 m current velocity along the cruise transect between ∼ 17.5 • S and 19• N as measured by acoustic Doppler current profiler.Dashed lines and associated text labels indicate current regimes identified in Table 1."Mod" means moderate.

Fig. 5 .Fig. 6 .
Fig. 5. Meridional temperature structure of the upper surface and near-surface along the cruise transect: (a) composite bucket SST and 3 m thermosalinograph temperature, (b) daily average composite bucket SST, 3 m thermosalinograph temperature, OSTIA foundation temperature and 15 m CTD temperature.The maximum and minimum values of composite bucket SST and 3 m temperature on each local day are denoted by the upper and lower bars (not plotted in current regimes with strong meridional temperature gradients).Currents regimes are demarcated as in Fig. 4.

Fig. 7 .
Fig. 7. Temperature structure of the upper 20 m in various current regimes along the cruise transect: (a) the weak and (b) moderate branches of the South Equatorial Current (SEC), (c) the Equatorial Countercurrent (SECC), (d) the strong branch of the SEC and (e) the equatorial cold tongue.The blue lines are temperature profiles corresponding to individual CTD casts.Temperatures at 5, 10, 15 and 20 m are fromCTD (indicated by the crosses) while those at 0.1 and 3 m are from composite bucket SST and thermosalinograph, respectively.The bucket measurements were obtained within 2 h and 15 km of each respective CTD cast.Bucket temperatures are not plotted for casts 11-13 since they are unlikely to be representative of conditions at the time of the respective CTD cast (they were obtained at 12:00 or 13:00 LT whereas the CTD casts were conducted between 10:00 and 11:00 LT).The remaining bucket temperatures will be in error by ∼ 0.1 • C or less due to such timing differences.All casts were taken between 9:00 and 12:00 LT, except CTD-1 which was taken around 15:30-16:00 LT.Cast numbers correspond to those on Fig.1.The red and black lines characterise the daily extremes of the upper 3 m temperature profile on the local day of the corresponding CTD cast.They are respectively defined from maximum and minimum 3-hourly average 3 m temperatures, and corresponding 3-hourly average composite bucket temperatures.They are not plotted in the panels for the SECC and cold tongue, where diurnal cycles were masked by transit through strong meridional gradients.

Fig. 8 .
Fig. 8. Scatter plots comparing upper 3 m temperature differences with (a) true wind speed at 10 m and (b) speed over ground of the Seamans.A linear least squares regression for plot (b) yields a sizeable negative gradient of −0.02, although with r 2= 0.06.The vertical dashed line on (a) denotes a wind speed of 6 m s −1 .General thinking holds that the near-surface should be near-isothermal at higher wind speeds.

For a cylindrical pipe
of inside diameter D i , A c = diameter D o is related to inside diameter through wall thickness, x, by D o = D i + 2 x.
Fig. A2.Cross-section through the modelled intake pipe.An illustrative temperature profile is shown by the solid black lines connecting temperatures T 1 , T 2 , T 3 and T 4 , with engine room air temperature, T 1 , being the warmest.
Fig. A4.Pipe length required for intake seawater to warm by 0.2 • C given an engine room air temperature of 50 • C and flow velocity of 1 m s −1 .

Table 1 .
Average upper 3 m temperature differences and eastward surface velocities in various current regimes encountered along the cruise transect.The regimes exhibit distinct differences in surface current velocity and/or direction.Four currents were recognised along the transect: the South Equatorial Current (SEC), the South Equatorial Countercurrent (SECC), the North Equatorial Countercurrent (NECC) and the North Equatorial Current (NEC).Adjectives in regime names describe relative current strength in sub-branches of these currents.

Table A1 .
Intake pipe specifications used to generate Figs.A3 and A4.

Table A2 .
Inlet-thermometer pipe lengths reported in the literature.

Table A3 .
Fixed parameters of our seawater intake warming model including their value(s) for Figs.A3 and A4.