A new 3-D modelling method to extract subtransect dimensions from underwater videos

Abstract. Underwater video transects have become a common tool for quantitative analysis of the seafloor. However a major difficulty remains in the accurate determination of the area surveyed as underwater navigation can be unreliable and image scaling does not always compensate for distortions due to perspective and topography. Depending on the camera set-up and available instruments, different methods of surface measurement are applied, which make it difficult to compare data obtained by different vehicles. 3-D modelling of the seafloor based on 2-D video data and a reference scale can be used to compute subtransect dimensions. Focussing on the length of the subtransect, the data obtained from 3-D models created with the software PhotoModeler Scanner are compared with those determined from underwater acoustic positioning (ultra short baseline, USBL) and bottom tracking (Doppler velocity log, DVL). 3-D model building and scaling was successfully conducted on all three tested set-ups and the distortion of the reference scales due to substrate roughness was identified as the main source of imprecision. Acoustic positioning was generally inaccurate and bottom tracking unreliable on rough terrain. Subtransect lengths assessed with PhotoModeler were on average 20% longer than those derived from acoustic positioning due to the higher spatial resolution and the inclusion of slope. On a high relief wall bottom tracking and 3-D modelling yielded similar results. At present, 3-D modelling is the most powerful, albeit the most time-consuming, method for accurate determination of video subtransect dimensions.


Introduction
With the advantage of being non-destructive, underwater imagery has become a common scientific tool for quantitative studies of the seafloor (Solan et al., 2003).This is due to an improvement of imaging technology (Kocak et al., 2008;Schettini and Corchs, 2010;Bonin et al., 2011) and the development of platforms such as sledges (Shortis et al., 2008;Jones et al., 2009), remotely operated vehicles (ROV) (Sedlazeck et al., 2009;Karpov et al., 2012;Lindsay et al., 2012;Stierhoff et al., 2012), autonomous underwater vehicles (AUV) (Dowdeswell et al., 2008) and manned submersibles (Chevaldonné and Jollivet, 1993;Tissot et al., 2007).Although different methods are available for underwater positioning and image scaling, practical considerations complicate the processing of the data in a quantitative way.
The main instruments on these vehicles are video and still cameras employed for both piloting and analysis.Their orientation plays a major role in data processing.In the past, the camera axis was set perpendicular to the substrate in order to reduce distortions in the images and ease scaling (Pilgrim et al., 2000).This strategy is still applied for estimation of sponge densities (Chu and Leys, 2010), determination of algal cover below ice (Ambrose et al., 2005), mapping of hydrothermal vents (Cuvelier et al., 2009) and mosaicking (Garcia et al., 2001;Jerosch et al., 2007).Vertical set-ups facilitate the calculation of area for quantitative outputs.Oblique cameras offer a more natural view making identification and piloting easier (Jones et al., 2009) but scaling more challenging due to distortions resulting from perspective (Wakefield and Genin, 1987).The deployment of two cameras, one forward-looking and the other tilted Published by Copernicus Publications on behalf of the European Geosciences Union.
The most widespread sampling strategy in the deployment of underwater cameras is the execution of line transects (e.g.Post et al., 2010;Karpov et al., 2012;Smith et al., 2012); however, surveying points regularly distributed on a grid may provide an alternative (Chu and Leys, 2010).The general attitude of the vehicle carrying the camera during a transect is a delicate issue as it can greatly complicate the postprocessing and hence increase the time invested in analysis (Jones et al., 2009).Usually the pilot tries to keep the distance to the substrate (Anderson and Yoklavich, 2007), the heading (Ambrose et al., 2005;Cuvelier et al., 2009) and the speed constant (Jones et al., 2006;Karpov et al., 2012).
Once images have been acquired, the area covered by the complete video transect, by subtransects or by single pictures (stills or extracted video frames) has to be determined in order to be able to assess quantitative data such as abundances and densities of organisms (Auster et al., 1989).
Usual methods for the scaling of single frames, appropriate for relatively flat habitats, rely on algorithms based on knowledge of the distance to the substrate and on the camera properties to estimate the size of the field of view (e.g.Jerosch et al., 2007;Guinan et al., 2009;Stierhoff et al., 2012), the use of parallel lasers as references (e.g.Pinkard et al., 2005;Baker et al., 2012b) or the overlay on the pictures of a perspective grid as described in Wakefield andGenin (1987) (e.g. Pilgrim et al., 2000;Pinkard et al., 2005;Smith et al., 2012).
While working on videos, especially with oblique cameras, the area surveyed can be calculated by multiplying the centre width of the frames, obtained by one of the scaling methods previously cited, by the length of the transect or the subtransect (Auster et al., 1989;Pinkard et al., 2005).This length might be derived from underwater navigation data (Auster et al., 1989) using an equal area projection in a geographic information system software to cipher the distance travelled by the vehicle (Tissot et al., 2007;Karpov et al., 2006Karpov et al., , 2012)).The choice of the geographic coordinate system can greatly impact the results as an inadequate projection would lead to high distortions, especially in polar regions (Sievers and Bennat, 1989).Transect length might also be evaluated from the speed recorded by a Doppler velocity log (DVL) (Pinkard et al., 2005;Snyder, 2010;Stierhoff et al., 2012) or read directly from the DVL bottom track data (Kocak et al., 2004).
Other means have been suggested for area determination, e.g.measuring distances between features on bathymetric charts (Karpov et al., 2006), flying over a known length of tether from a weight (Auster et al., 1989), using a weighted wheel bound to an odometer (Pollio, 1969), deploying a scale (Patterson et al., 2009), frames (Kocak et al., 2004;Amado-Filho et al., 2012) or using objects of known size as scaling references (Jones et al., 2006(Jones et al., , 2009)).
To sum up, the complexity of the scaling process depends on the camera system employed and the attitude sensors available on the vehicle: it is easier to scale vertical images with a constant field of view than to calculate the area surveyed by an oblique camera with variable tilt, altitude and speed (Pinkard et al., 2005).
Relief and substrate roughness can also be an issue as they may affect some instruments such as lasers (Karpov et al., 2006) and DVLs (Pinkard et al., 2005) and result in significant differences between the actual distance travelled and the track length computed from the navigation system (Barry and Baxter, 1993).In habitats with a rough small-scale topography, difficulties arise as complex 3-D structures are represented on 2-D images: all visible surfaces are not located at the same distance to the camera nor viewed from the same angle and hence appear at different scales on the images.None of the previously cited scaling methods is able to account for this.
Nowadays, a plethora of underwater videos and pictures are available, from regions all around the globe (e.g.Arctic: Laudien and Orchard, 2012, Antarctic: Gutt and Starmans, 2001, tropics: Carleton and Done, 1995).They cover all depth ranges (e.g.photic zone: Parry et al., 2002, continental slope: Baker et al., 2012a, deep sea: Chevaldonné and Jollivet, 1993) but represent a very heterogeneous assemblage of video quality, camera orientation and methods used to calculate the area covered by the survey.This becomes problematic when spatial or temporal comparisons have to be realized.
The solution imagined was to create scaled 3-D models of the portion of substrate visible in underwater videos from which the dimensions of several subtransects could be derived.For this purpose PhotoModeler Scanner (EOS Systems) was used: a commercially available 3-D modelling software which triangulates the position of various points on an object or a surface from pictures representing different views of this object.The point cloud obtained can then be scaled by entering one or several known distances, named here scaling references, to allow measurements between any two points within the model (Ewins and Pilgrim, 1997).Pho-toModeler was initially developed for land-based work.In an aquatic environment, turbidity and image distortions might impact the accuracy of the 3-D model (Ewins and Pilgrim, 1997).In addition, artificial lighting results in the centre of underwater images being brighter than the edges (Schettini and Corchs, 2010) so that differences in brightness could also disturb the process of 3-D reconstruction as the colours are distorted while the camera moves (Sedlazeck et al., 2009).Ewins and Pilgrim (1997) found the software suitable for underwater work.It has been successfully employed for morphometric analysis on corals (Bythell et al., 2001) and mapping of submarine archaeological sites (Green et al., 2002;Green and Gainsford, 2003).The advantage of this method is that it only bases on overlapping images and a scale and should thus be applicable to the majority of the underwater videos readily available.Furthermore, as 3-D information can be regained from 2-D images, this could be especially useful in habitats with a rough small-scale topography.
Here, we describe a method of subtransect length computation from 3-D models of the seafloor created with Photo-Modeler from ROV videos.We also evaluate this technique on videos showing different qualities and orientation, using two scaling references on two types of substrate.Finally, we compare the subtransect lengths obtained via 3-D modelling with distances estimated from underwater navigation data and DVL bottom tracking.

Video material: sites and set-ups
Video material from three dives with different ROVs was used to evaluate the feasibility of 3-D modelling with Pho-toModeler aiming at subtransect length measurements.The key parameters of the sites and set-ups are summarized in Table 1.

Dive A
The video data for the first 3-D reconstruction originated from a dive at station PS69/724-1 (64 • 54.9 S, 60 • 39.15 W) during the expedition ANT-XXIII/8 of R/V Polarstern in January 2007 in the Larsen Ice Shelf area (Antarctic Peninsula).The substrate was relatively flat and composed of mud, sand and pebbles with depths varying from 146 to 190 m.The ROV "Cherokee" (sub-Atlantic) owned by Marum, University of Bremen, Germany was deployed.It was equipped with a forward looking standard definition (SD: 720 × 576 px, progressive, 25 fps, 25 Mbps) video camera (Tritech Typhoon PAL,), a still camera (Nikon Coolpix 995) and an additional overview camera (DSPL MultiSeacam color PAL), illuminated by three 500 W LEDs (ROS QLED III).Two parallel red lasers (ILEE LDA1000) pointing into the centre of the SD video provided a reference scale of 20 cm.Additional navigation sensors were available: a mechanical scanning sonar (Tritech super SeaKing), a pan and tilt unit, an altimeter (Tritech PA500) and a manipulator (Hydrolek, EH5) for sampling.The underwater position of the vehicle was not available.The video signal from the SD camera was recorded on mini-DV (.avi, DV (digital video)).During the entire dive, the pilot tried to keep the heading and distance to the seabed constant, following the ship's track.

Dive B
A second dive was realized at approximately the same site as dive A in March 2011, at station PS77/253-1 (64 • 54.82 S, 60 • 39.06 W) during the R/V Polarstern ANT-XXVII/3 expedition.Depth varied from 143 to 167 m.A ROV (Sperre SubFighter 7500 DC) belonging to the Sven Lovén Centre for Marine Sciences, University of Gothenburg, Sweden was deployed with one forward looking high definition (HD: 1920 × 1080 px, interlaced, 50 fps, 50 Mbps) video camera (Sony FCBH11, lens: 5.1-51 mm, F1.8-F2.1),two standard video cameras for navigation and umbilical surveillance and one still camera (Canon Powershot G9).Two parallel red lasers (Deep Sea Systems) placed 5 cm apart were projected in the centre of the HD video for scaling.Lighting was ensured by two 200 W HMI (hydrargyrum medium arc-length iodide) lights (Sperre) and two 250 W halogen lights.The vehicle also carried a scanning sonar (Kongsberg Mesotech), a CTD (Conductivity-Temperature-Depth recorder; Saiv SD204) and a manipulator (Hydrolek EH5).Underwater position was determined via the ultra short baseline (USBL) system Posidonia (Ixsea) linked to the GPS system on-board R/V Polarstern.The USBL data (latitude, longitude and depth) was imported into the ROV data processing software OFOP (Ocean Floor Observation Protocol) (Huetten and Greinert, 2008) for real time display and recording of the vehicle position.All videos were relayed to the surface control room and the HD stream was saved to compact flash cards (.mov, .mpeg2)with a nanoFlash recorder (Convergent Design).The dive alternated between short (10 min) line transects where the pilot kept the heading, speed and altitude constant and periods where the vehicle remained immobile for sampling and small-scale observations.

Dive C
The third data set was recorded in February 2012 during the expedition Errina 2012 on M/V Explorador.The station Er-rina2012 GD (51 • 10.14 S, 74 • 56.171 W) was located in the steep-sloped Guadalupe Channel in Chilean Patagonia.The substrate was composed of stony walls alternating with slides of finer sediment resulting in a rough habitat topography marked by small-scale variations in slope angle and orientation down to 150 m.The ROV, a V8 Sii (Ocean Modules) customized for the Alfred Wegener Institute, Germany, carried two HD (1920 × 1080 px, interlaced 60 fps, 50 Mbps) video cameras (Kongsberg oe14-502, lens: 5.1-51 mm, F1.8-F2.1):one oriented horizontally and the other tilted 30 • downward for navigation and data analysis.A wide angle camera (Bowtech L3C-550) observed the rear to control the manipulator (Sub-Atlantic MK 1) and the tether.An echo sounder (Tritech Micron) was mounted onto the tilted camera to measure its distance to the substrate (Karpov et al., 2006).Light was provided by five LEDs (Bowtech LED-2400 aluminium): four in the front and one at the rear.An obstacle avoidance sonar (Tritech Micron) facilitated the navigation and a Doppler velocity log (RDI Explorer PA) orientated in the same direction as the tilted camera was used for bottom tracking and current measurements.Depth was obtained from the inertial measurement unit (IMU) and the CTD (SeaBird SBE19 plus).The USBL positioning system (Tritech MicroNav) was linked to a differential GPS (Geneq SX Blue II) and the position of the vehicle was plotted and recorded in the Seanet software (Tritech).Data from the DVL were displayed and registered in WinRiver II (RDI).The HD video streams were captured to compact flash cards (.mxf, .mpeg2)by a nanoFlash recorder (Convergent Design).The strategy adopted on this site was to exploit the ROV's 360 degrees manoeuvrability and fly several short (estimated 15 m from the DVL bottom track) horizontal transects at given depths by moving sideways, thus keeping the tilted camera axis perpendicular to the channel's wall.The pitch was adapted to the slope and the speed, heading and distance to the substrate were kept as constant as possible.Nevertheless, navigation was difficult due to the rough habitat topography and the presence of obstacles (stones, overhangs) on the trajectory requiring careful adjustment of the vehicle.

Determination of subtransect length
Figure 1 gives an overview of the different steps necessary to obtain subtransect measurements from 3-D models, USBL underwater acoustic navigation and from DVL bottom tracking.

PhotoModeler
To create 3-D models with PhotoModeler Scanner overlapping pictures along the transects and a scale are needed.Videos were trimmed to consistent sequences (.mpg, .mpeg2) of stable vehicle speed, heading, tilt and distance to the substrate with Freemake video converter.Free studio (DVD-VideoSoft) was then used to extract as .jpegevery tenth frame for dive A and B (Antarctic) and every twentieth frame for dive C (Chile, tilted camera).In order to minimize the dis-turbances due to artificial lightning, the edges of the pictures were cropped in XnView by up to 10 % vertically and horizontally.With these settings, any feature was seen from at least 8 angles as recommended in previous studies using PhotoModeler or similar software (Bythell et al., 2001;Cocito et al., 2003;Green and Gainsford, 2003;de Bruyn et al., 2009).
The frames obtained were imported into PhotoModeler Scanner and an automated "SmartPoints project" was run.During this processing, the software first automatically detects natural features in each picture and marks them as "SmartPoints" (Fig. 2).Based on its characteristics (position, shape, scale) each feature is then identified on consecutive pictures and its displacements followed up.From these movements, a programme routine reconstructs the relative position of the camera from which each picture was taken (Fig. 3).Finally, the relative 3-D position of each SmartPoint is solved, resulting in a 3-D points cloud (Fig. 4).While several of the 3-D reconstruction algorithms developed in the last decade have been published (Pizarro et al., 2004;Brandou et al., 2007;Sedlazeck et al., 2009;Beall et al., 2010), the algorithm running in PhotoModeler, a commercial software, is not publicly available.
For each video sequence, the processing was first run on an initial group of 50 consecutive frames.If the modelling was successful, more frames were added in groups of 10 and the model reprocessed until the software failed to construct a point cloud.The last successful model was then considered as a subtransect and a new model was started with the next 50 frames.For each subtransect, the time at which the first (t start ) and the last (t end ) frames included in the model were recorded was listed and the likelihood of the camera trajectory was checked in the corresponding video.Impossible camera positions (i.e.lying in the ground or too far from the others) and obvious badly positioned SmartPoints (i.e.deep in the sediment or floating far above the substrate) were removed manually.After this cleaning procedure, the 3-D models were scaled to obtain the absolute distances in meters between any two 3-D SmartPoints.For the Antarctic deployments (dives A and B) the frames (N ≥ 3 per subtransect) selected were those that showed the laser dots most clearly on a flat surface.The known distance between the laser points was used to calibrate the distance between the two nearest 3-D SmartPoints.For dive C in Chile, the distance between the camera and the  central point in the image, known from the echo sounder, was entered as a scaling reference every 10 images (Fig. 5).
When several scaling references are entered for one model, PhotoModeler applies an affine transformation to best fit all values and recalculates the dimensions of the references.A comparison between the dimensions estimated by Photo-Modeler after scaling and the known size of the references provides a measure of the scaling error, expressed in percentage of the measured length.It includes both the 3-D Smart-Point positioning error by PhotoModeler and the error made while measuring the scaling references (laser points or echo sounder).
Finally, the linear subtransect length (L 3Dl ) was measured by considering the straight line between the central points in the first and last frames.The projected subtransect length (L 3Dp ) was obtained by measuring segments linking the central points of frame n and frame n + 10, moving from the first to the last frame in the 3-D model and thus following the substrate small-scale topography (Fig. 6).

Underwater acoustic positioning
The geographic position of the ROV obtained from the Posidonia USBL system was imported into OFOP for processing.Erroneous locations were identified by eye and removed.
The track was then smoothed using a floating mean algorithm taking the 20 nearest neighbours into account and the spline function was used to rebuild the position for every second.The smoothed trajectory was plotted into the software Ar-cGIS (ESRI) as a single polyline.Based on t start and t end from the 3-D models, the geographic position of the ROV at the beginning and at the end of each PhotoModeler subtransect was identified and the smoothed USBL trajectory was extracted between those two positions.The extracted track was then projected to a metric system to compute the distance travelled during the subtransect (L USBL ).The Lambert azimuthal equal area projection centred on the site was used, an equivalent coordinate system recommended for length measurements in the Antarctic Digital Database manual.

Bottom tracking
A Doppler velocity log acoustically tracks the velocity vector of a ROV relative to the substrate and computes the distance travelled by the vehicle.The DVL data was extracted for each subtransect from the WinRiver software (RDI) using the same time windows (t start to t end ) for which 3-D models were created from the videos.As the time interval between two DVL measurements was 3.5 s, a simple linear interpolation was realized to compute the data for every second and so calculate the distance travelled during each subtransect (L DVL ).

PhotoModeler
For dive A in the Antarctic with a standard definition camera, fifty-two (52) subtransects were successfully reconstructed in 3-D (Table 2) with a mean scaling error of 4.7 %.The mean linear subtransect length (L 3Dl ) was 6.55 m from a total of 341 m modelled.The projected subtransect length (L 3Dp ) was different from L 3Dl in only nine cases where a slight relief was observed along the subtransect.In those nine subtransects L 3Dp was longer than L 3Dl by a maximum of 3 %.
For dive B, located on the same site but with a high definition camera, seventy-one (71) subtransects were modelled (Table 2) and the scaling error was not significantly different from the one in dive A (Mann-Whitney rank sum test, P = 0.662).The linear subtransect length (L 3Dl ) was, on average, 1.7 m longer than in dive A (Mann-Whitney rank sum test, P = 0.003) and the total length modelled almost twice as long.Twenty-three ( 23) subtransects presented a slight relief for which L 3Dp was measured as being longer than L 3Dl by an average of 3 %.Out of the sixty (60) 3-D models created in Chile (dive C), only fifty-five (55) could be scaled (Table 2) with an average scaling error of 10 % of the length, more than twice as large as for dive A and B. The scaling error was positively correlated with the standard deviation of the distance to the substrate measured by the echo sounder during the subtransects (Pearson product moment correlation, correlation coefficient = 0.305, P = 0.024).The reconstructed trajectories were in general shorter for those horizontal flights along the wall than for the line transects in the Antarctic as modelling often failed when the vehicle was moving too abruptly or when the slope changed too quickly due to the rough substrate.L 3Dp was longer than L 3Dl by 13 %, on average, for all but two subtransects where they were equal.
For one single subtransect, it took 1.5-6 h to pre-process the videos and go through the various steps necessary to Ocean Sci., 9, 461-476, 2013 www.ocean-sci.net/9/461/2013/obtain lengths from 3-D models with PhotoModeler.On average, this represented a computing time of 11-26 min to measure one meter of transect (Table 3).

Underwater acoustic positioning
The Posidonia underwater USBL positioning system yielded erratic results (Fig. 7), with consecutive positions sometimes up to 170 m apart.Removal of outliers and spline fitting the data allowed reasonable reconstructions of the vehicle's track.The mean distances between the OFOP smoothed trajectory and the raw Posidonia positions was 3.74 ± 13.91 m.The seventy-one (71) USBL subtransects corresponding to the 3-D models had an average length (L USBL ) of 6.45 ± 2.79 m and a total length of 458 m.For a complete dive, the time needed to compute the length of all subtransects was about 1.5 h, equivalent to a computing time of 12 s for one meter of transect (Table 3).

Bottom tracking
One third of the subtransects modelled in 3-D could not be measured by bottom tracking with the DVL because of missing data resulting from a too close range of the ROV (< 1.2 m) or off-angle relative to the slope.Only measurements with less than 20 % missing pings were included in the comparison, representing a number of thirty-seven (37) subtransects with an average length (L DVL ) of 3.48 ± 1.72 m and a total length of 129 m.
To obtain the length of one subtransect required a computing time between 11 and 37 min at an average speed of one meter every 7 min (Table 3).

3-D versus acoustic positioning
For dive B, L USBL was significantly different from L 3Dl (paired t test, P < 0.001).The linear subtransect length from PhotoModeler resulted in distances on average 20 ± 22 % longer than the acoustic navigation data.The methods agreement assessment strategy of Bland and Altman (1986) was applied by plotting the difference between the lengths obtained from 3-D modelling and acoustic positioning (L 3Dl − L USBL ) against the average between both methods ((L 3Dl + L USBL ) / 2) (Fig. 8).Despite large scatter, the difference tended to increase with increasing subtransect length (Pearson product moment correlation, correlation coefficient = 0.292, P = 0.013).Conducting the same tests with L 3Dp produced similar results.

3-D versus bottom tracking
No significant difference was detected between L DVL and L 3Dl (paired t test, P = 0.982) but the test presented low power (0.05).As seen in Fig. 9, PhotoModeler linear subtransect lengths appeared comparable to DVL measurements with a mean difference around zero, yet showing a high scatter (standard deviation for L DVL − L 3Dl was ± 22 %).
The comparison of L 3Dp and L DVL identified a significant difference between the projected subtransect lengths in PhotoModeler and the DVL distances (Wilcoxon signed rank test, P < 0.001).As shown in Fig. 10, L 3Dp was clearly longer than L DVL (mean difference = 14.85 ± 20.84 % of length) and the difference increased slightly with increasing distance (Pearson product moment correlation, correlation coefficient = 0.435, P = 0.007).

PhotoModeler
The surface of the substrate was successfully modelled for several video sequences of all three ROV dives with Photo-Modeler Scanner.It was thus shown that the method of 3-D subtransect reconstruction aiming at distance measurements is applicable for both vertical and oblique camera orientations.As the scaling error for the models was not significantly different between dive A and B in the Antarctic and represented less than 5 % of the length, it seems that neither the video quality (standard or high definition) nor the length of the scaling references biased the accuracy of the 3-D models.Comparable quantitative data from dive A and B could be computed from PhotoModeler data as the surface of the subtransects was assessed using the same method.This was not possible with traditional methods of survey area determination as navigation was missing for dive A. For those two deployments, two parallel lasers placed at respective distances of 20 and 5 cm from each other were projected on a relatively flat bottom so that the scale remained mostly unaltered and constantly visible directly in the frames integrated into the model.The difference in the subtransect lengths between dives A and B can be explained by the higher altitude and speed with which the ROV was flown during dive A. These conditions increased the difficulty for PhotoModeler to follow up features displacement.In Chile (dive C), the use of the echo sounder as a scaling reference in a rough stony habitat yielded a scaling error twice as large as for dive A and B. This error increases when the scale itself is not properly measured.The echo sounder was sometimes disturbed by the presence of superimposed objects positioned at various distances in its field of view.Moreover, its data were recorded independently of the videos, and a delay of only 1s in time synchronization could mean a great change in the distance to the substrate.The standard deviation of the distance to the substrate reflects the amount of change in the distance from the camera to the substrate, i.e. the roughness of the small-scale topography.It is thus not surprising that the scaling error was positively correlated to the variability of the distance to the substrate as a rougher topography would mean a higher measurement inaccuracy for the scale.Lasers would not have performed better than the echo sounder as they are often not visible in rough habitats and their projection becomes distorted (Karpov et al., 2006).Furthermore, piloting was influenced by the topography, leading to shorter subtransects modelled in Chile due to abrupt camera movements.Overall, the accuracy of the models presented in this study was acceptable and comparable to results obtained from perspective grids (Smith and Hamilton, 1983;Kocak et al., 2004).The distortion of the reference scales due to substrate roughness in Chile was identified as the main source of imprecision.The performance of PhotoModeler for subtransect length measurements was tested.In a similar manner, the width could be determined from the 3-D models.After export of the point cloud to computer-aided design software even the area of the entire subtransect or of given surfaces could be directly computed (Bythell et al., 2001).In this study, modelling was realized with frames extracted from videos but the software is also able to work with overlapping still pictures (Bythell et al., 2001;Green et al., 2002;de Bruyn et al., 2009).The processing requires identifiable features on the images and the sandy/rocky habitats in the Antarctic and Chile offered several such features.In contrast, the modelling of muddy substrate could be more difficult due to featureless or smooth surfaces (Green and Gainsford, 2003).Calibrating the camera could improve the accuracy of the data ciphered from 3-D models (Ewins and Pilgrim, 1997;Bythell et al., 2001;Cocito et al., 2003;Green and Gainsford, 2003).However, calibration requires direct access to the camera, is usually performed in shallow water and can be affected by depth (Shortis et al., 2008).Reference targets of known size could also facilitate scaling (Green et al., 2002), especially in complex habitat structures where laser projections are distorted and echo sounders disturbed by the topography.Nonetheless the deployment of objects along a transect is a long and difficult task in deep environments where sledges and ROVs are usually set.Deploying stereo cameras could also be a solution as the distance between both cameras could be used as scaling reference (Shortis et al., 2008;Althaus et al., 2009;Beall et al., 2010).

Underwater acoustic positioning
The Posidonia USBL system used during dive B revealed a positioning accuracy far worse than the 0.3 % of distance expected from the instrument specification (60 cm on a 200 m deep site).Underwater acoustic devices can be affected either by signal disturbances or sound velocity (Gamroth et al., 2011).Therefore, we assume that the system was disturbed by stratification (halocline), the presence of ice crystals in the water and a rather low sound velocity of about 1440 m s −1 .Compared to distances between bathymetric features known from charts USBL navigation revealed more accuracy in shallow water (< 30 m) (Karpov et al., 2006) but yielded similar instabilities in deeper water (> 600 m) (Althaus et al., 2009).Moreover, in Chile, the Micronav USBL system almost completely failed to record the ROV position in a steep channel setting with vertical walls which act both as reflective surfaces and obstacles for the acoustic signal.Distance measurements from underwater trajectories can become more accurate by increasing the length of the subtransect (Barry and Baxter, 1993;Karpov et al., 2006).However video transects cannot always be exploited in their entire length due to bad quality sequences.New technologies couple USBL data with DVL speed measurements for better navigation (Kinsey and Whitcomb, 2004;Kocak et al., 2004;Dolan et al., 2008).Likewise, long baseline can be employed for precise positioning (Parry et al., 2003) although it implies the deployment of an additional system on the site (Pilgrim et al., 2000).

Bottom tracking
A considerable amount of data from the DVL was not usable, probably because the vehicle flew too close to the substrate (Pinkard et al., 2005).Nonetheless it has been proven that DVL bottom tracking can be as accurate as GPS positioning for a ship (Snyder, 2010) and more precise than USBL for a ROV (Pinkard et al., 2005).The performance of the DVL could be improved by coupling it to the attitude sensor of the ROV (Kinsey et al., 2006;Snyder, 2010).In this study, these inaccuracies were minimized by keeping the heading as constant as possible and by the ROV's automatic stabilization of depth, tilt and roll.

Comparison
While using underwater acoustic navigation data to calculate the distance travelled, even with an appropriate coordinate system, the trajectory is usually projected on a flat surface unless a bathymetric model is integrated.This explains why the distances ciphered from USBL positioning were shorter than those computed from 3-D models (Barry and Baxter, 1993).Moreover, positioning inaccuracies in Posidonia and post-processing in OFOP influenced the USBL subtransect lengths.Smoothing, i.e. removing small-scale jitter and loops from the track, is equivalent to shortening the ROV track (ultimately to a straight line).In contrast, the 3-D method integrated the tiniest movements.The difference between both lengths is correlated to the distance measured as the number of small deviations from the straight line increases on a longer path.
Bottom tracking with a DVL uses the structure of the substrate to compute the straight line distance travelled by the vehicle but does not project the path on the topography.Hence such data were comparable to the linear subtransect length obtained with PhotoModeler in Chile.However these results must be considered carefully, due to the low power of the test.The high variability of the difference values between both methods suggests an underlying mismatch probably due to the weaknesses of the DVL data.
Substrate relief and roughness cannot be ignored as they have an ecological significance (e.g.Wilson et al., 2007;Gratwicke and Speight, 2005) but they lead to more challenging analysis.PhotoModeler allows the user to take the small-scale topography into consideration, for instance by measuring a projected subtransect length.This measure was, as expected, longer than the linear subtransect length and the DVL bottom track every time the bottom presented some relief or 3-D structure.Moreover, the difference between the DVL measurements and the projected subtransect length obtained with PhotoModeler were correlated to the distance measured.As in acoustic positioning, this can also be explained by the amount of deviation from the straight line increasing on a longer path.However, the precision used for the projection must be standardized as the subtransect length could be extended up to infinity by increasing the resolution (Mandelbrot, 1967).The main advantage of 3-D modelling is that measurements can be performed on the actual substrate topography (Shortis et al., 2008) taking the slope and small-scale relief into account and related directly to objects and surfaces visible in the videos.Used in combination with an endoscopic camera (Wunsch and Richter, 1998), Photo-Modeler could allow the mapping of cryptic habitats such as cracks and crevices in coral reefs (Richter et al., 2001).
For the estimation of substrate area needed for density studies, 3-D modelling seems much more suitable than DVL and USBL data, especially in high-relief habitats.Nevertheless the processing in PhotoModeler is extremely time consuming (see also Cocito et al., 2003).Obtaining subtransect lengths from 3D models took us twice as long as from bottom tracking and lasted up to 130 times longer than distance measurements from the acoustic navigation system.
In terms of data representation, trajectories from an underwater positioning system can be directly mapped, which is an advantage.The movements over ground recorded by a DVL are displayed in real time, allowing distance measurements during the deployment.3-D models give an insight into the structure of the substrate, a significant ecological factor.Besides, PhotoModeler offers an option for geo-referencing and export to other software for further processing.

Conclusion
In summary, 3-D modelling is a solution to compare quantitative data extracted from several underwater video transects.Applicable on a variety of set-ups, it is an alternative to compute subtransect dimensions when more traditional methods cannot be employed.For example, image scaling, underwater acoustic positioning or DVL bottom tracking may fail due to unsuitable camera set-ups, unavailability of instruments, inaccurate measurements and difficult environmental conditions such as high relief.One of the main advantages of 3-D reconstruction is that it relates directly to the surfaces and objects seen in the images.In the case of rough substrates, it is the first step to accurately measure areas considering the actual topography.Nevertheless, scaling the model is a sensitive issue, especially in habitats showing high structural complexity, and the accuracy of the measurements will greatly depend upon it.At the present time, the disadvantage of 3-D processing in PhotoModeler is the decidedly time-consuming procedure.Whether or not 3-D modelling should be used depends on the other methods applicable for determining the surface surveyed, the topography of the site, and on the goal and scale of the study.

Fig. 1 .
Fig. 1.Workflow for the determination of subtransect length through 3-D modelling, USBL navigation and DVL bottom tracking.

Fig. 2 .
Fig. 2. SmartPoints (green dots) in PhotoModeler: automatic detection of natural features in a sample frame extracted from dive B.

Fig. 3 .
Fig. 3. SmartPoints matching and camera position reconstruction in PhotoModeler.(A1-4) Position of 4 SmartPoints identifying the same features on 4 consecutive frames.(B) Displacements of the 4 SmartPoints along the frames.(C) Reconstructed relative positions of the camera for the 4 previous frames within the subtransect.

Fig. 4 .
Fig. 4. 3-D SmartPoints cloud: the sponges visible in Figs. 2 and 3 are outlined.The same points are marked as in Fig. 3 and the relative orientation of the camera position A1 is shown in red.(A) View from the same angle as from camera position A1.(B) View from the same direction as camera position A1 but at bottom level.(C) View from the left side of camera position A1 at bottom level.

Fig. 5 .
Fig. 5. Scaling of the 3-D models (A).Dive A and B (Antarctic): the distance between the laser points (5 cm, red line) is used to calculate the distance between the two nearest SmartPoints (4.6 cm, black line).(B) Dive C (Chile): the distance (1.9 m, black line) measured by the echo sounder between the camera and the central point (red dot) in the first frame (not represented here) is employed as scaling reference for the 3-D model (green dots, grey lines for perspective).

Fig. 6 .
Fig. 6. 3-D models and subtransect length.(A) Example of a reconstructed ROV subtransect in the Antarctic seen in lateral view.(B) Example of a reconstructed ROV subtransect in Chile displayed in top-front view and showing the complex topography.(1) 3-D points cloud and camera positions.(2) 3-D points cloud, linear subtransect length (L 3Dl , red line) and projected subtransect length (L 3Dp , blue segments).
Fig. 7. (A) Overview showing the extent (red rectangle) of (B) high resolution detail of the ROV track during dive B (projection Lambert azimuthal equal area) highlighting the distance between the raw ROV positions from Posidonia and the OFOP smoothed trajectory.

Fig. 8 .
Fig. 8. Bland and Altman plot of difference against mean for the subtransect lengths measured from 3-D models (L 3Dl ) and from underwater navigation (L USBL ).

Fig. 9 .
Fig. 9. Bland and Altman plot of difference against mean for the subtransect length measured from bottom tracking (L DVL ) and the linear subtransect length in PhotoModeler (L 3Dl ).

Fig. 10 .
Fig. 10.Bland and Altman plot of difference against mean for the projected subtransect length in PhotoModeler (L 3Dp ) and the subtransect length measured from bottom tracking (L DVL ).

Table 1 .
Main parameters of the sites and set-ups for the three ROV dives.

Table 2 .
Subtransects lengths computed from the 3-D models for the three dives.

Table 3 .
Computing time necessary to obtain subtransect length with the three different methods.