Molecular biology techniques and applications for ocean sensing

Molecular biology techniques and applications for ocean sensing J. P. Zehr, I. Hewson, and P. H. Moisander Department of Ocean Sciences, University of California Santa Cruz, 1156 High Street E&MS D446, Santa Cruz, CA 95064, USA Received: 23 September 2008 – Accepted: 2 October 2008 – Published: 27 November 2008 Correspondence to: J. P. Zehr (zehrj@es.ucsc.edu) Published by Copernicus Publications on behalf of the European Geosciences Union.


Introduction
The biology of the oceans is recorded in the genetic material of organisms and viruses.The nucleotide sequences of chromosomes include genes that encode biological molecules such as proteins and ribosomes, as well as noncoding regions and genetic elements such as mobile elements, repeated DNA and viral genomes and fragments.The function of organisms is determined by the expression of genes into ribonucleic acids, messenger mRNA (mRNA), which is translated into proteins by the ribosome.With methods developed Correspondence to: J. P. Zehr (zehrj@ucsc.edu)over the past two decades, it is possible to characterize organisms and microorganisms on the basis of their molecular features, including the sequence of nucleotides composing the chromosome of an individual organism, even from a single cell.
Although there are methods for deoxyribonucleic and ribonucleic acids (DNA and RNA, respectively), proteins and other biological structures, in general "molecular techniques" are considered to be those that are used to characterize the sequence or molecular structure of nucleic acids of organisms.Two common uses of molecular approaches are to identify organisms and to examine the activity of organisms by assaying gene expression (gene transcription) (Zehr and Hiorns, 1998).
DNA sequences evolve by accumulating mutations induced by natural processes, such as UV radiation (Meador et al., 2009).Some of these mutations are selected through evolution because the resulting changes in protein sequences confer an ecological advantage by increasing ecological fitness.Some gene sequences accumulate fewer mutations than others, since the activity of the proteins or structure of the rRNA has specific structural requirements that are sensitive to substitutions of key amino acids or ribonucleotides.These genes are called conserved genes.Accumulated mutations in conserved genes can be used as taxa specific markers and can be used to compare the evolution of organisms using phylogenetic analysis.
In the marine environment, molecular biological methods have been used to study virtually all trophic levels (Cooksey, 1998; Zehr and Voytek, 1999).In the larger expanses of the ocean, and in the context of oceanic sensors and observing networks, molecular biological approaches provide methods for detecting and characterizing the key players in the biogeochemistry of the ocean: the planktonic microbiota.Molecular methods are particularly useful for studying microbial assemblages in the environment, since most environmental microorganisms have not been cultivated (Azam, 1998).The composition and complexity of microbial assemblages can be identified and compared based on differences in nucleic acid composition and sequence.Analysis of gene transcripts (i.e.mRNAs), can be used to determine whether microorganisms are active, and how they are responding to the environment (Zehr and Hiorns, 1998).Here we review examples of the breadth of molecular techniques currently employed in characterizing marine microorganisms in the ocean.
There are a variety of molecular biological methods that address different types of questions (Fig. 1).Some methods are useful for characterizing overall community assemblage diversity, while others can be used to determine the abundance of microorganisms, or specific enzymatic activities (Fig. 1).The goal of this overview is to acquaint the nonspecialist with the breadth of molecular biology techniques, in order to provide the scope and vision for how molecular biological techniques may ultimately be ported to oceansensing technology, in situ.

Polymerase chain reaction techniques
The development of the polymerase chain reaction (PCR) (Mullis et al., 1986) made it possible to amplify specific genes of interest from very small DNA samples, which facilitated the study of cultivated microorganisms and mixed microbial assemblages.The polymerase chain reaction is an enzymatic method, based on DNA synthesis reactions that enable the geometric amplification of DNA targets using repeated steps of synthesizing DNA.Early applications of PCR targeted specific genes including the universally conserved small subunit ribosomal RNAs (rRNAs) (Hofle, 1988;Olsen et al., 1986;Pace et al., 1986) in mixed assemblages (Giovannoni et al., 1990) and conserved proteins such as nitrogenase (Zehr and McReynolds, 1989).Subsequently, modifications of the DNA amplification approach have been developed to detect gene expression (mRNA) by reverse-transcription PCR (RT-PCR), and quantitative methods for both DNA and RNA (quantitative PCR or quantitative RT-PCR).
PCR has provided a useful, sensitive and informative approach for studying natural microbial communities.However, there are limitations in application of PCR, including bias introduced by the nonlinear amplification (Polz and Cavanaugh, 1998) which is affected by base pair content (particularly guanosine and cytosine vs adenosine and thymidine content) (Suzuki et al., 1998), the efficiency with which PCR primers bind to their complementary sequence (targets) in the PCR reaction (Suzuki and Giovannoni, 1996), and errors caused by the DNA polymerase enzyme (Barnes, 1992).Additional errors can be introduced due to differential nucleic acid extraction efficiency (Polz et al., 1999) and during DNA sequencing (Acinas et al., 2004).
Typically, PCR is used to amplify rRNA or specific "functional" genes using gene-specific primers, cloning the amplified material into a bacterial host Escherichia coli using a plasmid vector, and then growing large numbers of host E. coli colonies (called a library) from which individual plasmids can be extracted.The individual DNA molecules can then be sequenced from individual recombinant plasmids (also called the Sanger sequencing).This technique replaced the direct cloning and sequencing approach (Schmidt et al., 1991).The PCR is dependent on designing PCR primers, which are short single-stranded DNA sequences or oligonucleotides that hybridize to complementary strands of DNA and "prime" the DNA synthesis reaction.PCR primers are designed based upon comparison of available gene sequences from representative organisms, and are usually 16-25 base pair oligonucleotides, which are similar or the same among representative sequences.Since codons for amino acids (three base pair nucleotide sequences) may vary in the third nucleotide base composition yet code for the same amino acid (known as "degeneracy" or "redundancy"), PCR primers can be designed using mixtures of oligonucleotides or bases that hydrogen bond to multiple bases, to create "universal" primers for conserved amino acid sequences that will amplify the same genes from many different microorganisms.This approach has been used for developing PCR primers for "functional" genes, or genes that encode proteins or enzymes that perform an ecologically important function.
PCR has been used to characterize the composition of marine microbial communities, often by targeting rRNA genes.Cloned PCR-amplified 16S rRNA genes have been used to detect the presence of uncultivated Bacteria (Giovannoni et al., 1990) and Archaea (Fuhrman et al., 1992;DeLong, 1992).Clone libraries have revealed that prokaryotic assemblages in seawater are dominated by only a handful of different major groups, however there is a long list of rarer taxa present within assemblages.Amongst dominant taxa, the SAR-11 clade of α-Proteobacteria (Giovannoni, 2004;Giovannoni et al., 1990;Giovannoni and Rappe, 2000), and SAR-202 clade of γ -Proteobacteria (Morris et al., 2004) comprise the dominant fraction of bacterial rRNA clones in surface and deep waters, respectively, which along with the dominance of Archaea in deeper waters (Fuhrman et al., 1993), have been confirmed by hybridization techniques (see next paragraph) (Morris et al., 2002(Morris et al., , 2004;;Karner et al., 2001).Estimating the richness and diversity of assemblages based upon clone library techniques is problematic since the long tail of the species distribution curve dictates use of nonparametric population sampling techniques, most commonly the ChaoI indicator (Chao, 1987).Because of the historically prohibitive cost of sequencing, creation of sequence libraries large enough to ensure appropriate coverage of microbial assemblages was not generally possible.
The PCR, because of its power in amplifying small amounts of DNA into quantities that can be processed by a variety of techniques, has become a method that underlies many other methods, from RT-PCR to assay mRNA to analyzing thousands of genes on DNA microarrays.It is integrated in many of the techniques described below, particularly those used for diversity analyses.

Enumeration of specific microbial targets
Enumeration of individual microbial populations within assemblages has come about through two methods, fluores-cence in situ hybridization (FISH) and more recently through quantitative PCR.FISH was developed in the early 1990s as a technique for microscopically identifying different groups of microorganisms within assemblages (Amann et al., 1990(Amann et al., , 1995)).This technique involves permeabilizing prokaryotic cells and hybridizing a fluorescently labeled DNA probe to rRNA or mRNA within the bacterial cells (Daims et al., 2005).The cells are then visualized and can be enumerated with an epifluorescence microscope.There are some biases involved in the FISH approach, notably that not all cells are equally permeable to the hybridization reagents, and that counts of less abundant components of microbial assemblages may not be of high statistical rigor since the absolute number of cells counted is low.The FISH technique has been further expanded to include steps allowing observation of the uptake of specific radioisotopically-labeled compounds (STAR-or MICRO-FISH) (Ouverney andFuhrman, 1999, 2000;Lee et al., 1999).
Fluorescent substrates and labeled probes have been used to develop methods where the PCR reaction can be monitored in real time, and the PCR product accumulation measured directly by fluorescence (Smith, 2005).In marine applications, quantitative PCR can be used to estimate the abundance of genes per unit volume of seawater.Quantitative PCR (qPCR), unlike sequence libraries and other methods of community analysis (see fingerprinting methods below), is gene-specific and often taxon specific and quantifies the absolute abundance of an organism (specifically its genes).One method used in qPCR reactions uses staining of the double-stranded DNA product with the DNA major groove-binding dye SYBR Green I.The primary drawback of this method is that spurious products and primer dimers can be formed, and lead to overestimates of abundance of the target.The kinetics of "melt curves", where the amplified products at the end of the qPCR reaction are dissociated slowly and fluorescence changes measured at each degree of dissociation, can be used to distinguish spurious products from the target.More recently, two methods have been published, known as 5'-nuclease (Heid et al., 1996) and hairpin probe hybridization (Tyagi and Kramer, 1996), where the detection of product is based upon hybridization of a probe labeled with fluorophore and quencher to amplification products during each cycle.The nuclease activity of the DNA polymerase degrades the probe and liberates the fluorescence dye from the quencher.The advantages to these chemistries are the very high specificity of the product, since the total matching DNA sequence must be ca.70-100 bp and match with both probes and primers.In the marine setting, qPCR has been used to estimate the abundance of different phylogenetic groups of organisms based upon rRNA amplification, and different groups of organisms with the ability to conduct particular functions -i.e.amplification of functional genes.One of the first applications of qPCR for detection of phylogenetic groups of marine microorganisms was that of Suzuki et al. (2001) in Monterey Bay, which provided estimates of the major components of the bacterioplankton community in a coastal environment, without DNA sequencing.In addition, qPCR has been used to estimate the abundance of pathogenic viruses (notably enteric viruses) in seawater samples (Fuhrman et al., 2005;Choi and Jiang, 2005), and has provided a method of detection several thousandfold more sensitive than cultivation-based techniques.Quantitative PCR has also been used for functional genes, including different clades of nitrogen-fixing bacteria targeting dinitrogenase reductase (nifH) (Short et al., 2004), ribulose bisphosphate carboxylase-oxidase (RuBisCO) in phytoplankton (John et al., 2007) and the photosynthetic reaction center of photoheterotrophs, pufM (Schwalbach et al., 2005).These studies have provided the important estimates of functional group abundance in natural samples, since enumeration of functional groups of organisms was previously only possible by using microscopy or cultivation-based techniques which both have biases.

Reverse-transcriptase-polymerase chain reaction (RT-PCR)
While DNA-based methods elucidate the genomic potential to carry out particular functions within microbial assemblages, they do not necessarily indicate the active process.Presence of messenger RNA (mRNA) is evidence that genes of interest have been recently transcribed, serving as an essential step towards protein synthesis and an active cellular process.Reverse transcriptase (RT-) PCR involves the conversion of template mRNA into copy DNA (cDNA), which is then used as template material for the PCR, has allowed examination of active processes in environmental samples (Nogales, 2005).RT-PCR, and quantitative RT-PCR (qRT-PCR) have been used to study the presence and expression levels (i.e.abundances of genes and transcripts) of active microorganisms and those conducting specific functions.While qRT-PCR allows enumeration of gene copies in environmental samples, it does not allow enumeration of cells since the number of RNA transcripts per gene and per cell is variable.
RT-PCR and qRT-PCR have been used to study the expression of functional genes, including Rubisco (Wawrik et al., 2002(Wawrik et al., , 2007) ) and nitrogenase."Functional" genes is often used to describe genes whose enzymes catalyze a biogeochemical reaction such as carbon fixation, ammonia oxidation or nitrogen fixation, for example.The expression of these genes (i.e. the reading of the gene sequence into a messenger RNA molecule that is used to translate proteins) indicates that the organisms are potentially active in the process.Microorganisms often have genes in their genomes that are not used, and thus, using RT-PCR to determine that the genes are expressed into mRNA can provide information on which organisms are actually involved in observed biogeochemical transformations.The study of nitrogenase (nifH) gene expression provides a good example of the kind of information that RT-PCR can provide.The first studies of nifH gene expression in marine bacterioplankton used clone libraries of reverse-transcribed and PCR amplified nifH genes to identify active N 2 -fixing microorganisms (Zehr et al., 2001;Falcón et al., 2002;Church et al., 2005;Hewson et al., 2007b).In Chesapeake Bay bacterioplankton, it was found that very few of the nifH gene types were expressed (Jenkins et al., 2004;Short and Zehr, 2007).Study of the variability in gene expression of different clades of nitrogenfixing micro-organisms over diel cycles has revealed gene expression patterns that are consistent with measurements of nitrogen fixation in the cyanobacterium Trichodesmium, and light-or dark-driven patterns of gene expression for other cultivated and uncultivated groups of N 2 -fixing microorganisms (Church et al., 2005;Zehr et al., 2007).
This type of approach can be applied in a number of ways, for example in detecting nitrogen limitation by ntcA expression (Lindell et al., 1998), phosphorus stress by phosphorus transporter (pstS) gene expression (Dyhrman and Haley, 2006).Such targets may be particularly useful in developing probes for interrogating natural populations on remote instrumentation (see below).

"Fingerprinting" methods for diversity studies
Whole community fingerprinting techniques have been used to examine the richness, diversity, and relative assemblage composition of entire assemblages in the marine environment.Similar to sequence library-based approaches, fingerprinting tools provide information on the presence and relative composition of assemblages, where rare components of assemblages (in some cases representing <1% of total template material, Hewson and Fuhrman, 2004b) are detected.While sequence library-based approaches may require thousands of clones to be sequenced to obtain similar resolution, molecular fingerprinting techniques allow rapid and cost-effective estimation of assemblage composition based upon the presence of universal or functional genes.The primary techniques utilized as fingerprinting techniques include terminal restriction fragment length polymorphism (TRFLP) (Avaniss-Aghajani et al., 1994), denaturing gradient gel electrophoresis (DGGE) (Muyzer et al., 1993), automated rRNA intergenic spacer analysis (ARISA) (Fisher and Triplett, 1999), 16S rRNA length heterogeneity PCR (LH-PCR) (Suzuki et al., 1998) and single stranded DNA conformational polymorphism (SSCP) (Lee et al., 1996).Direct sequencing of DGGE fragments can allow identification of different fragments, while fragments in TRFLP, ARISA, and LH-PCR, and perhaps SSCP may be used to identify taxa within fingerprints (Fig. 2).
The TRFLP approach utilizes sequence heterogeneity to separate different taxa within assemblage fingerprints.A conserved gene (which may be rRNA or mRNA) is amplified using PCR with one or two primers with a fluorescent tag.The amplicons are digested with one or more restriction enzymes (which recognize a particular nucleotide sequence, hence restriction site varies with the sequence of the amplicon) and electrophoretic separation.This technique has been used in study of entire marine bacterial assemblages using 16S rRNA as a target, (Moeseneder et al., 2001;Morris et al., 2005), as well as functional groups such as denitrifying bacteria (Braker et al., 2001), methanogens (Leuders and Friedrich, 2003) and nitrogen-fixers (Hewson and Fuhrman, 2006a).The advantages of this technique include the ability to predict restriction site, and hence restriction fragment length, from public databases and in silico restriction digests.The disadvantages include a very restricted number of restriction digest locations in assemblages and hence limited phylogenetic resolution of assemblages.
DGGE relies upon the electrophoretic separation of PCR amplicons based upon their denaturing characteristics, which is determined by their DNA sequence (and more specifically G+C content).In DGGE, PCR is used to amplify targets within mixed assemblages of rRNA or mRNA, where one PCR primer contains a ∼50 bp region of G+C, which does not denature under the most denaturing of conditions.The PCR products are then separated on a denaturant gradient gel (typically acrylamide).The amplicons reach the position at which the end of the amplicon is completely denatured (but the G+C region on the primer remains double-stranded) and thus forms a band on the gel.The position at which the amplicons form bands is determined by the sequence of the amplicon (and hence denaturing peculiarity of the sequence).The major advantage of this technique is that the bands on the gel may be excised directly and sequenced using the original PCR primer (i.e. the DNA in the band does not need to be cloned).The disadvantage is that it is very difficult to standardize denaturing conditions, and hence fingerprints in each lane may not be comparable between gels.
ARISA relies upon the length difference in the intergenic region between rRNAs (16S rRNA to 23s RNA) to separate different components within assemblage fingerprints.Like TRFLP, PCR is performed using primers flanking the ITS region, where one primer is fluorescently labeled.The amplified ITS regions are then separated electrophoretically and each ITS length corresponds to a taxonomic unit (analogous but not equal to a taxon).The disadvantage of the technique is that very few genomes or ITS regions have been sequenced, hence it is difficult to assign a particular ITS length to a taxon in fingerprints.However, creation of ITS clone libraries from the same samples, where identification is based on 16S rRNA sequence flanking the ITS, has allowed putative identification of ARISA peaks (Brown et al., 2005).The disadvantage of ARISA is that it can only be applied to rRNAs at present (not functional genes) and that the database of ITS length identities is minimal at present.ARISA has been used to study bacterial assemblage biogeography in sediments (Hewson et al., 2003(Hewson et al., , 2007a;;Luna et al., 2002)  2007) and the water column (Hewson andFuhrman, 2004a, 2006c;Danovaro and Pusceddu, 2007), where comparisons between assemblage fingerprints has yielded information on the spatial heterogeneity of open ocean bacterioplankton assemblages in surface (Hewson et al., 2006a) and meso-to bathypelagic (Hewson et al., 2006b) assemblages.ARISA has also been used to gauge the impacts of viral pressure upon bacterioplankton assemblages (Hewson et al., 2006c;Hewson and Fuhrman, 2006c;Schwalbach et al., 2004).LH-PCR examines the length heterogeneity in 16S rRNA PCR amplicons.Because the 16S rRNA is slightly variable in length over hypervariable regions (ca.50 bp), the presence of different taxa can be examined in fingerprints of rRNA length.This approach was applied in surface waters and observed similar distribution of phylogenetic groups as 16S rRNA sequence libraries (Suzuki et al., 1998).
SSCP relies upon the sequence-driven conformational properties of single stranded 16S rRNA PCR amplicons.These properties result in different migration patterns by gel electrophoresis, and hence organisms with similar sequence comigrate in gels.This approach has been used most recently in the study of sea surface microlayer (Agogue et al., 2005) and Mediterranean plankton (Ghiglione et al., 2005).

Microarrays
During the past decade microarray approaches have become increasingly common in microbial ecology.In general, microarrays are arrays of spotted oligonucleotide probes, often on glass slides that can be hybridized to DNA or RNA to identify or characterize genes through hybridization.Microarrays can utilize a variety of platforms and assay chemistries, including arrays spotted on glass slides or membranes, electronic or fiberoptic arrays, and in situ synthesized arrays (such as ones produced by Affymetrix and Nimble-Gen).All microarrays consist of a collection of probes that are designed to match specific microbial target genes.When a sample from a target organism or environment is hybridized with the array, matching nucleotide fragments bind and can be detected by a fluorescent signal.A large variety of probe chemistries and target preparation methods are used affecting the method specificity, sensitivity, and potential for accurate target quantification (Zhou, 2003).Microarrays are divided into different categories based on their design and purpose of use, including (1) whole genome arrays, (2) community genome arrays, (3) phylogenetic arrays and (4) functional gene arrays.
Whole genome information from marine microbes is currently accumulating as a result of ongoing large sequencing projects.These data will allow the design of new microarrays that can be used in full genome expression studies with both laboratory isolates and environment samples.Whole genome microarrays or microarrays constructed with shotgun clones (see below) from microbial isolates or environmental samples can be hybridized with RNA from microbial isolates or environmental transcriptomes (Parro et al., 2007).Linear amplification of RNA is often necessary to reach sufficient detection limit for the mRNA targets.Whole genome microarrays have thus far been applied for marine cyanobacteria Synechococcus spp.(Palenik et al., 2003(Palenik et al., , 2006;;Su et al., 2006) and recently a DMSP degrading Proteobacterium Silicibacter pomeroy from the Roseobacter clade (Burgmann et al., 2007).Genome data and analyses of several marine microbes have been published, including the uncultivated marine Archaeon Cenarchaeum symbiosum (Hallam et al., 2006), the anoxygenic anaerobic phototrophic Proteobacterium Roseobacter denitrificans (Swingley et al., 2007), the cyanobacterial diazotroph Crocosphaera watsonii (Zehr et al., 2007), and a marine planctomycete Pirellula sp.(Glockner et al., 2003).Many other genome projects near their completion, such as two marine Planctomycetes (Woebken et al., 2007).Additional full genome information has been completed for marine viruses and marine microbial plasmids (Paul et al., 2005(Paul et al., , 2007)).Genome analyses are highly valuable for microarray development in terms of identifying genome regions of interest and development of optimal microarray probes.Information on genome structure may allow development of partial genome microarrays (Studholme and Dixon, 2004) with reduced complexity and cost compared to whole genome microarrays.
In addition to the whole genome sequences, partial genome data and analyses are accumulating at an increasing pace from metagenomic and transcriptomic fragments recovered from uncultivated marine micro-organisms (e.g.Frigaard et al., 2006;Suzuki and Béjà, 2007;Vergin et al., 1998).These data provide an enormous resource for development of targeted microarray approaches for detection of expression and distributions of uncultivated micro-organisms in different geographic regions under a range of environmental conditions.Thus far array approaches have been applied to only few oceanic metagenomic datasets.North Pacific Ocean fosmid libraries were spotted on membranes and hybridized with proteorhodopsin targets to screen for presence of proteorhodopsin genes (Frigaard et al., 2006).A similar approach could be useful in other environments if specific gene fragments are sought and sequencing of the entire metagenome or transcriptome is not feasible.
Microarrays have also been applied to microbial diversity studies.The small subunit ribosomal RNA (SSU rRNA) gene sequence has been used to develop phylogenetic microarrays with the goal of detecting microbes representing different phylogenetic groups in the environment.Phylogenetic microarrays have been developed for terrestrial environments (Small et al., 2001;DeSantis et al., 2007), all known sulphate reducers (Loy et al., 2002), and aquatic cyanobacteria (Castiglioni et al., 2004).An electronic microarray (NanoChip) using the SSU rRNA gene sequence for detection of bacteria associated with coastal harmful algal blooms was developed (Barlaan et al., 2007).
Functional gene microarrays may include one or more genes from a variety of micro-organisms, with the goal of targeting diversity or expression of a microbial functional group.These microarrays may include genes encoding key metabolic processes or genes involved in important biogeochemical processes (Zhou, 2003).Functional genes can be used as an approach to address taxonomic composition of microbial communities.Several functional gene microarrays have been designed, but relatively few thus far have been applied to aquatic environments, and even fewer have been used for samples from marine pelagic environments.Examples of functional gene microarrays include one developed to target genes involved in biodegradation of environmental contaminants (Rhee et al., 2004) and another one targeting all known methanotrophs (Bodrossy et al., 2003).Microarrays have been developed to target genes involved in nitrogen cycling (Wu et al., 2001;Taroncher-Oldenburg et al., 2003), primarily in microorganisms that carry out nitrification or denitrification (ammonia oxidation, nitrate reduction).A microarray was also developed for characterization of diazotroph communities in marine environments, based on the nifH gene diversity, and the array was applied in studies of communities of nitrogen-fixing micro-organisms in marine microbial mat, estuarine (Moisander et al., 2006(Moisander et al., , 2007)), open ocean, and coral reef environments (Hewson et al., 2007a, b).Another functional gene microarray targeting marine sediment ammonia-oxidizers has also been developed (Ward et al., 2007a, b).Recently, a functional gene microarray consisting of 24 243 probes covering 10 000 genes involved in cycling of nitrogen, carbon, sulfur, phosphorus among others (He et al., 2007), has been applied in a number of environments, but primarily in soils.
Very recently microarrays have been used to examine the similarity of large fragments of bacterial phylotypes between different locations in the ocean (Rich et al., 2008).The targets on the microarrays were developed around fragments of DNA sequenced as part of a metagenomic survey (see next section) that were closely related to the marine cyanobacterium Prochlorococcus.Hybridization with pure culture DNA of several strains of Prochlorococcus, and DNA from coastal seawater that was amended with Prochlorococcus culture correctly identified the presence of Prochlorococcus genes.While field results examining the presence of Prochlorococcus in environmental samples has not yet been published, this metagenome array approach holds significant promise in examining gene presence (and expression) of uncultivated microorganisms in marine systems.
The application of microarrays in microbial ecology thus far has been technology and method development driven.The future decade may well witness a surge in marine microbial ecological applications that utilize this technology advancing the knowledge on ecology and physiology of marine micro-organisms.

Shotgun cloning, large insert libraries metagenomics and metatranscriptomics
Concern over PCR biases in studies of marine microbial diversity was a strong impetus for development of PCRindependent techniques to study marine microbial diversity.Without amplification of a specific gene of interest, genomic DNA from mixed microbial assemblages needed to be sheared into small fragments, ligated into a vector and then screened for genes of interest.The earliest attempt at this technique by Schmidt and colleagues (Schmidt et al., 1991) used lambda libraries which were screened for 16S rRNA by hybridization prior to sequencing.Random sequencing of viral DNA was necessary since there are no conserved genes across all viruses, a consequence of the high mutation rate in viral genomes relative to their hosts.As a consequence, a method for randomly sequencing viral DNA was sought.Since large amounts of viral DNA are required for direct sequencing, it was first necessary to amplify viral genome material.This initially took the form of linker-amplified shotgun libraries (LASLs) where PCR priming sites were ligated onto the ends of randomly sheared viral DNA, which was then amplified using PCR and sequenced using Sanger Sequencing.The sequences were then assembled into contiguous fragments in silico.This approach has been used to understand viral composition and diversity in pelagic and benthic compartments of coastal marine waters (Breitbart et al., 2002(Breitbart et al., , 2004)).Most recently, direct sequencing of viral DNA has been possible after circularizing and amplifying viral genome fragments using phi29 polymerase, an isothermal and linear amplification employed in genome studies of prokaryotes (Angly et al., 2006).
Because the likelihood of cloning intact genes of interest on small, randomly sheared fragments of microbial DNA, and a desire to obtain further genetic information on the same cloned fragments as universally conserved genes, techniques which aimed to clone much larger fragments of DNA were sought.The application of bacterial artificial chromosomes (BACs) to microbial DNA from the marine environment was first attempted by Stein et al. (1996), where ca.40 kbp of genetic information was cloned into host E. coli using phage transfection.BACs, and later Fosmids (which vary by the method of sheared DNA separation during transfection) have been used to study the genomes of uncultivated microorganisms in marine environments (Béjà et al., 2000;DeLong et al., 2006;Stein et al., 1996).Because the ends of the BACs or Fosmids can be sequenced, this technique may also be used as a method of random sequencing of microbial community DNA, where the sequences obtained may be aligned and overlapping contiguous fragments of DNA sequence assembled into contigs or scaffolds.The application of this type of approach to assemblages of microorganisms is referred to as metagenomics.This has been used most recently by DeLong and colleagues (DeLong et al., 2006) to study the phylogenetic composition and similarity of assemblages at station ALOHA, north of Oahu, Hawaii, over a yearly cycle.
The reduction in cost of Sanger sequencing and the operation of numerous sequencers in series, initially applied to the Human Genome Sequencing project, provided for the first time the ability to sequence extremely high numbers (>10 6 individual sequences) of random, sheared, short fragments of microbial DNA from environmental samples.This was first applied simultaneously to microbial DNA from an acid mine drainage site (Tyson et al., 2004) and to bacterioplankton DNA collected at several stations in the Sargasso Sea (Venter et al., 2004).These studies, and later metagenomic studies during the global ocean survey (GOS) project (Rusch et al., 2007) have elucidated high variability in bacterioplankton assemblage structure, in line with observations made using fingerprinting approaches (Hewson and Fuhrman, 2006b), as well as a plethora of new genes and protein families (Yooseph et al., 2007).
Because metagenomics (see Rademaker et al., 2005) elucidates only potential functions, and not necessarily the active utilization of genes in microbial assemblages, there is a need for examination of community transcripts, which has taken the form of random sequencing of mRNAs from assemblages.The first marine metatranscriptomic study utilized random amplification via RT-PCR of community RNA extracts which had been treated to remove rRNAs (which comprise >90% of all RNA in an assemblage) to examine diel and tidal influence on estuarine bacterioplankton assemblage expression patterns (Poretsky et al., 2005).The major disadvantage of metatranscriptomic approaches are that sequencing efforts are frequently dominated by rRNAs, since removal via capture hybridization (physical separation) or terminator exonucleic (a way of selectively digesting rRNAs which have different chemical properties than mRNAs) approaches typically remove only a small fraction of rRNAs.A new type of sequencing which is based upon the fluorescent detection of sequential nucleotide incorporation into amplified DNA was initially developed for use as an alternative to Sanger sequencing (Ronaghi et al., 1996), and later was applied to DNA that is separated onto solid (bead) surfaces in picotiter plates (Margulies et al., 2005).Instead of amplification and the incorporation of terminator dideoxynucleoside triphosphates as in Sanger sequencing, each deoxynucleosidetriphosphate (dNTP) is added in sequence into reactions containing deoxynucleosidetriphosphatase which removes unincorporated dNTPs.The incorporation of the dNTP complementary to the sequenced strand releases pyrophosphate which interacts with adenosine triphosphate (ATP) to produce a flash of light.This is detected and, thus, sequences are determined by the detection of incorporation in relation to the dNTP that is added to the reactions.Unlike Sanger sequencing, which produces sequences ca.600-1000 bp in length, the length of sequences is limited to <500 bp.However, because sequencing occurs on a massively parallel scale in picotiter plates, a typical pyrosequencing run generates 100 000-300 000 individual sequences.Several other technologies have also been developed for high throughput sequencing.
To date pyrosequencing has been applied in marine environments to estimate the diversity and identification of microbial assemblages.An early application (Sogin et al., 2006) sequenced a highly variable region within the ribosomal RNA of hydrothermal vent communities and found diversity of assemblages was underestimated in previous studies by 2-3 orders of magnitude.The first shotgun pyrosequencing approach examined virus assemblages in several oceanic regions (Angly et al., 2006) and found a large proportion of viral assemblage genetic composition was not identifiable by comparison to existing sequence data, but that virioplankton assemblages were comprised of very similar species despite large geographic separation.Finally, pyrosequencing technology has recently been applied to coral surface microbial communities (Wegley et al., 2007), where predominately heterotrophs, and sequence data from fungi to viruses were recovered.Pyrosequencing technology holds promise to allow a much larger marine microbial sequence database to be constructed, allowing microbial ecologists to address biogeographical and ecological questions that were not possible with limited numbers of sequences.

Summary: applications to sensor technology
It is now possible to develop in situ molecular biological sensors for microorganisms in the environment (Paul et al., 2007).Ultimately, many molecular biological approaches may be applied in some type of remote sensing application.For example, microarray approaches have been proposed for detection and monitoring of pathogens such as Vibrio cholerae in coastal environments (Stine et al., 2003).The status of biological sensors is discussed in another chapter (Scholin et al., 2009).Probably the first technologies to be applied will be based on hybridization and the polymerase chain reaction.These techniques facilitate detection and quantification of species, with the potential for detecting organisms with key biochemical functions by their genes or gene expression.
In situ biological sensors require extensive technology development, and development of molecular applications that can be implemented in situ.A relatively small number of probes can be implemented on current technologies, since size and power consumption limit the capacity of remote instrumentation.Probe arrays for hybridization to a suite of probes for microorganisms can be used to detect specific microorganisms or monitor microbial diversity.Gene expression assays using linear amplification (NASBA), hybridization technologies, or quantitative PCR are likely to be successfully implemented in the near future.
A primary question is how such instruments can best provide information useful for sensing ocean ecosystems.Probes for harmful algal bloom species are an example of an application with clear applied relevance to ocean ecology (Casper et al., 2004).Given the limited number of probe targets that can be implemented with current technology, discussions of useful targets for monitoring ocean ecosystem function are needed.
Microbial components of oceanic systems are the dominant species and ecotypes, and the species involved in biogeochemical cycles.These key components may be useful as targets for monitoring for changes in ocean communities, including regime shifts and dynamics in relation to physical-chemical forcing.Species that may be useful targets are ecotypes of the major oceanic primary producers Prochlorococcus and Synechococcus, or major lineages of eukaryotic phytoplankton, including the picoeukaryotes.Key ecosystem biogeochemical components are carbon fixation and decomposition, nitrogen fixation and nitrogen assimilation, ammonia oxidation and denitrification (Fig. 3).Targets for these processes could be designed to detect presence (genes) or activity (gene expression as a proxy), by developing probes for specific species, or generic group-level probes.Information on time-series or spatial distribution of these genes could inform decisions on selection of ecosystem probes for implementation on remote instrumentation.Much work needs to be done to select targets that provide useful information on ecosystem structure or function, determine how the spatial and temporal distribution of genes and gene expression relate to ecosystem dynamics, and to design appropriate methods for detection and quantification (Zehr, 2009).Once targets have been identified, and the abundance and dynamics known, appropriate molecular assays can be selected and developed for ocean sensing technology.
Edited by: G. Griffiths

Fig. 1 .
Fig. 1.Ocean scientific questions addressed using molecular biological techniques.Dark grey bars indicate the primary use of the molecular technique, while light grey bars indicate potential applications.The methods are discussed in text (see corresponding numbered sections).

Fig. 2 .
Fig. 2. Comparison of assemblage fingerprints and identification of phylotypes based upon sequence data.