Accessing diverse data comprehensively – CODM, the COSYNA data portal

. The coastal observation system COSYNA aims to describe the physical and biogeochemical state of a regional coastal system. The COSYNA data management is the link between observations, model results and data usage. The challenge for the COSYNA data management CODM 1 is the integration of diverse data sources in terms of parameters, dimensionality and observation methods to gain a comprehensive view of the observations. This is achieved by describing the data using metadata in a generic way and by making all gathered data available for different analyses and visualisations in an interrelated way, independent of data dimensionality. Different parameter names for the same observed property are mapped to the corresponding CF 2 standard name (Eaton et al., 2010) leading to standardised and comparable metadata. These metadata together with standardised web services are the base for the data portal. The URLs of these web services are also stored within the metadata as direct data access URLs, e.g. a map such as a GetMap request.


Introduction
In the last years, various portals for ocean integrated observing systems have been created, such as the Australian Ocean Data Network Portal with IMOS 3 (Trull et al., 2010), the G. Breitbach et. al.: Accessing diverse data comprehensively and may then access the data related to that platform.They are optimal for a limited region and distinct parameter set.
Most portals of the big national observation systems such as IOOS are linked to other portals of regional systems or portals of integrated systems for a single type of observation (e.g.high frequency radar10 ).CODM offers an integrated portal for all COSYNA observations and additionally for operational model results.Thus, the portal allows an integrated access to highly diverse data with a focus on the observed property.
In addition to using CODM, the COSYNA data can be accessed using specialised data portals for stationary time series 11 , data from surveys 12 , data from FerryBoxes on ships going on fixed routes 13 and remote sensing data 14 .These specialised portals have advantages when accessing data from single-platform types.
The following section of this paper describes the goal and the objectives of CODM, i.e. what should be done, whereas the subsequent Sects.3-7 describe how this is done.Section 3 present the general outline of data management in COSYNA to meet the objectives with a focus on the essential elements, including various web services.The implementation of these web services together with metadata are a new feature of CODM, allowing a flexible user-adapted visualisation and retrieval of all searched data.In Sect.3.7, the integration of web services and the concept of CODM is described.Short sections on data quality from the viewpoint of data management in Sect. 4 and data policy in Sect. 5 follow.In Sect.6, some illustrative examples of the different ways data visualisations are implemented in the CODM portal are presented.Finally, in Sect.7, very useful visualisation tools are described.

Goal of CODM
The objective of CODM is to gather data which are often heterogeneous in origin, from COSYNA, in an integrative way.To achieve this objective, all related data must be homogenised with regard to data structure and have to be combined in plots and maps for visualisation.One solution would be the application of an ontology such as the one proposed by the Semantic Sensor Network Group (Lefort et al., 2011).For CODM, the mapping to standardised observed property names, the CF standard names (Eaton et al., 2010), is a fast and less complicated solution.Within the metadata, these standard names are mapped to the internal parameter names used by the scientists who set up the sensors and the data acquisition.
Based on metadata alone, the user of the COSYNA data portal should be able to select an observed property and the spatiotemporal extent of interest.Access to data should occur only if it is really necessary, e.g. when a visualisation or download is requested.Data and metadata access is performed solely via standardised web services.
The COSYNA data policy is free and open, according to the idea that all data should be open to everybody without any restrictions or collection of personal data.The understanding of user requirements and the optimisation of the system accordingly needs the monitoring of user access to COSYNA, which is a conflicting objective to the open data approach.

Observations
Observations in COSYNA result from different types of measurement devices leading to different types of data.Fixed positions include buoys with CTDs (device measuring conductivity, temperature, pressure and more) and ADCPs (acoustic Doppler current profilers) at different fixed depths, -Waverider buoys, stationary FerryBoxes and15 underwater nodes with CTDs and ADCPs.

Remote sensing platforms include
satellites (MODIS on Aqua, MERIS on ENVISAT) and land-based, high-frequency (HF) radar.
All observations will be described in other articles contained in the Ocean Science and Biogeosciences special issue entitled "COSYNA -Coastal Observing System for Northern and Arctic Seas" (Baschek et al., 2016).A schematic overview of these observations is shown in Fig. 1.

Databases
As indicated in Fig. 2, all in situ observations are stored in relational databases (Oracle).For every type of in situ observation, a database with a data model suited to the observation type is used.Data from FerryBoxes operating on steady routes are stored in the database which can be accessed directly via http://ferrydata.hzg.de.Data from stations of fixed locations are accessible under http://tsdata.hzg.de.Survey data from ships not using steady routes can be accessed under http://surveydata.hzg.de.

Models
One goal of COSYNA is to integrate observations and numerical models to get a synoptic view of the state of the coastal areas.This integration is done by assimilating realtime observations into a model reanalysis.Another objective of CODM is to enable an online validation of these models using observation data which are not used for assimilation.

Circulation model
Based on the GETM 16 model (Stips et al., 2004) data from HF radar observations are assimilated into a reanalysis of the currents in the North Sea (Stanev et al., 2011).In addition, temperature data from OSTIA (Donlon et al., 2012) and Fer-ryBoxes are assimilated into GETM.

Wave model
Driven by data from DWD 17 , a prognostic wave model is run, which provides wave parameters for every hour up to a 36 h forecast (Behrens, 2009).In principle, the model output could be used for assimilation of observations, too.In prac- 16 General Estuarine Transport Model 17 Deutscher Wetterdienst -German Weather Service tice, it is difficult to determine high-quality wave parameters from HF radar observations.On the other hand, the wave model data are consistent with observation to a degree that renders it unnecessary to improve the model via data assimilation (compare Fig. 9).

Collecting Observations
All COSYNA observations are collected in near-real time.
The principle of the data flow is shown in Fig. 2. Data are stored as either netCDF files (Rew and Davis, 1990) in the COSYNA filesystem (remote sensing platforms) or as time series in relational databases (all other sources).Metadata are also stored in a relational database.The netCDF output of model calculations is treated just like the netCDF files from remote sensing observations.

Metadata
The underlying concept for CODM was to build a data portal with a spatiotemporal search processed solely within the metadata.Real data are not accessed before visualisation or download occur.Hence, creation of metadata is crucial for the underlying concept.The automation of data handling, which ranges from data search to data display and data retrieval, relies on the stored metadata.It is necessary to use a harmonised vocabulary for the names of the observed properties, and this is realised by using CF standard names (Eaton et al., 2010) which are mapped to the originally used notations.This is done internally within the metadata structure and ensures a common usage of parameters supplied by the primary data-providing devices.The structure and content of the metadata are critical as they allow joined searches and retrievals of diverse data types.The COSYNA data portal, which functions as a system of services, is described in Sect.3.7.The COSYNA data portal application retrieves data and communicates internally by using metadata, not only for the measured data but also for the sensors used for collecting data.This necessitates that two types of metadata are used within CODM: 1. metadata for describing devices, such as observation platforms with sensors generating environmental data, including numerical models called platform metadata and 2. metadata for describing observations in the coastal system but also including model runs, called data metadata.

Platform metadata
Contain the name of the platform including the data provider.Examples for platforms are buoys, FerryBoxes, Wadden sea poles or HF radar.A platform may consist of several sensors measuring one or many different observed properties.Within platform metadata, the following properties are described: sensors with sensor methods; observed properties measured by the sensors; location or, in case of moving platforms, bounding box for the platform; start time and end time of the platform; existing international platform codes and links to external extended metadata.
For numerical models, two groups of parameters are defined and described within platform metadata: input parameters used to force the model as well as output parameters produced by the model.
The names of the internal parameters used in COSYNA are not harmonised but remain as they were chosen and used originally.A mapping of the original names to a standardised vocabulary is needed to ensure a common presentation and analysis of data from different platforms.The concept of parameter mapping is realised by introducing an additional, virtual sensor called selectedparameters as part of the platform metadata.
The additional virtual sensor is not existing as a real sensor but carries CF standard parameter names which represent the measuring parameters.The observed property names of parameters measured belonging to this sensor are always CF standard names.The internal sensor name and the name of the internal parameter are specified in the parameter description.
In the present context, this additional sensor just uses the metadata structure for a sensor to describe the mapping.With the help of this sensor, a user can track the real sensor behind the corresponding CF standard name.This structure thereby allows for an interrelated search for comparable parameters in the portal, as shown in Fig. 5.A search thus combines various sources of measurements and model output to create an integrated view of data originating from different sources.
Data metadata include the start and end time of a measurement, the start and end location, graphic previews for the observed properties, if available, the person responsible for the data or the metadata and the URLs of web services for visualising and downloading the corresponding data.In the case of platforms at fixed positions, the data are described as time series.This means that only one metadata record is needed for the whole time range covered by the platform.If the platform supplies data at multiple positions, a metadata record is created for every data set.Data sets may originate from transect measurements, as in the case of data from ships and gliders or a data set may be represented by a single netCDF file.These multiple metadata records are created automatically following the procedure outlined in Fig. 3.
The COSYNA metadata are stored in the NOKIS (Lehfeld and Reimers, 2009) metadata system which is INSPIRE18 (INSPIRE, 2007) and ISO19115 (ISO19115, 2003) compliant.The structure for platform metadata has been developed within NOKIS.A migration to SensorML metadata (Botts, 2014) is being considered.Such a migration would be rea- sonable if a SensorML profile is developed which is compatible with the NOKIS platform profile.The initiative to develop such a profile is drawn out of several EU projects, e.g.BRIDGES19 , FixO320 , Jerico/Jerico-Next21 , NeXOS22 and ODIP/ODIP II23 .

Web services
Web services are used to both visualise and to download data.The details of their usage are kept within the metadata for each measurement, allowing the CODM portal not only to link to a web service but to execute the user request and deliver the data or plot as described in more detail below.Data stored as netCDF files can be downloaded via OPeNDAP24 (Cornillon et al., 2009).If netCDF files correlate with area data they can be visualised as OGC WMS25 maps (WMS, 2004) with the help of ncWMS (Blower et al., 2013).In addition, the versatile tool ncWMS is able to create time series plots at selected positions within the represented area of the netCDF file.
The presentation of data from moving platforms, such as FerryBoxes, gliders or ships, needs additional effort.A WMS servlet was coded in Java and added as a web service which produces and presents colour-coded transect maps of the measurements made by moving devices.Parameter plots of time series at fixed platforms can be visualised by web services using an application with direct connection to the COSYNA time series database TSdata.A similar application is used to build parameter plots for transects.Downloads for all data stored in the Oracle database are provided through the software PySOS26 , an implementation of a sensor observation service (OGC SOS) (Na and Pries, 2007) which has been adapted for Oracle.The standard OGC SOS is part of the sensor web enablement framework (OGC SWE) (Botts and Reed, 2006) which improves the interoperability between sensors.CODM currently does not use OGC SOS at the sensor tier but rather one abstraction level higher at the database tier.
The web processing service (OGC WPS) (Schut, 2007) PyWPS 27 is used to create additional services.For example, a service which transforms SOS XML output to a human readable ASCII table is provided, as well as a service to plot wave energy against time and frequency for Waverider 27 http://pywps.wald.intevation.org/documentation/buoys.Excluding plots and WPS services, all services are provided by Tomcat web servers (Brittain and Darwin, 2008).
A web feature service (OGC WFS) (Vretanos, 2002) is provided using the open source software Geoserver (Geoserver Project, 2001) to provide access to metadata of platforms and real data as web features.This WFS is used as a data discovery web service.The advantage of this approach is the possibility to configure the service for a performant data discovery.

Integration of web services
The COSYNA metadata system is based on NOKIS (Lehfeld and Reimers, 2009).The metadata can be accessed using the catalog service for the web (OGC CSW) (Nebert, 2007).In addition, a catalog service based on a web feature service (OGC WFS) was created, which is optimised to be used as a data discovery service by a data portal.As this metadata service supplies the URLs to access the corresponding data as downloads, maps or other visualisations, the COSYNA data portal CODM is capable of offering all types of available web services to the users.The diagram in Fig. 4 gives an overview of the portal, its substructure, data access and possible user interactions.
A unique feature of the integration of web services in CODM is the storage of web service URLs in the metadata allowing the direct use of them.Some other approaches such as NOOS 28 are storing web service URLs as well but only as general GetCapabilities URLs.These general URLs provide information about the web service but cannot be used to access the data directly.CODM stores URLs which leads to immediate data access as map, plot or numerical download.These web service URLs have a static part and a dynamic part.The dynamic URL parameters are defined using an XML description.An example of a web map service is shown in Listing 1.
In general, any data portal may use these mechanisms of building web service URLs.This is depicted in Fig. 4. The user interface could be CODM or another portal.Any portal can access COSYNA data because all the web services are freely accessible.EMODnet physics (Novellino et al., 2014) uses the web services of HF radar stored in CODM to create the web service for visualisation of the currents in the German Bight 29 .
CODM provides various search options to its users.The entry point for each refined search is the selection of the desired observed property, a time and depth region and a geographic area of interest (Fig. 5).After clicking "Select all datasets", one of the buttons -"Create map", "Create plots" or "Downloads" -can be used.This means the requested data are provided with just three clicks.As an example, Fig. 6 shows the output of a click on the "Create map" button for chlorophyll measurements derived from MERIS and Ferry-Box data for the end of April 2009.
The comparison is done with a click on one of the buoys marked on the map.One result for the Waverider buoy near Heligoland is shown in Fig. 9. 28  Listing 1. Example XML listing for a WMS request showing the mechanism of web service integration.The element "baseurl" contains a GetMap request with some fixed request parameters and some variable request parameters, which are empty in "baseurl".These variable parameters can be filled in automatically using the name and the syntax in the element "dynamicParameters".

Data level and quality control
COSYNA data level definitions are based on the data level definition used for remote sensing data (Parkinson and King, 2006) but are expanded to include in situ data.They range from level 0 for raw data to level 4 for externally published data.The definitions for the different levels are indicated in Table 1.Data of level 3 and higher are available via CODM.
Level 3 denotes that the data have a defined unit and are georeferenced.A quality control flag is applied to the data of level 3. A subset of the SeaDataNet (SeaDataNet, 2010) qual-   values located in standard formats (e.g. common described grids) with er-ror/confidence level, for use in general scientific communityin general, por-tals accessible, correctable by version preferable in netCDF for map-like, RDBMS for time-series-like data metadata in standard system map projected on regular oriented grid; time in (fractional) days since 1970 (preferred); if applicable, on time inter-polated grid the intention is to attach errors for every measured value; the data quality flag is filled; link to descriptive metadata in standard system Level 4 (published) data published, kept in archives, prefer-able with DOIs, fixed data set, cross linked to eventual publication, correc-tion of data results in new data set, use by full scientific community as needed by receiving/publishing data cen-tre (WDC) as level 3, plus mandatory metadata on data quality extended analysis performed to under-stand accuracy and limitations of the data set in extended space and time dimensions (e.g.check for long-term trends and errors) to the largest extent possible data quality controlled, all checks passed; reviewed data set, sent to an WDC data centre with a DOI ity flags is used as a flagging scheme, as shown in Table 2. Externally published data have a final delayed mode quality control and, for example, are published in PANGAEA (Diepenbroek et al., 2002).

Data policy and access to CODM
COSYNA supports an open data policy.It is possible to download COSYNA data via CODM.Some guidelines for a fair data use are written in the COSYNA data disclaimer 30 which pops up before a download is started.
In addition, COSYNA starts to publish data via the peerreviewed data journal Earth System Science Data (Carlson and Pfeiffenberger, 2009).This external data publication applies to COSYNA data level 4 with final quality control.
COSYNA data policy stipulates that data access should be unhindered.On the other hand, there has been increasing interest from COSYNA funding sources to gather information on who is accessing and using COSYNA data.This conflict is solved by an open user registration process which defines user accounts completely based on user input without requesting personal information.Only self-defined username and password, country, city and user category are mandatory inputs.The user category is selected from a predefined 30 http://www.coastlab.org/Disclaimer.htmllist31 .The self-defined username and password combination is needed to access CODM.
The user registration process creates a connection between the user information and the IP number of this user.With every new login, this connection is renewed.Based on this connection, all log files of web service requests and responses can be analysed to gather the information about the usage of CODM.This analysis started in November 2014.The data downloaded per category for the accumulated year 2015 and the first 4 months of 2016 are shown in Fig. 7.
The Science category has most data access requests followed by the Administration category.Only minor access requests stem from the general public.This result is not surprising because the available data and visualisations are targeting mostly science and administration.To address the general public, a portal is needed with less data variety presented but more explanations and user guidance.

Examples
To compare output from the forecast wave model WAM (Behrens, 2009) with observations, data for wave height during a winter storm are shown.The selected time range starts on 1 December 2013 and ends on 8 December 2013.A map of the wave heights from model results during the storm is presented in Fig. 8.In this map, the positions of Waverider buoys are marked as red dots.The comparison is done with a click on one of the buoys marked on the map.One result for the Waverider buoy between Heligoland and the mouth of the Elbe is shown in Fig. 9.
Similarly good matches could be reached at the other measurement stations.Because the tools for creating the graphs of the forecast model (ncWMS) and the time series differ, the plot layouts differ slightly.

Additional visualisation tools
Although all the objectives mentioned in Sect. 2 could be met by CODM, additional visualisation tools are required to get a more comprehensive view for some data.For example, comparing measured HF radar data (Seemann et al., 2011) for surface water currents with those of model output for the same parameter using the portal is not a trivial exercise because two or more data sets covering the same area cannot be displayed simultaneously on a single map.To allow such a comparison, a separate web application for data of equal extent was developed.This application uses synchronised maps to visualise data sets for a selected time step.Figure 10 shows the measured HF radar current vector (right map), the GETM model run (left map) and the reanalysed model with assimilated data (centre map).Another feature of this tool is the creation of time series plots at a clicked map location.Shown in the lower part of Fig. 10 are the current direction components for the corresponding data set on the selected date.
One basic presentation form of the COSYNA data portal is two-dimensional maps.However, a data search can also filter for depth ranges which are then applied to the data request using the metadata.For a few data types, such as glider data, it is useful to have a tool to visualise the data in 3-D.A Java applet was built to display the location and the depth dependency of up to three parameters from a glider campaign (Fig. 11).The URL for the selected observed property is stored within the metadata similar to that of Listing 1.With this mechanism the tool is accessible as an additional icon from CODM.The further applet processing is independent of CODM.Up to three observed properties can be visualised with the applet.In addition to glider data, the applet can be applied to ScanFish data as well.

Conclusions and Outlook
Modern marine observing systems are composed of many different observation devices or platforms.This paper describes an approach for the challenging task of integrating these various observations into one common portal while providing the ability to visualise the data in a concerted way.The CODM data portal demonstrates that it is possible to integrate heterogeneous observations and model output comprehensively.Furthermore, an online comparison of a data-independent model, observed data and a data-assimilated model is provided.In addition, a solution for the challenging task of visualising data tracks in 3-D has been developed.This approach has been already realised in the COSYNA data portal CODM32 .
CODM is based on web services and metadata.Unique features of this approach are the storing of web service URLs as metadata as well as mapping observed or modelled parameters to standardised names.This enables the portal to present an integrated view and to compare different data sets and methods without any additional effort.
In the future, new platforms with sensors accessed by a system using Sensor Web Enablement (Botts and Reed, 2006) to access the observations will be available.Metadata for these new sensors will be described based on SensorML.
To gain information about usage and user interactions, a registration procedure has been implemented in the portal.Only registered users are able to browse, view and download data.To comply with COSYNA's open data policy, registra- tion is unrestricted, free of charge and without verification of the information provided by users.
The COSYNA data portal as well as many COSYNA web services are registered in GEOSS 33 (Lulla et al., 2014).GEOSS promotes common technical standards enabling data from thousands of different instruments to be combined into coherent data sets.The approach described here makes it easier for any system, such as GEOSS, to integrate many data sources, thus ultimately creating a real earth observation system.
It would be useful to get a review of different approaches for data management in the observing system.The main focus in such a review should be the interoperability of the data access and the integration into global portals.
A deficit common to all existing approaches is the dependency on observation platforms and different data types.It should be possible to integrate all observational data into a common data cube with a time dimension coordinate, three spatial dimension coordinates and one more dimension coordinate for the observed property.Such a data cube, with the ability to homogeneously store various observational data, should provide arbitrary cuts in all dimensions in a performant manner.As a start to realise the vision, it is planned to integrate all COSYNA observations into a data cube in the near future.When results are promising for COSYNA data, there should be no barrier to consecutively integrate more diverse data. 33Global Earth Observation System of Systems

Figure 1 .
Figure 1.Main locations for COSYNA observations in the North Sea and at Spitsbergen (top left).The logos of COSYNA partner institutions are shown.

Figure 4 .
Figure4.Concept for the interaction between users of the portal, data and metadata.All interaction is done using various web services.The data are stored as netCDF files or as rows in the Relational Database Management System (RDBMS) of Oracle.

Figure 5 .
Figure 5.View of the CODM portal with the selected parameter chlorophyll a and a selected time range from April 2009 to May 2009.After clicking "Select all datasets" the count of data sets for all platforms is shown and automatically selected.OpenStreetMap (OSM) (Coast, 2004) is used as the background map.To keep the portal simple for the users, well-known parameter names are used instead of CF standard names.

Figure 6 .
Figure 6.Comparison of a chlorophyll map derived from MERIS with FerryBox measurements for the end of April 2009.The MERIS data were deduced following Doerffer and Schiller (2007).The concentrations are computed on a logarithmic scale.FerryBox data are taken from W. Petersen, personal communication, 2012.

Figure 7 .
Figure 7. COSYNA data downloads since 2015 per category.The category names are self explanatory, with the exception of "Private" which stands for private businesses like fishermen, etc.On the left side, 2015 (blue) stands for the whole year 2015.The other bars are monthly values.

Figure 8 .
Figure 8.The map of the significant wave heights during the winter storm from a forecast wave model (data from A. Behrens, personal communication, 2014).

Figure 9 .
Figure 9.Comparison of the forecast model results (top, A. Behrens, personal communication, 2014) and the measurement of the significant wave height at the Heligoland Waverider buoy during December 2013 (bottom, K. Herklotz, personal communication, 2014).

Figure 10 .
Figure 10.Application comparing results for current vectors from the GETM model run without assimilation (left, J. Staneva, personal communication, 2015), HF radar data (right, J. Hostmann, personal communication, 2015) and HF radar data assimilated into GETM model results (middle, J. Schulz-Stellenfleth, personal communication, 2015).Time series plots of the selected day are shown added below each map.The cross marks the position of the time series.

Figure 11 .
Figure 11.Java applet to visualise glider data in 3-D.Three parameters are selected.It is possible to zoom in on location or time.With "Applet Configuration" other parameters can be selected.

Table 1 .
Parkinson and King (2006)d in COSYNA based onParkinson and King (2006)with COSYNA specific expansions.Some aspects, like accuracies, are not presently realised.

Table 2 .
Quality flag values used in COSYNA.These are based on SeaDataNet (2010).