Preface

Notes to an Unfinished Document:

  • The title was changed for version 0.7 to better reflect what the author has come to believe should be the contents. It makes little sense to write a tutorial guide to THREDDS for novices without also including tutorial material about NetCDF at a similar level, especially since someone completely unfamiliar with THREDDS is probably also unfamiliar with NetCDF. Novices will be significantly more motivated to use THREDDS to serve datasets of their own if they become familiar with the language and data structures of the NetCDF format that underlies much of the functionality of THREDDS.

  • It ain’t finished for a couple of reasons. The first is because the author hasn’t finished including what he thinks is obviously necessary for a thorough tutorial. The second is that the author hasn’t quite figured out what else might just be "obviously necessary" or at least "a corking good idea" to include. Suggestions are welcome and frankly mandatory if one wishes this to get to an unembarassing and possibly useful version 1.0.

  • Most material borrowed from various HTML, PDF and RTF documents scattered over a few cyberparsecs. The author has variously added explanatory material to sections in an attempt to help motivate and more clearly explain the concepts to users - not to mention the author.

  • Direct credit has been given in a few places in the sense that a specific document has been plundered for a specific graphic or chunk of text. A plan to do this thoroughly was replaced with an initial "relevant documentation" portion for each (sub)section that identifies all the documents that were used to create that (sub)section.

  • The author has also tried to impose some structure - mostly in the way of progressing from the simple to the progressively less simple - on the material. Or at least he’s been subtly guided by the material’s natural structural tendencies.

  • The author has attempted to remain as up-to-date as possible when conflicting materials exist due to updates to the schemas.

  • A plan is to create working versions of all examples that will be made available - along with the data files without which they’re not terribly useful - in a format that will allow it to be simply dropped into a startup installation and used for learning purposes and, even more importantly, as templates for the user’s actual data.

  • There is also a plan to create working Python scripts to perform various tasks - e.g. create files for each of the feature types that also following the relevant CF conventions for each type - as described in CDM Implementation of CF Discrete Sampling Features. These scripts will use Python, Numpy and netcdf4-python to accomplish this. Given the increasingly widespread use of Python and Numpy in the geosciences - and the accompanying explosion of related and useful Python packages - this seems like a good and obvious choice.

  • The ur-format of this document is presently an ASCII text file with AsciiDoc markup, which is directly converted to HTML. The author is presently working/struggling to implement the parts needed to additionally translate this document into LaTex, PS, PDF, etc. On an editorial note, the author very much likes the AsciiDoc system and would highly recommend it to anyone attempting to create a technical document that needs to be translated into several other formats. The final appendix will include material about how the document was produced and how to replicate the process.

1. Introduction to the THREDDS Data Server (TDS)

1.1. Motivation

Installing, configuring and maintaining a THREDDS server may at first seem like an overwhelming task for someone approaching it for the first time. The real tasks involved are downloading and installing a few programs in the appropriate places, and then creating and modifying configuration files so your THREDDS server will serve what you want it to serve in the way you want it served. The installation is a one-time thing that won’t be very difficult, especially given the system administrator help you’ll need to install it correctly. If you already have sysadmin experience, this will be trivial. The more difficult task will be creating the configuration files - written in a dialect of XML - and getting the syntax correct such that the THREDDS magic you want to happen does indeed happen. The basic configuration is relatively easy, but if you want to use the full power of NcML for rewriting, modifying and combining datasets things get a bit harder. While the documentation is scattered and less than wholly coherent, there are plenty of examples available via web searching on which you can build. This document is hopefully a step towards making the documentation marginally less scattered and more coherent, and thus a useful guide for beginners.

1.2. General Overview

The THREDDS (Thematic Realtime Environmental Distributed Data Services Data Server (TDS) is a package designed to make it as easy as possible for those who create scientific datasets to make them available to those who use them. The goal is to make datasets in many different formats and located in many different geographic locations available to users in a way that hides the data format and location information and presents only the data essential to the datasets themselves. It is a web server providing metadata and data access for scientific datasets. It employs several widely-used and useful data access protocols including OPeNDAP, WMS, WCS and HTTP.

The TDS is primarily used to manually or automatically create data catalogs that provide virtual directories of data and their associated metadata, and then make them available to users via various data interrogation and transfer protocols. For example, if you have a series of individual NetCDF or HDF files containing gridded velocities, temperatures, etc. at one day intervals over a year, TDS can be used to create a virtual, combined file that a user sees and can access as a single file containing the data for the entire year. All or chosen parts of this virtual file can be downloaded, viewed or processed by a user without the tedious need to deal with 365 individual files. The virtual file can then be accessed via most widely used data access protocols such as OPeNDAP, WMS, WCS and HTTP.

The TDS uses an internal library to read datasets in most of the formats used in the geosciences - NetCDF, OPeNDAP, HDF, GRIB, NEXRAD, etc. - and transform them into a common internal data format called the Common Data Model (CDM). This allows the use of datasets in disparate formats, with the CDM library doing all the hard work of converting them all into a common internal format for further processing.

The TDS uses the NetCDF Markup Language (NcML) to modify and create virtual aggregations of and modify CDM datasets. NcML is used to process the internal CDM versions of the original files in various formats into virtual datasets via modifying and aggregating the CDM versions.

NcML can be used to add, delete or change the metadata in the original files. The metadata is data about the data included as the initial or header portion of, for example, a NetCDF file. This metadata includes dimensions, variables and attributes. A short example of a typical NetCDF header is:

dimensions:
        eta_rho = 128 ;
        xi_rho = 256 ;
        wind_time = UNLIMITED ; // (224 currently)
variables:
        float lon_rho(eta_rho, xi_rho) ;
                lon_rho:long_name = "longitude of RHO-points" ;
                lon_rho:units = "degree_east" ;
        float lat_rho(eta_rho, xi_rho) ;
                lat_rho:long_name = "latitude of RHO-pints" ;
                lat_rho:units = "degree_north" ;
        double wind_time(wind_time) ;
                wind_time:units = "days since 1970-01-01 0:00:00 0:00" ;
                wind_time:long_name = "time since initialization" ;
                wind_time:standard_name = "wind_time" ;
        float Uwind(wind_time, eta_rho, xi_rho) ;
                Uwind:units = "meter second-1" ;
                Uwind:long_name = "surface u-wind component" ;
                Uwind:time = "wind_time" ;
                Uwind:standard_units = "m s-1" ;
                Uwind:standard_name = "eastward_wind" ;
        float Vwind(wind_time, eta_rho, xi_rho) ;
                Vwind:units = "meter second-1" ;
                Vwind:long_name = "surface v-wind component" ;
                Vwind:time = "wind_time" ;
                Vwind:standard_units = "m s-1" ;
                Vwind:standard_name = "northward_wind" ;

wherein a dimension is eta_rho, a variable is Uwind, and an attribute is standard_name. If a key or desired piece of any of these types of metadata is missing from the original data file, NcML can be used to add it to the metadata presented as part of the full virtual data set. The original dataset will not be modified, but the virtual representation of it will include the modifications.

NcML is used to rename, add, delete and restructure variables from the original files. For instance, if you have a NetCDF file that contains a gridded temperature field on a known regular grid but doesn’t contain the data and metadata about that grid then you can add that information via NcML. As an example, if lon_rho and lat_rho were missing from the file represented by the header file in the example above, they could easily be added.

NcML is used to combine or aggregate data from multiple CDM files.

After the original datasets are converted into a common format and then modified to suit whatever needs are desired, they are made available over the internet via the most useful data transfer protocols for the geosciences.

  • An integrated HTTP server allows for a highlighted file name to be clicked on to download an actual version of the chosen virtual data set.

  • An integrated OPeNDAP server allows for the chosen virtual data set to be accessed via OPeNDAP protocols, which were developed to make access to data sets typically found in the geosciences - for instance, gridded fields - quick and easy.

  • An integrated WCS server allows the chosen data to be accessed as actual data files via the OGC Web Coverage Service protocol for gridded datasets.

  • An integrated WMS server allows the chosen data to be accessed as images via the OGC Web Map Service protocol for gridded datasets.

To summarize, the TDS reads in datasets in several different formats, converts them into a standard internal format, transforms and aggregates them into virtual datasets, and makes them available to users via several different data transfer methods.

1.3. Current Status

The current technical status of the TDS can always be found on the Technical Status Page, which contains news and announcements about the latest release version as well as the required prerequisite versions.

1.4. Roadmap

In this document you will learn how to:

2. Installation

Related pages with additional installation information:

2.1. Installing the Prerequisite Software

The two key prerequisites for installing THREDDS are Java and Apache Tomcat. Many operating systems are shipped with a recent version of Java, while Tomcat usually has to be installed separately. Be sure to note which versions of Java and Tomcat your version of THREDDS requires. This information is supplied on the Getting Started page along with even more detailed instructions on how to install both packages.

2.1.1. UNIX/Linux

A UNIX platform - preferably Linux - is a highly recommended choice on which to install and run your THREDDS server. In addition to being a stable platform on which to run the server, the THREDDS software was and is still being developing on a Linux platform. Also, the vast majority of the documentation - including this document - has been written by those employing Linux and contains many examples specific to that platform.

While the THREDDS server can be installed and run in a production mode on a Windows or OS X platform, it is not recommended and certainly not supported herein. Those platforms are better employed in a client rather than server capacity.

2.1.2. Java

Java is a programming language designed to have minimal implementation dependencies, which enables developers to write a program once that will run on any device that includes a Java installation. To accomplish this goal, Java programs are compiled to run on so-called virtual machines, which are basically a software version of a computer’s hardware CPU. Once a virtual machine is create for a specific hardware architecture, any Java program should run on that architecture. Java is especially useful for client-server web applications such as THREDDS.

Although it is unusual for a computing platform to not come with Java already installed, if the need arises the latest version of Java can be obtained at the download site at:

which has virtual machine packages for Windows, Linux and Solaris machines for both 32- and 64-bit architectures. Apple supplies their own version of Java for the OS X operating system.

The most recent officially released version of THREDDS is 4.2, which requires Java 1.6 or above and recommends 1.6u24 or greater.

Once you have figured out where the Java installation is located, you need to specify it via a global environment variable. An example, for a Java installation located at /opt/java, would be:

export JAVA_HOME=/opt/java

The location is highly variable by platform, so you may have to consult your sysadmin about this.

2.2. Installing Apache Tomcat

2.2.2. Introduction

Apache Tomcat is an open source implementation of Java Servlet and JavaServer Pages techologies. A Java servlet is a Java programming language class used to extend the capabilities of servers that host web applications accessed by a request-response model. Servlets provide component-based, platform-independent methods for building web-based applications such as THREDDS. JavaServer Pages (JSP) is a technology for creating dynamically generated web pages based on HTML and XML. It is similar to the PHP web programming language but built on top of Java. Basically, Tomcat is an HTTP server that leverages the platform-independence of Java all the way up to the server level, enabling and allowing the portability of such web-based applications as THREDDS.

It should be noted that implementations of Java Servlet and JavaServer Pages other than Tomcat are available and can also be used with THREDDS. We have chosen to document Tomcat for the simple reason that it works well with THREDDS and that most of the available documentations is for that combination.

Tomcat is usually not included in standard operating system distributions and must be obtained from the Tomcat home site at:

Tomcat can be obtained in source code format and compiled for your specific platform, but it is recommended that you skip that chore and simply download a binary distribution appropriate to your platform.

The most recent officially released version of THREDDS is 4.2, which requires Tomcat 5.5 or above and recommends the latest version of Tomcat 6.x.

2.2.3. Quick Installation Procedure

WARNING

This section describes how to quickly install Tomcat. It is recommended only as a way to quickly check out the capabilities of Tomcat and THREDDS, and only then if you’re behind a firewall or on a machine that’s not connected to the internet. If you use this section to set up an operational THREDDS server connected to the Internet, it will be a matter of when and not if you will be successfully hacked.

Downloading

As of this writing (July 2014) the latest official stable release of THREDDS is 4.3, and it requires Tomcat 6+ and Java 6+, although it is recommended that you use Tomcat 7+ and Java 7+ for mostly reasons of security. The current development release is 4.5, and requires Tomcat 7+ and Java 7+.

The main Tomcat site is at:

and the version 7 download page is at:

For Linux platforms, you will download the tar.gz version, which is currently apache-tomcat-7.0.54.tar.gz. For Windows platforms, just give up now before it’s too late.

Installing

Once the apache-tomcat-7.0.54.tar.gz file has been obtained, we need to install it somewhere. A good choice on Linux systems is in the /opt directory, and if we choose this option the installation will look like this:

su
[enter password to become super user]
cp apache-tomcat-7.0.54.tar.gz /opt
cd /opt
tar xzvf apache-tomcat-7.0.54.tar.gz
ln -s apache-tomcat-7.0.54 tomcat7

The symbolic link to tomcat7 will save some aggravation, and allow us to use /opt/tomcat7 in the following rather than a less aesthetically pleasing variable such as ${TOMCAT_ROOT}.

This uncompresses and unarchives the files contained in the distribution and creates a hierachy of subdirectories in the tomcat directory that, upon entering the commands:

cd tomcat7
ls -l

should look something like this:

drwxr-xr-x. 2 root root  4096 Apr 24 14:24 bin
drwxr-xr-x. 3 root root  4096 Apr 24 14:24 conf
drwxr-xr-x. 2 root root  4096 Apr 24 14:08 lib
-rw-r--r--. 1 root root 37951 Nov 28 04:22 LICENSE
drwxr-xr-x. 2 root root  4096 Apr 24 14:24 logs
-rw-r--r--. 1 root root   558 Nov 28 04:22 NOTICE
-rw-r--r--. 1 root root  8680 Nov 28 04:20 RELEASE-NOTES
-rw-r--r--. 1 root root  6670 Nov 28 04:22 RUNNING.txt
drwxr-xr-x. 2 root root  4096 Apr 24 14:08 temp
drwxr-xr-x. 7 root root  4096 Nov 28 04:20 webapps
drwxr-xr-x. 3 root root  4096 Apr 24 14:24 work
Setting the Environment

To save time, frustration and grief in the long run, it is strongly recommended that a setenv.sh file be created in the directory /opt/tomcat/bin that is created during installation. To do this, execute the following command:

cd /opt/tomcat/bin

and use a text editor to create a file containing the following commands. For example, if we use the vi text editor we would issue the command:

vi setenv.sh

and, upon entering editing mode, enter the following lines:

#!/bin/sh
#
# ENVARS for Tomcat and TDS environment
#
JAVA_HOME="/opt/java"
export JAVA_HOME

JAVA_OPTS="-Xmx4096m -Xms512m -server -Djava.awt.headless=true
-Djava.util.prefs.systemRoot=$CATALINA_HOME/content/thredds/javaUtilPrefs"
export JAVA_OPTS

CATALINA_HOME="/opt/tomcat"
export CATALINA_HOME

On 32-bit platforms where RAM size may be smaller than 4 Gb, we can swap -Xmx4096m for -Xmx1500m in the above.

See the Security Measures section below for additional steps recommended for production installations of THREDDS.

Starting the Server

There are a couple of options for starting Tomcat, with extensive details available at:

Basically, Tomcat can be started manually or automatically. A manual start - given the /opt location into which we have installed the package - would be performed via:

/opt/tomcat/bin/startup.sh

with a shutdown performed similarly via:

/opt/tomcat/bin/shutdown.sh

If these commands do not work, check the commands you used to set up the environment via creating the setenv.sh file.

Tomcat can also be run automatically as a UNIX daemon using a program called jsvc that is included in Tomcat binary distributions. The following commands will compile and install this program:

cd /opt/tomcat/bin
tar xzvf commons-daemon-native.tar.gz
cd commons-daemon-1.0.x-native-src/unix
./configure
make
cp jsvc ../..

This procedure puts the jsvc binary in the /opt/tomcat/bin directory and allows you to run it as a daemon via:

/opt/tomcat/bin/jsvc -cp ./bin/bootstrap.jar -outfile ./logs/catalina.out
            -errfile ./logs/catalina.err org.apache.catalina.startup.Bootstrap

Additional information about this procedure including several additional options can be found at:

Checking for a Running Server

Once you have started the Tomcat server via one of the procedures above, you can verify that it is running either via the command line with:

ps -ef | grep tomcat

where, if the server is running, you’ll see something like the following confusing mess:

baum     10781 23963  0 13:40 pts/26   00:00:00 grep tomcat
root     18619     1  0 Apr24 pts/32   00:04:13 /usr/bin/java
   -Djava.util.logging.config.file=/opt/tomcat/conf/logging.properties -Xmx4096m
   -Xms512m -XX:MaxPermSize=180m -server -Djava.awt.headless=true
   -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
   -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
   /opt/tomcat/bin/bootstrap.jar -Dcatalina.base=/opt/tomcat
   -Dcatalina.home=/opt/tomcat -Djava.io.tmpdir=/opt/tomcat/temp
   org.apache.catalina.startup.Bootstrap start

or, if it’s not running, you’ll see just the first line containing grep tomcat.

You can also check for a running server by opening a browser window or tab and going to:

where you should see the welcome page for Apache Tomcat looking something like this:

thredds/tomcat.png
Figure 1. Tomcat Welcome Page
Troubleshooting

Tomcat troubleshooting starts with checking the logs in the directory:

/opt/tomcat/logs

with the most useful messages usually ending up in the main log file, that is:

/opt/tomcat/logs/catalina.out

although you may not be able to make out hide nor hair of what’s happening in there due to the recondite nature of Java and its error messages. Try Google or you local sysadmin if you’re hopelessly confused on the matter.

2.2.4. Secure Installation Procedure

A secure installation procedure for production environments follows.

Create a tomcat User

Create a dedicated user and group - which does not have root privileges - for running Tomcat. And why not call this user tomcat?

su
[root password]
/usr/sbin/adduser tomcat
Download and Install

As the tomcat user, download and install the recommended version of Tomcat as detailed in the quick installation section. You can become the tomcat user via the following:

su
[root password]
su - tomcat

After you have performed the tar xzvf apache-tomcat-7.0.54.tar.gz and ln -s apache-tomcat-7.0.54 tomcat7" steps, check to ensure that user +tomcat does indeed own the directory created. Issue the command:

ls -l /opt

to get a result like:

...
drwxr-xr-x.  5 root root     4096 Aug  1  2013 ncar
drwxr-xr-x. 10 tomcat tomcat 4096 May  1  2012 tomcat
drwxr-xr-x.  3 root root     4096 Aug 30  2012 vtk
...

If the tomcat directory doesn’t show tomcat as user and group, then modify them appropriate as root using:

chown -R tomcat:tomcat tomcat7

All the additional steps should also be performed as the tomcat user, unless otherwise indicated.

Create a setenv.sh Script

In the /opt/tomcat7/bin directory, create a setenv.sh script within which you will set the values of JAVA_HOME, JAVA_OPTS and CATALINA_BASE. An example is:

#!/bin/sh
JAVA_HOME="/opt/jdk7"
export JAVA_HOME

CATALINA_BASE="/opt/tomcat7"
export CATALINA_BASE

JAVA_OPTS="-Xmx4g -Xms512m -server -Djava.awt.headless=true -Djava.util.prefs.systemRoot=$CATALINA_BASE/content/thredds/javaUtilPrefs"
export JAVA_OPTS

where:

  • JAVA_HOME - The root directory of the required Java distribution.

  • CATALINA_BASE - The root directory of the required Tomcat distribution.

  • JAVA_OPTS - Various options used for starting the server, e.g. -Xm4g for specifying the amount of memory to use.

Obtain a Certificate from a Certificate Authority

You can create your own, but it will result in annoying and/or confusing messages for your possible users.

Modify server.xml

Make the following changes to the file /opt/tomcat7/conf/server.xml.

  • Enable digest passwords by commenting out UserDatabaseRealm and enabling MemoryRealm.

  • Enable SSL by uncommenting the SSL connector listening on port 844, and adding the required keystoreFile and keystorePass attributes.

  • Enable compression by adding compression and compressableMimeType attributes to the 8080 connector.

  • Enable access logging by uncommenting AccessLogValve and changing the prefix, suffix and pattern attributes.

Create Password Digests

For each user, create a SHA1 password digest using the /opt/tomcat7/bin/digest.sh script or an online service such as:

The file /opt/tomcat7/conf/tomcat-users.xml stores user names and passwords. By default, the passwords are stored as clear text, e.g.

<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
  <role rolename="tdsConfig"/>
  <role rolename="manager"/>
  <role rolename="admin"/>
  <user username="sysadmin" password="yrPassword" roles="manager,admin"/>
  <user username="cataloger" password="myPassword" roles="tdsConfig"/>
</tomcat-users>

To create encrypted passwords in digest form, use digest.sh, e.g.

cd /opt/tomcat7/bin
./digest.sh -a SHA yrPassword
yrPassword:aa01ea2afaae56c2b7da5e25ec18c505e58f12d7
./digest.sh -a SHA myPassword
myPassword:5413ee24723bba2c5a6ba2d0196c78b3ee4628d1

You can then cut and paste these into tomcat-users.xml, e.g.

<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
  <role rolename="tdsConfig"/>
  <role rolename="manager"/>
  <role rolename="admin"/>
  <user username="sysadmin" password="aa01ea2afaae56c2b7da5e25ec18c505e58f12d7" roles="manager,admin"/>
  <user username="cataloger" password="5413ee24723bba2c5a6ba2d0196c78b3ee4628d1" roles="tdsConfig"/>
</tomcat-users>

Now edit the server.xml file to tell it to use digested passwords via adding the Realm element to the Host element with name localhost, e.g.

<Host name="localhost" debug="0" appBase="/opt/tomcat/webapps" unpackWARs="true" autoDeploy="true"
     xmlValidation="false" xmlNamespaceAware="false">
  <Realm className="org.apache.catalina.realm.MemoryRealm" digest="SHA" />
  ...
</Host>
Modify tomcat-users.xml

Make the following modifications to /opt/tomcat7/conf/tomcat-users/xml.

<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
 <role rolename="manager-gui"/>
 <role rolename="tdsConfig" description="can change THREDDS configuration files"/>
 <role rolename="tdsMonitor" description="can download THREDDS log files"/>
 <role rolename="tdsTrigger" description="can trigger featureCollection reloads"/>

 <user username="generalissimo" password="digest1" roles="manager-gui"/>
 <user username="capitan" password="digest2" roles="tdsTrigger,tdsConfig,tdsMonitor"/>
 <user username="tdm" password="digest3" roles="tdsTrigger"/>
</tomcat-users>

wherein:

  • The roles of manager-gui, tdsConfig and tdsMonitor have been defined.

  • You’ve added yourself as a user with the roles manager-gui and tdsConfig using the digest password created in the previous section.

Modify web.xml

Make the following modifications to /opt/tomcat7/webapps/manager/WEB-INF/web.xml:

  • Ensure that the manager is only available via SSL by adding a user-data-constraint with a transport-guarantee value of CONFIDENTIAL inside the security-constraint element.

Remove Unnecessary Applications

Remove any unnecessary/unused applications from /opt/tomcat7/webapps. Here are some comments about the various applications, all of which you may not find in your Tomcat distribution. You might also find some not listed here, and they should probably be deleted unless you have a really good reason for keeping them.

  • The servlets-examples, jsp-examples and/or examples applications should be removed from a production server to minimize security exposure. This can be done via the manager application or by deleting them via the command-line from the /opt/tomcat7/webapps directory.

  • The tomcat-docs/docs, balancer and webdav applications are probably harmless, at least until some bored Romanian hacker finally gets around to them. Delete them.

  • The ROOT application contains the server’s main page. Any file added under /opt/tomcat7/webapps/ROOT will be served.

  • The admin and manager applications are used for remote management. To use these, users with roles admin and manager, respectively, must be added. The applications then become accessible from the main Tomcat page, and can be used to add further users, start and stop webapps, etc. The IP addresses allowed to run these applications should be restricted by editing the admin.xml and manager.xml files in the /opt/tomcat7/conf/Catalina/localhost/ directory.

Build NetCDF Library

Build the NetCDF-C library on your server in /opt/netcdf.

The Tomcat and TDS Security Tutorial at:

is an excellent, detailed look at how to secure your Tomcat and TDS installation.

The THREDDS site has a checklist for production installation available at:

that details several additional installation steps recommended for securing a production Tomcat installation. Basically, if you do this now you’ll be dealing much less with the predations of Romanian hackers later.

2.2.5. Performance Tuning

If you are having performance issues with Tomcat, for instance, it’s running very slowly, then you have some options to tune it for optimum performance. A good overview of these options can be found at:

although they fall outside of the purview of our basic installation procedure.

2.3. Installing THREDDS

2.3.2. Obtaining THREDDS

The latest THREDDS release can always be found at:

where you’ll also find information about the versions of Java and Tomcat needed for the latest release.

The TDS is available as either a jar or a war file. A jar file is a Java archive file, which bundles java binary classes into a single file for easy accessibility. A war file is a web application archive file, which bundles JSP/Servlet classes into a single file for the same reason. Since we’ve already installed the Tomcat web application server for just this purpose, we will skip the jar and download the war version of THREDDS which is thredds.war.

2.3.3. Installing the Web Archive File

The TDS Java web archive file is installed in the webapps subdirectory of the main Tomcat installation which, in our example, is:

/opt/tomcat/webapps/thredds

The installation is performed simply by copying or moving the thredds.war into that directory, after which a running Tomcat server will automatically unpack and start the programs therein. You can tell that the process has at least started if you can see that a thredds subdirectory has been created under /opt/tomcat/webapps, i.e.

/opt/tomcat/webapps/thredds

If nothing happens, first check to see if you have successfully started you Tomcat server, and then check the logs.

2.3.4. Checking for a Running THREDDS

If the installation has proceeded correctly, you can point your browser at:

and, if you have been successful, the dynamically generated default THREDDS home page appears something like this:

thredds/thredds.png
Figure 2. THREDDS Default Welcome Page

2.3.5. Troubleshooting

THREDDS Catalog Changes Aren’t Implemented Via Restarting Tomcat

Sometimes tomcat doesn’t shut down properly and additional steps have to be taken beyond shutdown and startup. The procedure to solve this problem is:

  1. Look in catalina.out for a message indicating that tomcat didn’t shut down, e.g. java.net.BindException: Address already in use:8080

  2. If so, then make sure it’s dead via:

    1. Enter ps -ef | grep java to find the process ID.

    2. Enter kill <pid> or kill -9 <pid> to really kill it.

    3. Verify it’s dead via ps -ef | grep java. If not, repeat this cycle.

  3. Restart and check tomcat via:

    1. Run startup.sh.

    2. Check catalina.out to look for a message that tomcat has correctly started.

    3. Enter ps -ef | grep java to verify that a new tomcat process is running.

When All Else Has Failed, Ask the Experts

The TDS development crew will provide help if you’re utterly lost. They have a strict procedure you must follow, though, the steps of which are:

  1. First, you have to be running the latest official stable release of TDS, as can be found here.

  2. Second, get a clean set of logs that capture your specific problem via:

    1. Stopping the tomcat server.

    2. Installing the latest release if it’s not, which will hopefully make the problem go away.

    3. Remove and/or copy to elsewhere all files from http://tomcat.apache.org//logs and http://tomcat.apache.org//content/thredds.logs.

    4. Restart the tomcat server.

    5. See if the problem happens again, while hoping that installing the latest release made everything copacetic.

    6. Gather up everything in http://tomcat.apache.org//logs and http://tomcat.apache.org//content/thredds.logs.

    7. Send the logs and a detailed description of exactly what was done to cause the problem, and what the problem looks like. If the problem didn’t occur immediately, note what time it did occur so that time can be correlated with the logs.

3. Configuring the THREDDS Server

3.1. Overview

You will be creating or editing configuration files for both your server and your datasets. First, you will be configuring the server itself. There is an XML file that contains information about the server itself, e.g. its name, who owns and runs it, what organization is involved, etc. It also allows you to configure how your HTML pages will appear to users, how large your caches will be, whether or not various services will be enabled, etc. Basically, you establish the way your server will look and perform via configuring this file. This is something you do when you first install the TDS, and then perhaps tweak occasionally as your situation changes and you might need to offer more services, establish larger caches, etc.

The second type of configuration is done within data catalogs. That is where you configure how your datasets are served to your users. This is significantly more complicated and involved than configuring the server itself, and will be more or less an ongoing process as you create or acquire more datasets and gradually learn the full capabilities of the TDS and apply them to your datasets.

Both the server and data configuration files are configured using a human- and machine-readable file written in a dialect of XML. Since XML is a fairly abstract concept, we’ll now take a brief look at what it is.

3.1.1. XML

The eXtensible Markup Language or XML is a markup language that defines a set of rules for encoding documents in a format that is both human- and machine-readable. An XML document is immediately recognizable by the number of angle brackets contained therein, and sort of resembles HTML markup although the two markup languages have different purposes. A good way of looking at this is that HTML is for form and XML is for content.

XML documents consist mainly of elements and attributes. An element is a component that begins with a start-tag and ends with a matching end-tag, with an example being:

<serviceName>odap</serviceName>

where serviceName is the element name. An attribute is a name/value pair within a start-tag, and example being:

<dataset name="TDS Tutorial"/>

where within the element dataset the attribute name is name and the value is TDS Tutorial.

An XML schema or grammar is a description of a specific type of XML document, usually expressed in terms of constraints on the structure and content of documents of that type. Typically a schema constrains the set of elements that may be used in a document, which attributes can be applied to them, the order in which they may appear, and the allowable parent/child relationships. Basically, a schema allows you to specify what you really need in a document while excluding extraneous material. The XML schema for the THREDDS markup language can be found at:

although this is best avoided until you’ve spun up a bit further on the basics of THREDDS configuration.

An XML namespace is used to provide uniquely named elements and attributes in an XML document, that is, a way to avoid element name conflicts. The namespace for the Dataset Inventory Catalog Specification Version 1.0 for THREDDS can be found at:

although this is also best avoided until later. The THREDDS schema and namespace documents, though, are the final arbiters of what constitutes correct syntax and grammar in THREDDS configuration catalogs, and as such override any secondary documents such as this one.

3.2. THREDDS Configuration Directory

All the configuration files are located in a single directory in the standard, default TDS distribution. In our installation example, this directory is located at:

/opt/tomcat/content/thredds

wherein all the configuration files are located. They are located in a directory separate from the webapp directory into which the software is installed to separate the software from the configuration files and thus allow the former to be upgraded without disturbing the latter. This subdirectory should look something like this:

drwxr-xr-x. 4 root root 4096 May  1 14:05 cache
-rw-r--r--. 1 root root 1457 May  1 14:05 catalog.xml
-rw-r--r--. 1 root root 3173 May  1 14:05 enhancedCatalog.xml
drwxr-xr-x. 2 root root 4096 May  1 14:05 logs
drwxr-xr-x. 4 root root 4096 May  1 14:05 public
drwxr-xr-x. 2 root root 4096 May  1 14:05 root
-rw-r--r--. 1 root root 6951 May  1 14:05 threddsConfig.xml
-rw-r--r--. 1 root root 3765 May  1 14:05 wmsConfig.xml

The basic configuration is performed within the threddsConfig.xml and catalog.xml files. The content configuration is mostly done in the latter file, with the former containing most of the configuration options for tweaking the server itself. We’ll start with configuring the server.

3.3. Configuring the Server

The TDS configuration file threddsConfig.xml, located in our example at:

/opt/tomcat/content/thredds/threddsConfig.xml

allows the TDS administrator to set parameters that control the general behavior of TDS. Most are set to reasonable parameters, although some others should be changed as soon as possible. These include those that describe the server, provide contact information, and change the theme of the HTML pages generated by the server.

Most of the threddsConfig.xml file sections are commented out in the default distribution. The sections that are not commented out and which should probably be modified upon initial server installation will be covered in the following sections. The sections that are commented out will be covered in further sections that are pertinent to their possible use.

3.3.1. Modifying Server Information in the serverInformation Element

The default serverInformation element looks like:

  <serverInformation>
    <name>Initial TDS Installation</name>
    <logoUrl>threddsIcon.gif</logoUrl>
    <logoAltText>Initial TDS Installation</logoAltText>

    <abstract>Scientific Data</abstract>
    <keywords>meteorology, atmosphere, climate, ocean, earth science</keywords>

    <contact>
      <name>Support</name>
      <organization>My Group</organization>
      <email>support@my.group</email>
      <!--phone></phone-->
    </contact>
    <hostInstitution>
      <name>My Group</name>
      <webSite>http://www.my.site/</webSite>
      <logoUrl>myGroup.gif</logoUrl>
      <logoAltText>My Group</logoAltText>
    </hostInstitution>
  </serverInformation>

Part or all of this information is displayed on all pages generated by TDS, for example the server information document located at:

http://localhost:8080/thredds/serverInfo.html

which for the default configuration looks like:

thredds/tds_info.png
Figure 3. Default THREDDS Server Information Page

An XML document containing the same information is:

<serverInformation>
  <name>Initial TDS Installation</name>
  <!--logoUrl>threddsIcon.gif</logoUrl-->
  <!--logoAltText>Initial TDS Installation</logoAltText-->
  <webapp>
    <name>THREDDS Data Server</name>
    <version>4.2.10</version>
    <versionBuildDate>20120417.2151</versionBuildDate>
  </webapp>
  <abstract>Scientific Data</abstract>
  <keywords>
    meteorology, atmosphere, climate, ocean, earth science
  </keywords>
  <contact>
    <name>Support</name>
    <organization>My Group</organization>
    <email>support@my.group</email>
  </contact>
  <hostInstitution>
    <name>My Group</name>
    <webSite>http://www.my.site/</webSite>
    <!--logoUrl>myGroup.gif</logoUrl-->
    <!--logoAltText>My Group</logoAltText-->
  </hostInstitution>
</serverInformation>

If you want something to looking different on the web page or in the XML file, then change the corresponding information in the serverInformation element.

3.3.2. Modifying HTML Appearance in the htmlSetup Element

The default htmlSetup element is:

<htmlSetup>
    <standardCssUrl>tds.css</standardCssUrl>
    <catalogCssUrl>tdsCat.css</catalogCssUrl>

    <folderIconUrl>folder.gif</folderIconUrl>
    <folderIconAlt>Folder</folderIconAlt>
    <datasetIconUrl>dataset.gif</datasetIconUrl> <!-- Not currently used. -->
    <datasetIconAlt>Dataset</datasetIconAlt>     <!-- Not currently used. -->
</htmlSetup>

The cssPage element sets the name of the Cascading Style Sheets (CSS) files used by the server. The default location for these files is:

/opt/tomcat/webapps/thredds/

The CSS file in the catalogCssUrl element is used for all pages that are HTML catalog views, while the one in the standardCssUrl element is used in all other HTML pages generated.

CSS is a style sheet language used for describing the presentation semantics - the look and formating - of a document written in a markup language such as HTML. If you want to change how your dynamically generated web pages will look, either edit the CSS files or supply your own new one. The details of CSS and how to do so are way beyond the scope of this document.

3.3.3. Additional Configuration Options

The elements that are commented out upon initial installation are:

  • catalogRoot - For specifying private catalogs not visible from the public root catalog.

  • crawlableDatasetPlugins - For enabling the CrawlableDataset framework.

  • nj22Config = For CDM Configuration via the NetCDF/Java library.

  • DiskCache - For specifying the parameters for Disk Caching for storing temporary files.

  • NetcdfFileCache - For specifying a NetCDF File Cache.

  • HTTPFileCache - For limiting the number of allowable open datasets.

  • GribIndexing - For writing GRIB indexes.

  • AggregationCache - For specifying the persistence period of joinNew aggregations.

  • Aggregation - For choosing a template dataset for an aggregation.

  • NetcdfSubsetService - For turning on the NetCDF Subset Service.

  • Opendap - For turning on the OPeNDAP Service.

  • WCS - For turning on the WCS Service.

  • WMS - For turning on the WMS Service.

  • NCISO - For turning on the ncISO Service.

  • CatalogGen - For turning on catalog generation.

  • DLwriter - For turning on the DLwriter Service.

  • DqcService - For turning on the DQC Service.

  • Viewer - For enabling Viewer Links

  • DatasetSource - For adding a Dataset Source.

  • Logging - For modifying TDS Logging parameters.

Each of these will be covered in the appropriate section below.

3.4. Enabling Remote Management

The default option for configuring the TDS is by editing the configuration files on the server machine, and then restarting the server after the modifications have been made. There is also an option for remotely configuring and debugging TDS, although it must be deliberately enabled due to the potential extra security problems it can cause.

If you are comfortable using command-line tools to make changes to the various configuration files, then you might want to skip enabling the remote management capabilities. If you are unfamiliar with using the command-line to invoke a text editor - not a word processor - to edit configuration files, then it would be a good idea to enable the remote configuration option that will enable you to make changes using a web GUI, although only after a risk assessment for your particular situation. A good compromise would be to enable the remote management, turn it on while actively configuring the server, and then turning it off when you go into production mode. You will still, however, have to use the command-line tools to enable remote management.

3.4.1. Configure Tomcat Users

This…

3.4.2. Enable SSL

Enabling the Secure Sockets Layer (SSL) ensures that sensitive information going to and from the server cannot be intercepted and read - or at least makes it marginally more difficult to do so - by encrypting that information.

Create Keystore Filename

The process starts with the choice of a keystore filename ${keystore_filename}, which has a default value of ${user_home}/.keystore.

Create a Certificate

Next, a certificate is created by executing the following command:

${java_home}/bin/keytool -genkey -alias tomcat -keyalg RSA -validity 365 -keystore ${keystore_filename}

Note: The validity option sets the number of days for which the generated certificate will remain valid.

The keytool command will create the following series of prompts. Please read the notes below the list of prompts before actually going through them.

Enter keystore password: mypassword

What is your first and last name? [Unknown]: www.mydomain.edu

What is the name of your organizational unit? [Unknown]:

What is the name of your organization? [Unknown]:

What is the name of your City or Locality? [Unknown]:

What is the name of your State or Province? [Unknown]:

What is the two-letter country code for this unit? [Unknown]:

Is CN=*.ucar.edu, OU=UCAR Web Engineering Group, O=University Corporation for Atmospheric Research, L=Boulder, ST=Colorado, C=US correct?

[no]: yes

Enter key password for <tomcat>

(RETURN if same as keystore password):

Note: The answer to the What is your first and last name? question must be the name of the Tomcat server host machine, which is www.mydomain.edu in this example.

Note: The same password must be specified for both Enter keystore password and Enter key password for <tomcat>. The default value is changeit.

Note: Enter the other values as is appropriate.

Once this procedure has finished, it will create a self-signed certificate that is placed into the ${keystore_filename} keystore.

Finally, the file ${tomcat_home}/conf/server.xml must be edited. The section that configures the SSL port - which is different than the regular 8080 port used by the server - should be uncommented and then modified to look like the following example:

    <!-- Define a SSL Coyote HTTP/1.1 Connector on port 8443 -->
    <Connector port="8443"  protocol="org.apache.coyote.http11.http11NioProtocol"
               SSLEnabled="true"
               maxThreads="150" minSpareThreads="25"
               enableLookups="false" disableUploadTimeout="true"
               acceptCount="100" scheme="https" secure="true"
               clientAuth="false" sslProtocol="TLS" keystoreFile="${keystore_filename}"
      keystorePass="mypassword"/>

This enables Tomcat to use port 8443 for the HTTPS protocol, which uses SSL. All sensitive accesses will be redirected to port 8443 from port 8080.

3.4.3. Installing a Certificate from an Authority

This…

3.5. TDS Remote Debugging

Once SSL is enabled, you can remote debug and configure the TDS, although you will need to login with a user who has the tdsConfig role.

4. THREDDS Client Catalogs, or, the Client View

The THREDDS Data Server (TDS) communicates to clients by sending them a THREDDS Catalog that describes what datasets the server has, and how they can be accessed. This section will describe the client view of the catalog. It will tell you how to decode and understand the XML elements and attributes that comprise the client view of the catalog.

A catalog is typically accessed via a rendered HTML version of the XML THREDDS catalog. An HTML page for a THREDDS catalog is:

and the corresponding XML version is:

where the html in the URL has simply been replaced by xml. Depending on your browser and how you have set it up, you may either see the XML version rendered or be asked if you want to download the XML file.

If you want to configure a TDS, you additionally need to visit the Configuration Catalogs section to study the server-side specializations described therein. The combined information in this Client section and the Configuration Catalog section will enable you to configure TDS to serve your datasets to interested users.

4.1. Roadmap

In this section, you will:

  • be introduced to the concept of client catalogs and the services and datasets used to construct them;

  • learn about the base catalog elements from which client catalogs are constructed including:

    • the catalog element, the top element that creates a configuration catalog and contains all the other elements;

    • the service element representing a data access service type that is a method for exploring and obtaining the available TDS data;

    • the dataset element that represents a named, logical set of data for presentation to a user;

    • the access element that defines how a dataset can be accesses through a service element;,

    • the catalogRef that enables the nesting of datasets; and

    • the Xlink element that allows the addition of URLs and human-readable descriptions to the client catalog.

  • learn about the digital library metadata elements used by digital libraries and discovery centers as well as for the annotation and documentation of datasets.

4.2. Introductory Examples

We now move on to the significantly trickier task of configuring data catalogs. We’ll start with looking at the two data catalogs that are included in the default distribution.

4.2.1. Default Catalogs

The default TDS distribution contains a couple of data catalog files. The first and most basic is catalog.xml and it looks like:

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="THREDDS Server Default Catalog : You must change this to fit
your server!"
        xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
        xmlns:xlink="http://www.w3.org/1999/xlink">

  <service name="all" base="" serviceType="compound">
    <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
    <service name="http" serviceType="HTTPServer" base="/thredds/fileServer/"
    <service name="http" serviceType="HTTPServer" base="/thredds/fileServer/"
/>
    <!--service name="wcs" serviceType="WCS" base="/thredds/wcs/" /-->
    <!--service name="wms" serviceType="WMS" base="/thredds/wms/" /-->
    <!--service name="ncss" serviceType="NetcdfSubset"
base="/thredds/ncss/grid/" /-->
  </service>

  <datasetRoot path="test" location="content/testdata/" />

  <dataset name="Test Single Dataset" ID="testDataset" serviceName="odap"
           urlPath="test/testData.nc" dataType="Grid"/>

  <dataset name="Test Single Dataset 2" ID="testDataset2" serviceName="odap"
urlPath="test/testData2.grib2" dataType="Grid"/>

  <datasetScan name="Test all files in a directory" ID="testDatasetScan"
path="testAll" location="content/testdata">

    <metadata inherited="true">
      <serviceName>all</serviceName>
      <dataType>Grid</dataType>
    </metadata>

    <filter>
      <include wildcard="*eta_211.nc"/>
    </filter>

  </datasetScan>

  <catalogRef xlink:title="Test Enhanced Catalog"
xlink:href="enhancedCatalog.xml" name=""/>

</catalog>

The second data catalog shipped by default is enhancedCatalog.xml and is both an example of how to nest catalogs, and of how to employ many of the more advanced configuration options available. It looks like:

<?xml version="1.0" encoding="UTF-8"?>
<catalog
xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
   name="Unidata THREDDS-IDD NetCDF-OpenDAP Server" version="1.0.1">

  <service name="latest" serviceType="Resolver" base="" />
  <service name="both" serviceType="Compound" base="">
    <service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/" />
    <service name="HTTPServer" serviceType="HTTPServer"
base="/thredds/fileServer/" />
  </service>

  <dataset name="NCEP Model Data">
    <metadata inherited="true">
      <serviceName>both</serviceName>
      <authority>edu.ucar.unidata</authority>
      <dataType>Grid</dataType>
      <dataFormat>NetCDF</dataFormat>
      <documentation type="rights">Freely available</documentation>
      <documentation
xlink:href="http://www.emc.ncep.noaa.gov/modelinfo/index.html"
xlink:title="NCEP Model documentation"></documentation>
      <creator>
        <name vocabulary="DIF">DOC/NOAA/NWS/NCEP</name>
        <contact url="http://www.ncep.noaa.gov/"
email="http://www.ncep.noaa.gov/mail_liaison.shtml" />
      </creator>
      <publisher>
        <name vocabulary="DIF">UCAR/UNIDATA</name>
        <contact url="http://www.unidata.ucar.edu/"
email="support@unidata.ucar.edu" />
      </publisher>
      <timeCoverage>
        <end>present</end>
        <duration>14 days</duration>
      </timeCoverage>
    </metadata>

    <datasetScan name="ETA Data" ID="testEnhanced"
                 path="testEnhanced" location="content/testdata/"
                 harvest="true">
      <metadata inherited="true">
        <documentation type="summary">NCEP North American Model : AWIPS 211
                    (Q) Regional - CONUS (Lambert Conformal). Model runs are
made at 12Z and 00Z,
                    with analysis and forecasts every 6 hours out to 60 hours.
Horizontal = 93 by
                    65 points, resolution 81.27 km, LambertConformal
projection. Vertical = 1000
                    to 100 hPa pressure levels.</documentation>
        <geospatialCoverage>
          <northsouth>
            <start>26.92475</start>
            <size>15.9778</size>
            <units>degrees_north</units>
          </northsouth>
          <eastwest>
            <start>-135.33123</start>
            <size>103.78772</size>
            <units>degrees_east</units>
          </eastwest>
          <updown>
            <start>0.0</start>
            <size>0.0</size>
            <units>km</units>
          </updown>
        </geospatialCoverage>
        <variables vocabulary="GRIB-1" />
        <variables vocabulary="">
          <variable name="Z_sfc" vocabulary_name="" units="gp m">Geopotential
height, gpm</variable>
        </variables>
      </metadata>

      <filter>
        <include wildcard="*eta_211.nc" />
      </filter>
      <addID/>
      <sort>
        <lexigraphicByName increasing="false"/>
      </sort>
      <addLatest/>
      <addDatasetSize/>
      <addTimeCoverage
datasetNameMatchPattern="([0-9]{4})([0-9]{2})([0-9]{2})([0-9]{2})_eta_211.nc$"
                       startTimeSubstitutionPattern="$1-$2-$3T$4:00:00"
                       duration="60 hours" />
    </datasetScan>
  </dataset>
</catalog>

The next several sections will be spent going over in complete and tedious detail how each and every part of these example catalogs works, so you needn’t be running away screaming just yet. We’ll start with very simple catalogs doing very simple things, and move on to more complex examples until even the enhancedCatalog.xml catalog looks like a familiar old friend. As each journey begins with but a single step, our first catalog example will begin with but a single dataset.

4.2.2. Services and Datasets

The simple and complete THREDDS catalog that follows defines a single service (OpenNDAP) that serves a single dataset (110312006.nc).

Example 1 - Basic THREDDS Catalog

(1)<?xml version="1.0" ?>
(2)<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0">
(3)  <service name="dodsServer" serviceType="OpenDAP" base="/thredds/dodsC/" />
(4)  <dataset name="SAGE III Ozone Loss for Oct 31 2006" serviceName="dodsServer" urlPath="sage/110312006.nc"/>
(5)</catalog>

where:

  • The first line is boilerplate indicating that this is indeed an XML document.

  • The second line is a catalog element that declares the THREDDS catalog namespace with the given xmlns attribute. This is the THREDDS namespace document we previously discussed in reference to XML namespaces.

  • The third line declares a service element with attributes name, serviceType and base with respective values dodsServer, OpenDAP and /thredds/dodsC.

  • The fourth line declares a dataset element with attributes name, serviceName and urlPath with the given attributes.

  • The fifth line closes the catalog element.

The indentation of the sub-elements service and dataset is not required, but can be very useful for avoiding confusion when many layers of nesting are needed as we will see in future examples.

Basically, lines 1, 2 and 5 are boilerplate that can be ignored beyond acknowledging their mandatory presence for a complete THREDDS catalog, while lines 3 and 4 perform the actual work of specifying a dataset and a method for serving it over the internet.

Example 2 - Nuking the Deprecated Elements

The use of serviceName as an attribute for dataset has been deprecated in favor of its use as an element.

(1)<?xml version="1.0" ?>
(2)<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0">
(3)  <service name="dodsServer" serviceType="OpenDAP" base="/thredds/dodsC/" />
(4)  <dataset name="SAGE III Ozone Loss for Oct 31 2006" urlPath="sage/110312006.nc"/>
(5)    <serviceName>dodsServer</serviceName>
(6)  </dataset>
(7)</catalog>

4.2.3. Base Catalog Elements

The base catalog elements from which configuration catalogs are created are:

4.3. The catalog Element

4.3.1. Schema

The schema for the catalog element is:

<xsd:element name="catalog">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element ref="service" minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element ref="property" minOccurs="0" maxOccurs="unbounded" />
      <xsd:element ref="dataset" minOccurs="1" maxOccurs="unbounded" />
    </xsd:sequence>

    <xsd:attribute name="name" type="xsd:string" />
    <xsd:attribute name="expires" type="dateType"/>
    <xsd:attribute name="version" type="xsd:token" default="1.0.1" />
  </xsd:complexType>
</xsd:element>

4.3.2. Attributes and Elements

The catalog element is the top-level element. It may contain:

  • zero or more service elements

  • zero or more property elements

  • one or more dataset elements (any element in the dataset substitution group: dataset or catalogRef)

  • an optional name attribute that is displayed to the user

  • an expires attribute that tells clients until when the catalog is accurate so they can catalog the information

  • an optional version attribute that indicates which version of the InvCatalog specification to which the catalog conforms

4.4. The Service Element

The service element represents a data access service and allows basic data accesss information to be factored out of dataset and access elements.

4.4.1. Schema

The schema of the service element is:

<xsd:element name="service">
 <xsd:complexType>
  <xsd:sequence>
    <xsd:element ref="property" minOccurs="0" maxOccurs="unbounded" />
    <xsd:element ref="service" minOccurs="0" maxOccurs="unbounded" />
    <xsd:element ref="datasetRoot" minOccurs="0" maxOccurs="unbounded"/> <!-- see Server-side
  InvCat doc -->
  </xsd:sequence>

  <xsd:attribute name="name" type="xsd:string" use="required" />
  <xsd:attribute name="base" type="xsd:string" use="required" />
  <xsd:attribute name="serviceType" type="serviceTypes" use="required" />
  <xsd:attribute name="desc" type="xsd:string"/>
  <xsd:attribute name="suffix" type="xsd:string" />
 </xsd:complexType>
</xsd:element>

4.4.2. The property Element

  • property - Zero or more of these elements can be included to allow for the encoding of additional information. One possible use is to encode additional information needed for clients to be able to access datasets through this service. While this is available, the serviceType (and possibly the dataFormat attribute) should suffice to allow clients to access datasets.

4.4.3. The service Element

The service element must have a serviceType attribute whose value is one of the service type values.

4.4.4. The datasetRoot Element

Nada.

4.4.5. The name Attribute

The name attribute is required and its value must be unique for all service elements within the catalog. These unique names are used in the definition of a dataset access method to refer to a specific service element.

4.4.6. The base Attribute

The mandatory base attribute and - if available - the optional suffix attribute are used in the construction of the dataset URL. The base may be an absolute URL or relative to the catalog’s base URL.

4.4.7. The serviceType Attribute

This attribute is mandatory and its value must be one of the available service type values.

4.4.8. The desc Attribute

The optional desc attribute allows the addition of a human-readable description of the service.

4.4.9. The suffix Attribute

An optional attribute used in the construction of the dataset URL.

4.4.10. How to Specify Services

Single Services

A canonical example of how to specify a service is:

<?xml version="1.0" ?>
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0">
  <service name="dodsServer" serviceType="OpenDAP" base="/thredds/dodsC/" />
  <dataset name="SAGE III Ozone Loss for Oct 31 2006" serviceName="dodsServer" urlPath="sage/110312006.nc"/>
</catalog>

wherein the service is defined by the service element, and then applied to a specific dataset via the serviceName attribute of the dataset element. This method has been deprecated in favor of using serviceName as an element rather than an attribute for dataset. This is implemented by replacing the dataset line in the previous example with:

<?xml version="1.0" ?>
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0">
  <service name="dodsServer" serviceType="OpenDAP" base="/thredds/dodsC/" />
  <dataset name="SAGE III Ozone Loss for Oct 31 2006" urlPath="sage/110312006.nc">
    <serviceName>OpenDAP</serviceName>
  </dataset>
</catalog>

This could alternatively be done by adding an access element to the dataset element, which additionally pulls the urlPath attribute out of dataset and puts it also into the access element.

<?xml version="1.0" ?>
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0">
  <service name="dodsServer" serviceType="OpenDAP" base="/thredds/dodsC/" />
  <dataset name="SAGE III Ozone Loss for Oct 31 2006">
    <access serviceName="OpenDAP" urlPath="sage/110312006.nc"/>
  </dataset>
</catalog>
Compound Services

If you want to offer the data with another service type besides OpenDAP, there is a method for specifying compound services. A service element with serviceTypes attribute value Compound is used to surround or nest the service types OpenDAP and WCS. The attribute name value all that names the compound service is placed within the serviceName element to indicate that all the services therein are to be used for the dataset.

<?xml version="1.0" ?>
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0">
  <service name="all" serviceType="Compound" base="" >
    <service name="dodsServer" serviceType="OpenDAP" base="/thredds/dodsC/" />
    <service name="wcs" serviceType="WCS" base="/thredds/wcs/" />
  </service>
  <dataset name="SAGE III Ozone Loss for Oct 31 2006">
    <serviceName>all</serviceName>
  </dataset>
</catalog>
Nested Datasets

The metadata element can be used to factor out one or more service types from a nested group of datasets as in the following example:

<dataset name="SAGE III Ozone Loss Experiment" >
  <metadata inherit="true">
    <serviceName>dodsServer</serviceName>
  </metadata>
  <dataset name="January Averages" urlPath="sage/avg/jan.nc" ID="sage-23487382"/>
  <dataset name="February Averages" urlPath="sage/avg/feb.nc" ID="sage-63656446"/>
  <dataset name="Global Averages" urlPath="sage/global.nc" ID="sage-7869700g" dataType="Grid"/>
</dataset>

This could be combined with a compound service to apply multiple services to multiple datasets, as in the following example:

  <service name="all" serviceType="Compound" base="" >
    <service name="dodsServer" serviceType="OpenDAP" base="/thredds/dodsC/" />
    <service name="wcs" serviceType="WCS" base="/thredds/wcs/" />
  </service>
  <dataset name="SAGE III Ozone Loss Experiment" >
    <metadata inherit="true">
        <serviceName>all</serviceName>
    </metadata>
    <dataset name="January Averages" urlPath="sage/avg/jan.nc" ID="sage-23487382"/>
    <dataset name="February Averages" urlPath="sage/avg/feb.nc" ID="sage-63656446"/>
    <dataset name="Global Averages" urlPath="sage/global.nc" ID="sage-7869700g"/>
  </dataset>

4.4.11. Service Types, or, the serviceTypes Element

A TDS service is a method for exploring and obtaining the data available on a TDS. See the services section for further details.

The schema for the serviceTypes element is:

<xsd:simpleType name="serviceTypes">
    <xsd:union memberTypes="xsd:token">
      <xsd:simpleType>
        <xsd:restriction base="xsd:token">
          <!-- client/server -->
          <xsd:enumeration value="ADDE"/>
          <xsd:enumeration value="DODS"/>  <!-- same as OpenDAP -->
          <xsd:enumeration value="OpenDAP"/>
          <xsd:enumeration value="OpenDAP-G"/>

          <!-- bulk transport -->
          <xsd:enumeration value="HTTPServer"/>
          <xsd:enumeration value="FTP"/>
          <xsd:enumeration value="GridFTP"/>
          <xsd:enumeration value="File"/>
          <xsd:enumeration value="NetcdfServer"/>

          <!-- web services -->
          <xsd:enumeration value="LAS"/>
          <xsd:enumeration value="WMS"/>
          <xsd:enumeration value="WFS"/>
          <xsd:enumeration value="WCS"/>
          <xsd:enumeration value="WSDL"/>

          <!--offline -->
          <xsd:enumeration value="WebForm"/>

          <!-- THREDDS -->
          <xsd:enumeration value="Catalog"/>
          <xsd:enumeration value="QueryCapability"/>
          <xsd:enumeration value="Resolver"/>
          <xsd:enumeration value="Compound"/>
        </xsd:restriction>
      </xsd:simpleType>
    </xsd:union>
  </xsd:simpleType>

The client/server service types defined within the schema listed below, with links to explanatory sections within the services section.

The bulk transport service types are:

The web services are:

The offline services are:

The THREDDS services are:

Table 1. Service Type Specifications
Service Mandatory Specification

OPeNDAP

<service name="odap" serviceType="OPeNDAP" base="/thredds/dodsC/" />

NetCDF Subset

<service name="ncss" serviceType="NetcdfSubset" base="/thredds/ncss/grid/" />

WCS

<service name="wcs" serviceType="WCS" base="/thredds/wcs/" />

WMS

<service name="wms" serviceType="WMS" base="/thredds/wms/" />

HTTP

<service name="fileServer" serviceType="HTTPServer" base="/thredds/fileServer/" />

Each of these services is described in greater detail in upcoming sections.

4.4.12. The Property Element

Element Schema

The schema for the property element is:

<xsd:element name="property">
  <xsd:complexType>
    <xsd:attribute name="name" type="xsd:string"/>
    <xsd:attribute name="value" type="xsd:string"/>
  </xsd:complexType>
</xsd:element>
Overview

The property element contains the attributes name and value to associate with a catalog, dataset or service element. Properties on datasets are added as global attributes to the THREDDS data model objects.

An example is:

<property name="Conventions" value="WRF" />

4.5. The dataset Element

4.5.1. Types of Datasets

A dataset element represents a named, logical set of data at a level of granularity appropriate for presentation to a user. There are two basic types of datasets:

  • Application datasets are datasets that an application might want to act on, e.g. visualize.

  • Dynamic datasets are typically generated dynamically by making a call to a server. They change constantly and are too large to completely list.

Application datasets are further divided into two categories:

  • direct, if it contains at least one dataset access method

  • collection, if it is a container for nested datasets

A direct dataset has an access URL and a service type that allows a THREDDS-enabled application to directly access its data using the protocol of the specified service. It is represented by a dataset element.

A collection dataset is represented by a dataset element with further nested dataset elements. There are two types:

  • A heterogeneous collection dataset that may have arbitrarily deep nested datasets with no constraints on how the datasets are related.

  • A coherent collection dataset that contains nested datasets that are directly and coherently related. It should have a collectionType attribute that describes the relationship of its nested datasets, e.g. TimeSeries, Stations, etc.

Dynamic datasets are divided into three categories:

  • A query dataset is a dynamic dataset with service type Catalog. Dereferencing the URL returns another catalog, whose contents are both the contents of the query dataset and the result of the query.

  • A resolver dataset is a type of query dataset with service type Resolver. It returns a catalog which must contain either a direct dataset or a coherent collection dataset. It is typically used to implement a virtual dataset such as a latest model run or latest measurement dataset on a real-time dataset where the actual URL must be generated when a request is made.

  • A DQC dataset is a collection of query datasets. It has service type QueryCapability, and its URL points to an XML document called a Dataset Query Capability (DQC) document. That document compactly describes the set of possible queries to a server or, equivalently, the set of query datasets contained in the DQC dataset.

4.5.2. Schema

The schema for the dataset element is:

<xsd:element name="dataset" type="DatasetType" />
<xsd:complexType name="DatasetType">
  <xsd:sequence>
    <xsd:group ref="threddsMetadataGroup" minOccurs="0" maxOccurs="unbounded"/>

    <xsd:element ref="access" minOccurs="0" maxOccurs="unbounded"/>
    <xsd:element ref="ncml:netcdf" minOccurs="0"/>
    <xsd:element ref="dataset" minOccurs="0" maxOccurs="unbounded"/>
  </xsd:sequence>

  <xsd:attribute name="name" type="xsd:string" use="required"/>
  <xsd:attribute name="alias" type="xsd:token"/>
  <xsd:attribute name="authority" type="xsd:string"/> <!-- deprecated : use element -->
  <xsd:attribute name="collectionType" type="collectionTypes"/>
  <xsd:attribute name="dataType" type="dataTypes"/> <!-- deprecated : use element -->
  <xsd:attribute name="harvest" type="xsd:boolean"/>
  <xsd:attribute name="ID" type="xsd:token"/>
  <xsd:attribute name="resourceControl" type="xsd:string"/>

  <xsd:attribute name="serviceName" type="xsd:string" /> <!-- deprecated : use element -->
  <xsd:attribute name="urlPath" type="xsd:token" />
</xsd:complexType>

4.5.3. Elements and Attributes

The elements and attributes are divided into two groups:

  • a group that encodes functional information about the dataset(s); and

  • a group that encodes metadata information about the dataset(s).

Functional Elements

The sub-elements of the dataset element are:

The non-deprecated attributes of the dataset element are:

  • name - It is important to make the dataset name both descriptive and succinct. This required attribute is what the user will see and use to make choices from the web page that displays the THREDDS catalog.

  • alias - Used to enable a dataset to appear in multiple places in the same catalog. It is defined in one place - with all appropriate metadata - and placed elsewhere by creating a dataset with an alias to the original, whose value is the ID of the original defined dataset.

  • collectionType - Used to indicate that the dataset is a coherent collection and the type of coherence. The available values are:

    • TimeSeries

    • Stations

    • ForecastModelRuns

  • harvest - An attribute that makes the dataset available to be placed in digital libraries or other discovery services if it has the value true. It is typicall placed on collection datasets.

  • ID - An optional but highly recommended attribute whose value must be unique within the catalog.

  • resourceControl

  • serviceName - A deprecated attribute that has been superseded in favor of the serviceName element that can be contained in a dataset or metadata element.

  • urlPath - This is used in combination with the applicable serviceName to specify data access methods.

Metadata Elements

Further groups of elements subsumed under the dataset element - and which are used for digital libraries, discovery centers, and for annotation and documentation of datasets - are described in the following section:

4.5.4. Examples

An example of a dataset element is:

<dataset name="DC8 flight 1999-11-19" urlPath="SOLVE_DC8_19991119.nc">
  <serviceName>agg</serviceName>
</dataset>

4.6. The access Element

An access element defines how a dataset can be accessed through a data service.

4.6.1. Schema

The schema for the +access element is:

<xsd:element name="access">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element ref="dataSize" minOccurs="0"/>
    </xsd:sequence>
    <xsd:attribute name="urlPath" type="xsd:token" use="required"/>
    <xsd:attribute name="serviceName" type="xsd:string"/>
    <xsd:attribute name="dataFormat" type="dataFormatTypes"/>
  </xsd:complexType>
</xsd:element >

4.6.2. Elements and Attributes

  • dataSize - An optional element used to specify how large a dataset would be if it were copied to the client.

  • urlPath - This is appended to the service’s base element to create the dataset URL.

  • serviceName - The refers to the unique name of a service element.

  • dataFormat - This specifies the format of the transferred file, and is used mainly for when the serviceType is a bulk transport type like FTP or HTTP. It is obtained from a list of data format types.

4.7. The catalogRef Element

A catalogRef element refers to another THREDDS catalog that logically is a nested dataset inside the parent catalog. This is used to separately maintain catalogs as well as to break up large catalogs. THREDDS clients should no read referenced catalogs until the user explicitly requests them, so that very large dataset collections can be represented with catalogRef elements without large delays in presenting them to the user. The referenced catalog is not textually substituted into the containing catalog, but remains a self-contained object. The referenced catalog must be a valid THREDDS catalog, but it does not have to match versions with the containing catalog.

4.7.1. Schema

The schema for the catalogRef element is:

<xsd:element name="catalogRef" substitutionGroup="dataset">
  <xsd:complexType>
    <xsd:complexContent>
      <xsd:extension base="DatasetType">
        <xsd:attributeGroup ref="XLink"/>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>
</xsd:element>

4.7.2. Examples

It is useful to break up large catalogs into pieces and separately maintain each piece. An example of this is:

  <?xml version="1.0" encoding="UTF-8"?>
  <catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" name="Top Catalog"
(1)         xmlns:xlink="http://www.w3.org/1999/xlink">

(2) <dataset name="Realtime data from IDD">
(3)   <catalogRef xlink:href="idd/models.xml" xlink:title="NCEP Model Data" name="" />
      <catalogRef xlink:href="idd/radars.xml" xlink:title="NEXRAD Radar" name="" />
      <catalogRef xlink:href="idd/obsData.xml" xlink:title="Station Data" name="" />
      <catalogRef xlink:href="idd/satellite.xml" xlink:title="Satellite Data" name="" />
    </dataset>
  </catalog>

where:

(1) An xlink namespace is declared in the catalog element.

(2) The container dataset logically contains the catalogRef elements, which are nested datasets whose contents are the contents of the external catalog.

(3) The catalogRef elements each contain a link to an external catalog via the xlink:href attributes, which are relative URLs resolved against the catalog URL. The xlink:title attribute is used to name the datasets. The name attribute is required for validation reasons, but can be empty. If the catalog URL is:

 http://thredds.ucar.edu/thredds/data/catalog.xml

then the resolved URL of the first catalogRef in this example will be:

http://thredds.ucar.edu/thredds/data/idd/models.xml

This allows the addition of XLink attributes, which are a generalization of HTTP hrefs. The value of xlink:href is the URL of the referenced catalog, and may be absolute or relative to the parent catalog URL. The value of xlink:title is displayed as the name of the dataset that the user can click on to follow the XLink.

The XLink attribute schema is:

<xsd:attributeGroup name="XLink">
    <xsd:attribute ref="xlink:href" />
    <xsd:attribute ref="xlink:title" />
    <xsd:attribute ref="xlink:show"/>
    <xsd:attribute ref="xlink:type" />
  </xsd:attributeGroup>

The attributes are:

  • xlink:href - Used for the URL of the resource itself.

  • xlink:title - A human-readable description of the linked resource.

  • xlink:show - Not currently used in the THREDDS software.

  • xlink:type - Not currently used in the THREDDS software.

An example is:

<documentation xlink:href="http://cloud1.arc.nasa.gov/solve/" xlink:title="SOLVE home page"/>

4.8.1. Constructing the Access URL

The information found in the dataset and service elements is combined with the address of your THREDDS service to construct an access URL. In this case, it is constructed from the server base URL http://motherlode.ucar.edu:8080, the base attribute value /thredds/dodsC/, and the dataset attribute urlPath value sage/110312006.nc. The absolute URL value is:

http://motherlode.ucar.edu:8080 /thredds/dodsC/ sage/110312006.nc

4.8.2. Nesting Datasets

In nearly all cases where we want to serve datasets via THREDDS, we’ll have more than one and more often than not many more than one. The simplest way to handle this is to declare a collection dataset with the dataset element that will be used to nest a collection of direct datasets - with the dataset element in our first catalog example being an example of this - that point directly to data. The following example illustrates a fairly common situation with geoscience data wherein we have a series of monthly average files we wish to make available.

Example 2 - Basic Catalog With Nesting

<?xml version="1.0" ?>
 <catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" >
   <service name="dodsServer" serviceType="OpenDAP"  base="/thredds/dodsC/" />

   <dataset name="SAGE III Ozone Loss Experiment" &gt;
     <dataset name="January Averages" serviceName="dodsServer" urlPath="sage/avg/jan.nc"/>
     <dataset name="February Averages" serviceName="dodsServer" urlPath="sage/avg/feb.nc"/>
     <dataset name="March Averages" serviceName="dodsServer" urlPath="sage/avg/mar.nc"/>
   </dataset>

 </catalog>

This is identical to our first example except for the use of both collection and direct datasets. The collection dataset - which is simply a container for the direct datasets - has a short name attribute that’s descriptive of and applicable to all the direct datasets within. The direct datasets each also have name attributes that serve the purpose of distinguishing them from each other.

The serviceName attribute for each of the direct datasets is identical to that in the previous example, but the urlPath differs for each one since it specifies the final part of the URL that will be used to access each of the different files. This particular catalog will result in a web page menu for the user that looks something like this:

SAGE II Ozone Loss Experiment
      January Averages
      February Averages
      March Averages

Note that the collection dataset container must be closed with another dataset element, and that while the collection dataset ends with a > instead of the /> used for the direct dataset.

4.8.3. Multiple Nesting Levels

We are not restricted to a single nested dataset nor to only two nesting levels. We can specify as many as we need to appropriately describe our datasets as seen in the following example wherein both monthly and daily average files from the same experiment are made available within the same collection dataset. We also require three nesting levels for the daily averages rather than the two we used for the monthly averages.

Example 3 - Basic Catalog With Multiple Nesting Levels

<?xml version="1.0" ?>
 <catalog
xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" >
   <service name="dodsServer" serviceType="OpenDAP"  base="/thredds/dodsC/" />

   <dataset name="SAGE III Ozone Loss Experiment" >

     <dataset name="Monthly Averages" >
       <dataset name="January Averages" serviceName="dodsServer" urlPath="sage/avg/jan.nc"/>
       <dataset name="February Averages" serviceName="dodsServer" urlPath="sage/avg/feb.nc"/>
       <dataset name="March Averages" serviceName="dodsServer" urlPath="sage/avg/mar.nc"/>
     </dataset>

     <dataset name="Daily Averages" >
       <dataset name="January" >
         <dataset name="Jan. 1" serviceName="dodsServer" urlPath="sage/daily/jan/20010101.nc"/>
         <dataset name="Jan. 2" serviceName="dodsServer" urlPath="sage/daily/jan/20010102.nc"/>
         <dataset name="Jan. 3" serviceName="dodsServer" urlPath="sage/daily/jan/20010103.nc"/>
       </dataset>
     </dataset>

   </dataset>

 </catalog>

Note the usefulness of indenting the various levels when it comes to keeping track what’s going on in the example catalog. The listing for this catalog will look something like this on our THREDDS page:

SAGE II Ozone Loss Experiment
    Monthly Averages
        January Averages
        February Averages
        March Averages
    Daily Averages
        January
            Jan. 1
            Jan. 2
            Jan. 3

4.8.4. Additional Dataset Attributes

The dataset attributes we’ve seen thus far suffice to uniquely identify and locate each dataset. However, much more information can be added to assist human and machine searchers identify a dataset in contexts larger than the dataset itself. The dataset element documentation at:

shows several more attributes we can employ to make our datasets easier to find and understand. The following example shows the use of the collectionType, authority, ID and dataType attributes, with a brief explanation of each following the example. Even more additional attributes are available for making THREDDS datasets more easily discoverable by digital libraries, and will be discussed in later sections.

Example 4 - Catalog With Additional Dataset Attributes

<dataset name="SAGE III Ozone Loss Experiment" collectionType="TimeSeries">
  <dataset name="January Averages" serviceName="aggServer" urlPath="sage/avg/jan.nc"
                authority="unidata.ucar.edu" ID="sage-20938483">
         <dataType>Trajectory</dataType>
  </dataset>
</dataset>

The collectionType attribute is used to indicate a coherent collection dataset which has only one level of nested datasets. At this time, the only coherent collection values available for this attribute are TimeSeries and Stations, although more are promised for future versions.

The authority and ID attributes are used in combination to create a globally unique identifier for the dataset. In the example, we see that Unidata is the authority and their identification number for this specific dataset is sage-20938483. The broader context is that Unidata has itself thousands of different datasets, and that this particular one is one of many from the SAGE experiment - itself one of many experiments whose datasets they store - with the given ID number. There are many other geoscience institutes that also have their own collection of datasets from various experiments and investigations. The use of both a location and number to identify a dataset better ensures that the identification will be unique. For example, while it’s possible that Unidata and Scripps could each give one of their datasets the same ID number, the additional use of the authority attribute will guarantee that a dataset with the ID number 42 will be globally differentiated by the fact that one has the authority unidata.ucar.edu and the other scripps.ucsd.edu.

The dataType attribute is useful for helping the user decide how to present the dataset being obtained. This is best and most simply explained via a list of the available values: Grid, Image, Station, Swath and Trajectory. For processing or presentation purposes, a grid of discrete and spatially separated values is handled much differently than an image or a single trajectory.

4.8.5. The Metadata Element

The catalog shown in Example 3 repeats an identical <TT>serviceName</TT> value many times. A method for avoiding this sort of repetition is available via the <TT>metadata</TT> element. Here is an example of its use:

Example 5 - Catalog Fragment Employing Metadata Element

<dataset name="SAGE III Ozone Loss Experiment" >

   <metadata inherit="true">
     <serviceName>dodsServer</serviceName>
     <dataType>Trajectory</dataType>
     <dataFormatType>NetCDF</dataFormatType>
     <authority>unidata.ucar.edu</authority>
   </metadata>

   <dataset name="January Averages" urlPath="sage/avg/jan.nc" ID="sage-23487382"/>
   <dataset name="February Averages" urlPath="sage/avg/feb.nc" ID="sage-63656446"/>
   <dataset name="Global Averages" urlPath="sage/global.nc" ID="sage-7869700g" dataType="Grid"/>

</dataset>

The metadata element here is used within the collection dataset container to apply various attributes to the direct datasets therein. The inherit attribute value true indicates that all the attribute information inside the metadata element applies to the current dataset and all those nested within it. This inheritance can be overridden, though, by simply specifying a different attribute value within an individual direct dataset. The example shows this in the dataset element with the name Global Averages, wherein the metadata-specified dataType attribute name Trajectory is overridden with the name Grid.

4.8.6. Compound Service Elements

In all of the examples thus far, the datasets have been made available via a single access method, while there are five available methods. Datasets can be made available via more than one access method by defining and referencing a compound service element, and example of which follows.

Example 6 - Compound Service Elements

<?xml version="1.0" ?>
<catalog
xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" >
   <service name="all" serviceType="Compound" base="" >
      <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
      <service name="wcs" serviceType="WCS" base="/thredds/wcs/" />
   </service>
   <dataset name="SAGE III Ozone Loss for Oct 31 2006"
                      serviceName="dodsServer" urlPath="sage/110312006.nc"/>
      <serviceName>all</serviceName>
   </dataset>
</catalog>

4.9. Digital Library Metadata Elements, or, the threddsMetadataGroup Model Group

These are catalog elements that are used in Digital Libraries entries, discovery centers, and for annotation and documentation of datasets.

4.9.1. Schema

The schema for the threddsMetadataGroup is:

<xsd:group name="threddsMetadataGroup">
  <xsd:choice minOccurs="0" maxOccurs="unbounded">
    <xsd:element name="documentation" type="documentationType"/>
    <xsd:element ref="metadata"  />
    <xsd:element ref="property"  />

    <xsd:element ref="contributor"/>
    <xsd:element name="creator" type="sourceType"/>
    <xsd:element name="date" type="dateTypeFormatted"/>
    <xsd:element name="keyword" type="controlledVocabulary" />
    <xsd:element name="project" type="controlledVocabulary" />
    <xsd:element name="publisher" type="sourceType"/>

    <xsd:element ref="geospatialCoverage"/>
    <xsd:element name="timeCoverage" type="timeCoverageType"/>
    <xsd:element ref="variables"/>

    <xsd:element name="dataType" type="dataTypes"/>
    <xsd:element name="dataFormat" type="dataFormatTypes"/>
    <xsd:element name="serviceName" type="xsd:string" />
    <xsd:element name="authority" type="xsd:string" />
    <xsd:element ref="dataSize"/>
  </xsd:choice>
</xsd:group>

4.9.2. Overview

The elements in this threddsMetadataGroup may be used as nested elements of both dataset and metadata elements. There may be any number in any order, although more than one geospatialCoverage, timeCoverage, dataType, dataFormat, serviceName or authority element will be ignored. The available elements are:

  • documentation - Contains or points to human-readable content, which may be displayed to users by THREDDS clients as appropriate for a given situation.

  • metadata - A container for machine-readable information structured in XML.

  • property - An arbitrary name/value pair.

  • contributor - An element that typically contains a person’s name - with an optional role attribute - that documents some person’s contribution to the dataset. This uses the +sourceType definition.

  • creator - An element indicating who created the dataset.

  • date - An element used to document dates associated with the dataset using one of the dateEnumTypes, or, date enumeration types, which are:

    • create

    • modified

    • valid

    • issued

    • available

    • metadataCreated

  • keyword - An element used to provide keywords for library searches, which has type controlledVocabulary.

  • project - An element specifying to which scientific project the dataset belongs. This has type controlledVocabulary.

  • publisher - An element indicating who is responsible for serving the dataset. This uses the sourceType definition.

  • geospatialCoverage - Specifies a lat/lon bounding box for the data.

  • timeCoverage - Specifies the range of dates covered by the dataset.

  • variables - Specifies the names of variables contained in the datasets, and ways to map the names to standard vocabularies.

  • dataType - Indicates the high-level semantic type of the dataset. It can be one of the following:

    • Grid

    • Image

    • Point

    • Radial

    • Station

    • Swath

    • Trajectory

  • dataFormat - Indicates the format of the data and is mainly used so clients can determine how to read data accessed using a bulk access method. The available dataFormat types are:

    • BUFR

    • ESML

    • GEMPAK

    • GINI

    • GRIB-1

    • GRIB-2

    • HDF4

    • HDF5

    • NcML

    • NetCDF

    • NEXRAD2

    • NIDS

    • image/gif

    • image/jpeg

    • image/tiff

    • text/plain

    • text/tab-separated-values

    • text/xml

    • video/mpeg

    • video/quicktime

    • video/realtime

  • serviceName - A reference to a service element, the content of which must match the name of a service element in the catalog.

  • authority - Used to further refine dataset IDs with the goal of producing globally unique IDs.

  • dataSize - Can be used to specify how large the dataset would be if copied to a client.

4.9.3. The documentation Element

The documentation element may contain arbitrary plain text content, or XHTML. This is called human-readable information

Schema

The schema is:

<xsd:complexType name="documentationType" mixed="true">
  <xsd:sequence>
    <xsd:any namespace="http://www.w3.org/1999/xhtml" minOccurs="0" maxOccurs="unbounded"
                  processContents="strict"/>
  </xsd:sequence>
  <xsd:attribute name="type" type="documentationEnumTypes"/>
  <xsd:attributeGroup ref="XLink" />
</xsd:complexType>
Examples

Examples are:

<documentation xlink:href="http://espoarchive.nasa.gov/archive/index.html"
    xlink:title="Earth Science Project Office Archives"/>

<documentation>Used in doubled CO2 scenario</documentation>
The type Attribute

The documentation element type attribute defines a basic set of types. The set of defined values is not exclusive, so other values are allowed. Alternate values must be strings that do no contain end-of-line characters or tabs, i.e. of xsd:token data type.

The schema for the type attribute for the documentation element is:

<xsd:simpleType name="documentationEnumTypes">
 <xsd:union memberTypes="xsd:token">
  <xsd:simpleType>
   <xsd:restriction base="xsd:token">
     <xsd:enumeration value="funding"/>
     <xsd:enumeration value="history"/>
     <xsd:enumeration value="processing_level"/>
     <xsd:enumeration value="rights"/>
     <xsd:enumeration value="summary"/>
   </xsd:restriction>
  </xsd:simpleType>
 </xsd:union>
</xsd:simpleType>

4.9.4. The metadata Element

A metadata element contains or refers to structured information - that is, information in XML format - about datasets. This is used by client programs to display, describe, or search for the dataset. This is called machine-readable information.

Schema

The schema for the metadata element is:

<xsd:element name="metadata">
  <xsd:complexType>
    <xsd:choice>
      <xsd:group ref="threddsMetadataGroup" minOccurs="0" maxOccurs="unbounded" />
      <xsd:any namespace="##other" minOccurs="0" maxOccurs="unbounded" processContents="strict"/>
    </xsd:choice>

    <xsd:attribute name="inherited" type="xsd:boolean" default="false" />
    <xsd:attribute name="metadataType" type="metadataTypeEnum"  />
    <xsd:attributeGroup ref="XLink" />
  </xsd:complexType>
</xsd:element>

A metadata element can contain any number of digital library metadata elements - i.e. elements in the threddsMetadataGroup model group - in any order Or it may contain any other well-formed XML elements, as long as they are in a namespace other than the THREDDS namespace. It may also contain an XLink to another XML document, whose top-level element should be a valid metadata element. Linking to an HTML page should be done with the documentation element instead of this.

Attributes
  • inherited - This indicates whether the metadata is inherited by nested datasets. If it is true, the metadata element becomes logically part of each nested dataset. The metadata always applies to the containing dataset whether inherited is true or not.

  • metadataType - This attribute may have any value, although the commonly used values are listed in the metadataType enumeration attribute. To use metadata elements from the threddsMetadataGroup, do not include the metadatatype attribute (or set it to "THREDDS"). To use your own elements, give it a metadatatype, and add a namespace declaration.

The metadataType Enumeration Attribute, or, metadataTypeEnum

The schema for this, which lists the commonly used metadata types, is:

  <xsd:simpleType name="metadataTypeEnum">
    <xsd:union memberTypes="xsd:token">
      <xsd:simpleType>
        <xsd:restriction base="xsd:token">
          <xsd:enumeration value="THREDDS"/>
          <xsd:enumeration value="ADN"/>
          <xsd:enumeration value="Aggregation"/>
          <xsd:enumeration value="CatalogGenConfig"/>
          <xsd:enumeration value="DublinCore"/>
          <xsd:enumeration value="DIF"/>
          <xsd:enumeration value="FGDC"/>
          <xsd:enumeration value="LAS"/>
          <xsd:enumeration value="ESG"/>
        <xsd:enumeration value="Other"/>
      </xsd:restriction>
     </xsd:simpleType>
   </xsd:union>
  </xsd:simpleType>
Examples

An example containing THREDDS metadata is:

<metadata inherited="true">
  <contributor role="data manager">John Smith</contributor>
  <keyword>Atmospheric Science</keyword>
  <keyword>Aircraft Measurements</keyword>
  <keyword>Upper Tropospheric Chemistry</keyword>
</metadata>

An example containing a link to an external file containing THREDDS metadata is:

<metadata xlink:href="http://dataportal.ucar.edu/metadata/solveMetadata.xml"
   xlink:title="Solve metadata" />

If an XLink is used, it should point to a document whose top element is a metadata element that declares the THREDDS namespace. An example of this is:

<?xml version="1.0" encoding="UTF-8"?>
<metadata  xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0">
  <contributor role="Investigator">Mashor Mashnor</contributor>

  <abstract>
   This project aims to determine the physiological adaptations of algae to the
   extreme conditions of Antarctica.
  </abstract>

  <publisher>
     <name vocabulary="DIF">AU/AADC</name>
     <long_name vocabulary="DIF">Australian Antarctic Data Centre, Australia</long_name>
     <contact url="http://www.aad.gov.au/default.asp?casid=3786" email="metadata@aad.gov.au"/>
  </publisher>

</metadata>

where the top element declaring the namespace is:

<metadata  xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0">

If using elements from another namespace, all the sub-elements should be in the same namespace, which should be declared in the metadata element. An example is:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
  <dc:title>Goto considered harmful</dc:title >
  <dc:description>The unbridled use of the go to statement has an immediate consequence
      that it becomes terribly
            hard to find a meaningful set of coordinates in which to describe the process progress.
  </dc:description>
  <dc:author>Edsger W. Dijkstra</dc:author>
</metadata>

If you use an XLink to point to elements from another namespace, a metadataType attribute should be added. An example is:

<metadata xlink:href="http://www.unidata.ucar.edu/metadata/ncep/dif.xml"
  xlink:title="NCEP DIF metadata"
        metadataType="DublinCore"/>

This should point to a document whose top element is a metadata element, which declares a different namespace (note you also still need to declare the THREDDS namespace), i.e.

<?xml version="1.0" encoding="UTF-8"?>
<metadata  xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
           xmlns:dc="http://purl.org/dc/elements/1.1/">
  <dc:title>Goto considered harmful</dc:title >
  <dc:description>The unbridled use of the go to statement has an immediate consequence
      that it becomes terribly
            hard to find a meaningful set of coordinates in which to describe the process progress.
  </dc:description>
  <dc:author>Edsger W. Dijkstra</dc:author>
</metadata>

The following equivalent declaration makes the other namespace the default:

<?xml version="1.0" encoding="UTF-8"?>
<cat:metadata  xmlns:cat="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
               xmlns="http://purl.org/dc/elements/1.1/">
  <title>Goto considered harmful</title >
  <description>The unbridled use of the go to statement has an immediate consequence
      that it becomes terribly
            hard to find a meaningful set of coordinates in which to describe the process progress.
  </description>
  <author>Edsger W. Dijkstra</author>
</cat:metadata>

4.9.5. The property Element

The property elements are arbitrary name/value pairs to associate with a catalog, dataset or service element. Properties on datasets are added as global attributes to the THREDDS data model objects, so they will appear as such in, for example, NetCDF files created from those objects.

Schema
<xsd:element name="property">
  <xsd:complexType>
    <xsd:attribute name="name" type="xsd:string"/>
    <xsd:attribute name="value" type="xsd:string"/>
  </xsd:complexType>
</xsd:element>
Example
<property name="Conventions" value="WRF" />

4.9.6. The source Type

This is used by the creator and publisher elements to specify who is responsible for the dataset. The name and contact elements are mandatory.

Schema
<xsd:complexType name="sourceType">
  <xsd:sequence>
    <xsd:element name="name" type="controlledVocabulary"/>
    <xsd:element name="contact">
      <xsd:complexType>
        <xsd:attribute name="email" type="xsd:string" use="required"/>
        <xsd:attribute name="url" type="xsd:anyURI"/>
      </xsd:complexType>
    </xsd:element>
  </xsd:sequence>
</xsd:complexType>
Elements
  • name - This mandatory element has an optional controlledVocabulary attribute.

  • contact - This mandatory element has attributes to specify a web url and/or an email address.

Example

An example is:

<publisher>
  <name vocabulary="DIF">UCAR/NCAR/CDP > Community Data Portal, National Center for Atmospheric
           Research, University Corporation for Atmospheric Research</name>
  <contact url="http://dataportal.ucar.edu" email="cdp@ucar.edu"/>
</publisher>
Controlled Vocabulary Type, or, controlledVocabulary

This adds an optional vocabulary attribute to a string-valued element, indicating that the value comes from a restricted list. The schema is:

<xsd:complexType name="controlledVocabulary">
 <xsd:simpleContent>
  <xsd:extension base="xsd:string">
   <xsd:attribute name="vocabulary" type="xsd:string" />
  </xsd:extension>
 </xsd:simpleContent>
</xsd:complexType>

and an example is:

 <name vocabulary="DIF">UCAR/NCAR/CDP</name>

4.9.7. The contributor Element

A contributor element is a person’s name with an optional role attribute that specifies the role that person plays in regard to the dataset.

Schema
<xsd:element name="contributor">
  <xsd:complexType>
    <xsd:simpleContent>
      <xsd:extension base="xsd:string">
        <xsd:attribute name="role" type="xsd:string" use="required"/>
      </xsd:extension>
    </xsd:simpleContent>
  </xsd:complexType>
</xsd:element>
Examples

Examples without and with the role attribute are:

<contributor>Jane Doe</contributor>

<contributor role="PI">Jane Doe</contributor>

4.9.8. The geospatialCoverage Element

This element specifies a lat/lon bounding box and the altitude range covered by the data.

Schema
<xsd:element name="geospatialCoverage">
   <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="northsouth" type="spatialRange" minOccurs="0" />
      <xsd:element name="eastwest" type="spatialRange" minOccurs="0" />
      <xsd:element name="updown" type="spatialRange" minOccurs="0" />
      <xsd:element name="name" type="controlledVocabulary" minOccurs="0" maxOccurs="unbounded"/>
    </xsd:sequence>

    <xsd:attribute name="zpositive" type="upOrDown" default="up"/>
   </xsd:complexType>
  </xsd:element>

  <xsd:complexType name="spatialRange">
   <xsd:sequence>
     <xsd:element name="start" type="xsd:double"  />
     <xsd:element name="size" type="xsd:double" />
     <xsd:element name="resolution" type="xsd:double" minOccurs="0" />
     <xsd:element name="units" type="xsd:string" minOccurs="0" />
   </xsd:sequence>
  </xsd:complexType>

  <xsd:simpleType name="upOrDown">
   <xsd:restriction base="xsd:token">
     <xsd:enumeration value="up"/>
     <xsd:enumeration value="down"/>
   </xsd:restriction>
  </xsd:simpleType>
Elements and Attributes
  • northsouth, eastwest - These elements are set to specify a lat/lon bounding box, with default units degrees_north and degrees_east, respectively.

  • updown - Specifies the altitude range with default units of meters.

  • name - Used to optionally add any number of names to describe the covered region.

  • zpositive - The value up means that z increases upwards, e.g. units of height, while down means that z increases downward, e.g. units of pressure or depth.

  • start, size - Supplies the range from start to start + size.

  • resolution - Used to supply the data resolution.

Example
  <geospatialCoverage zpositive="down">
   <northsouth>
     <start>10</start>
     <size>80</size>
     <resolution>2</resolution>
     <units>degrees_north</units>
   </northsouth>
   <eastwest>
     <start>-130</start>
     <size>260</size>
     <resolution>2</resolution>
     <units>degrees_east</units>
   </eastwest>
   <updown>
     <start>0</start>
     <size>22</size>
     <resolution>0.5</resolution>
     <units>km</units>
   </updown>
  </geospatialCoverage>

  <geospatialCoverage>
    <name vocabulary="Thredds">global</name>
  </geospatialCoverage>

4.9.9. The timeCoverage Type

This is used to specify a date range.

Schema
<xsd:complexType name="timeCoverageType">
  <xsd:sequence>
    <xsd:choice minOccurs="2" maxOccurs="3" >
      <xsd:element name="start" type="dateTypeFormatted"/>
      <xsd:element name="end" type="dateTypeFormatted"/>
      <xsd:element name="duration" type="duration"/>
    </xsd:choice>
    <xsd:element name="resolution" type="duration" minOccurs="0"/>
  </xsd:sequence>
</xsd:complexType>

This specifies a date range, which can be specifies by:

  • giving both a start and an end date type element;

  • specifying a start and a duration element; or

  • specifying an end and a duration element.

The optional resolution element is used to indicate the data resolution for time series data.

Examples
<timeCoverage>
  <start>1999-11-16T12:00:00</start>
  <end>present</end>
</timeCoverage>

<timeCoverage>
  <start>1999-11-16T12:00:00</start>
  <duration>P3M</duration>  // 3 months
</timeCoverage>

<timeCoverage>   // 10 days before the present up to the present
  <end>present</end>
  <duration>10 days</duration>
  <resolution>15 minutes</resolution>
</timeCoverage>

4.9.10. The date Element

The date element is used to specify the date and time.

Schema
<xsd:simpleType name="dateType">
  <xsd:union memberTypes="xsd:date xsd:dateTime udunitDate">
    <xsd:simpleType>
      <xsd:restriction base="xsd:token">
        <xsd:enumeration value="present"/>
      </xsd:restriction>
    </xsd:simpleType>
  </xsd:union>
</xsd:simpleType>

<xsd:simpleType name="udunitDate">
  <xsd:restriction base="xsd:string">
    <xsd:annotation>
      <xsd:documentation>Must conform to complete udunits date string, eg
                        "20 days since 1991-01-01"</xsd:documentation>
    </xsd:annotation>
  </xsd:restriction>
</xsd:simpleType>

The dateType follows the W3C profile of ISO 8601 for date/time formats. It is a simple type that can be used as the type of an attribute. It can be:

  • an xsd:date, with form CCYY-MM-DD;

  • an xsd:datetime, with forms CCYY-MM-DDThh:mm:ss, CCYY-MM-DDThh:mm:ssZ or CCYY-MM-DDThh:mm:ss-hh:ss;

  • a valid udunits date string; or

  • the string present.

Examples
<start>1999-11-16</start>
<start>1999-11-16T12:00:00</start> // implied UTC
<start>1999-11-16T12:00:00Z</start> // explicit UTC
<start>1999-11-16T12:00:00-05:00</start> // EST time zone specified
<start>20 days since 1991-01-01</start>
<start>present</start>
The dateTypeFormatted Type

This extends dateType by allowing an optional, user-defined format attribure with an optional type attribute. The schema is:

<xsd:complexType name="dateTypeFormatted">
  <xsd:simpleContent>
    <xsd:extension base="dateType">
      <xsd:attribute name="format" type="xsd:string" /> // from java.text.SimpleDateFormat
      <xsd:attribute name="type" type="dateEnumTypes" />
    </xsd:extension>
  </xsd:simpleContent>
</xsd:complexType>

An example is:

<start format="yyyy DDD" type="created">1999 189</start> <!-- year, day of year -->

_Example_Format_String___________Example_Text___________________
"yyyy.MM.dd G 'at' HH:mm:ss z"  2001.07.04 AD at 12:08:56 PDT
"EEE, MMM d, ''yy"               Wed, Jul 4, '01
"K:mm a, z"                     0:08 PM, PDT
"yyyyy.MMMMM.dd GGG hh:mm aaa"  02001.July.04 AD 12:08 PM
"EEE, d MMM yyyy HH:mm:ss Z"    Wed, 4 Jul 2001 12:08:56 -0700
"yyMMddHHmmssZ"                 010704120856-0700
The duration Type

The schema is:

<xsd:simpleType name="duration">
  <xsd:union memberTypes="xsd:duration udunitDuration" />
</xsd:simpleType>

<xsd:simpleType name="udunitDuration">
  <xsd:restriction base="xsd:string">
    <xsd:annotation>
      <xsd:documentation>Must conform to udunits time duration, eg "20.1 hours"
      </xsd:documentation>
    </xsd:annotation>
  </xsd:restriction>
</xsd:simpleType>

A duration type can be one of the following:

  • an xsd:duration type specified in the form PnYnMnDTnHnMnS where

    • P is the period;

    • nY is the number of years;

    • nM is the number of months;

    • nD is the number of days;

    • T is the start of the time section;

    • nH is the number of hours;

    • nM is the number of minutes;

    • nS is the number of seconds; or

  • a valid udunits time duration string.

*Note: THREDDS present (7/14) has not implemented parsing for the xsd:duration type, so it would probably be a good idea to restrict duration values to udunits time duration strings.

Examples are:

<duration>P5Y2M10DT15H</duration>
<duration>5 days</duration>

4.9.11. The dataSize Element

The schema is:

<xsd:element name="dataSize">
  <xsd:complexType>
    <xsd:simpleContent>
    <xsd:extension base="xsd:string">
      <xsd:attribute name="units" type="xsd:string" use="required"/>
    </xsd:extension>
    </xsd:simpleContent>
  </xsd:complexType>
</xsd:element>

This is an number with a units attribute, and the attribute should be bytes, Kbytes, Mbytes, Gbytes or Tbytes.

4.9.12. The variables Element

This contains a list of variables or a variableMap element that refers to another document that contains a list of variables. The list specifies the variables available within the dataset, and associates them with a standard vocabulary of names. The purpose of this element is to describe a dataset for a search device or digital library.

Schema
<xsd:element name="variables">
  <xsd:complexType>
    <xsd:choice>
      <xsd:element ref="variable" minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element ref="variableMap" minOccurs="0"/>
    </xsd:choice>
    <xsd:attribute name="vocabulary" type="variableNameVocabulary" use="optional"/>
    <xsd:attributeGroup ref="XLink"/>
  </xsd:complexType>
</xsd:element>

<xsd:element name="variable">
  <xsd:complexType mixed="true">
    <xsd:attribute name="name" type="xsd:string" use="required"/>
    <xsd:attribute name="vocabulary_name" type="xsd:string" use="optional"/>
    <xsd:attribute name="units" type="xsd:string"/>
  </xsd:complexType>
</xsd:element>

<xsd:element name="variableMap">
  <xsd:complexType>
    <xsd:attributeGroup ref="XLink"/>
  </xsd:complexType>
</xsd:element>
Elements and Attributes
  • variable - Each of these must have a name attribute containing the name of the variable. The content of this element can contain text describing the variable.

  • vocabulary_name - Optional attribute containing the variable name from a standard vocabulary (optionally specified as an attribute to the variables element).

  • units - Optional attribute containing the units of the variable.

  • variableMap - Contains an XLink to variable elements, so these can be factored out and referred to in multiple places.

Examples

<variables vocabulary="CF-1.0">
  <variable name="wv" vocabulary_name="Wind Speed" units="m/s">Wind Speed @ surface</variable>
  <variable name="wdir" vocabulary_name="Wind Direction" units= "degrees">Wind Direction @ surface</variable>
  <variable name="o3c" vocabulary_name="Ozone Concentration" units="g/g">Ozone Concentration @ surface</variable>
</variables>

<variables vocabulary="GRIB-NCEP" xlink:href="http://www.unidata.ucar.edu//GRIB-NCEPtable2.xml">
  <variableMap xlink:href="../standardQ/Eta.xml" />
</variables>

A variableMap should point to an XML document with a top-level variables element with the THREDDS namespace declared. For example:

<?xml version="1.0" encoding="UTF-8"?>
<variables xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" >
  <variable name="wv" vocabulary_name="Wind Speed" units="m/s"/>
  <variable name="wdir" vocabulary_name="Wind Direction" units= "degrees"/>
  <variable name="o3c" vocabulary_name="Ozone Concentration" units="g/g"/>
  ...
</variables>

5. Configuration Catalogs, or, Server-Side Catalogs

The THREDDS Data Server (TDS) uses specialized catalogs as configuration documents. Several elements have been added to the InvCatalog schema to allow for this server-side usage.

5.1. Overview

A Configuration Catalog is a basic THREDDS catalog with extensions that allow you to specify the location of your datasets as well as enable the automatic scanning of these locations to create data catalogs. The extensions are the datasetRoot and datasetScan elements. All the catalog examples given thus far contain no information about the directory in which the datasets reside. The examples show how a URL is constructed to access a dataset via one of the available services, but there is nothing thus far that relates a given URL to a specific directory on your computer.

5.2. Roadmap

In this section you will learn how to:

  • use the datasetRoot element to specify the location of the datasets in your filesystem such that the TDS will know where to find them;

  • use the datasetScan element to configure the TDS to automatically scan file directories and create catalogs, with its various sub-elements including:

    • the filter element that enables you to select which of the datasets being scanned are to be included in the generated catalogs;

    • the namer element to create more human readable dataset names;

    • the sort element for specifying the lexigraphic order in which the dataset lists are displayed; and

    • the addProxies element for adding an additional link to the most recently obtained dataset in the group.

5.3. The Elements

The server-side elements are:

5.4. The datasetRoot Element

5.4.1. Schema

The schema for the datasetRoot element is:

  <xsd:element name="datasetRoot">
    <xsd:complexType>
      <xsd:attribute name="path" type="xsd:string" use="required"/>
      <xsd:attribute name="location" type="xsd:string" use="required"/>
    </xsd:complexType>
  </xsd:element>

5.4.2. Examples

The datasetRoot element maps a URL base path to a directory, allowing the constructed URLs to find the directory in which a dataset resides. Suppose you have several datasets in the directory with absolute path /data/ocean and wish to make them available via THREDDS. The files in the directory are:

/data/ocean/salinity.nc
            temp.nc
            hdf/salinity.hdf
            hdf/temp.hdf

The following example shows how that location can be mapped onto the URLs used for accessing the datasets.

Example 7 - The datasetRoot Element

...
  <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />

  <datasetRoot path="ocean" location="/data/ocean/" />

  <dataset name="A Test Dataset" ID="testDataset" urlPath="ocean/salinity.nc" >
    <serviceName>odap</serviceName>
  </dataset>
  <dataset name="A Test Dataset 2" ID="testDataset2" urlPath="ocean/hdf/salinity.hdf" >
    <serviceName>odap</serviceName>
  </dataset>
...

The directory path /data/ocean is aliased to ocean, which is inserted into the URL string right after the part indicating the service type. The URLs for accessing these files via the OPeNDAP server will be:

http://hostname:8080/thredds/dodsC/ocean/salinity.nc
http://hostname:8080/thredds/dodsC/ocean/hdf/salinity.hdf

where http://hostname:8080 is the server name, thredds is the web application name, dodsC is the service name, ocean is the data root alias, and sanity.nc and hdf/salinity.hdf are the filenames relative to the directory being aliased which, in this example, is /data/ocean.

Multiple datasetRoot elements can be defined in a catalog, as in the following example:

Example 8 - Multiple datasetRoot Elements

...
  <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />

  <datasetRoot path="ocean" location="/data/ocean/" />
  <datasetRoot path="atmos" location="/data/atmos/" />

  <dataset name="Ocean Data Test" ID="testDataset" urlPath="ocean/salinity.nc">
    <serviceName>odap</serviceName>
  </dataset>
  <dataset name="Atmosphere Data Test" ID="testDataset2" urlPath="atmos/sfc_wind.nc" >
    <serviceName>odap</serviceName>
  </dataset>
...

The URLs for these datasets will be:

http://hostname:8080/thredds/dodsC/ocean/salinity.nc
http://hostname:8080/thredds/dodsC/atmos/sfc_wind.nc

5.5. The datasetScan Element

5.5.2. Overview

The datasetScan element renders most of what we’ve done thus far unnecessary, since it does most of the tedious work for us. It specifies filesystem locations to be scanned for datasets when generating a catalog. Up until now our examples have entailed creating catalogs from individually specified datasets. While this is a fine way to create a catalog when you only have a few datasets, it can get very tedious very quickly if you have hundreds of datasets, for example, a series of daily files of gridded temperatures over several years. Like the datasetRoot element, this element defines a mapping between a URL base path and a directory. Unlike that element, the datasetScan element will automatically serve some or all of the datasets found in the scanned directory instead of working with individual dataset elements to define the datasets.

5.5.3. XML Schema

The XML schema of the datasetScan element defines all the elements and attributes that can be used. It is:

<xsd:element name="datasetScan" substitutionGroup="dataset">
  <xsd:complexType>
    <xsd:complexContent>
      <xsd:extension base="DatasetType">
        <xsd:sequence>
          <xsd:element ref="filter" minOccurs="0" />
          <xsd:element ref="addID" minOccurs="0" />
          <xsd:element ref="namer" minOccurs="0" />
          <xsd:element ref="sort" minOccurs="0" />
          <xsd:element ref="addLatest" minOccurs="0" />
          <xsd:element ref="addProxies" minOccurs="0" />
          <xsd:element name="addDatasetSize" minOccurs="0" />
          <xsd:element ref="addTimeCoverage" minOccurs="0" />
        </xsd:sequence>

        <xsd:attribute name="path" type="xsd:string" use="required"/>
        <xsd:attribute name="location" type="xsd:string"/>
        <xsd:attribute name="dirLocation" type="xsd:string"/> <!-- deprecated : use location attribute -->
        <xsd:attribute name="filter" type="xsd:string"/> <!-- deprecated : use filter element -->
        <xsd:attribute name="addDatasetSize" type="xsd:boolean"/> <!-- deprecated : use enhance/addDatasetSize element -->
        <xsd:attribute name="addLatest" type="xsd:boolean"/> <!-- deprecated : use addLatest element -->
        <xsd:attribute name="addId" type="xsd:boolean"/> <!-- deprecated : use addID element -->
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>
</xsd:element>

The available elements are:

The available non-deprecated attributes are:

A simple example of a catalog employing the datasetScan element follows:

Example 9 - A Catalog with a datasetScan Element

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="Ocean Data" version="1.0.1"
    xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
    xmlns:xlink="http://www.w3.org/1999/xlink">

  <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
  <datasetScan name="Ocean Data" path="ocean" location="/data/ocean/">
    <serviceName>odap</serviceName>
  </datasetScan >
</catalog>

The path attribute of the datasetScan element is the part of the URL that identifies this particular datasetScan and is used to map dataset URLs to a location. The location attribute gives the location of the dataset collection on the local filesystem.

When a client requests a specific datasetScan, the datasetScan element is replaced and shown as a catalog reference, that is, replaced by a catalogRef element. For example, the catalog in Example 9 would be transformed for a client request into:

Example 10 - A Catalog with a datasetScan Element Replaced by a catalogRef Element

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="Ocean Data" version="1.0.1"
    xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
    xmlns:xlink="http://www.w3.org/1999/xlink">

  <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
  <catalogRef xlink:href="/thredds/catalog/ocean/catalog.xml"
              xlink:title="Ocean Data" name="" />
</catalog>

The datasetScan element in Example 9 is replaced by the catalogRef element of Example 10.

The catalog.xml of Example 10 will be generated dynamically for display by the server when requested by the client. It will do this by scanning the /data/ocean directory specified in the datasetScan element.

If this catalog were to scan a directory structure that looks like:

/data/ocean/atlantic/salinity/s_20050101.nc
                              s_20050102.nc
                                ...
                              s_20051231.nc
                     temperature/t_20050101.nc
                                  ...
                                 t_20051231.nc
            pacific/
             ...
            indian/
             ...

the result of a client request for the top-level catalog created by a datasetScan request to

http://server:8080/thredds/catalog/ocean/catalog.xml

would look something like:

Example 11 - First Level Catalog Created by datasetScan

<catalog ...>
  <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
  <dataset name="Ocean Data">
    <metadata inherited="true">
      <serviceName>odap</serviceName>
    </metadata>
    <catalogRef xlink:title="atlantic" xlink:href="atlantic/catalog.xml" name="" />
    <catalogRef xlink:title="pacific" xlink:href="pacific/catalog.xml" name="" />
    <catalogRef xlink:title="indian" xlink:href="indian/catalog.xml" name="" />
  </dataset>
</catalog>

The result of a request for the second-level catalog atlantic in Example 11 would be of the form:

http://server:8080/thredds/catalog/ocean/atlantic/catalog.xml

and generate a catalog like this:

Example 12 - Second Level Catalog Created by datasetScan

<catalog ...>
  <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
  <dataset name="ocean/atlantic">
    <metadata inherited="true">
      <serviceName>odap</serviceName>
    </metadata>
    <catalogRef xlink:title="salinity"
xlink:href="atlantic/salinity/catalog.xml" name="" />
    <catalogRef xlink:title="temperature"
xlink:href="atlantic/temperature/catalog.xml" name="" />
  </dataset>
</catalog>

The result of a client request for the third-level catalog /atlantic/salinity, i.e. the first subdirectory in Example 12, would have the URL:

http://server:8080/thredds/catalog/ocean/atlantic/salinity/catalog.xml

and generate a catalog like:

Example 13 - Third Level Catalog Created by datasetScan

<catalog ...>
  <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
  <dataset name="ocean/atlantic/salinity>
    <metadata inherited="true">
      <serviceName>odap</serviceName>
    </metadata>
    <dataset name="s_20050101.nc"
             urlPath="ocean/atlantic/salinity/s_20050101.nc" />
    <dataset name="s_20050101.nc"
             urlPath="ocean/atlantic/salinity/s_20050102.nc" />
      ...
    <dataset name="s_20050101.nc"
             urlPath="ocean/atlantic/salinity/s_20051231.nc" />
  </dataset>
</catalog>

To summarize, the datasetScan element performs the following basic tasks:

  • All the files found are turned into dataset elements.

  • All subdirectories are turned into catalogRef elements, and nested if there is more than one level.

  • All the catalog URLs are relative. For example, if the base catalog URL is http://server:8080/thredds/catalog.xml, the xlink:href attribute of the catalogRef element - in this example /thredds/catalog/ocean/catalog.xml - resolves to:

http://server:8080/thredds/catalog/ocean/catalog.xml
  • The URLs for accessing the datasets created by a datasetScan are built from the service base attribute and the created dataset urlPath, respectively /thredds/dodsC/ and ocean/atlantic/salinity/s_20050101.nc for the first dataset in Example 13. This would make the access URL:

http://server:8080/thredds/dodsC/ocean/atlantic/salinity/s_20050101.nc

To summarize, the datasetScan element has the tremendous utility of automatically creating both the catalogs and the URL paths used to access the datasets within the catalogs. Additional useful features of datasetScan beyond these basics will now be discussed.

5.5.4. Filtering, or, the filter Element

Schema

The schema of the filter sub-element of the datasetscan element is:

<xsd:element name="filter">
  <xsd:complexType>
    <xsd:choice>
      <xsd:sequence minOccurs="0" maxOccurs="unbounded">
        <xsd:element name="include" type="FilterSelectorType" minOccurs="0"/>
        <xsd:element name="exclude" type="FilterSelectorType" minOccurs="0"/>
      </xsd:sequence>
    </xsd:choice>
  </xsd:complexType>
</xsd:element>

<xsd:complexType name="FilterSelectorType">
  <xsd:attribute name="regExp" type="xsd:string"/>
  <xsd:attribute name="wildcard" type="xsd:string"/>
  <xsd:attribute name="atomic" type="xsd:boolean"/>
  <xsd:attribute name="collection" type="xsd:boolean"/>
</xsd:complexType>

The available elements are:

The available attributes are:

Overview

The filter element allows users to specify which datasets are to be included in the generated catalogs. A filter element can contain any number of include and exclude elements. Each include or exclude element may contain either a wildcard or a regExp attribute. If the given wildcard pattern or regular expression matches a dataset name, that dataset is included or excluded as specified. By default, includes and excludes apply only to atomic datasets (regular files). You can specify that they apply to atomic and/or collection datasets (directories) by using the atomic and collection attributes.

Examples

A basic example showing the use of include, exclude and wildcard is:

<filter>
  <include wildcard="*.nc"/>
  <exclude wildcard="*.hdf"/>
</filter>

Here the include element is used to tell datasetScan - via the wildcard attribute - to include all NetCDF files - e.g. files of the form *.nc - in the catalogs to be created, and the exclude element is similarly used to exclude all HDF files - e.g. files of the form *.hdf - from those catalogs. If we had a directory structure like the following:

/data/ocean/salinity.nc
            temp.nc
            salinity.hdf
            temp.hdf

then the following datasetScan example:

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="Ocean Data" version="1.0.1"
    xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
    xmlns:xlink="http://www.w3.org/1999/xlink">

  <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
  <datasetScan name="Ocean Data" path="ocean" location="/data/ocean/">
    <serviceName>odap</serviceName>
    <filter>
      <include wildcard="*.nc"/>
      <exclude wildcard="*.hdf"/>
    </filter>
  </datasetScan >
</catalog>

would create the following catalog upon user request:

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="Ocean Data" version="1.0.1"
    xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
    xmlns:xlink="http://www.w3.org/1999/xlink">

  <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
  <dataset name="ocean">
    <metadata inherited="true">
      <serviceName>odap</serviceName>
    </metadata>
    <dataset name="salinity.nc" urlPath="ocean/salinity.nc" />
    <dataset name="temp.nc" urlPath="ocean/temp.nc" />
  </dataset>
</catalog>

In addition to the wildcard attribute, a regExp attribute can be used with the filter element to specify which files are to be selected.

By default, the includes and excludes only apply to regular files, which are also known as atomic datasets.

If you want to filter on the directory level, you can use the atomic and collection attributes. Suppose that in our previous example there was an additional obsolete directory in our structure to hold obsolete versions of our files, i.e.:

/data/ocean/salinity.nc
            temp.nc
            salinity.hdf
            temp.hdf
            obsolete/salinity_old.nc
                     temp_old.nc

To exclude all the files in this directory we would use an exclude element of the form:

<exclude wildcard="obsolete" atomic="false" collection="true" />

to get the same catalog result as the previous example even with the presence of the additional obsolete directory.

5.5.5. Adding Identification Information, or, the addID Element

The addID element is used to specify that a datasetScan should add an ID attribute to each dataset element included in the resulting catalog. The TDS automatically adds ID attributes by default, even with no addID element present. These IDs are constructed by appending the path of the dataset to the datasetScan path value or, if one exists, the ID of the datasetScan element.

The example results above would then more accurately look like this:

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="Ocean Data" version="1.0.1"
    xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
    xmlns:xlink="http://www.w3.org/1999/xlink">

  <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
  <dataset name="ocean" ID="ocean">
    <metadata inherited="true">
      <serviceName>odap</serviceName>
    </metadata>
    <dataset name="salinity.nc" ID="ocean/salinity.nc" urlPath="salinity.nc" />
    <dataset name="temp.nc" ID="ocean/temp.nc" urlPath="temp.nc" />
  </dataset>
</catalog>

5.5.6. Naming Datasets, or, the namer Element

By default, all datasets are named with the name of the underlying file. A namer element can be used with datasetScan to create more human readable dataset names. A simple example of this is:

<namer>
  <regExpOnName regExp="salinity" replaceString="roms_surface_salinity" />
</namer>

where the regExp attribute value salinity is replaced with the roms_surface_salinity attribute value ROMS Ocean Model Output whenever it is encountered. This would cause the first dataset name in Example 16 to be changed from salinity.nc to roms_surface_salinity.nc.

Much more complex renaming is possible via regular expression matching on the dataset names. What are known as capturing groups can be used in the replacement string when a match succeeds. A capturing group is part of a regular expression enclosed in parentheses. When a regular expression with a capturing group is applied to a string, the substring that matches the capturing group is saved, and can then be substituted into another string in place of capturing group references. Suppose we have a file named temp-20080105.nc that contains ROMS model surface temperature values for January 5, 2008 for the Gulf of Mexico (as encoded in a YYYYMMDD format in the filename). A regular expression used as the value for the regExp attribute of the namer element could be:

regExp="temp-([0-9]{4})([0-9]{2})([0-9]{2}).nc"

and would capture the four digits of the year (2008) as the first capture group or reference $1, the two digits of the month (01) as the second capturing group or reference $2, and the two digits of the day (05) as the third capturing group or reference $3. A replaceString attribute value to supply more useful information than is contained within the filename could be:

replaceString="ROMS Gulf of Mexico Surface Temperature: $1-$2-$3"

and the complete namer element example would be:

<namer>
    <regExpOnName regExp="temp-([0-9]{4})([0-9]{2})([0-9]{2}).nc"
                  replaceString="ROMS Gulf of Mexico Surface Temperature: $1-$2-$3" />
</namer>

This would serve to change the dataset element name attribute from temp-20080105.nc to ROMS Gulf of Mexico Surface Temperature: 2008-01-05, with the latter much more informative than the former for those searching for specific datasets.

All the gory details about regular expressions are beyond the scope of this tutorial but can be found at:

Basically, you are limited only by your imagination and your capacity for understanding the recondite syntax of regular expressions.

5.5.7. Sorting Datasets, or, the sort Element

The schema for the sort element is:

<xsd:element name="sort">
  <xsd:complexType>
    <xsd:choice>
      <xsd:element name="lexigraphicByName">
        <xsd:complexType>
          <xsd:attribute name="increasing" type="xsd:boolean"/>
        </xsd:complexType>
      </xsd:element>
    </xsd:choice>
  </xsd:complexType>
</xsd:element>

The default behavior of datasetScan is that datasets are presented in decreasing lexigraphic order. A sort element can be added to the datasetScan element to specifying the opposite, i.e. an increasing lexigraphic order. An example of this is:

<sort>
    <lexigraphicByName increasing="true" />
</sort>

5.5.8. Adding the Latest Dataset, or, the addLatest Element

This element has been deprecated in favor of the addProxies element.

5.5.9. Adding Time Coverage Information, or, the addTimeCoverage Element

If your filenames contain sufficient encoded time information, you can use the addTimeCoverage element with datasetScan to add start and duration elements to the datasets created by datasetScan. At present, this is limited to specifying the start time and duration of a dataset. Using the filename of our previous example, an appropriate addTimeCoverage element to add to datasetScan would be:

<addTimeCoverage datasetNameMatchPattern="temp-([0-9]{4})([0-9]{2})([0-9]{2}).nc"
                 startTimeSubstitutePattern="$1-$2-$3"
                 duration="24 hours" />

wherein the datasetNameMatchPattern attribute is the same as the regExp attribute of the prevous namer example and is used to extract the year, month and day from the filename, the startTimeSubstitutePattern attribute sets the desired format for presenting the starting time, and the duration attribute allows us to specify a duration for the file. For our example filename temp-20080105.nc, the result would be the addition of the following timeCoverage element to the dataset element in the catalog created via datasetScan:

<timeCoverage>
    <start>2008-01-05</start>
    <duration>24 hours</duration>
</timeCoverage>

5.5.10. Adding Dataset Size Information, or, the addDatasetSize Element

The addDatasetSize element allows you to add file size metadata to all of the atomic datasets. This element is simply added within the datasetScan container like this:

<datasetScan name="Ocean Data" ...
  <addDatasetSize />
</datasetScan>

and will produce a dataset element looking something like this:

<dataset name="Ocean Data" ID="testdata">
  <dataset name="salinity.nc" urlPath="salinity.nc">
    <dataSize units="Kbytes">6.08</dataSize>
  </dataset>
  <dataset name="temperature.nc" urlPath="temperature.nc">
    <dataSize units="Mbytes">4.961</dataSize>
  </dataset>
</dataset>

5.5.11. Adding Most Recent Dataset Information, or, the addProxies Element

The schema for the addProxies element is:

<xsd:element name="addProxies">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="simpleLatest" minOccurs="0">
        <xsd:complexType>
          <xsd:attribute name="name" type="xsd:string"/>
          <xsd:attribute name="top" type="xsd:boolean"/>
          <xsd:attribute name="serviceName" type="xsd:string"/>
        </xsd:complexType>
      </xsd:element>
      <xsd:element name="latestComplete" minOccurs="0">
        <xsd:complexType>
          <xsd:attribute name="name" type="xsd:string"/>
          <xsd:attribute name="top" type="xsd:boolean"/>
          <xsd:attribute name="serviceName" type="xsd:string"/>
          <xsd:attribute name="lastModifiedLimit" type="xsd:float"/>
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>

In a real-time archive such as one might have for an operational forecasting system, it would be a good idea to indicate to the user which is the most recent dataset. The addProxies element allows us to define a proxy dataset that always points to the most recent dataset in a collection. At present, the addProxies element has two subelements for specifying proxy datasets.

  • The simpleLatest element adds a proxy dataset which points to the existing dataset whose name is lexgraphically greatest. This finds the most recent dataset if we assume that a timestamp exists as part of the filename and is the only part of the filename that changes.

  • The latestComplete element is similar to simpleLatest, but won’t include any dataset that has been modified more recently than a specified time limit. For example, you could specify the most recent dataset that hasn’t been modified for 60 minutes.

An example of an addProxies element to be added to a datasetScan element is:

<addProxies>
    <latestComplete name="latextComplete.xml" top="true" serviceName="latest"
         lastModifiedLimit="60" />
</addProxies>

which would result in the following dataset being listed at the top of the catalog created via datasetScan:

<dataset name="latestComplete.xml" serviceName="latest" urlPath="latestComplete.xml" />

wherein the attribute name provides a name for the proxy dataset, the attribute top indicates that the proxy dataset will be listed at the top of the catalog, the attribute serviceName references the service used by the dataset, and the attribute lastModifiedLimit excludes datasets that have been modified more recently than - in this case - 60 minutes. Default values for name, top and serviceName are, respectively, latest.xml, true and latest.

5.6. The datasetFmrc Element

This element has been deprecated. It has been replaced with the featureCollection element.

The schema for the datasetFmrc element is:

<!-- Define a Forecast Model Run Collection dataset -->
<xsd:element name="datasetFmrc" substitutionGroup="dataset">
  <xsd:complexType>
    <xsd:complexContent>
      <xsd:extension base="DatasetType">
         <xsd:sequence>
          <xsd:element ref="fmrcInventory" minOccurs="0"/>
          <xsd:element ref="addTimeCoverage" minOccurs="0"/>
        </xsd:sequence>
        <xsd:attribute name="path" type="xsd:string" use="required"/>
        <xsd:attribute name="runsOnly" type="xsd:boolean" />
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>
</xsd:element>

<xsd:element name="fmrcInventory">
  <xsd:complexType>
    <xsd:attribute name="location" type="xsd:string" use="required"/>
    <xsd:attribute name="suffix" type="xsd:string"/>
    <xsd:attribute name="fmrcDefinition" type="xsd:string" use="required"/>
    <xsd:attribute name="olderThan" type="xsd:string" />
    <xsd:attribute name="subdirs" type="xsd:string" />
  </xsd:complexType>
</xsd:element>

A datasetFmrc is a kind of dataset, and allows all of the attributes and child elements of the dataset element.

An example is:

  <datasetFmrc name="SABGOM Forecast Model Run Collection" path="fmrc/sabgom">

    <metadata inherited="true">
      <serviceName>dapService</serviceName>
    </metadata>

         <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
      <aggregation dimName="runtime" type="forecastModelRunCollection" recheckEvery="10min">
        <scan location="D:/test/signell/big/" suffix=".nc" dateFormatMark="his_#yyyyMMdd" olderThan="5 min"/>
      </aggregation>
    </netcdf>

  </datasetFmrc>

6. Restricting Access to Datasets

Data suppliers may occasionally wish to restrict access to some datasets. There are two ways to do this.

6.1. Restriction by URL Using Tomcat

A built-in Tomcat mechanism can be used to restrict a pattern of URLs by adding <security-constraint> elements into the web.xml file. In the following example, all URL accesses with the urlPattern will be forced to authorized users with the role roleName. The <transport-guarantee> elements force a swich to using an SSL socket.

  <security-constraint>
    <web-resource-collection>
      <web-resource-name>restrict by URL</web-resource-name>

  <url-pattern>urlPattern</url-pattern>
      <http-method>GET</http-method>
    </web-resource-collection>
    <auth-constraint>

  <role-name>roleName</role-name>
    </auth-constraint>
    <user-data-constraint>
      <transport-guarantee>CONFIDENTIAL</transport-guarantee>
    </user-data-constraint>
  </security-constraint>

A more realistic version of this might look like:

  <security-constraint>
    <web-resource-collection>
      <web-resource-name>restrict by URL</web-resource-name>
      <url-pattern>/dodsC/dataRoot/*</url-pattern>
      <http-method>GET</http-method>
    </web-resource-collection>
    <auth-constraint>
      <role-name>tiggeRole</role-name>
    </auth-constraint>
    <user-data-constraint>
      <transport-guarantee>CONFIDENTIAL</transport-guarantee>
    </user-data-constraint>
  </security-constraint>

If you are using multiple data services, you must include each service’s URL pattern, e.g.

  <web-resource-collection>
      <web-resource-name>restrict by URL</web-resource-name>
      <url-pattern>/dodsC/testEnhanced/*</url-pattern>
      <url-pattern>/fileServer/testEnhanced/*</url-pattern>
      <http-method>GET</http-method>
    </web-resource-collection>

This restriction method works well if you want to restrict your entire site to a single set of users. When you want to give access to different datasets to different users, you need to udnerstand in more detail what the URLs look like.

6.2. Restriction by Dataset Using the TDS Catalog

A fine-grained approach to dataset restriction is to modify the dataset elements in the configuration catalog. This is done by adding an adding the attribute restrictAccess="roleName" to a dataset or datasetScan element. This will restrict acccess to that dataset to users with the named role.

In this method, when a client attempts to access a restricted dataset, it is redirected to a URL that triggers a security challenge. If the challenge is successfully answered, the client is redirected back to the original dataset URL into an authenticated session represented by a session cooke passed to the client. As long as the session remains valid, no further authentication will be required for further requests.

The default TDS configuration uses digest authentication. The web.xml file can be modified to perform authentication in other ways, e.g. SSL authentication. There is also the option to add a custom authentication method.

A client such as a browser, OPeNDAP-enabled application, or WCS client must have to following capabilities to access a restricted dataset.

  • following redirects, including circular redirects;

  • switching to SSL and back;

  • performing basic and digest authentication;

  • answering security challenges with the appropriate username and password;

  • returning session cookies

6.2.1. Configuring Restricted Datasets

To configure a restricted dataset, you must first decide on distinct sets of datasets that need to be restricted, as well as which users will be allowed access. For each set a name called a security role is chosen. This name should not contain special characters such as /"><' and spaces.

Once the users and roles have been chosen, three steps need to be taken.

Adding Roles

Suppose you have three sets of restricted data that you have given the security roles ccsmData, fieldProject and tiggeData. along with the restrictedDataUser role, e.g.

  <role rolename="restrictedDatasetUser"/>
  <role rolename="ccsmData"/>
  <role rolename="fieldProject"/>
  <role rolename="tiggeData"/>

If there is only one set of datasets you wish to restrict, you can use just restrictedDatasetUser. If there are multiple datasets, though, you must specify restrictedDatasetUser along with all the other roles.

Adding Users

Each user who should have authorization should be added to the tomcat-users.xml file. A user may have multiple roles, and must always have the restrictedDatasetUser role. An example is:

<user username="john" password="dorkology" roles="ccsmData,restrictedDatasetUser"/>
  <user username="tiggeUser" password="flabulate" roles="tiggeData,restrictedDatasetUser"/>
  <user username="luci" password="designated" roles="fieldProject,tiggeData,restrictedDatasetUser"/>

Make sure that no user with the restrictedDatasetUser role has any of the secure roles such as tdsConfig, manager or admin. This is required since the secure roles can only be accessed via HTTPS, while restrictedDatasetUsers can also use non-HTTPS URLs and are thus vulnerable to session hijacking. Note that while this example stores passwords in cleartext, it is advised to store them in digest form. Users and roles can also be managed using the Tomcat administration application if it has been installed.

Adding Attributes to Configuration Catalogs

You now need to add restrictAccess={security role} attributes to the dataset or datasetScan elements of the datasets for which you wish to have access restricted. This will also restrict access to the children of those datasets. An example is:

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="TDS Catalog" xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0">

  <service name="thisDODS" serviceType="OpenDAP" base="/thredds/dodsC/" />
  <datasetRoot path="test" location="/data/testdata/"/>

  <dataset name="Test Single Dataset" ID="testDataset" serviceName="thisDODS"
      urlPath="test/testData.nc" restrictAccess="tiggeData">

    <dataset name="Nested" ID="nested" serviceName="thisDODS" urlPath="test/nested/testData.nc" />
  </dataset>

  <datasetScan name="Test all files in a directory" ID="testDatasetScan"
      path="testAll" location="/data/testdata" restrictAccess="ccsmData" >

    <metadata inherited="true">
      <serviceName>thisDODS</serviceName>
    </metadata>

  </datasetScan>
</catalog>

In this example the dataset with ID testDataset will be restricted along with its child dataset with ID nested. All the datasets generated by the datasetScan element will also be restricted. These datasets will still be listed in the generated catalogs, but a user will be challenged when attempting to access them.

The next section shows what needs to be done to add SSL authentication to the process.

6.3. SSL Authentication

If SSL authentication is to be added, the following must be implemented in addition to all of the default configuration procedures explained in the previous section.

6.3.1. Enable Tomcat Security

Secure Sockets must be enabled in Tomcat. See the SSL authentication section for the details.

6.3.2. Modify the TDS web.xml File

Edit the ${tomcat_home}/webapps/thredds/WEB-INF/web.xml file and find the following section:

  <security-constraint>
    <web-resource-collection>
      <web-resource-name>restrictedAccess</web-resource-name>
      <url-pattern>/restrictedAccess/*</url-pattern>
      <http-method>GET</http-method>
    </web-resource-collection>
    <auth-constraint>
      <role-name>restrictedDatasetUser</role-name>
    </auth-constraint>
  </security-constraint>

Add the following user-data-constraint element to it:

<security-constraint>
  <web-resource-collection>
    <web-resource-name>restrictedAccess</web-resource-name>
    <url-pattern>/restrictedAccess/*</url-pattern>
    <http-method>GET</http-method>
  </web-resource-collection>
  <auth-constraint>
    <role-name>restrictedDatasetUser</role-name>
  </auth-constraint>
  <user-data-constraint>
    <transport-guarantee>CONFIDENTIAL</transport-guarantee>
  </user-data-constraint>
</security-constraint>

Next, find the following section:

  <!-- Restricted Access (using Tomcat) -->
  <servlet>
    <servlet-name>RestrictedDataset</servlet-name>
    <servlet-class>thredds.servlet.restrict.RestrictedDatasetServlet</servlet-class>

          <init-param>
      <param-name>Authorizer</param-name>
      <param-value>thredds.servlet.restrict.TomcatAuthorizer</param-value>
    </init-param>

    <init-param>
      <param-name>useSSL</param-name>
      <param-value>false</param-value>
    </init-param>

    <init-param>
      <param-name>portSSL</param-name>
      <param-value>8443</param-value>
    </init-param>

    <load-on-startup>2</load-on-startup>
 </servlet>

and:

  • change the useSSL parameter false to true

  • change portSSL if the correct value is not 8443

6.4. Adding a Custom Authenticator for Restricting Access

This..

7. NcML

7.2. Overview

A brief introduction to the concepts of the NetCDF Markup Language, the Common Data Model, NcML Documents, virtual and aggregated datasets and their interrelationships is now presented. Basically, all of these concepts combine to allow us to take a set of files we have created or collected - possibly a huge number of them and possibly scattered all over local and remote locations - and combine them in various ways such that their contents are presented to interested users in a uniform format that is easily understandable by those who aren’t walking acronym dictionaries and that is easily accessible via a variety of remote methods.

The NetCDF Markup Language (NcML) is an XML dialect that enables the creation of CDM datasets. The Common Data Model (CDM) is an abstract data model for scientific datasets that merges features of the data models used by NetCDF, OPeNDAP and HDF5 to create a common interface for many types of scientific data.

NcML is an XML representation of NetCDF metadata, and can be usefully considered an XML version of the NetCDF Common data form Description Language (CDL). The basic uses of NcML are:

  • to describe the metadata and structural content of a NetCDF file;

  • to modifying existing NetCDF files; and

  • to create virtual NetCDF datasets that are combinations of many actual NetCDF datasets.

An NcML Document is an XML document whose contents are described and constrained by the latest official NcML schema. This document represents a generic NetCDF dataset, that is, a container for data conforming to the NetCDF CDM. The document can represent many things, including:

  • a generic NetCDF dataset;

  • a NetCDF file not yet written;

  • an HDF or GRIB file read through the NetCDF-Java library;

  • a subset of a NetCDF file;

  • an aggregation of NetCDF files; or

  • a self-contained dataset with all the data contained within the NcML document itself with no separate NetCDF file holding the data and referred to from within the NcML document.

In most instances the NcML document will be a fairly small file that contains metadata about a larger file along with a pointer or pointers to the actual data described by that metadata.

Most important for the purposes of this tutorial, NcML can be embedded directly into TDS catalogs and thus allow its full functionality to be employed from within THREDDS. An NcML element can be placed inside dataset or datasetScan elements, where it can be used to modify a regular dataset. The result is called a virtual dataset, which can be accessed via subsetting services like OPeNDAP, WCS, WMS and NetcdfSubset, although not by HTTP since it is a virtual rather than actual dataset.

7.3. Roadmap

In this section your will learn:

  • about the XML schema that defines the capabilities of NcML;

  • about a software library and utility that implement the capabilities of NcML for you to use;

  • about the difference between actual datasets and virtual datasets;

  • how to add, delete and change metadata for virtual datasets;

  • how to rename, add, delete and restructure variables for virtual datasets;

  • how to use NcML to aggregate many individual datasets into a single virtual dataset;

7.4. Annotated Schema for NcML

The final authority on all matters touched upon in this NcML section is the Annotated Schema for NcML at:

which contains the schema for the version of NcML found in the latest release of the TDS. If an example or explanation within this document isn’t working, it’s most likely because of either an error within this document, or this document lagging behind the latest NcML schema. The schema is heavily annotated and not that difficult to read and understand once you’ve gotten a bit used to such things.

7.4.1. The netcdf Element

The netcdf element is the root tag of an NcML instance document, and defines a NetCDF dataset. It is:

<!-- XML encoding of Netcdf container object -->
<xsd:element name="netcdf">
  <xsd:complexType>
    <xsd:sequence>

 (1)  <xsd:choice minOccurs="0">
        <xsd:element name="readMetadata"/>
        <xsd:element name="explicit"/>
      </xsd:choice>

 (2)  <xsd:element name="iospParam" minOccurs="0" />

 (3)  <xsd:choice minOccurs="0" maxOccurs="unbounded">
        <xsd:element ref="group"/>
        <xsd:element ref="dimension"/>
        <xsd:element ref="variable"/>
        <xsd:element ref="attribute"/>
        <xsd:element ref="remove"/>
      </xsd:choice>

 (4)  <xsd:element ref="aggregation" minOccurs="0"/>
    </xsd:sequence>
 (5)<xsd:attribute name="location" type="xsd:anyURI"/>
 (6)<xsd:attribute name="id" type="xsd:string"/>
 (7)<xsd:attribute name="title" type="xsd:string"/>
 (8)<xsd:attribute name="enhance" type="xsd:string"/>
 (9)<xsd:attribute name="addRecords" type="xsd:boolean"/>

(10)<xsd:attribute name="iosp" type="xsd:string"/>
    <xsd:attribute name="iospParam" type="xsd:string"/>
    <xsd:attribute name="bufferSize" type="xsd:int"/>

  <!-- for netcdf elements nested inside of aggregation elements -->
(11)<xsd:attribute name="ncoords" type="xsd:string"/>
(12)<xsd:attribute name="coordValue" type="xsd:string"/>
(13)<xsd:attribute name="section" type="xsd:string"/>

  </xsd:complexType>
</xsd:element>

The attributes are:

The ncoords Attribute

This optional attribute is used for joinExisting aggregation datasets to indicate the number of coordinates that come from the dataset. This is used to avoid having to open each dataset when starting.

The coordValue Attribute

This attribute is used for joinExisting or joinNew aggregations to assign one or more coordinate values to a dataset. A joinNew aggregation always has exactly one coordinate value. A joinExisting aggregation may have multiple values, in which case blanks and/or commas are used to delineate them. Thus these characters cannot be used in your coordinate values.

The section Attribute

This attribute is only used for tiled aggregations, and describes which section of the entire dataset that this subset dataset represents.

7.5. NetCDF Java Library and ToolsUI

The CDM used by NcML to create datasets is an abstract data model. It must be implemented as software to create actual NcML documents defining CDM datasets from NetCDF, HDF, etc. files. This has been done with the NetCDF-Java Library, fully documented at:

which implements the CDM data model. The NetCDF-Java library is a Java framework for reading NetCDF and other formats into the CDM, as well as writing to NetCDF 3 format. The library also implements NcML, allowing you to add metadata to CDM datasets as well as create virtual datasets. It is available as a Java applications in jar format, but since it is bundled as part of the THREDDS package it is not usually necessary to use it by itself.

A program that is useful to run by itself is ToolsUI, a GUI interface to much of the functionality of the NetCDF-Java/CDM library. ToolsUI is not part of THREDDS and cannot be used as such, but it can be extremely valuable as a tool for creating and modifying NcML documents to quickly test the effects of adding and modifying various elements and attributes within the documents.

7.6. THREDDS Dataset vs. THREDDS Virtual Dataset

An example of serving a file as a THREDDS dataset and also serving exactly the same file as a virtual dataset through NcML follows. In this example, there is a single dataset located at:

/data/ocean/example.nc

that will be served both directly and - via NcML commands embedded in the dataset element using the netcdf subelement - virtually.

Example 17 - Dataset vs. Virtual Dataset

<catalog
xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
         xmlns:xlink="http://www.w3.org/1999/xlink">
  <service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/"/>

  <datasetRoot path="test/ncml" location="/data/ocean/" />
  <dataset name="Example Dataset" ID="Example" urlPath="test/ncml/example.nc">
    <serviceName>ncdods</serviceName>
  </dataset>

  <dataset name="Example NcML Modified" ID="Modified" urlPath="ncml/modified.nc">
    <serviceName>ncdods</serviceName>
    <netcdf xmlns="http.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="/data/ocean/example.nc">
      <variable name="Temperature" orgName="T"/>
    </netcdf>
  </dataset>

</catalog>

In the first part of this example, a datasetRoot is defined that associates the URL path test/ncml with disk location /data/ocean/, such that the file located at /data/ocean/example.nc is served via the URL:

http://localhost:8080/thredds/dodsC/test/ncml/example.nc

In the second part, the same file example.nc is used to define a virtual dataset defined by the embedded NcML. This virtual dataset is given the urlPath value ncml/modified.nc. The netcdf element that contains the NcML commands then associates this urlPath with the location value of /data/ocean/example.nc. This virtual file will be served via the URL:

http://localhost:8080/thredds/dodsC/ncml/modified.nc

7.7. Using NcML to Modify Metadata

7.7.1. Overview

In this section we will learn how to use NcML to:

7.7.2. The dimension Element, or Defining or Renaming Dimensions

Schema Overview

The dimension element represents a NetCDF dimension, that is, a named index of specified length. The NcML schema for this is:

  <!-- XML encoding of Dimension object -->
  <xsd:element name="dimension">
    <xsd:complexType>
      <xsd:attribute name="name" type="xsd:token" use="required"/>
      <xsd:attribute name="length" type="xsd:nonNegativeInteger" use="required"/>
      <xsd:attribute name="isUnlimited" type="xsd:boolean" default="false"/>
      <xsd:attribute name="isVariableLength" type="xsd:boolean" default="false"/>
      <xsd:attribute name="isShared" type="xsd:boolean" default="true"/>
      <xsd:attribute name="orgName" type="xsd:string"/>
    </xsd:complexType>
  </xsd:element>

where:

  • name - A mandatory attribute that must be unique within its containing netcdf or group element.

  • length - A mandatory attribute expressing the number of points associated with the dimension, which can be any non-negative integer including zero. A variable length dimension is specified by length="*".

  • isUnlimited - An attribute whose value is true if this is a NetCDF record dimension - that is, one that can grow - and the default value of false if the dimension if fixed.

  • isVariableLength - An attribute that has the value true for variable length data types, and the default false for all else.

  • isShared - An attribute with the default value true for shared dimensions, and false when the dimension is private to the variable.

  • orgName - Used when renaming a dimension.

Examples

An example wherein the original dimension name lat is changed to latitude is:

<dimension orgName="lat" name="latitude" />

7.7.3. The group Element

The group element of the NcML schema defines the proper method for adding, removing or modifying groups. It is:

<xsd:element name="group">
  <xsd:complexType>
  <xsd:choice minOccurs="0" maxOccurs="unbounded">
     <xsd:element ref="enumTypedef"/>
     <xsd:element ref="dimension"/>
     <xsd:element ref="variable"/>
     <xsd:element ref="attribute"/>
     <xsd:element ref="group"/>
     <xsd:element ref="remove"/>
   </xsd:choice>

   <xsd:attribute name="name" type="xsd:string" use="required"/>
   <xsd:attribute name="orgName" type="xsd:string"/>
  </xsd:complexType>
</xsd:element>

where:

  • The group element may contain zero or more of sub-elements group, variable, dimension or attribute that can appear in any order. Any number of remove elements can be mixed in to remove elements coming from the referenced dataset.

  • The mandatory name attribute must be unique among groups within its containing group or netcdf element.

  • The optional attribute orgName is used when renaming a group.

7.7.4. The variable Element, or Defining or Renaming Variables

Schema Overview

The variable element of the NcML schema defines the proper method for defining or modifying variables, and is:

  <xsd:element name="variable">
    <xsd:complexType>
      <xsd:sequence>
(1)     <xsd:element ref="attribute" minOccurs="0" maxOccurs="unbounded"/>
(2)     <xsd:element ref="values" minOccurs="0"/>
(3)     <xsd:element ref="variable" minOccurs="0" maxOccurs="unbounded"/>
(4)     <xsd:element ref="logicalSection" minOccurs="0"/>
(5)     <xsd:element ref="logicalSlice" minOccurs="0"/>
(6)     <xsd:element ref="remove" minOccurs="0" maxOccurs="unbounded" />
      </xsd:sequence>

(7)   <xsd:attribute name="name" type="xsd:token" use="required" />
(8)   <xsd:attribute name="type" type="DataType" use="required" />
(9)   <xsd:attribute name="typedef" type="xsd:string"/>
(10)  <xsd:attribute name="shape" type="xsd:token" />
(11)  <xsd:attribute name="orgName" type="xsd:string"/>
    </xsd:complexType>
  </xsd:element>

The elements under variable are:

  • attribute - A variable element may contain one or more attribute elements.

  • values - The optional values element is used to specify the data values of a variable. The The values must be listed compatibly with the size and shape of the variable (slowest varying dimension first). If not specified, the data values are taken from the variable of the same name in the referenced dataset. Values are the "raw values", and will have scale.offset/missing applied to them if those attributes are present.

  • variable - A variable of data type structure may have nested variable elements within.

  • logicalSection - An element for creating a logical section of a variable.

  • logicalSlice - An element for creating a logical slice of a variable, where one of the dimensions is set to a constant.

  • remove - An element for removing attributes from the underlying variable.

The attributes under variable are:

  • name - A mandatory attribute that must be unique among variables within its containing group, variable and/or netcdf element.

  • type - One of the enumerated DataTypes.

  • typedef - The name of an enumerated Typedef that is only used for variable types enum1, enum2 or enum4.

  • shape - Lists the names of the dimensions on which the variable depends. For a scalar variable, the list is empty. The dimension names must be ordered with the slowest varying dimension first. Anonymous dimensions are specified with just the integer length.

  • orgName - An optional attribute used when renaming a variable.

An Example

Why would we want to add a variable to an NcML document? Suppose we have the following NetCDF file:

netcdf example1 {
dimensions:
        time = UNLIMITED ; // (2 currently)
        lat = 3 ;
        lon = 4 ;
variables:
        int rh(time, lat, lon) ;
                rh:long_name = "relative humidity" ;
                rh:units = "percent" ;
        double T(time, lat, lon) ;
                T:long_name = "surface temperature" ;
                T:units = "degC" ;
        int time(time) ;
                time:units = "hours" ;

// global attributes:
                :title = "Example Data" ;
                :lat_range = "41.0 40.0 39.0" ;
                :lon_range = "-109.0 -107.0 -105.0 -103.0" ;
data:

 rh =
  1, 2, 3, 4,
  5, 6, 7, 8,
  9, 10, 11, 12,
  21, 22, 23, 24,
  25, 26, 27, 28,
  29, 30, 31, 32 ;

 T =
  1, 2, 3, 4,
  2, 4, 6, 8,
  3, 6, 9, 12,
  2.5, 5, 7.5, 10,
  5, 10, 15, 20,
  7.5, 15, 22.5, 30 ;

 time = 6, 18 ;
}

wherein we find the variables rh, T and time, with the first two obviously located on some sort of spatial grid based on latitudes and longitudes. The file contains dimensions for lat and lon, but does not specify them as variables. It is not at all an unusual situation for a NetCDF file to contain variable fields on some sort of grid without the grid field being explicitly specified within the file. When we run a numerical model to simulate the time evolution of various variable fields - for instance, the horizontal velocities, salinity and temperature in the ocean - we need to store the files containing the fields and usually just dump the fields themselves into the file due to time constraints. We’re the only ones using those files and we know what grid we’re using, and we can always go back and add the metadata later. Well, now somebody else wants to use our files and we’d really prefer to not have to go through all the hassle of writing a program to read all of our old data files, and then write them right back out with additional information such as details about the grid we’re using. Fortunately for us, we can keep the old files in their present form and simply create an NcML document containing elements and attributes that will define virtual versions of our old files containing all the information anyone would ever need to find, understand, download and use them.

We can obtain the details about the underlying grid either from external documentation or - as in this example - from the information being included in the list of global attributes. We then create the following NcML document to create the variables lat and lon and to supply them with the appropriate attributes.

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="example1.nc">

  <variable name="lat" shape="lat" type="float">
    <attribute name="units" type="String" value="degrees_north" />
    <values>41.0 40.0 39.0</values>
  </variable>

  <variable name="lon" shape="lon" type="float">
    <attribute name="units" type="String" value="degrees_east" />
    <values>-109.0 -107.0 -105.0 -103.0</values>
  </variable>

</netcdf>

We create a variable element for each variable we wish to define. As we see in the schema for the variable element, the name and type attributes are required. The name attribute values lat and lon supply names for our newly defined variables, and the type attribute value float defines them both as floating point variables.

The shape element is recommended and defines the shape of our arrays via listing the dimensions defined in the original NetCDF file. In this case we have a regular grid, so the shape of both the lat and lon variables will be single dimensions, respectively, the lat and lon dimensions defined in example1.nc above. For a scalar variable, this list will be empty. The dimension names must be in order with the slowest varying dimension first in the CDL description, with anonymous dimensions specified via just the integer dimension length.

The orgName element is used when we want to rename a variable. If we wanted to rename the mildly cryptic T in this example to the more informative temperature we would use the following code:

<variable orgName="T" name="temperature" />

7.7.5. The attribute Element, or Defining or Renaming Variable Attributes

Schema Overview

The attribute element of the NcML schema tells us how we can add or modify attributes, and is:

 <xsd:element name="attribute">
    <xsd:complexType mixed="true">
      <xsd:attribute name="name" type="xsd:token" use="required"/>
      <xsd:attribute name="type" type="DataType" default="String"/>
      <xsd:attribute name="value" type="xsd:string" />
      <xsd:attribute name="separator" type="xsd:string" />
      <xsd:attribute name="orgName" type="xsd:string"/>
      <xsd:attribute name="isUnsigned" type="xsd:boolean"/>
    </xsd:complexType>
  </xsd:element>

where the attributes are:

  • name - A mandatory attribute that must be unique among attributes within its containing group, variable or netcdf element.

  • type - An optional attribute that may be String, byte, short, int, long, float or double. The default value is String.

  • value - Contains the actual data of the attribute element. Most commonly a single number or string will be listed, e.g. +value="3.0". In the case of multi-valued attributes, all the numbers will be listed and separated by a blank or optionally some other character, e.g. +value="3.0 4.0 5.0".

  • separator - Used to specify a different token separator for multi-valued instances.

  • orgName - Used when renaming an existing attribute.

  • isUnsigned - An attribute’s values may be unsigned (if byte, short, int or long). By default they are signed.

Examples

In this case, only the name attribute is required since it’d not be very useful to have an unnamed attribute. In our example NcML document, we define attribute elements with the attribute name value units for lat and lon, respectively, as:

<attribute name="units" type="String" value="degrees_north" />
<attribute name="units" type="String" value="degrees_east" />

We’ve also defined the non-required attributes type and value for each. The value String for the former defines it as a string type, and the values degrees_north and degrees_east are standard units for such quantities.

The orgName attribute is used when an attribute is being renamed rather than created. For example, if we wanted to rename the units name long_name in our example to standard_name, we would do it thusly:

<attribute orgName="long_name" name="standard_name" />

The separator attribute is used to define a token separator other than whitespace if the attribute does not have a String type value.

7.7.6. The values Element, or Defining Values

Schema Overview

The NcML schema for the values element used to specify the data values of a variable is:

<xsd:element name="values">
    <xsd:complexType mixed="true">
      <xsd:attribute name="start" type="xsd:float"/>
      <xsd:attribute name="increment" type="xsd:float"/>
      <xsd:attribute name="npts" type="xsd:int"/>
      <xsd:attribute name="separator" type="xsd:string" />
      <xsd:attribute name="fromAttribute" type="xsd:string"/>
    </xsd:complexType>
  </xsd:element>

where:

  • start, increment - Used to specify numeric and evenly spaced values. These can be integers or floating point numbers, and they will be converted to the data type of the variable. The number of points will be taken from the shape of the variable.

  • npts - A deprecated variable allowed for backwards compatibility that is ignored.

  • separator - Allows the specification of a separator other than whitespace.

  • fromAttribute - The values can be specified from a global or variable attributes. A global attribute is specified via @gattname, and a variable attribute by varName@attName. The data type and the shape of the variable must agree with the attribute.

Examples

In our example, we see the values elements:

<values>41.0 40.0 39.0</values>
<values>-109.0 -107.0 -105.0 -103.0</values>

which illustrates the specification of the data values by explicitly listing all of them. While this is a good and convenient enough way to do this with just a few data values, it can become tedious and error-prone if there a dozens or hundreds of values to specify. Thus, we also have the start and increment attributes which - if applied to this example - would look like this:

<values> start="41.0" increment="-1.0" />
<values> start="-109.0" increment="2.0" />

This can be done with all evenly spaced sets of data values. The numbers can be entered as integers or floating point numbers, and will automatically be converted to the data type of the variable. If you are listing values for a multi-dimensional variable, the values must be listed compatibly with the size and shape of the variable, with the slowest varying dimension first. The npts attribute is included solely for reasons of backwards compatibility, and is ignored by the NcML parser.

The default is for the list of values to be separated by whitespace, but this can be changed with the separator attribute. This could be useful if you were entering String values, e.g.

<values separator="*">My dog*has*fleas</values>

or if you were cutting and pasting a long string of comma-separated numbers from another document into your values element, e.g.

<values separator=",">10.5,11.0,11.5,12.0,12.5,13.0,13.5,14.0,14.5</values>

The fromAttribute element allows you to specify values by extracting them from a global or variable attribute. In our example, the global attributes did indeed included the data values for both the lat and lon variables. They are:

// global attributes:
                :title = "Example Data" ;
                :lat_range = "41.0 40.0 39.0" ;
                :lon_range = "-109.0 -107.0 -105.0 -103.0" ;

To extract the lat values from this, the following would be used:

  <variable name="lat" shape="lat" type="float">
    <attribute name="units" type="String" value="degrees_north" />
    <values fromAttribute="@lat_range" />
  </variable>

  <variable name="lon" shape="lon" type="float">
    <attribute name="units" type="String" value="degrees_east" />
    <values fromAttribute="@lon_range" />
  </variable>

If the data values had instead been specified within the variable rh, for example,

nt rh(time, lat, lon) ;
                rh:long_name = "relative humidity" ;
                rh:units = "percent" ;
                rh:lat_range = "41.0 40.0 39.0" ;
                rh:lon_range = "-109.0 -107.0 -105.0 -103.0" ;

then the proper format would be varName@attName or:

  <variable name="lat" shape="lat" type="float">
    <attribute name="units" type="String" value="degrees_north" />
    <values fromAttribute="rh@lat_range" />
  </variable>

  <variable name="lon" shape="lon" type="float">
    <attribute name="units" type="String" value="degrees_east" />
    <values fromAttribute="rh@lon_range" />
  </variable>

7.7.7. Removing Objects

Schema Overview

The remove element is used to remove attribute, dimension, variable or group objects in the referenced dataset. The XML schema for this is:

 <xsd:element name="remove">
    <xsd:complexType>
      <xsd:attribute name="name" type="xsd:string" use="required"/>
      <xsd:attribute name="type" type="ObjectType" use="required"/>
    </xsd:complexType>
  </xsd:element>

 <xsd:simpleType name="ObjectType">
   <xsd:restriction base="xsd:string">
     <xsd:enumeration value="attribute"/>
     <xsd:enumeration value="dimension"/>
     <xsd:enumeration value="variable"/>
     <xsd:enumeration value="group"/>
   </xsd:restriction>
 </xsd:simpleType>
Examples

The remove element is placed in the container of the object to be removed, and both the name and type attributes are required. Examples are:

<remove name="T" type="variable" />

<variable name="T" type="double">
  <remove name="long_name" type=attribute" />
</variable>

<remove name="time" type="dimension" />

7.7.8. Logical View Elements

Schema Overview
 <!-- logical view: use only a section of original  -->
 <xsd:element name="logicalSection">
   <xsd:complexType>
     <xsd:attribute name="section" type="xsd:token" use="required"/>  <!-- creates anonymous dimensions -->
   </xsd:complexType>
 </xsd:element>

 <xsd:element name="logicalSlice">
   <xsd:complexType>
     <xsd:attribute name="dimName" type="xsd:token" use="required"/>
     <xsd:attribute name="index" type="xsd:int" use="required"/>
   </xsd:complexType>
 </xsd:element>

 <xsd:element name="logicalReduce">
   <xsd:complexType>
     <xsd:attribute name="dimNames" type="xsd:string" use="required"/>
   </xsd:complexType>
 </xsd:element>
Examples

Suppose the original variable has extraneous dimensions "latitude" and "longitude":

<dimension name="time" length="143" />
<dimension name="pressure" length="63" />
<dimension name="latitude" length="1" />
<dimension name="longitude" length="1" />

<variable name="temperature" shape="time pressure latitude longitude" type="float">
  <attribute name="long_name" value="Sea Temperature" />
  <attribute name="units" value="Celsius" />
</variable>

They can be removed with:

<variable name="temperature">
  <logicalReduce dimNames="latitude longitude" />
</variable>

7.7.9. Data Types

The allowed data types are listed in the NcML schema as:

<xsd:simpleType name="DataType">
    <xsd:restriction base="xsd:token">
      <xsd:enumeration value="byte"/>
      <xsd:enumeration value="char"/>
      <xsd:enumeration value="short"/>
      <xsd:enumeration value="int"/>
      <xsd:enumeration value="long"/>
      <xsd:enumeration value="float"/>
      <xsd:enumeration value="double"/>
      <xsd:enumeration value="String"/>
      <xsd:enumeration value="string"/>
      <xsd:enumeration value="Structure"/>
      <xsd:enumeration value="Sequence"/>
      <xsd:enumeration value="opaque"/>
      <xsd:enumeration value="enum1"/>
      <xsd:enumeration value="enum2"/>
      <xsd:enumeration value="enum4"/>
    </xsd:restriction>
  </xsd:simpleType>

where:

  • Unsigned integer types (byte, short, int) are indicated with an _Unsigned="true" attribute on the variable.

  • A variable with type enum1, enum2 or enum4 will refer to an enumTypedef object.

7.8. Using NcML to Create Aggregations, or, the aggregation Element

7.8.1. Schema Overview

The schema is:

<xsd:element name="aggregation">
  <xsd:complexType>
    <xsd:sequence>
(1)  <xsd:choice minOccurs="0" maxOccurs="unbounded">
      <xsd:element ref="group"/>
      <xsd:element ref="dimension"/>
      <xsd:element ref="variable"/>
      <xsd:element ref="attribute"/>
      <xsd:element ref="remove"/>
     </xsd:choice>

(2)  <xsd:element name="variableAgg" minOccurs="0" maxOccurs="unbounded">
      <xsd:complexType>
       <xsd:attribute name="name" type="xsd:string" use="required"/>
      </xsd:complexType>
     </xsd:element>
(3) <xsd:element ref="promoteGlobalAttribute" minOccurs="0" maxOccurs="unbounded"/>
(4)  <xsd:element ref="cacheVariable" minOccurs="0" maxOccurs="unbounded"/>
(5)  <xsd:element ref="netcdf" minOccurs="0" maxOccurs="unbounded"/>
(6)  <xsd:element name="scan" minOccurs="0" maxOccurs="unbounded">
      <xsd:complexType>
(7)    <xsd:attribute name="location" type="xsd:string" use="required"/>
(8)    <xsd:attribute name="regExp" type="xsd:string" />
(9)    <xsd:attribute name="suffix" type="xsd:string" />
(10)   <xsd:attribute name="subdirs" type="xsd:boolean" default="true"/>
(11)   <xsd:attribute name="olderThan" type="xsd:string" />
(12)   <xsd:attribute name="dateFormatMark" type="xsd:string" />
(13)   <xsd:attribute name="enhance" type="xsd:string"/>
      </xsd:complexType>
     </xsd:element>

(14) <xsd:element name="scanFmrc" minOccurs="0" maxOccurs="unbounded">
      <xsd:complexType>
(7)    <xsd:attribute name="location" type="xsd:string"
(8)    <xsd:attribute name="regExp" type="xsd:string" />use="required"/>
(9)    <xsd:attribute name="suffix" type="xsd:string" />
(10)   <xsd:attribute name="subdirs" type="xsd:boolean" default="true"/>
(11)   <xsd:attribute name="olderThan" type="xsd:string" />
(12)   <xsd:attribute name="dateFormatMark" type="xsd:string" />
(13)
(14)

(15)   <xsd:attribute name="runDateMatcher" type="xsd:string" />
    <xsd:attribute name="forecastDateMatcher" type="xsd:string" />
    <xsd:attribute name="forecastOffsetMatcher" type="xsd:string" />
      </xsd:complexType>
     </xsd:element>
    </xsd:sequence>

(16) <xsd:attribute name="type" type="AggregationType" use="required"/>
(17) <xsd:attribute name="dimName" type="xsd:token" />
(18) <xsd:attribute name="recheckEvery" type="xsd:string" />
(19) <xsd:attribute name="timeUnitsChange" type="xsd:boolean"/>


      <!-- fmrc only  -->
(20) <xsd:attribute name="fmrcDefinition" type="xsd:string" />

</xsd:complexType>
</xsd:element>

where:

  • group, dimension, variable, attribute, remove - When inside the aggregation, these elements get applied to each dataset. When outside, they are applied to the aggregation.

  • variableAgg - Each variable to be aggregated via joinNew must be explicitly listed in one of these elements.

  • promoteGlobalAttribute - This element can be used - albeit only in outer aggregations - to specify global attributes to promote to a variable.

  • cacheVariable - In an outer aggregation, this element can be used to specify which variables should be cached.

  • netcdf - This element can be used to explicitly list nested NetCDF datasets.

  • scan - This element can be used to implicitly specify nested NetCDF datasets.

  • location - This attribute is used to specify the location of the directory to be scanned.

  • regExp - This attribute can be used to employ a regular expression to limit the number of files scanned.

  • suffix - This attribute can be used to limit the number of files scanned by their suffix, e.g. .nc or .grib.

  • subdirs - This attribute can be used to specify whether a scan should descend into subdirectories, and has a default value of true.`

  • olderThan - This attribute can be used to scan only those files whose last modified date is older than a specified amount of time. This provides a mechanism for exclusing files that are still being written, and must be a udunit time such as 5 min or 1 hour.

  • dateFormatMark - An attribute used with joinNew aggregations to create data coordinate values out of filenames.

  • runDateMatcher, forecastDateMatcher, forecastOffsetMatcher - For scanFmrc aggregations, a run date and forecast date are extracted from the file pathname using runDateMatcher and either forecastDateMatcher or forecastOffsetMatcher.

  • aggregationType - One of these must be specified.

  • dimName - This attribute must be specified for all aggregation types except joinUnion.

  • recheckEvery - This attribute applies only when using a scan element

  • timeUnitsChange - This attribute applies only to joinExisting and forecastModelRunCollection types. If set to true, the units of the joined coordinate variable may change.

7.8.2. Aggregation Types, or the aggregationType Attribute

The schema for the aggregationType attribute is:

 <!-- type of aggregation -->
 <xsd:simpleType name="AggregationType">
  <xsd:restriction base="xsd:string">
   <xsd:enumeration value="forecastModelRunCollection"/>
   <xsd:enumeration value="forecastModelRunSingleCollection"/>
   <xsd:enumeration value="joinExisting"/>
   <xsd:enumeration value="joinNew"/>
   <xsd:enumeration value="tiled"/>
   <xsd:enumeration value="union"/>
  </xsd:restriction>
 </xsd:simpleType>

The available aggregation types are:

They will each be described with examples in the ensuing sections.

Aggregation Using forecastModelRunCollection

If the forecast model output is spread out over multiple files, the forecastModelRunCollection aggregation type can be used to create an FMRC dataset.

Forecast Model Runs in Single Files

In the case where all the data for each forecast model run is contained within a single file, a new, outer dimension is created, and each file becomes one slice of the new dataset. An example is:

   <?xml version="1.0" encoding="UTF-8"?>
   <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" enhance="true" >
     <aggregation dimName="runtime" type="forecastModelRunCollection">
       <netcdf location="file:/data/ldm/NAM_CONUS_80km/Run_20060910_0000.grib1" coordValue="2006-09-10T00:00:00Z" enhance="true" />
       <netcdf location="file:/data/ldm/NAM_CONUS_80km/Run_20060910_0600.grib1" coordValue="2006-09-10T06:00:00Z" enhance="true" />
       <netcdf location="file:/data/ldm/NAM_CONUS_80km/Run_20060910_1200.grib1" coordValue="2006-09-10T12:00:00Z" enhance="true" />
     </aggregation>
   </netcdf>

where:

  • The netcdf element includes the attribute enhance with value true to add the coordinate systems needed to identify the forecast time coordinate.

  • A forecastModelRunCollection aggregation is declared, and an outer dimension called runtime will be added via the dimName attribute.

  • All the files in the collection are explicitly named via location, and their runtime coordinate values specified via coordValue. The coordinate values must be ISO 8601 formatted dates, and the files must contain all the output times for a single model run.

While this works fine for a few files, it can get tedious if there are dozens of files containing full single model runs. In that case, a scan element can be used as in the following example:

 <?xml version="1.0" encoding="UTF-8"?>
   <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" enhance="true" >
     <aggregation dimName="run" type="forecastModelRunCollection">
       <scan location="/data/ldm/NAM_CONUS_80km/" suffix=".grib1" dateFormatMark="Run_#yyyyMMdd_HHmm" enhance="true" />
     </aggregation>
   </netcdf>

where:

  • All the files in the directory /data/ldm/CONUS_80km/ that end in .grib1 will be aggregated.

  • The run time coordinate values will be extracted from the filename using the dateFormatMark attribute value Run_#yyyyMMdd_HHmm.

Both of these examples will create a runtime dimension and coordinate, and the time coordinate is expanded into a 2-D coordinate as required for an FMRC dataset. The NetCDF metadata will look like:

double time(run=3, time=11);
     :units = "hours since 2006-09-10T00:00:00Z";
     :long_name = "Coordinate variable for time dimension";
     :standard_name = "time";
     :_CoordinateAxisType = "Time";

If the time coordinates in each of the files do not have the same units, then the time values must be read in and adjusted to have a common unit. This would be required if, for example, some of the files had the time units in the example - hours since 2006-09-10T00:00:00Z - while others had the time units seconds since 1970-01-01T00:00:00Z. This is done by adding the timeUnitsChange attribute to the aggregation element, e.g.

    <aggregation dimName="run" type="forecastModelRunCollection" timeUnitsChange="true">

Forecast Model Runs in Multiple Files

If the data for each forecast model run is in multiple files, then nested aggregations must be used. An inner aggregation is used to join together the files that make one run, and an outer aggregation to make the runs into an FMRC dataset. The following example shows a single forecastModelRunCollection outer aggregation along with three variations on how to perform the inner aggregation, one each for successive run time coordinate values of:

2006-09-10T00:00:00Z
2006-09-10T06:00:00Z
2006-09-10T12:00:00Z
<?xml version="1.0" encoding="UTF-8"?>

 <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" enhance="true">
(1)<aggregation dimName="run" type="forecastModelRunCollection">

(2) <netcdf coordValue="2006-09-10T00:00:00Z">
      <aggregation dimName="Time" type="joinExisting">
        <netcdf location="file:/data/ldm/NAM_CONUS_80km/Run_20060910_0000/Hour_F00.grib1" coordValue="0"/>
        <netcdf location="file:/data/ldm/NAM_CONUS_80km/Run_20060910_0000/Hour_F03.grib1" coordValue="3"/>
        <netcdf location="file:/data/ldm/NAM_CONUS_80km/Run_20060910_0000/Hour_F06.grib1" coordValue="6"/>
      </aggregation>
(2) </netcdf>

(3) <netcdf coordValue="2006-09-10T06:00:00Z">
      <aggregation dimName="Time" type="joinExisting">
        <scan location="/data/ldm/NAM_CONUS_80km/Run_20060910_0600/" suffix=".grib1" />
      </aggregation>
(3) </netcdf>

(4) <netcdf coordValue="2006-09-10T12:00:00Z">
      <variable name="Time" shape="Time" type="int">
            <attribute name="long_name" value="Forecast Time"/>
        <attribute name="units" value="hours since 2006-09-10T12:00:00Z"/>
        <attribute name="_CoordinateAxisType" value="Time"/>
        <values start="0" increment="1"/>
      </variable>
      <aggregation dimName="Time" type="joinExisting">
        <scan location="/data/ldm/NAM_CONUS_80km/Run_20060910_1200/" suffix=".grib1" />
      </aggregation>
(4) </netcdf>

(1)</aggregation
 </netcdf>

where:

  • (1) is the outer forecastModelRunCollection aggregation that will consist of nested datasets.

  • (2) is the first variation of the inner aggregation with a run time coordinate value of 2006-09-10T00:00:00Z, a joinExisting aggregation is used on the existing Time dimension, and each file is explicitly listed along with its coordinate value for the forecast time. Since only one value is listed, the files must have only one forecast time coordinate.

  • (3) is the second variation of the inner aggregation with a run time coordinate of 2006-09-10T06:00:00Z+, and all files in the directory /data/ldm/NAM_CONUS_80km/Run_20060910_0600/ that end in .grib1 will be aggregated via a joinExisting aggregation.

  • (4) is the third variation of the inner aggregation with a run time coordinate of 2006-09-10T12:00:00Z. The coordinate variable Time for the aggregation dimension is defined and given attributes (long_name, units, _CoordinateAxisType) and values (Forecast_Time, hours since 2006-09-10T12:00:00Z, Time). The datasets will be sorted alphanumerically, and must be one hour apart (increment="1"). The +joinExisting aggregation type is again used to scan for files.

Aggregation Using forecastModelRunSingleCollection

In the special FMRC case where the data has a snigle time step in each file, and the runtime and forecast time can be ascertained from each file’s pathname, the forecastModelRunSingleCollection aggregation type can be used along with a special form of the scan element called scanFmrc. An example is:

 <?xml version="1.0" encoding="UTF-8"?>
 <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <aggregation dimName="run" type="forecastModelRunSingleCollection" timeUnitsChange="true" >
      <scanFmrc location="/data/grib/rtmodels/" regExp=".*_nmm\.GrbF[0-9]{5}$" runDateMatcher="yyMMddHH#_nmm.GrbF#"
           forecastOffsetMatcher="#_nmm.GrbF#HHH"/>
   </aggregation>
 </netcdf>

where:

  • The aggregation is declared as type forecastModelRunSingleCollection with variable forecast time units, i.e. a timeUnitsChange value of true.

  • The files in the directory /data/grib/rtmodels/ whose full pathname aggregation.

  • The files are grouped by run date by extracting that date from the filename using the runDateMatcher attribute and the pattern matching value yyMMddHH#_nmm.GrbF#.

  • Within each run, the forecast offset will be extracted using the forecastOffsetMatcher attribute with specified pattern #_nmm.GrbF#HHH.

The files in this example have names like:

06091212_nmm.GrbF03000

and the regExp ensures that only files that have a literal _nmm.GrbF in the name followed by exactly five numerical digits via the pattern matching string

[0-9]{5}$

where the $ indicates the end of the filename. The runDateMatcher value matches the literal _nmm.GrbF in the file’s full pathname, and then applies the Simple Date Format string yyMMddHH# to the eight characters that come before the match to derive the run date time coordinate. This would be the string 06091212 in the example filename. Similarly, the forecastDateMatcher value matches the same literal _nmm.GrbF to the file pathname, and then turns the three immediately following characters - via #HHH - into a double to calculate the hour offset from the run date, e.g. 300 in the example filename.

Another example of the scanFmrc section is:

<scanFmrc location="C:/data/rap/" suffix=".nc" subdirs="true"
              runDateMatcher="yyyyMMddHH#/wrfout_d01_#"
              forecastDateMatcher="#/wrfout_d01_#yyyy-MM-dd_HHmm"/>

where the files in the directory C:/data/rap/ and its subdirectories - because of +subdirs="true" - that end in .nc are scanned. An example filename for this example is:

C:/data/rap/2006070611/wrfout_d01_2006-07-06_080000.DPG_F.nc

The runDateMatcher matches the literal /wrfout_d01_ in the file’s pathname, and then applies the Simple Date Format string yyyyMMddHH to the 10 characters before the match - in this case 2006070611 - to derive the run date coordinate. The forecastDateMatcher matches the literal /wrfout_d01_ and then applies the Simple Date Format string yyyy-MM-dd_HHmm to the 15 characters after the match to derive the forecast time coordinate. In this example the - (dash) and _ (underscore) characters are literals, while the Y, M, d, H are special characters that match, respectively, year, month, day and hour numbers.

A variable that is also a coordinate will not be promoted to use the runtime dimension unless you explicitly enable it to do so via variableAgg. An example is:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
  <aggregation dimName="runtime" type="forecastModelRunCollection" recheckEvery="10min" timeUnitsChange="true">
    <variableAgg name="my_vertical_coord"/>
   <scan location="test" dateFormatMark="ncom_glb_reg7_#yyyyMMdd" subdirs="false"/>
 </aggregation>
</netcdf>

where my_vertical_coord is the variable that is also a coordinate.

Aggregation Using joinExisting

This example will use two NetCDF files jan.nc and feb.nc, with the header metadata for the former being:

netcdf jan {
dimensions:
        lat = 3 ;
        lon = 4 ;
        time = 31 ;
variables:
        double P(time, lat, lon) ;
        double T(time, lat, lon) ;
        float lat(lat) ;
        float lon(lon) ;
        int time(time) ;
  ...

and for the latter:

netcdf feb {
dimensions:
        lat = 3 ;
        lon = 4 ;
        time = 28 ;
variables:
        double P(time, lat, lon) ;
        double T(time, lat, lon) ;
        float lat(lat) ;
        float lon(lon) ;
        int time(time) ;
  ...

These are NetCDF files containing daily values of the variables P and T for January and February, and although the time values differ in each file the spatial variables, dimensions and attributes are indentical in each file.

A netcdf element containing NcML commands to perform the aggregation on an existing dimension - in this case time - is:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
  <aggregation dimName="time" type="joinExisting">
    <netcdf location="jan.nc" />
    <netcdf location="feb.nc" />
  </aggregation>
</netcdf>

The header metadata resulting from the aggregation will look like:

netcdf aggExisting.xml {
   dimensions:
     lat = 3;
     lon = 4;
     time = 59;
   variables:
     double P(time=59, lat=3, lon=4);
     double T(time=59, lat=3, lon=4);
     float lat(lat=3);
     float lon(lon=4);
     int time(time=59);
 }

The variables P, T and time are the aggregation variables since the aggregation dimension time is their first or outermost dimension. The contents of jan.nc and feb.nc are logically concatenated along this dimension in the order the datasets are listed within the netcdf element. The first 31 time indices are taken from jan.nc and the next 28 from feb.nc to produce the time dimension of 59 in the virtual aggregated dataset.

Specifying the Number of Coordinates Along the Outer Dimension

When THREDDS processes the NcML commands to create an aggregate, virtual file, it has to open and read through all the datasets to find the length of the outer - in this case time - dimension needed for the aggregate file. While this isn’t a problem with just a couple of files, if their are dozens or hundreds of files they would all have to be opened and processed, which could push the memory limitations of any machine and significantly slow down processing time. A netcdf attribute ncoords has been created to alleviate this possible problem. Its use in the immediately previous example would look like this:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
  <aggregation dimName="time" type="joinExisting">
    <netcdf location="jan.nc" ncoords="31"/>
    <netcdf location="feb.nc" ncoords="28"/>
  </aggregation>
</netcdf>

In this case, only the first file needs to be opened immediately, and the others as needed for a data read request.

Scanning with JoinExisting

In our previous examples for creating THREDDS catalogs without the use of NcML, we progressed from specifying individual datasets ourselves to letting the software scan directories to create the datasets automatically. Indeed, if we are attempting to serve hundreds of separate actual files then letting the software do most or all of the tedious work is really the only sensible way to proceed.

Fortunately, NcML has the capability to create an aggregation by scanning a directory in a similar manner to how datasetScan can scan a directory to create a configuration catalog. The difference is that the former will create a single, aggregate, virtual file that will make available the data from all the of the individual files in the directory, while the latter will create a catalog listing of all those individual files.

A simple example of NcML magic that scans all of the files in the directory /data/ocean - as well as all of the subdirectories therein - that end in .nc is:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
  <aggregation dimName="time" type="joinExisting">
    <scan location="/data/ocean/" suffix=".nc" />
  </aggregation>
</netcdf>

We see that the scanning is directed by information contained within the scan element. The directory to scan is indicated by the location attribute, and the files to select and use by the suffix attribute.

This example uses the joinExisting method to concatenate a series of files in the indicated directory, which are probably a set of daily, weekly or monthly files over a year that we desire to make available as a single aggregate dataset for the whole year.

Defining Coordinates on a JoinExisting Aggregation

Puzzling.

Aggregation Using joinNew

The NetCDF files time0.nc and time1.nc will be used for this example. The header data for the former is:

netcdf time0.nc {
 dimensions:
   lat = 3*;
   lon = 4;
 variables:
   double T(lat=3, lon=4);
   float lat(lat=3);
   float lon(lon=4);
}

and for the latter is:

netcdf time1.nc {
 dimensions:
   lat = 3;
   lon = 4;
 variables:
   double T(lat=3, lon=4);
   float lat(lat=3);
   float lon(lon=4);
}

A netcdf element containing NcML commands to join these files along a new dimension time is:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
   <aggregation dimName="time" type="joinNew">
     <variableAgg name="T"/>
     <netcdf location="time0.nc" coordValue="0"/>
     <netcdf location="time1.nc" coordValue="10"/>
   </aggregation>
</netcdf>

The dimName attribute defines the new variable time, and the variableAgg attribute specifies that the variable T will be made into and aggregation variable. Were there also a variable P in the NetCDF files, this and any number of other variables could also be made into aggregation variables in a similar manner. The NetCDF files are listed in the desired order, and each one has a coordinate value assigned to it via the coordValue attribute whose type must be compatible with the coordinate variable you have created.

The following virtual dataset header is created:

netcdf aggNew.ncml {
 dimensions:
   lat = 3;
   lon = 4;
   time = 2;
 variables:
   float lat(lat=3);
   float lon(lon=4);
   int time(time=2);
   double T(time=2, lat=3, lon=4);
 data:
    time = {0, 10}
 }

The following changes have been made as compared to the files comprising this aggregate file:

  • A time dimension has been created with a value of 2.

  • A time variable has been created.

  • The T variable has a new leading dimension time.

  • The time values specified via the coordValue attributes are listed in a data section of the aggregate file.

Defining Coordinates on a JoinNew Aggregation

The time variable created in our example has no units or other attributes. All we know is that the time variable has the values 0 and 10 with unknown units. For the joinNew aggregation procedure to be at all useful, there must be some procedure for defining time units in a manner compatible with the NetCDF files being used. There is such a procedure, and the following netcdf element shows an example of its use.

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">

   <variable name="time" type="int" >
     <attribute name="units" value="months since 2000-6-16 6:00"/>
     <attribute name="_CoordinateAxisType" value="Time" />
     <values>0 1</values>
   </variable>

   <aggregation dimName="time" type="joinNew">
   <variableAgg name="T"/>
    <netcdf location="time0.nc" />
    <netcdf location="time1.nc" />
   </aggregation>

</netcdf>

This defines a coordinate variable time with type attribute value int. There are also a couple of attribute elements that contain name attributes units and _CoordinateAxisType with respective values months since 2000-6-16 6:00 and Time. We have also specified the values 0 and 1 for the values of the two time coordinates, with the absolute time values then being - given the units specification above - 2000-6-16 6:00 for the first file and 2000-7-16 6:00 for the second file.

The coordinate values could also be assigned via coordValue attributes as in the previous example, for example:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">

   <variable name="time" type="int" >
     <attribute name="units" value="months since 2000-6-16 6:00"/>
     <attribute name="_CoordinateAxisType" value="Time" />
   </variable>

   <aggregation dimName="time" type="joinNew">
   <variableAgg name="T"/>
    <netcdf location="time0.nc" coordValue="0"/>
    <netcdf location="time1.nc" coordValue="1"/>
   </aggregation>

</netcdf>

The coordinate value must be explicitly specified as shown above in order to assign attributes to it, and since some attributes are mandatory under the CF Conventions we must do the former if we expect to follow the latter.

The details of specifying a time coordinate in a manner appropriate for use in a NetCDF file is covered in the NetCDF Climate and Forecast (CF) Metadata Conventions manual at:

The details specific to this example can be found in the Time Coordinate Section at:

It is worth the time to become familiar with the time coordinate specification recommendations found therein, since similar conventions are used throughout the geosciences.

Scanning with JoinNew

In our previous examples for creating THREDDS catalogs without the use of NcML, we progressed from specifying individual datasets ourselves to letting the software scan directories to create the datasets automatically. Indeed, if we are attempting to serve hundreds of separate actual files then letting the software do most or all of the tedious work is really the only sensible way to proceed.

Fortunately, NcML has the capability to create an aggregation by scanning a directory in a similar manner to how datasetScan can scan a directory to create a configuration catalog. The difference is that the former will create a single, aggregate, virtual file that will make available the data from all the of the individual files in the directory, while the latter will create a catalog listing of all those individual files.

In this example of scanning with joinNew, the directory /data/ocean holds NetCDF files containing, say, hourly ocean surface temperature fields for a longer period such as a month or even a year. But when we originally created these files from a model simulation we unwisely didn’t include a time coordinate within the NetCDF file, thinking that our system of naming the files, for example, 2002010101.nc in a YYYYMMDDHH format would suffice for the long term. NcML can save us from our shortsightedness and create a virtual aggregation file showing the user the time information in the format we should have included in the first place.

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">

  <variable name="time" type="int" shape="time" >
    <attribute name="units" value="hours since 2000-01-01 00:00"/>
    <attribute name="_CoordinateAxisType" value="Time" />
    <values start="0" increment="1" />
  </variable>

  <aggregation dimName="time" type="joinNew">
    <variableAgg name="T"/>
    <scan location="/data/ocean/" suffix=".nc" />
  </aggregation>

</netcdf>

Here the values are specified via start and increment attributes, i.e.

<values start="0" increment="1" />

which will give the first field the absolute time 2000-01-01 00:00, the second 2000-01-01 01:00, etc. We could also explictly list all of the values via, e.g.

<values>0 1 2 3 4 5 6 7 8 9 10</values>

but if we have more than a few dozen files and their time coordinate is evenly spaced it is much more elegant and economical to use the start/increment method to do this.

Aggregation Using tiled

The only documentation for this aggregation type is found in the Java docs and in the mailing list. The Java docs are at:

and a mailing list thread can be found at:

A tiled aggregation example from the mailing list is presented below. See the entire threaded discussion for further enlightenment.

<dataset name="tiled" ID="tiled" urlPath="tiled/Agg.nc">
  <metadata inherited="true">
    <serviceName>all</serviceName>
    <dataType>Grid</dataType>
  </metadata>
  <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
    <aggregation dimName="latitude longitude" type="tiled">
      <netcdf location="file:P:/foo_26_04_2011.nc.000.000" section="0:23,0:22"/>
      <netcdf location="file:P:/foo_26_04_2011.nc.001.000" section="0:23,22:44"/>
      <netcdf location="file:P:/foo_26_04_2011.nc.001.001" section="23:46,22:44"/>
      <netcdf location="file:P:/foo_26_04_2011.nc.001.002" section="48:72,22:44"/>
    </aggregation>
  </netcdf>
</dataset>

The originator says that while the tiling works as expected, they were required to generate the section part when it should already be clear from inspecting the lat/lon variables. They were also required to name every individual tile specificially, and couldn’t use scans or regular expressions. The thread died out without a resolution to this issue.

Aggregation Using union

An example that employs two two NetCDF files called tmp.nc and sal.nc. Running the ncdump command on each of these files shows the following header metadata for tmp.nc:

netcdf tmp.nc {
  dimensions:
   time = UNLIMITED;   // (456 currently)
   lat = 21;
   lon = 360;
 variables:
   float lat(lat=21);
   float lon(lon=36);
   double time(time=456);
   short cldc(time=456, lat=21, lon=36);
}

and the following header metadata for sal.nc:

netcdf sal.nc {
  dimensions:
   time = UNLIMITED;   // (456 currently)
   lat = 21;
   lon = 360;
 variables:
   float lat(lat=21);
   float lon(lon=36);
   double time(time=456);
   short lflx(time=456, lat=21, lon=36);
}

The following netcdf element contains the NcML commands for performing a union type aggregation:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
  <attribute name="title" type="string" value="Union tmp and sal"/>
  <aggregation type="union">
    <netcdf location="tmp.nc"/>
    <netcdf location="sal.nc"/>
  </aggregation>
</netcdf>

and will produce a virtual dataset with the metadata header:

netcdf aggUnionSimple.xml {
  dimensions:
     time = UNLIMITED;   // (456 currently)
     lat = 21;
     lon = 36;
  variables:
     float lat(lat=21);
            float lon(lon=36);
            double time(time=456);

     short cldc(time=456, lat=21, lon=36);
     short lflx(time=456, lat=21, lon=36);
}

Although the tmp.nc and sal.nc NetCDF files contain different variables tmp and sal, their dimensions, coordinate variables (time and space), and attributes are identical so they can be joined into a single virtual file containing both tmp and sal.

7.8.3. Extracting Time Coordinates from the Filename

We usually create the filenames of a series of file containing model results over a long period of time with encoded date and time information. For instance, the file:

ocean-2011071206.nc

could be a file containing model results or data beginning on July 12, 2011 at 0600 hours GMT. The dateFormatMark attribute of the NcML netcdf element. An example of how this is done is contained in the following netcdf element wherein the ocean-2011071206.nc file is accessed.

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
  <aggregation dimName="time" type="joinExisting" recheckEvery="4 sec">
     <scan location="/ocean/data dateFormatMark="ocean-#yyyyMMDDHH" suffix=".nc" />
  </aggregation>
</netcdf>

In this example the scan element specifies that all files in the the /ocean/data subdirectory are to scanned and included in the aggregation. The files would be sorted alphabetically on the filename if the dateFormatMark attribute were not specified, but instead will be sorted by the date derived from the filename.

The dateFormatMark attribute value ocean-yyyyMMDDHH indicates that the characters ocean- before the be skipped, after which a Java SimpleDateFormat string is read to create a date in a standard format. In this example, the Java string yyyyMMDDHH will be read and converted into a coordinate variable with the date format 2011-07-12T06:00:00Z within the virtual, aggregated file being created.

A tabular summary of this format can be found in the following table.

7.8.4. Aggregating Dynamically Changing Sets of Files

Special attention must be given to using the scan element in the case where the set of files being scanned changes as new files are added and/or old ones deleted. When you create an aggregate dataset by requesting it via its URL, the aggregation will not change as long the NetcdfDataset object exists. That is, if you create the aggregate dataset and a file is deleted from the underlying set of files you will get an error message if you try to access the portion of the aggregate dataset containing the contents of the deleted file. You can remedy this by recreating the aggregation via reloading the page, or by specifying additional attributes to the netcdf element. It’s always more elegant and usually less confusing to let the machine automatically handle such things.

This process can be automated via the use of the recheckEvery attribute as in the following example:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
  <aggregation dimName="time" type="joinNew" recheckEvery="15 min" >
    <variableAgg name="T"/>
    <scan location="/data/ocean/" suffix=".nc" />
  </aggregation>
</netcdf>

In this case, the files located at /data/ocean are rescanned and the aggregate dataset recreated every 15 minutes. The time unit used in the recheckEvery attribute value must be a udunits unit such as sec, min, hour, day, etc.

7.8.5. Aggregation Caching

If you have a very large collection of individual files comprising your aggregation dataset, you’d probably like to avoid having your machine opening every single file underyling the dataset every time it is accessed. It would be much more elegant and efficient to have to open only the subset of underlying files needed to fulfill a data request from the aggregated dataset. This can be done via the time-honored and heavily-used internet trick of caching our data, which for us is done by enabling Aggregation Caching in your THREDDS server. Specifically, this is done by adding a line to the Java ucar.nc2.ncml.Aggregation class that is contained within our distribution, with the line to add being something of the form:

Aggregation.setPersistenceCache( new DiskCache2("/.unidata/aggCache", true, 60 * 24 * 30, 60));

When this is added and enabled, all joinExisting aggregations will save information to special XML files within the specified directory. These files will allow the server to avoid opening every file to obtain coordinate values each time the dataset is opened. The first time it is opened the values are read and stored or cached for subsequent openings.

7.8.6. Nested Aggregation

The netcdf elements can be nested in aggregation elements, with an example being:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
 <aggregation dimName="time" type="joinExisting">
   <netcdf>
   <aggregation type="union">
    <netcdf location="/ocean/data/temperature_20080101.nc" />
    <netcdf location="/ocean/data/salinity_20080101.nc" />
   </aggregation>
  </netcdf>
    <netcdf>
   <aggregation type="union">
    <netcdf location="/ocean/data/temperature_20080102.nc" />
    <netcdf location="/ocean/data/salinity_20080102.nc" />
   </aggregation>
  </netcdf>
 </aggregation>
</netcdf>

In this example, the inner aggregations first join the files containing snapshot temperature and salinity fields with identical spatial and temporal variables and dimensions to create single aggregate files containing both at the single, identical time. The outer aggregation then joins all of these individual time files into a single, aggregated virtual file containing however many time fields there are in the files at the location being scanned, in this case two of them.

7.9. Debugging NcML

It’s just not easy to figure out why your NcML doesn’t work when it doesn’t work. You can look at the TDS server logs for hours and come away with nothing more than a headache from attempting to decipher Java error messages. The key to debugging NcML - as is detailed in the NetCDF-Java FAQ at

is to use the ToolsUI GUI interface to the NetCDF-Java library. The general procedure follows.

  • Extract the NcML that’s not working from the TDS catalog XML file that it’s in via the following steps:

    • Find out which dataset is causing the problem. This will correspond to a dataset element somewhere in the configuration file. Inside that will be a netcdf element that contains the NcML code. Extract the entire netcdf container element - for example, by cutting and pasting - into a separate file we’ll call test.ncml (with a ncml or xml suffix mandatory).

    • Add the XML header <?xml version="1.0" encoding="UTF-8"?> to the top of the test.ncml file.

    • If a recheckEvery attribute is present, remove it.

    • Ensure that the referenced datasets are readily available. If you’re dealing with an aggregation of many files, just copy at least two of them into the same directory into which the test.ncml file exists. Then use a scan element or explicitly list them in a netcdf element with the location being the relative path name.

  • Open test.ncml in the viewer tab of the ToolsUI program to check for NcML errors. This will show you directly what the modified dataset looks like. Now modify the test.ncml file and reopen it to see if you’ve fixed your problem. Repeat as necessary. (While it is possible to edit the file within ToolsUI, it is a primitive editor and almost certainly not as good as the external editor - vi, emacs, etc. - that you can already use blindfolded.)

  • If the dataset in question is a grid dataset, open it in the FeatureTypes/Grid tab to ensure that you can actually see a grid as a check for a complete coordinate system specification. If you don’t see what you expect - e.g. an obviously incorrect or missing grid - try using the CoordSys tab. If you’re still at a loss, try reading through the Coordinate Systems section of the CF Metadata Conventions Guide to familiarize yourself with what’s required and recommended. Then compare this to what you have in the dataset file you’re unsuccessfully attempting to read.

  • In the case of an aggregation, you can use the NcML/Aggregation tab to see if you’re aggregating what you think you’re aggregating.

  • In the case of an FMRC aggregation, the Fmrc/FmrcImpl tab will show show you what’s being aggregated.

  • After you’ve found your problem and things are working right inside the ToolsUI GUI, put the netcdf element back into the XML catalog file from which you originally extracted it, making sure to put back the recheckEvery attribute if you previously removed it. Then restart the server.

  • Open the TDS catalog you just modified in the THREDDS tab of the ToolsUI GUI. Navigate to the dataset you just modified, and either open as file or open as dataset to see if you get the same results as in the previous remediation steps. If so, you’re good to go, although you might want to access this catalog via a web browser as a triple check to ensure that everything’s copacetic.

7.10. NcML Examples

7.10.2. Nested Aggregations and Attribute Modifications

A dataset consists of 196 files with names of the form ocean_his_**.nc. These files are the output of a single, continuous simulation from 2003-02-01 to 2012-01-01 for which the output has been broken into 196 separate files to keep each individual file to a reasonable size. We wish to use the aggregation capabilities of NcML to create a single, virtual dataset from these 196 files. A typical 4-D variable we wish to make available is u, for which the NetCDF metadata is:

        float u(ocean_time, s_rho, eta_u, xi_u) ;
                u:long_name = "u-momentum component" ;
                u:units = "meter second-1" ;
                u:time = "ocean_time" ;
                u:coordinates = "x_u y_u s_rho ocean_time" ;
                u:field = "u-velocity, scalar, series" ;
                u:_FillValue = 1.e+37f ;

The coordinates attribute is our first problem. The CF standard requires longitude and latitude rather than x and y values, so we must modify the coordinates attribute values to meet the requirements. This leads to our second problem. While the variables x_u and y_u are present in the simulation result files with the following metadata:

        double x_u(eta_u, xi_u) ;
                x_u:long_name = "x-locations of U-points" ;
                x_u:units = "meter" ;
                x_u:field = "x_u, scalar" ;
        double y_u(eta_u, xi_u) ;
                y_u:long_name = "y-locations of U-points" ;
                y_u:units = "meter" ;
                y_u:field = "y_u, scalar" ;

there are no longitude and latitude variables in the files. They exist, but they are contained in a separate file called grd.nc. The lon/lat variables corresponding to x_u and y_u are lon_u and lat_u, and their metadata in the separate grd.nc file is:

        double lon_u(eta_u, xi_u) ;
                lon_u:units = "meters" ;
        double lat_v(eta_v, xi_v) ;
                lat_v:units = "meters" ;

Note that the grd.nc file has its own problems such as an incorrect value for the units attribute and some missing attributes required by the CF standard.

We could simply append the grd.nc to each of the 196 simulation result files, but that will take quite a while even with a program as fast and efficient as the ncrcat part of the NCO package. We will use the more elegant mechanism of virtually appending grd.nc to a virtual aggregation of all of the simulation result files. This will be done through the nested use of both the joinExisting and union aggregation modes.

The outer aggregation of the nesting pair will the union aggregation, and will have the basic form of:

        <aggregation type="union">
          <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
             <!-- aggregated simulation result files -->
          </netcdf>
          <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
             <!-- grid file -->
          </netcdf>
        </aggregation>

The first part of the union aggregation will be a virtual dataset constructed via the following nested joinExisting aggregation:

          <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
            <aggregation dimName="ocean_time" type="joinExisting">
              <scan location="/raid/data/txla_nesting6/"
                    regExp=".*ocean_his_[0-9]{4}\.nc$"/>
            </aggregation>
          </netcdf>

wherein all files in the filesystem location /raid/data/txla_nesting6/ of the form shown by the value of the regExp attribute will be aggregated by joining them along the ocean_time dimension.

The second part of the union aggregation will be the grd.nc file, which will be included via:

          <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"
                location="/raid/data/txla_nesting6/txla_grd_v4_new_lonlat.nc"/>

When these pieces are added to our skeletal union aggregation structure, we obtain the following nested aggregation structure wherein the simulation result files are first aggregated into a single virtual file, and then combined with the file containing the grid information.

      <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
        <aggregation type="union">
          <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
            <aggregation dimName="ocean_time" type="joinExisting">
              <scan location="/raid/data/txla_nesting6/"
                    regExp=".*ocean_his_[0-9]{4}\.nc$"/>
            </aggregation>
          </netcdf>
          <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"
                location="/raid/data/txla_nesting6/txla_grd_v4_new_lonlat.nc"/>
        </aggregation>
          <!--- further modifications -->
      </netcdf>

Now we must change the coordinates attributes of variables like u that have x_u and y+u instead of the lon_u and lat_u required by the CF conventions. This will be done by inserting the following NcML code:

        <variable name="u">
         <attribute name="coordinates" value="ocean_time s_rho lat_u lon_u"/>
        </variable>

into the further modifications section of our previous NcML code example. This will replace the improper coordinates attribute found in each of the actual simulation result files with the proper coordinates attribute in the single, virtual file.

This must be done for every variable in the ocean_his_**.nc files whose coordinates attribute values are x_* and y_* where the asterisk represents u, v, rho and psi. All the x_* and y_* values must be changed to lon_* and lat_*, respectively. Or, more accurately, this must be done for those variables you wish to aggregate and make available via NetCDF subsetting or DODS. If you only want to make u and v available, then you only need to make the appropriate changes for them. The other variables you don’t change simply won’t show up on the appropriate web interfaces for NetCDF subsetting and DODS.

We also need to add some attributes to the minimalist metadata in the grd.nc file. An example for lon_u would be:

        <variable name="lon_u">
          <attribute name="units" value="degrees_east"/>
          <attribute name="long_name" value="longitude of U-points"/>
          <attribute name="standard_name" value="longitude"/>
          <attribute name="field" value="lon_u, scalar"/>
        </variable>

If we desire to only make the u variable available via aggregation, then the following NcML will suffice. If we want to make other variables available, then additional NcML statements will have to be added in place of the ellipses shown in the example.

      <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
        <aggregation type="union">
          <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
            <aggregation dimName="ocean_time" type="joinExisting">
              <scan location="/raid/data/txla_nesting6/"
                    regExp=".*ocean_his_[0-9]{4}\.nc$"/>
            </aggregation>
          </netcdf>
          <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"
                location="/raid/data/txla_nesting6/txla_grd_v4_new_lonlat.nc"/>
        </aggregation>
         ...
        <variable name="u">
         <attribute name="coordinates" value="ocean_time s_rho lat_u lon_u"/>
        </variable>
         ....
        <variable name="lon_u">
          <attribute name="units" value="degrees_east"/>
          <attribute name="long_name" value="longitude of U-points"/>
          <attribute name="standard_name" value="longitude"/>
          <attribute name="field" value="lon_u, scalar"/>
        </variable>
        <variable name="lon_v">
          <attribute name="units" value="degrees_north"/>
          <attribute name="long_name" value="latitude of U-points"/>
          <attribute name="standard_name" value="latitude"/>
          <attribute name="field" value="lon_v, scalar"/>
        </variable>
         ...
      </netcdf>

In this example the following tasks were accomplished in the following order via NcML:

  1. The 196 actual simulation result files were aggregated into a single virtual file with the joinExisting aggregation type.

  2. The single virtual simulation result file was virtually combined with a file containing longitude and latitude information using the union aggregation type.

  3. Attributes for some variables were added or modified to meet CF standard requirements.

8. Feature Collections

8.2. Overview

A feature collection is a collection of CDM feature datasets. It is used to aggregate gridded and point datasets whose time and spatial coordinates are recognizied by the CDM software stack. This allows the TDS to automatically create logical or virtual datasets composed of collections of files, and to allow subsets in the coordinate space to be selected via the WMS, WCS and NetCDF Subset services.

The featureCollection element allows you to serve collections of feature datasets. A FeatureDataset is a container for FeatureType objects. Feature types or feature datasets can be divided into two major categories - grid and point feature datasets - with the grid feature type or grid feature dataset categories being grid, radial, swath and image. The point feature type or point feature dataset categories are point, time series station, profile, station profile, trajectory and section or trajectory profile.

8.3. XML Schema for featureCollection

The XML schema for the featureCollection element is:

  <xsd:element name="featureCollection" substitutionGroup="dataset">
    <xsd:complexType>
      <xsd:complexContent>
        <xsd:extension base="DatasetType">
          <xsd:sequence>
            <xsd:element type="collectionType" name="collection"/>
            <xsd:element type="updateType" name="update" minOccurs="0"/>
            <xsd:element type="updateType" name="tdm" minOccurs="0"/>
            <xsd:element type="manageType" name="manage" minOccurs="0"/>
            <xsd:element type="protoDatasetType" name="protoDataset" minOccurs="0"/>
            <xsd:element type="fmrcConfigType" name="fmrcConfig" minOccurs="0"/>
            <xsd:element type="pointConfigType" name="pointConfig" minOccurs="0"/>
            <xsd:element type="gribConfigType" name="gribConfig" minOccurs="0"/>
            <xsd:element ref="ncml:netcdf" minOccurs="0"/>
          </xsd:sequence>
          <xsd:attribute name="featureType" type="featureTypeChoice" use="required"/>
          <xsd:attribute name="path" type="xsd:string" use="required"/>
        </xsd:extension>
      </xsd:complexContent>
    </xsd:complexType>
  </xsd:element>

and shows us that we have the following elements and attributes to employ:

  • collection - for defining the collection of datasets in the manner of an NcML aggregation element;

  • update - for updating the dataset when the underlying collection of files changes often;

  • manage - for deleting older files;

  • protoDataset - for choosing a prototype dataset for the collection;

  • fmrcConfig - for defining grid feature collection datasets;

  • pointConfig - for defining point feature collection datasets;

  • gribConfig - for working with GRIB collection datasets;

  • featureType - an attribute for defining which feature type is being used; and

  • path - an attribute for specifying the location of the files underlying the collection dataset.

Now we will present an example and then proceed to fully explain the capabilities and limitations of each available element and attribute for the featureCollection element.

8.4. A featureCollection Element Example

An example employing the featureCollection element is:

<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
     xmlns:xlink="http://www.w3.org/1999/xlink" name="Unidata THREDDS Data Server" version="1.0.3">
  <service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/"/>

  <featureCollection featureType="FMRC" name="Atlantic Surface Velocities" harvest="true" path="fmrc/NCEP/GFS/Puerto_Rico_191km">
   <metadata inherited="true">
     <serviceName>ncdods</serviceName>
   </metadata>

   <collection spec="/data/testdata/2010TdsTW/fmrc/GFS_Puerto_Rico_191km.*grib1$"/>

  </featureCollection>

</catalog>

We will now review all of the subelements that exist within the featureCollection container element as shown in the XML schema above, as well as relate them to their use in our example when relevant.

8.5. The collection Element

The collection element defines the collection of datasets, and takes the place of an NcML aggregation element. The XML schema for this is:

  <xsd:complexType name="collectionType">
    <xsd:attribute name="spec" type="xsd:string" use="required"/>
    <xsd:attribute name="name" type="xsd:token"/>
    <xsd:attribute name="olderThan" type="xsd:string" />
    <xsd:attribute name="dateFormatMark" type="xsd:string"/>
    <xsd:attribute name="timePartition" type="xsd:string"/>
  </xsd:complexType>

and an example using the element is:

<collection spec="/data/ocean/uv/2003/07/atlantic_sfc_uv_#yyyyMMdd_HHmm#.nc$"
            name="Atlantic Surface Velocities" olderThan="1 min" olderThan="15 min" />

wherein we’re dealing with a directory /data/ocean/uv/2003/07/ with a set of files of the form:

...
atlantic_sfc_uv_20030712_1200.nc
atlantic_sfc_uv_20030712_1800.nc
atlantic_sfc_uv_20030713_0000.nc
....

that contain Atlantic Ocean surface velocity components starting on July 12, 2003 at 1200 hours UCT and continuing at 6 hour intervals thereafter.

8.5.1. The spec Attribute, or Specification String

The spec attribute holds the required collection specification string, which can be specified in many ways. This string creates a collection of files by scanning file directories and looking for matches. It can also optionally extract a date from a filename. Let’s start with a simple example, e.g.

<collection spec="/data/ocean/uv/2003/07/.*nc$"/>

where we’re matching all files in the directory /data/ocean/uv/2003/07/ that end with nc. The .*nc$ string is a regular expression that tries to match the path name after the directory path /data/ocean/uv/2003/07/, with the .* part matching any number of characters between that directory path and the nc$ part, which specifies that the file end with nc. While it is not necessary to use it to match this particular set of files, it is good practice to use the $ character to indicate that nc is the end of the matched string. This will ensure that you exclude, for example, files with names like atlantic_sfc_uv_20030712_1200.nc.old which is a fairly common practice for keeping at least one older version of a file around.

If you want to ensure that the file ends in .nc instead of just nc, you need to specify:

<collection spec="/data/ocean/uv/2003/07/.*\.nc$"/>

This is required since the period "." is a special character in regular expressions that specifies any number of characters. If you want to specify that a period be matched, you have to escape it by preceding it with a backslash, i.e. "\.".

Suppose that in this example you also had subdirectories for other months and years making for a directory tructure that looks like this:

/data/ocean/uv/2001/01
/data/ocean/uv/2001/02
...
/data/ocean/uv/2001/12
/data/ocean/uv/2002/01
...
/data/ocean/uv/2010/12

If you wanted to extract all the files for 2003 you could use the following for the specification:

<collection spec="/data/ocean/uv/2003/**/.*\.nc$"/>

where you’ll match all files ending with .nc in the directory /data/ocean/uv/2003 and all of its subdirectories. This will serve to match the files in all of the twelve monthly subdirectories under 2003.

You could additionally extract a run date for each file via the specification:

<collection spec="/data/ocean/uv/2003/**/atlantic_sfc_uv_#yyyyMMdd_HHmm#\.nc$"/>

where, once again, all of the files for 2003 ending in .nc are matched, and a run date is found by applying the template yyyyMMdd_HHmm to the part of the filename following atlantic_sfc_uv_. For the file atlantic_sfc_uv_20030712_1800.nc this would extract July 12, 2003 at 1800 hours UTC.

Constructing a Specification String the Easy Way

The simplest way to construct a specification string is to start with an example full file path for one of the files and change it step by step. Here we’ll start with

/data/ocean/uv/2003/07/atlantic_sfc_uv_20030712_1800.nc

First, we modify it to included the subdirectories we want by replacing the 07 with ** to obtain:

/data/ocean/uv/2003/**/atlantic_sfc_uv_20030712_1800.nc

Next, use the # character to demarcate the part of the string where the run date is encoded to obtain:

/data/ocean/uv/2003/**/atlantic_sfc_uv_#20030712_1800#.nc

Next, substitute a SimpleDataFormat matching string for the actual string to obtain:

/data/ocean/uv/2003/**/atlantic_sfc_uv_#yyyyMMdd_HHmm#.nc

Finally, ensure that the filename ends in .nc to obtain:

/data/ocean/uv/2003/**/atlantic_sfc_uv_#yyyyMMdd_HHmm#\.nc

Creating Collections from Catalogs or NcML

Warning: Very beta. No information found beyond the end of this page:

In addition to the collection specification string, you can also create collections using a catalog or NcML file. To create a collection from a catalog file, you use the catalog:catalogURL as the value for the spec attribute, e.g.

<collection spec="catalog:catalogURL"/>

This is done using the DatasetCollectionFromCatalog class documented at:

To create a collection from an NcML file, you pass the name of an NcML file, e.g.

<collection spec="atlantic_ocean_uv.ncml"/>

This is done using the DatasetCollectionFromNcml class, for which I can find no documentation.

8.5.2. The name Attribute

The name attribute is the collection name and - if specified - must be unique across all your TDS catalogs. It should be an easy to read identifier for indexing, logging and debugging. If not specified, the spec string is used, but that is not a good thing so it should be specified even though not required.

8.5.3. The olderThan Attribute

The olderThan attribute is optional and indicates that only files whose last modified datta is older than this are included. This serves to exclude large files that are still being written and may take a goodly amount of time to be fully written.

8.5.4. The recheckAfter Attribute

The recheckAfter attribute is optional and indicates that a new scan should take place whenever a request is received and this much time has passed since the last scan.

8.5.5. The dataFormatMark Attribute

The dataFormatMark attribute is used if dates are to be extracted from the full filesystem path of the files instead of just the file name.

8.6. The update Element

The update element is used to specify if and how the dataset will be updated. An example is:

<update startup="true" rescan="0 5 3 * * ? *" trigger="allow"/>

The attributes for this are:

  • startup - to specify whether (true) the collection is updated upon TDS startups;

  • rescan - a cron expression for periodically updating the collection; and

  • trigger - to specify whether updating is done in the background as opposed to when a request for it arrives.

A useful cron tutorial can be found at:

8.7. The manage Element

The manage element is for managing your collection by deleting files older than a certain data and time. An example is:

<manage deleteAfter="30 days" check="cron expr" />

The attributes are:

  • deleteAfter - a time specification for deleting files older than;

  • check - a cron expression for specifying when the collection should be checked for old files.

8.8. The protoDataset Element

Note: Needs examples.

The protoDataset element provides control over the choice of the prototype dataset used for the collection. The prototype dataset is used to populate the metadata for the feature collection.

The XML schema for this is:

 <xsd:complexType name="protoDatasetType">
   <xsd:sequence>
1)   <xsd:element ref="ncml:netcdf" minOccurs="0"/>
   </xsd:sequence>
2) <xsd:attribute name="choice" type="protoChoices"/>
3) <xsd:attribute name="change" type="xsd:string"/>
4) <xsd:attribute name="param" type="xsd:string"/>
 </xsd:complexType>

with the various attributes explained below.

  1. The ncml:netcdf attribute is optional and includes NcML elements that might be used to modify the prototype dataset.

  2. The choice attribute provides the choices First, Random, Penultimate and Latest, which are fairly self-explanatory.

  3. The change attribute is optionally used when you have rolling datasets wherein you have to change the prototype periodically since the old one will eventually get deleted. It specifies when the prototype dataset should be reselected using a cron expression.

  4. The param attribute is presently not implemented.

8.9. The fmrcConfig Element

8.9.2. Overview of Forecast Model Run Collections

A Forecast Model Run Collection (FMRC) is a collection of forecast model output. It uses a special kind of aggregation called FMRC Aggregation to create a dataset that has two time coordinates, a run time and a forecast time. An FMRC creates a 2-D time collection dataset, which can then be sliced up in various ways to create 1D time views of the model output.

Example Dataset with Two Time Coordinates

An example showing both the runtime and time coordinates in a grid dataset follows. First, we show the metadata and data for an example runtime:

String runtime(run=8);
     :long_name = "Run time for model";
     :standard_name = "forecast_reference_time";
     :_CoordinateAxisType = "RunTime";

  data:
  "2006-09-05T12:00:00Z", "2006-09-06T12:00:00Z", "2006-09-07T12:00:00Z", "2006-09-08T12:00:00Z",
  "2006-09-09T12:00:00Z", "2006-09-10T12:00:00Z", "2006-09-11T12:00:00Z", "2006-09-12T12:00:00Z"

The runtime variable is an array consisting of eight date strings, which may be either ISO 8601 or udunit date strings.

The time coordinate is the forecast or valid time. For the above example an appropriate example might be:

double time(run=8, time=16);
     :units = "hours since 2006-09-05T12:00:00Z";
     :long_name = "forecast (valid) time";
     :standard_name = "time";
     :_CoordinateAxisType = "Time";

   data:
   { 90.0,  96.0, 102.0, 108.0, 114.0, 120.0, 126.0, 132.0, 138.0, 144.0, 150.0, 156.0, 162.0, 168.0, 174.0, 180.0},
   {114.0, 120.0, 126.0, 132.0, 138.0, 144.0, 150.0, 156.0, 162.0, 168.0, 174.0, 180.0, 186.0, 192.0, 198.0, 204.0},
   {138.0, 144.0, 150.0, 156.0, 162.0, 168.0, 174.0, 180.0, 186.0, 192.0, 198.0, 204.0, 210.0, 216.0, 222.0, 228.0},
   {162.0, 168.0, 174.0, 180.0, 186.0, 192.0, 198.0, 204.0, 210.0, 216.0, 222.0, 228.0, 234.0, 240.0, 246.0, 252.0},
   {186.0, 192.0, 198.0, 204.0, 210.0, 216.0, 222.0, 228.0, 234.0, 240.0, 246.0, 252.0, 258.0, 264.0, 270.0, 276.0},
   {210.0, 216.0, 222.0, 228.0, 234.0, 240.0, 246.0, 252.0, 258.0, 264.0, 270.0, 276.0, 282.0, 288.0, 294.0, 300.0},
   {234.0, 240.0, 246.0, 252.0, 258.0, 264.0, 270.0, 276.0, 282.0, 288.0, 294.0, 300.0, 306.0, 312.0, 318.0, 324.0},
   {258.0, 264.0, 270.0, 276.0, 282.0, 288.0, 294.0, 300.0, 306.0, 312.0, 318.0, 324.0, 330.0, 336.0, 342.0, 348.0}

The time variable is two-dimensional, with the rows containing all the individual forecast times for each of the eight runtime values. For example, for the first runtime value 2006-09-05T12:00:00Z the first row of the 8x16 run array contains the 16 forecast times for that run time or:

90.0, 96.0, 102.0, 108.0, 114.0, 120.0, 126.0, 132.0, 138.0, 144.0, 150.0, 156.0, 162.0, 168.0, 174.0, 180.0

That is, for that particular simulation or run in a series, the simulation started at 90 hours after 2006-09-05T12:00:00Z and continued until 180 hours after 2006-09-05T12:00:00Z, making for a total simulation length of 90 hours.

There is significant overlap in the time array due to the forecast time overlapping of the simulations indicated by each runtime value. While the individual simulations are 90 hours long, the runtime values are only 24 hours apart. This is a common situation in operational forecasting, where prediction simulations are typically started one or more times a day in real time, with each individual simulation typically proceeding for several days of model time. We can see this overlapping by comparing the rows for the first two runtime values, after shifting one of them over a bit we readily see the overlap:

 90.0,  96.0, 102.0, 108.0, 114.0, 120.0, 126.0, 132.0, 138.0, 144.0, 150.0, 156.0, 162.0, 168.0, 174.0, 180.0
                            114.0, 120.0, 126.0, 132.0, 138.0, 144.0, 150.0, 156.0, 162.0, 168.0, 174.0, 180.0, 186.0, 192.0, 198.0, 204.0

The difference between the forecast time and the run time is known as the forecast offset or forecast hour.

The data variables will also generally have both the runtime and time dimensions, e.g.

 float Dew_point_temperature(run=8, time=16, height_above_ground1=1, y=689, x=1073);
     :units = "K";
     :long_name = "Dew point temperature @ height_above_ground";

Any dataset with the runtime dimensions and a 2-D time dimension - as in the example we have just analyzed - is called an FMRC dataset.

A excellent graphical representation of the FMRC concept can be found here:

and is reproduced in microscopic JPG format below, although if you click on this image you’ll get a much bigger and more readable one.

8.9.3. Aggregating Forecast Model Runs with fmrcConfig

Note - This entire subsection needs examples.

The fmrcConfig subelement of the featureCollection container element is used to configure feature collections with a featureType value of FMRC. An example of its use is:

  <featureCollection name="NCEP-GFS-CONUS_80km" featureType="FMRC" harvest="true" path="fmrc/NCEP/GFS/CONUS_80km">
     <metadata inherited="true">
       <documentation type="summary">good stuff</documentation>
     </metadata></metadata>

     <collection spec="/data/ldm/pub/native/grid/NCEP/GFS/CONUS_80km/GFS_CONUS_80km_#yyyyMMdd_HHmm#.grib1"
               recheckAfter="15 min"
               olderThan="5 min"/>
     <update startup="true" rescan="0 5 3 * * ? *" />
     <protoDataset choice="Penultimate" change="0 2 3 * * ? *" />
     <fmrcConfig regularize="true" datasetTypes="TwoD Best Files Runs ConstantForecasts ConstantOffsets" />
   </featureCollection>

The attributes available for fmrcConfig are explained in the following sections.

The regularize Attribute

If the regularize attribute is set to true, then the runs for a given hour - from time 0Z - are assumed to have the same forecast time coordinates. For example, if you have four model runs per day - starting at 0, 6, 12 and 18Z - and many days of model runs, all the 6Z runs for all the days will be assigned the same time coordinates. This can prove useful if their are missing forecast times to ensure that new time coordinates aren’t created.

The datasetTypes attribute

This attribute lists the dataset types that are exposed in the TDS catalog. The available attribute values are:

  • TwoD - a dataset with two time dimensions (run time and forecast time) that contains all the data in the collection;

  • Best - a dataset using the latest model data available for each possible forecast hour;

  • Files - each component file of the catalog collection is available separately as it would also be using datasetScan, with a latest file added if the catalog contains a latest Resolver service;

  • Runs - a model run dataset contains all the data for one run time;

  • ConstantForecasts - a constant forecast dataset is created from all the data that have the same forecast time, which contains successively shorter forecasts of the same endpoint;

  • ConstantOffsets - A constant offset dataset is created from all the data with the same offset from the beginning of the run.

The dataset Element

The dataset element allows you to define your own best dataset. This uses the same algorithm as the Best attribute for the datasetTypes element, but additionally allows you to exclude data based on its offset hour. The available attributes for this are:

  • name - a dataset name that must be unique within the fmrcConfig element; and

  • offsetsGreaterEqual - a number of forecast offset hours - i.e. forecast time minus run time - where files less than that are excluded.

8.10. The pointConfig Element

This is covered (slightly) at:

The pointConfig element defines options on feature collections with a featureType attribute value of POINT or STATION. An example of its use is:

<pointConfig datasetTypes="CdmrFeature Files" />

where the datasetTypes attribute has two possible values:

  • CdmrFeature - which creates a CmdrFeature dataset and service, with all of the file sin the collection treated as part of the same dataset; and

  • Files - where each component file in the collection is separately available as with a datasetScan.

As you can see in the example, both attribute values can be specified at the same time.

8.10.1. Point Data Example

An example of a point feature collection for point data is:

<featureCollection name="Surface Buoy Point Data" harvest="true" featureType="Point" path="nws/buoy/ncdecoded">
  <collection spec="/data/ldm/pub/decoded/netcdf/surface/buoy/Surface_Buoy_#yyyyMMdd_HHmm#.nc$" />
  <update startup="true" rescan="0 0/15 * * * ? *" trigger="allow"/>
  <protoDataset choice="Penultimate" />
  <pointConfig datasetTypes="cdmrFeature Files"/>
</featureCollection>

8.10.2. Profile Data Example

An example of a point feature collection for profile data is:

<featureCollection name="6 minute" featureType="STATION_PROFILE" harvest="true" path="station/profiler/wind/06min">
  <documentation type="summary">Six minute average data.</documentation>
  <collection spec="/data/ldm/pub/native/profiler/wind/06min/**/PROFILER_wind_06min_#yyyyMMdd#.nc"
     recheckEvery="15 min" olderThan="5 min" />
</featureCollection>

8.10.3. Station Data Example

An example of a point feature collection for station data is:

<featureCollection name="Metar Station Data" harvest="true" featureType="Station" path="nws/metar/ncdecoded">
 <metadata inherited="true">
   <documentation type="summary">Metars: hourly surface weather observations</documentation>
   <documentation xlink:href="http://metar.noaa.gov/" xlink:title="NWS/NOAA information"/>
   <documentation>In the U.S., METAR reports are taken once an hour between 50 minutes past the hour and the top of
     the (next) hour. All the observations taken within this time are considered to be for the same cycle.
   </documentation>
   <keyword>metar</keyword>
   <keyword>surface observations</keyword>
 </metadata>

 <property name="raw" value="report"/>
 <property name="resolution" value="20 min"/>

 <collection spec="/data/ldm/pub/decoded/netcdf/surface/metar/Surface_METAR_#yyyyMMdd_HHmm#.nc$" />
 <update startup="true" rescan="0 0/15 * * * ? *" trigger="allow"/>
 <protoDataset choice="Penultimate" />
 <pointConfig datasetTypes="cdmrFeature Files"/>
</featureCollection>

8.11. The gribConfig Element

See the GRIB Feature Collections section.

8.12. The path Attribute

The…

8.13. The featureType Attribute

8.13.1. Relevant Documentation

8.14. THREDDS Data Manager (TDM)

8.14.1. Overview

The THREDDS Data Manager (TDM) creates indexes for feature collections in a process separate from the TDS. It is configured via the same configuration catalogs for two situations:

  • For static datasets, TDM is first used to create the indexes and then the TDS is started.

  • For dynamic datasets, the TDM runs continously and sends messages to the TDS when a dataset changes.

8.14.2. Installation

Although the TDM can be run from anywhere, it is conventionally run from the directory http://tomcat.apache.org//content/tdm. A shell script is created to run the TDM. The general form is:

<JAVA> <JVM options> -Dtds.content.root.path=<content directory> -jar <TDM jar> [-tds <tdsServers>] [-cred <user:passwd>] [-showOnly] [-log level]

and a specific example is:

opt/jdk/bin/java -Xmx4g -Dtds.content.root.path=/opt/tds/content -jar tdm-4.5.jar -tds "http://thredds.unidata.ucar.edu/,http://thredds2.unidata.ucar.edu:8081/"

The pieces are:

  • <JAVA> - The path of the Java binary. A 64-bit JVM is recommended since large collections require a lot of memory.

  • <JVM options>

    • -Xmx4g - Enables 4 Gbytes of memory, and even more is better.

    • -Dtds.content.root.path=<content directory> - This passes the content directory as a system property, e.g. the configuration catalogs and threddsConfig.xml are found in <content directory>/thredds.

  • -jar tdm-4.5.jar - Executes the TDM from the jar file.

  • -tds <tdsServers> - An optional list of TDS to notify via a comma-separated list with no blanks, and only the scheme, host and optional port with a trailing slash, e.g. http://localhost:8081/.

  • -cred <user:passwd> - An optional username and password if you are sending notifications. If this is not included, you will be prompted for the password on startup, with the default username tdm.

  • -showOnly - An optional that causes the feature collections to be indexed to be listed, and then exits.

  • -log level - An optional way to set the log4j logging level (e.g. DEBUG, INFO (default), WARN, ERROR).

Troubleshooting Hints:

  • Ensure that the <JVM Options> including -Dtds.content.root.path come before the -jar <TDM jar>.

  • Ensure that the <content directory> does not include the THREDDS subdirectory, e.g. /opt/tds/content and not /opt/tds/content/thredds.

8.14.3. Running

If -tds is used without -cred upon startup, you will be prompted for the password for the tdm user. The enables starting the TDM without having to put a password in the startup script. Note that user tdm should be given only the role of tdsTrigger, which limits its functionality to only triggering of collection reloading.

A log file is created in the TDM working directory for each feature collection. The name is of the form fc.<collectionName>.log. These logs should be inspected if there are problems with the indexing.

8.14.4. Triggering the TDS

The TDM scans the files in the feature collection and rewrites the index files if it detects that the collection has changed. A trigger message can be enabled that will cause the TDM to update a feature collection dataset. This is enabled by configuring the TDS with the tdsTrigger role and the addition of a user tdm with that role. This is done by editing ${tomcat}/conf/tomcat-user.xml. For example,

<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
  <role ... />
  <role rolename="tdsTrigger"/>
  <user ... />
  <user username="tdm" password="secret" roles="tdsTrigger"/>
</tomcat-users>

Then the attribute trigger=allow is placed in the update and tdm catalog elements, e.g.

<update startup="nocheck" trigger="allow" />
<tdm rewrite="test" rescan="0 0/15 * * * ? *" trigger="allow"/>

8.14.5. Examples

Here is an example of using the TDS with a static dataset:

<featureCollection name="NOMADS CFSRR" featureType="GRIB2" harvest="true"
path="grib/NOMADS/cfsrr/timeseries">
  <metadata inherited="true">
    <dataType>GRID</dataType>
    <dataFormat>GRIB-2</dataFormat>
  </metadata>

  <collection name="NOMADS-cfsrr-timeseries"
spec="/san4/work/jcaron/cfsrr/**/.*grb2$"
                   dateFormatMark="#cfsrr/#yyyyMM" timePartition="directory"/>

  <tdm startup="always"/>
</featureCollection>

where:

  • startup="always" tells the TDM to index this dataset when TDM starts

  • A log file will be written to fc.NOMADS-cfsrr-timeseries.log in the TDM working directory.

  • The TDS will use the existing indexes, and will not monitor any changes in the dataset.

An example for a dynamic dataset is:

<featureCollection name="DGEX-Alaska_12km" featureType="GRIB2" harvest="true"
path="grib/NCEP/DGEX/Alaska_12km">
  <metadata inherited="true">
     <dataType>GRID</dataType>
     <dataFormat>GRIB-2</dataFormat>
  </metadata>

  <collection name="DGEX-Alaska_12km"
   spec="/data/ldm/pub/native/grid/NCEP/DGEX/Alaska_12km/.*grib2$"
   dateFormatMark="#DGEX_Alaska_12km_#yyyyMMdd_HHmm"
   timePartition="file"
   olderThan="5 min"/>

  <tdm rewrite="true" rescan="0 0/15 * * * ? *" trigger="allow"/>
  <update startup="never" trigger="allow" />
</featureCollection>

where:

  • The tdm element contains:

    • rewrite="test" - Tells the TDM to test for dataset changes.

    • rescan="0 0/15 * * * ? *" - Causes the directories to be rescanned every 15 minutes.

    • trigger="allow" - Enables the TDM to send a message to the TDS when the dataset index changes, which causes the TDS to reread the index for that dataset.

  • The update element contains:

    • startup="never" - Tells the TDS to read in the feature collection when starting up without checking or scanning the files.

    • trigger="allow" - Enables the TDS to receive messages from the TDM when the dataset changes.

9. CDM Feature Types

The CDM (or Scientific) Feature Type layer adds another layer of functionality on top of NetcdfDataset, by specializing in various kinds of data that are common in earth science. All Scientific Feature Types have georeferencing coordinate systems, from which a location in real physical space and time can be found, usually with reference to the Earth. Each adds special data subsetting methods which cannot be done efficiently or at all in the general case of NetcdfDataset objects.

9.1. Grid Feature Types

9.1.2. Grid Feature

A multidimensional grid with separable coordinates. A Grid Coordinate System at a minimum has a Lat and Lon coordinate axis, or a GeoX and GeoY coordinate axis plus a Projection that maps x, y to lat, lon. It usually has a time coordinate axis. It may optionally have a vertical coordinate axis, classified as Height, Pressure, or GeoZ. If it is a GeoZ axis, it may have a Vertical Transform that maps GeoZ to height or pressure. A Grid may also optionally have a Runtime and/or Ensemble coordinate axis.

A GridDatatype (aka GeoGrid or just Grid) has a Grid Coordinate System, whose dimensions are all connected, meaning that neighbors in index space are connected neighbors in coordinate space. This means that data values that are close to each other in the real world (coordinate space) are close to each other in the data array, and are usually stored close to each other on disk, making coordinate subsetting easy and efficient.

A GridCoordSystem wraps a georeferencing coordinate system. It always has 1D or 2D XHoriz and YHoriz axes, and optionally 1D vertical and 1D or 2D time axes. The XHoriz/YHoriz axes will be lat/lon if isLatLon() is true, otherwise they will be GeoX,GeoY with an appropriate Projection.

thredds/grid_data.jpg

9.1.3. Radial Feature

A connected set of radials using polar coordinates collected into sweeps. The Radial Datatype uses polar coordinates (elevation, azimuth, distance) to describe the location of its points in space, and it is referred as Radial Coordinate System. A Radial Coordinate System has a Elevation, Azimuth, and Distance coordinate axis. It may also has a Time coordinate axis. Generally, in our level II and level III radar products, there is a time variable at radial (elevation, azimuth), or sweep (elevation) level, so it is considered as a variable, rather than coordinate axis.

A RadialDatasetSweep has a collection of Radial Variables. The data in each Variable is organized into sweeps. A sweep is a connected set of radials. All the radials in a sweep are assumed to have the same number of gates and the same beam width and nyquist frequency. A radial is a set of data sampled along a straight line at constant intervals called the gate size. The radial’s geometry is described by an elevation and azimuth angles relative to some origin.

thredds/radial_data.jpg

9.1.4. Swath Feature

A two-dimensional grid with track and cross-track coordinates.

thredds/swath_data.jpg

9.1.5. Image Feature

A collection of pixels

A grid collection is a collection of grids with the same coordinate system, and a grid dataset is a set of related grid collections, e.g. all of the output from a model run.

9.2. Point Feature Types

9.2.1. Background

Overview

The point feature datasets contain PointFeatureCollections or NestedPointFeatureCollections. All of the point feature types are arrangements of collection of PointFeatures - a set of measurements at the same point in space and time - that are distinguised by the geometry and topology of the collections. The available point feature types are:

  • Point Data - Data located at different, unconnected locations.

    • Examples: earthquake and lightning data

  • Time Series or Station Data - Data located at named locations called stations. There can be more than one station, with the usual case being multiple data with different time coordinates for each.

    • Examples: weather station data, fixed ocean buoys

  • Profile Data - A series of connected observations along a vertical line. Each profile has a single latitude/longitude coordinate (possibly nominal), so the points along the profile differ only in their z coordinates and possibly time coordinates. There can be multiple profiles in the same file, with each profile having a unique identifier. If there is more than one profile with the same latitude/longitude location, you should use the Time Series Profile type.

    • Examples: atmospheric profiles from satellites, moving profilers

  • Time Series Profile Data - Profile data at fixed locations, with the data changing over time. This combines the Time Series Data and Profile Data types since we have a time series of profiles at fixed locations.

    • Examples: profilers, balloon soundings

  • Trajectory Data - A series of connected observations along a 1-D curve in time and space. There can be multiple trajectories in the same file, as long as each has a unique identifier.

    • Examples: aircraft data, drifting buoys

  • Trajectory of Profiles - A collection of profile features that originate along a trajectory, sometimes also called a section. That is, these are trajectories that have profile data at each latitude/longitude location.

    • Examples: ship soundings

Each of these feature datasets will now be further illustrated with a graphic that shows the variable and spatial-temporal structure of each. Each feature type will also be accompanied by a sample NetCDF header (i.e. a CDL representation of the file metadata), as will a sample, fairly generic Python script illustrating how to create a NetCDF file for the feature type.

CF Conventions

The Discrete Sampling Geometries section of the CF Conventions Standard gives standard names to these point feature types. They are the names used to label datasets via the featureType attribute. The CF conventions also supply annotated examples and recommendations for each type of discrete geometry/feature type.

NOAA NetCDF Templates

The NOAA NODC has developed NetCDF templates based on CF feature types, with the templates conforming to both the ACDD and CF conventions.

The templates corresponding to each discrete sampling geometry are:

9.2.2. Point Feature

A point feature is one or more paramters measured at one point in time and space, e.g. earthquake or lightning data, that has the following variable and spatial-temporal structure:

thredds/point_data.jpg

and for which a sample CDL file would look like:

dimensions:
   obs = 1234 ;
variables:
  double time(obs) ;
    time:standard_name = “time”;
    time:long_name = "time of measurement" ;
    time:units = "days since 1970-01-01 00:00:00" ;
  float lon(obs) ;
    lon:standard_name = "longitude";
    lon:long_name = "longitude of the observation";
    lon:units = "degrees_east";
  float lat(obs) ;
    lat:standard_name = "latitude";
    lat:long_name = "latitude of the observation" ;
    lat:units = "degrees_north" ;
  float alt(obs) ;
    alt:long_name = "vertical distance above the surface" ;
    alt:standard_name = "height" ;
    alt:units = "m";
    alt:positive = "up";
    alt:axis = "Z";
  float humidity(obs) ;
    humidity:standard_name = "specific_humidity" ;
    humidity:coordinates = "time lat lon alt" ;
attributes:
   :featureType = "point";

Note that there is no separate time dimension in this example. That is, it is not a time series of data but rather a set of 1234 observations of humidity, each of which has its own tuple of time, lon, lat and alt.

Suppose we have a text file point.txt containing the following values for date, alt, lon, lat and humidity in columnar format:

11941.25   236  -92.42   28.11   0.112
11920.75   402  -95.56   27.77   0.056
11960.50   345  -94.33   28.67   0.122
...
11980.00   556  -96.41   27.13   0.111

A Python/Numpy/netcdf4-python script to read this data file and create a NetCDF file with the CDL of the sample is:

#!/usr/bin/python

# Import the required Python packages.
import numpy,netCDF4

#  Open the point.txt file.
infile = open('point.txt')
inlines = infile.readlines()

#  Find the number of lines in the file.
nlines = len(inlines)

#  Create an output file point.nc and add a global attribute.
nc = netCDF4.Dataset('point.nc', 'w', format='NETCDF3_CLASSIC')
nc.featureType = "point"

#  Create the dimension obs.
nc.createDimension('obs',nlines)

#  Create the variables time, lon, lat, alt and humidity and add attributes.
time = nc.createVariable('time','f8',('obs',))
time.units = "days since 1970-01-01 00:00:00"
time.standard_name = "time"
time.long_name = "time of measurement"
lon = nc.createVariable('lon','f4',('obs',))
lon.standard_name = "longitude"
lon.long_name = "longitude of the observation"
lon.units = "degrees_east"
lat = nc.createVariable('lat','f4',('obs',))
lat.standard_name = "latitude"
lat.long_name = "latitude of the observation"
lat.units = "degrees_north"
alt = nc.createVariable('alt','f4',('obs',))
alt.long_name = "vertical distance above the surface"
alt.standard_name = "height"
alt.units = "m"
alt.postive = "up"
alt.axis = "Z"
humidity = nc.createVariable('humidity','f4',('obs',))
humidity.standard_name = "specific_humidity"
humidity.coordinates = "time lat lon alt"

# Loop over all the lines in the text file.
count = 0
for line in inlines:
# Split each line into an array containing the variable values.
        vars = line.split()
# Write each variable value to the corresponding NetCDF file variable.
        time[count] = float(vars[0])
        alt[count] = float(vars[1])
        lon[count] = float(vars[2])
        lat[count] = float(vars[3])
        humidity[count] = float(vars[4])
        count = count + 1

# Close the NetCDF file.
nc.close()

If the columns in the data file were separated by commas (CSV) instead of whitespace, you would make a simple modification to the split command by changing it to:

vars = line.split(',')
 method for handling an ASCII data file with a header preceding the
data is illustrated in the following time series example.

9.2.3. Time Series Station Feature

A time series station feature is a time series of data points all at the same location, with varying time, e.g. weather station data or fixed buoys. It has the following variable and spatial-temporal structure:

thredds/time_series_station_data.jpg

and for which a sample CDL file would look like:

dimensions:
  station = 10 ;  // measurement locations
  time = UNLIMITED ;
variables:
  float humidity(station,time) ;
    humidity:standard_name = "specific humidity" ;
    humidity:coordinates = "lat lon alt" ;
  double time(time) ;
    time:standard_name = "time";
    time:long_name = "time of measurement" ;
    time:units = "days since 1970-01-01 00:00:00" ;
  float lon(station) ;
    lon:standard_name = "longitude";
    lon:long_name = "station longitude";
    lon:units = "degrees_east";
  float lat(station) ;
    lat:standard_name = "latitude";
    lat:long_name = "station latitude" ;
    lat:units = "degrees_north" ;
  float alt(station) ;
    alt:long_name = "vertical distance above the surface" ;
    alt:standard_name = "height" ;
    alt:units = "m";
    alt:positive = "up";
    alt:axis = "Z";
  char station_name(station, name_strlen) ;
    station_name:long_name = "station name" ;
    station_name:cf_role = "timeseries_id";
attributes:
    :featureType = "timeSeries";

Note that while this is similar to the Point type CDL example, a time dimension has been added and the obs dimension has been replaced by a station dimension. We now have a set of measurements of humidity at a series of times at a set of stations, each of which is located at a different lon, lat, alt spatial point.

Suppose we have an ASCII file for each of a series of stations, with each containing columnar data in the same format. Example file station_a.txt contains:

ST-A  -92.42  28.11  5
11941.00   0.0052
11941.25   0.0057
11941.50   0.0061
11941.75   0.0059
11942.00   0.0058
...

with the first or header line containing the station name, longitude, latitude and altitude, and the remaining lines containing a series of time and specific humidity values. A Python script to read this data file and create a NetCDF file with the CDL of the sample - with the dimension numbers significantly reduced for simplicity - is:

#!/usr/bin/python2.7

import numpy,netCDF4

#  Set the number of lines in the data file header.
nhlines = 1

#  Create a Python list containing the names of the station files.
infiles = ['station_a.txt','station_b.txt','station_c.txt']
#  Calculate the number of stations.
nstations = len(infiles)

#  Create an output NetCDF file timeseries.nc and add a global attribute.
#  Note:  The format must be NETCDF4 to make the unlimited dimension
#         'time' the second dimension.
nc = netCDF4.Dataset('timeseries.nc', 'w', format='NETCDF4')
nc.featureType = "timeSeries"

#  Create the dimensions 'station' and 'time'.
nc.createDimension('station',nstations)
#  The value 'None' is used to create a record or unlimited dimension.
nc.createDimension('time',None)

#  Create the variables time, lon, lat, alt and humidity and add attributes.
time = nc.createVariable('time','f8',('time',))
time.units = "days since 1970-01-01 00:00:00"
time.standard_name = "time"
time.long_name = "time of measurement"
lon = nc.createVariable('lon','f4',('station',))
lon.standard_name = "longitude"
lon.long_name = "station longitude"
lon.units = "degrees_east"
lat = nc.createVariable('lat','f4',('station',))
lat.standard_name = "latitude"
lat.long_name = "station latitude"
lat.units = "degrees_north"
alt = nc.createVariable('alt','f4',('station',))
alt.long_name = "vertical distance above the surface"
alt.standard_name = "height"
alt.units = "m"
alt.postive = "up"
alt.axis = "Z"
sta = nc.createVariable('station_name','S',('station',))
sta.long_name = "station name"
sta.cf_role = "timeseries_id"
humidity = nc.createVariable('humidity','f4',('station','time',))
humidity.standard_name = "specific_humidity"
humidity.coordinates = "time lat lon alt"

ns = 0
#  Loop over number of stations.
for f in infiles:
        infile = open(f,'r')
        inlines = infile.readlines()
#  Find the number of lines in the file.
        nlines = len(inlines)

# Loop over all the lines in the file header.
        for n in range(0,nhlines):
# Split each line into an array containing the header variable values.
                print " n = ",n
                vars = inlines[n].split()
                print " vars = ",vars
# Write the station name, lon, lat and altitude to the NetCDF file.
                sta[ns] = vars[0]
                lon[ns] = float(vars[1])
                lat[ns] = float(vars[2])
                alt[ns] = float(vars[3])
                print " sta = ",sta[ns]

# Loop over all the data lines after the file header.
        count = 0
        for n in range(nhlines,nlines):
                print " n = ",n
                vars = inlines[n].split()
# Write times and the humidity values to the NetCDF file.
                time[count] = float(vars[0])
                humidity[ns,count] = float(vars[1])
                count = count + 1
# Close the station data file.
        infile.close()
        ns = ns + 1

# Close the NetCDF file.
nc.close()

Program Notes:

  • The output file timeseries.nc is opened using the NETCDF4 format value rather than the NETCDF3_CLASSIC format used in the point dataset example. In this case, the extended capabilities of NetCDF-4 were required to place an record array someplace other than the outer position. Given that many users still use tools that can only deal with the NetCDF-3 format, it is recommended as good practice - for a while longer at least - to use the NETCDF3_CLASSIC format whenever the additional capabilities of NetCDF-4 aren’t absolutely necessary.

9.2.4. Profile Feature

A profile feature is a set of data points along a vertical line, e.g. satellite profiles. It has the following variable and spatial-temporal structure:

thredds/profile_data.jpg

and for which a sample CDL file would look like:

dimensions:                                                     1
   z = 42 ;
   profile = 142 ;
variables:                                                      2
  int profile(profile) ;
    profile:cf_role = "profile_id";
  double time(profile);
    time:standard_name = "time";
    time:long_name = "time" ;
    time:units = "days since 1970-01-01 00:00:00" ;
  float lon(profile);
    lon:standard_name = "longitude";
    lon:long_name = "longitude" ;
    lon:units = "degrees_east" ;
  float lat(profile);
    lat:standard_name = "latitude";
    lat:long_name = "latitude" ;
    lat:units = "degrees_north" ;
  float z(z) ;
    z:standard_name = “altitude”;
    z:long_name = "height above mean sea level" ;
    z:units = "km" ;
    z:positive = "up" ;
    z:axis = "Z" ;
  float pressure(profile, z) ;
    pressure:standard_name = "air_pressure" ;
    pressure:long_name = "pressure level" ;
    pressure:units = "hPa" ;
    pressure:coordinates = "time lon lat z" ;
  float temperature(profile, z) ;
    temperature:standard_name = "surface_temperature" ;
    temperature:long_name = "skin temperature" ;
    temperature:units = "Celsius" ;
    temperature:coordinates = "time lon lat z" ;
  float humidity(profile, z) ;
    humidity:standard_name = "relative_humidity" ;May 10, 2011
    humidity:long_name = "relative humidity" ;
    humidity:units = "%" ;
    humidity:coordinates = "time lon lat z" ;
attributes:                                                     3
   :featureType = "profile";
1 The dimension section.
2 The variables section.
3 The global attributes section.

In this example, there are 142 vertical profiles, each of which has a single time, lat and lon associated with it. At each of these profiles 42 measurements of pressure, temperature and humidity were taken at the same set of 42 altitudes given by the variable z.

Suppose we had a set of ASCII text files containing the profile data, with a different file for each profile. A typical file profile_001.txt has the format:

001  11850.25  -92.42  28.11
0.01   1013   19.1   80
0.02   1007   19.0   78
0.03   1000   18.8   75
0.04    991   18.7   73
0.05    982   18.6   70

with the first line containing the profile number, date (in days since Jan. 1, 1970), longitude and latitude, and the remaining lines each containing a depth and the corresponding pressure, temperature and relative humidity at that depth.

A Python script to read this data file and create a NetCDF file with the CDL of the sample - with the numbers of profiles and depths greatly reduced to simplify the example - would be:

#!/usr/bin/python2.7

import numpy,netCDF4

#  Set the number of lines in the data file header.
nhlines = 1
#  Set the number of data lines following the header.
ndatlines = 5

#  Create a Python list containing the names of the profile files.
infiles = ['profile_001.txt','profile_002.txt','profile_003.txt']
#  Calculate the number of stations.
nprofiles = len(infiles)

#  Create an output NetCDF file profile.nc and add a global attribute.
nc = netCDF4.Dataset('profile.nc', 'w', format='NETCDF3_CLASSIC')
nc.featureType = "profile"

#  Create the dimensions 'z' and 'profile'.
nc.createDimension('z',ndatlines)
nc.createDimension('profile',nprofiles)

#  Create the variables profile, time, lon, lat, z, pressure, temperature and
#  humidity.
prof = nc.createVariable('profile','i4',('profile',))
prof.cf_role = "profile_id"
time = nc.createVariable('time','f8',('profile',))
time.units = "days since 1970-01-01 00:00:00"
time.standard_name = "time"
time.long_name = "time"
lon = nc.createVariable('lon','f4',('profile',))
lon.standard_name = "longitude"
lon.long_name = "longitude"
lon.units = "degrees_east"
lat = nc.createVariable('lat','f4',('profile',))
lat.standard_name = "latitude"
lat.long_name = "latitude"
lat.units = "degrees_north"
z = nc.createVariable('z','f4',('z',))
z.long_name = "height above mean sea level"
z.standard_name = "altitude"
z.units = "km"
z.postive = "up"
z.axis = "Z"
pres = nc.createVariable('pressure','f4',('profile','z',))
pres.standard_name = "air_pressure"
pres.long_name = "pressure level"
pres.units = "hPa"
pres.coordinates = "time lon lat z"
temp = nc.createVariable('temperature','f4',('profile','z',))
temp.standard_name = "surface_temperature"
temp.long_name = "skin temperature"
temp.units = "Celsius"
temp.coordinates = "time lon lat z"
humid = nc.createVariable('humidity','f4',('profile','z',))
humid.standard_name = "relative_humidity"
humid.long_name = "relative humidity"
humid.units = "%"
humid.coordinates = "time lon lat z"

np = 0
#  Loop over number of profiles.
for f in infiles:
        print " f = ",f
        infile = open(f,'r')
        inlines = infile.readlines()
#  Find the number of lines in the file.
        nlines = len(inlines)
# Loop over all the lines in the file header.
        for n in range(0,nhlines):
# Split each line into an array containing the header variable values.
                print " n = ",n
                vars = inlines[n].split()
                print " vars = ",vars
# Write the profile number, time, lon and lat to the NetCDF file.
                prof[np] = int(vars[0])
                time[np] = float(vars[1])
                lon[np] = float(vars[2])
                lat[np] = float(vars[3])
        count = 0
# Loop over all the data lines after the file header.
        for n in range(nhlines,nlines):
                print " n = ",n
                vars = inlines[n].split()
# Write z, pressure, temperature and humidity values to the NetCDF file.
                z[count] = float(vars[0])
                pres[np,count] = float(vars[1])
                temp[np,count] = float(vars[2])
                humid[np,count] = float(vars[3])
                count = count + 1
# Close the station data file.
        infile.close()
        np = np + 1
# Close the NetCDF file.
nc.close()

Program Notes:

  • Rewriting the 1-D z array for every profile isn’t the most elegant solution in this case, but it’s not computationally expensive. In this example writing half again as much code to avoid such redundancy would serve no purpose beyond elegance for the sake of elegance.

  • Creating a list of infiles entries by hand would be tedious in the extreme if we had to deal with all 142 of the profiles in the CDL example. If the names of the ASCII files containing the profile data were numbered sequentially, e.g. profile_001.txt, profile_002.txt to profile_142.txt, we could programmatically create the filename in the outer loop by concatenating profile_, a string created from an integer counting variable, and .txt. If the filenames aren’t sequentially numbered but are all contained within a single directory and have a standard filename format, we can read the filenames into a Python list from within the program and iterate over that list as our outer loop.

9.2.5. Station Profile Feature

A station profile feature is a time series of Profile Features at a named location, e.g. profilers or balloon soundings. It has the following variable and spatial-temporal structure:

thredds/time_series_profile_station_data.jpg

and for which a sample CDL file would look like:

dimensions:
  station = 22 ;
  profile = 3002 ;
  z = 42 ;
variables:
  float lon(station) ;
    lon:standard_name = "longitude";
    lon:long_name = "station longitude";
    lon:units = "degrees_east";
  float lat(station) ;
    lat:standard_name = "latitude";
    lat:long_name = "station latitude" ;
    lat:units = "degrees_north" ;
  char station_name(station, name_strlen) ;
    station_name:cf_role = "timeseries_id" ;
    station_name:long_name = "station name" ;
  int station_info(station) ;
    station_name:long_name = "some kind of station info" ;
  float alt(station, profile , z) ;
    alt:standard_name = “altitude”;
    alt:long_name = "height above mean sea level" ;May 10, 2011
    alt:units = "km" ;
    alt:positive = "up" ;
    alt:axis = "Z" ;
  double time(station, profile ) ;
    time:standard_name = "time";
    time:long_name = "time of measurement" ;
    time:units = "days since 1970-01-01 00:00:00" ;
    time:missing_value = -999.9;
  float pressure(station, profile , z) ;
    pressure:standard_name = "air_pressure" ;
    pressure:long_name = "pressure level" ;
    pressure:units = "hPa" ;
    pressure:coordinates = "time lon lat alt" ;
  float temperature(station, profile , z) ;
    temperature:standard_name = "surface_temperature" ;
    temperature:long_name = "skin temperature" ;
    temperature:units = "Celsius" ;
    temperature:coordinates = "time lon lat alt" ;
  float humidity(station, profile , z) ;
    humidity:standard_name = "relative_humidity" ;
    humidity:long_name = "relative humidity" ;
    humidity:units = "%" ;
    humidity:coordinates = "time lon lat alt" ;

attributes:
:featureType = "timeSeriesProfile";

Suppose you have a set of ASCII data files, with each file containing a time series of profiles at a single station in the following format:

AN  023 -92.42  28.11
11850.00
0.01   1013   19.1   80
0.02   1007   19.0   78
0.03   1000   18.8   75
0.04    991   18.7   73
0.05    982   18.6   70
11851.00
0.01   1011   19.2   81
0.02   1005   19.1   79
0.03    998   18.9   76
0.04    989   18.8   74
0.05    980   18.7   71
11852.00
0.01   1009   19.3   82
0.02   1005   19.2   80
0.03    997   19.0   77
0.04    992   18.9   75
0.05    988   18.8   72

where the first line contains the station name, station number, longitude and latitude. The next line contains the time (based on the units attribute of the time variable), followed by a series of lines containing the altitude, pressure, temperature and humidity for each of the set of altitudes. This pattern after the initial station name line is repeated until the time series of profiles is exhausted.

A Python script to read a set of these data files and create a NetCDF file with the CDL of the sample follows. Note that this script requires that all of the station files have the same number of times in all of the time series. However, the time variable is created with two dimensions so the specific times comprising the equal length time series may differ from station file to station file. The specific altitude values can also vary, although there must be the same number of them within each file.

#!/usr/bin/python2.7

import numpy,netCDF4

#  Set the number of times in the time series.
ntimes = 3
#  Set the number of lines in the major file header.
maj_hlines = 1
#  Set the number of lines in the minor file header.
min_hlines = 1
#  Set the number of data lines following the header.
ndatlines = 5

#  Create a Python list containing the names of the station files.
infiles = ['an_profile.txt','bn_profile.txt']
#  Calculate the number of stations.
nstations = len(infiles)

#  Create an output NetCDF file profile.nc and add a global attribute.
nc = netCDF4.Dataset('timeseries_profile.nc', 'w', format='NETCDF4')
nc.featureType = "timeSeriesProfile"

#  Create the dimensions 'z' and 'profile'.
nc.createDimension('z',ndatlines)
nc.createDimension('profile',ntimes)
nc.createDimension('station',nstations)

#sta = nc.createVariable('station_name','S',('station',))
#  Create the variables profile, time, lon, lat, z, pressure, temperature and
#  humidity.
sta_name = nc.createVariable('station_name','S',('station',))
sta_name.cf_role = "timeseries_id"
sta_name.long_name = "station name"
sta_nfo = nc.createVariable('station_info','i4',('station',))
sta_nfo.long_name = "some kind of station info"
time = nc.createVariable('time','f8',('station','profile',))
time.units = "days since 1970-01-01 00:00:00"
time.standard_name = "time"
time.long_name = "time of measurement"
time.missing_value = "-999.9"
lon = nc.createVariable('lon','f4',('station',))
lon.standard_name = "longitude"
lon.long_name = "station longitude"
lon.units = "degrees_east"
lat = nc.createVariable('lat','f4',('station',))
lat.standard_name = "latitude"
lat.long_name = "station latitude"
lat.units = "degrees_north"
alt = nc.createVariable('alt','f4',('station','profile','z',))
alt.long_name = "height above mean sea level"
alt.standard_name = "altitude"
alt.units = "km"
alt.postive = "up"
alt.axis = "Z"
pres = nc.createVariable('pressure','f4',('station','profile','z',))
pres.standard_name = "air_pressure"
pres.long_name = "pressure level"
pres.units = "hPa"
pres.coordinates = "time lon lat z"
temp = nc.createVariable('temperature','f4',('station','profile','z',))
temp.standard_name = "surface_temperature"
temp.long_name = "skin temperature"
temp.units = "Celsius"
temp.coordinates = "time lon lat z"
humid = nc.createVariable('humidity','f4',('station','profile','z',))
humid.standard_name = "relative_humidity"
humid.long_name = "relative humidity"
humid.units = "%"
humid.coordinates = "time lon lat z"

ns = 0
#  Loop over number of stations.
for f in infiles:
        lineno = 0
        infile = open(f,'r')
        inlines = infile.readlines()
#  Find the number of lines in the file.
        nlines = len(inlines)
#  Split the major header and read the station name, lon and lat.
        vars = inlines[lineno].split()
        sta_name[ns] = vars[0]
        sta_nfo[ns] = int(vars[1])
        lon[ns] = float(vars[2])
        lat[ns] = float(vars[3])

        lineno = lineno + min_hlines

#  Loop over the number of profile times in the time series at the station.
        for nt in range(0,ntimes):

#  Read the time from the subheader and convert it to a floating point number.
                time[ns,nt] = float(inlines[lineno])
                line_start = lineno + min_hlines
                line_end = line_start + ndatlines
                count = 0
# Loop over all the data lines after the file header.
                for n in range(line_start,line_end):
                        vars = inlines[n].split()
# Write z, pressure, temperature and humidity values to the NetCDF file.
                        alt[ns,nt,count] = float(vars[0])
                        pres[ns,nt,count] = float(vars[1])
                        temp[ns,nt,count] = float(vars[2])
                        humid[ns,nt,count] = float(vars[3])
                        count = count + 1
                lineno = line_end
# Close the station data file.
        infile.close()
        ns = ns + 1
# Close the NetCDF file.
nc.close()

Program Notes:

  • This program is only as robust as the regularity of the input data files. As written it cannot handle a set of station files containing time series of different lengths, and will crash if it encounters such a set. The more regularity you can build into your ASCII data sets, the easier it will be to write a program to convert them into NetCDF files. The less regularity you build into your ASCII data sets, the more exception statements you’ll need to include in the program until it’s much bigger than the one you started with. Like all such things, it’s a trade-off.

9.2.6. Trajectory Feature

A trajectory feature is a set of data points along a 1-D curve in time and space, e.g. aircraft data, ship data, or drifting buoys. It has the following variable and spatial-temporal structure:

thredds/trajectory_data.jpg

and for which a sample CDL file would look like:

dimensions:
  obs = 1000 ;
  trajectory = 77 ;
variables:
  char trajectory(trajectory, name_strlen) ;
    trajectory:cf_role = "trajectory_id";
    trajectory:long_name = "trajectory name" ;
  int trajectory_info(trajectory) ;
    trajectory_info:long_name = "some kind of trajectory info"
  double time(trajectory, obs) ;
    time:standard_name = "time";
    time:long_name = "time" ;
    time:units = "days since 1970-01-01 00:00:00" ;
  float lon(trajectory, obs) ;
    lon:standard_name = "longitude";
    lon:long_name = "longitude" ;
    lon:units = "degrees_east" ;
  float lat(trajectory, obs) ;
    lat:standard_name = "latitude";
    lat:long_name = "latitude" ;
    lat:units = "degrees_north" ;
  float z(trajectory, obs) ;
    z:standard_name = “altitude”;
    z:long_name = "height above mean sea level" ;
    z:units = "km" ;
    z:positive = "up" ;
    z:axis = "Z" ;
  float O3(trajectory, obs) ;
    O3:standard_name = “mass_fraction_of_ozone_in_air”;
    O3:long_name = "ozone concentration" ;
    O3:units = "1e-9" ;
    O3:coordinates = "time lon lat z" ;
  float NO3(trajectory, obs) ;
    NO3:standard_name = “mass_fraction_of_nitrate_radical_in_air”;
    NO3:long_name = "NO3 concentration" ;
    NO3:units = "1e-9" ;
    NO3:coordinates = "time lon lat z" ;
attributes:
   :featureType = "trajectory";

In this example there are 77 separate trajectories for which measurements of O3 and NO3 have been made at 1000 points along each separate trajectory. And the measurement of those variables is associated with a separate time, lat, lon and z for each of the 1000 points, with those being different for every trajectory.

A Python/Numpy/netcdf4-python script to read this data file and create a NetCDF file with the CDL of the sample would be:

#!/usr/bin/python

import sys,os
import numpy as np
import netCDF4
...

9.2.7. Section or Trajectory Profile Feature

A section feature is a collection of Profile Features which originate along a Trajectory, e.g. ship soundings. It has the following variable and spatial-temporal structure:

thredds/section_data.jpg

and for which a sample CDL file would look like:

dimensions:
  trajectory = 22 ;
  profile = 33;
   z = 42 ;
variables:
  int trajectory (trajectory ) ;
    trajectory:cf_role = "trajectory_id" ;
  float lon(trajectory, profile) ;
    lon:standard_name = "longitude";
    lon:units = "degrees_east";
  float lat(trajectory, profile) ;
    lat:standard_name = "latitude";
    lat:long_name = "station latitude" ;
    lat:units = "degrees_north" ;
  float alt(trajectory, profile , z) ;
    alt:standard_name = “altitude”;
    alt:long_name = "height above mean sea level" ;
    alt:units = "km" ;
    alt:positive = "up" ;
    alt:axis = "Z" ;
  double time(trajectory, profile ) ;
    time:standard_name = "time";
    time:long_name = "time of measurement" ;
    time:units = "days since 1970-01-01 00:00:00" ;
    time:missing_value = -999.9;
  float pressure(trajectory, profile , z) ;
    pressure:standard_name = "air_pressure" ;
    pressure:long_name = "pressure level" ;
    pressure:units = "hPa" ;
    pressure:coordinates = "time lon lat alt" ;
  float temperature(trajectory, profile , z) ;
    temperature:standard_name = "surface_temperature" ;
    temperature:long_name = "skin temperature" ;May 10, 2011
    temperature:units = "Celsius" ;
    temperature:coordinates = "time lon lat alt" ;
  float humidity(trajectory, profile , z) ;
    humidity:standard_name = "relative_humidity" ;
    humidity:long_name = "relative humidity" ;
    humidity:units = "%" ;
    humidity:coordinates = "time lon lat alt" ;
attributes:
:featureType = "trajectoryProfile";

A Python/Numpy/netcdf4-python script to read this data file and create a NetCDF file with the CDL of the sample would be:

#!/usr/bin/python

import sys,os
import numpy as np
import netCDF4
...

10. GRIB Feature Collections

10.1. Background

10.1.2. Introduction

GRIB Feature Collection Datasets are collections of GRIB records, which contain gridded data typically from numeric model output. Because of the complexity of how GRIB data is written and stored, the TDS has developed specialized handling of GRIB datasets called GRIB Feature Collections. The features include:

  • the automatic creation of a virtual dataset from the specification of a collection of GRIB-1 or GRIB-2 files

  • an indexing scheme that enables fast access and scalability to very large datasets

  • support for multiple horizontal domains, each of which is placed in a separate group

  • full support for interval time coordinates

  • collections that keep track of both reference time and valid time, with the collection partitioned by the former

  • two datasets for every partition collection: TwoD and Best

  • two time coordinates for the TwoD datasets that correspond to those used in FMRC TwoD datasets

  • a single forecast time coordinate for the Best dataset that is the same as that for GRIB collections and FMRC Best datasets

Mapping a GRIB Collection to Multidimensional Variables in the CDM

A GRIB file is an unordered collection of GRIB records. A GRIB record consists of a single 2D (x, y) slice of data. The CDM library reads a GRIB file and creates a 2, 3,4, or 5 dimension Variable (time, ensemble, z, y, x), by finding the records with the same parameter, with different time / level / ensemble coordinates. Given various limitations and shortcomings of the GRIB-1 and GRIB-2 standards, this is not a trivial task and is basically one that involves guessing the dataset schema and the intent of the data provider. It involves significantly more arbitrariness than one would like. The details of the mappings are available in the GRIB mapping appendix.

10.2. Indexing

Index files are created during GRIB collection processing. Files with suffixes .ncx[2] and .gbx9 will be created for each GRIB file in the collection. Both types of index files are binary, private formats for the CDM, whose format may change. Both files are placed in the same directory that contains the GRIB file from which they are derived.

10.2.1. The ncx Index Files

For each GRIB file, an index file with suffix .ncx (or .ncx2 for GRIB-2 files) is written. This contains all the metadata and coordinate information for the collection. It is generally under a few megabytes in size and - once created - enables accessing the GRIB collection much faster than it would be without it. It will usually be updated if the file changes, but the issue can be forced by manually deleting it.

10.2.2. The gbx9 Index Files

For each GRIB file, an index file with suffix .gbx9 is created. This contains everything is the original GRIB file but the data. It is generally 2 to 3 orders of magnitude smaller than the original GRIB file. It is written once and never rewritten unless the GRIB file changes, at which point the CDM should detect the change and rewrite the index file. If there is some uncertainty in the matter, the .gbx9 file(s) can be manually deleted and it will be automatically recreated.

10.3. The gribConfig Element

The way that GRIB collection processing can be modified by the user is via the use of the gribConfig element with the TDS configuration catalog or in NcML.

10.3.1. Element Schema

<xsd:complexType name="gribConfigType">
  <xsd:sequence>
   <xsd:element name="gdsHash" minOccurs="0">
     <xsd:complexType>
       <xsd:attribute name="from" type="xsd:int" use="required"/>
       <xsd:attribute name="to" type="xsd:int" use="required"/>
     </xsd:complexType>
   </xsd:element>

   <xsd:element name="gdsName" minOccurs="0" maxOccurs="unbounded">
     <xsd:complexType>
       <xsd:attribute name="hash" type="xsd:int"/>
       <xsd:attribute name="groupName" type="xsd:string"/>
     </xsd:complexType>
   </xsd:element>

   <xsd:element name="useGenType" minOccurs="0" maxOccurs="1"/>

   <xsd:element name="intvFilter" minOccurs="0" maxOccurs="unbounded">
     <xsd:complexType>
       <xsd:sequence>
         <xsd:element name="variable" minOccurs="0" maxOccurs="unbounded">
           <xsd:complexType>
             <xsd:attribute name="id" type="xsd:string" use="required"/>
             <xsd:attribute name="prob" type="xsd:string" use="optional"/>
           </xsd:complexType>
         </xsd:element>
       </xsd:sequence>
       <xsd:attribute name="excludeZero" type="xsd:boolean" use="optional"/>
       <xsd:attribute name="intvLength" type="xsd:int" use="optional"/>
     </xsd:complexType>
   </xsd:element>

 <xsd:element name="timeUnitConvert" minOccurs="0">
     <xsd:complexType>
       <xsd:attribute name="from" type="xsd:int" use="required"/>
       <xsd:attribute name="to" type="xsd:int" use="required"/>
     </xsd:complexType>
   </xsd:element>
 </xsd:sequence>

 <xsd:attribute name="datasetTypes" type="gribDatasetTypes"/>

 <xsd:element name="latestNamer">
   <xsd:complexType>
     <xsd:attribute name="name" type="xsd:string" use="required"/>
   </xsd:complexType>
 </xsd:element>

 <xsd:element name="bestNamer">
   <xsd:complexType>
    <xsd:attribute name="name" type="xsd:string" use="required"/>
   </xsd:complexType>
 </xsd:element>

 <xsd:element name="filesSort">
  <xsd:complexType>
    <xsd:choice>
      <xsd:element name="lexigraphicByName">
        <xsd:complexType>
          <xsd:attribute name="increasing" type="xsd:boolean"/>
        </xsd:complexType>
      </xsd:element>
    </xsd:choice>
  </xsd:complexType>
 </xsd:element>

</xsd:complexType>

<xsd:simpleType name="gribDatasetTypes">
 <xsd:union memberTypes="xsd:token">
   <xsd:simpleType>
    <xsd:restriction base="xsd:token">
     <xsd:enumeration value="Best"/>
     <xsd:enumeration value="Files"/>
     <xsd:enumeration value="LatestFile"/>
    </xsd:restriction>
   </xsd:simpleType>
  </xsd:union>
</xsd:simpleType>

The key elements in the schema are:

Note: If changes are made to gribConfig parameters other than datasetTypes, LatestNamer, BestNamer and fileSort, the corresponding CDM *.ncx files must be deleted to force them to be recreated.

10.3.2. The gdsHash Element

The Grid Description Section (GDS) of a GRIB file contains a description of the horizontal coordinate system used for the data contained within the GRIB file. The details of the GDS section can be found at:

When a GribCollection is being created, groups are created for each different GDS used in the collection. This is not a problem if there are indeed distinct GDS definitions within the collection. But if all the GDS definitions are meant to be the same, but one or more differs in a minor way such as a difference in the fifth decimal place in a starting X or Y coordinate, the differences can be fixed via gdsHash.

The separate groups created are each identified via a hash code. The hash code(s) within the GRIB file can be identified via the use of the ToolsUI interface. The steps for this are:

  • enter the GRIB filename in the IOSP/GRIB1(2)/GribCollection tab

  • select the GDS at the bottom, right-click to obtain the context, menu, and choose compare GDS

If the GDS parameters are pragmatically identical but causing more than one hashcode to be generated due to trivial differences in the specification, you can merge the two groups via gdsHash.

If you find, say, the two hash codes 1450218978 and 1450192070 have been created for essentially the same GDS along with two different groups, you can fix the problem by merging the two groups via:

<gribConfig>
   <gdsHash from="1450218978" to="1450192070"/>
</gribConfig>

This will change the hashcode 1450218978 to +1450192070 and essentially merge the two groups that had been created.

You can also remove the spurious records altogether by setting the gdsHash value to zero, e.g.

<gribConfig>
   <gdsHash from="1450218978" to="0"/>
</gribConfig>

which will cause any records with hashcode 1450218978 to be ignored.

10.3.3. The gdsName Element

The gdsName element is used to set group names when a dataset contains multiple groups. The groups are named automatically using the projection and the horizontal dimension sizes, e.g. LatLon-360x720. These names can also be set manually. First, the group hashcode needs to be found via the toolsUI procedure described in the gdsHash section. Then, the group names are set via the pattern of the following example:

<gribConfig>
  <gdsName hash='-1960629519' groupName='KTUA Arkansas-Red River RFC'/>
  <gdsName hash='-1819879011' groupName='KFWR West Gulf RFC'/>
  <gdsName hash='-1571856555' groupName='KORN Lower Mississippi RFC'/>
   ...
</gribConfig>

Note: The groupName will be used in URLs, so avoid using special characters such as ":".

Creating a Template

The template for modification can be obtained using ToolsUI via:

  • opening the collection in the IOSP/GRIB1(2)/GribCollection tab

  • clicking on the Generate GDS XML button on the top right

This will create a template that looks like:

<gribConfig>
  <gdsName hash='1201131096' groupName='Lambert conformal-129X185'/>
</gribConfig>

wherein you can simply replace the automatically generated groupName values with your own.

10.3.4. The pdsHash Element

The Product Definition Section (PDS) of a GRIB file contains a description of the variables contained therein. The information in the PDS is used to group GRIB records into CDM variables containing multidimensional arrays. This is done by creating a CDM hashcode of each record, and then combining all records with the same hashcode into a unique variable. The schema for pdsHash is:

  <xsd:element name="pdsHash" minOccurs="0" maxOccurs="unbounded">
   <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="useGenType" minOccurs="0" maxOccurs="1"/>
      <xsd:element name="useTableVersion" minOccurs="0" maxOccurs="1"/>
      <xsd:element name="mergeIntv" minOccurs="0" maxOccurs="1"/>
      <xsd:element name="useCenter" minOccurs="0" maxOccurs="1"/>
    </xsd:sequence>
   </xsd:complexType>
  </xsd:element>

And the subelements are:

The useGenType Element

This is used only for a featureType value of GRIB2. It controls whether the generating type (octet 12) is used in the CDM hashcode, with a default value of false. To set to true, use:

 <pdsHash>
   <useGenType>true</useGenType>
 </pdsHash>
The useTableVersion Element

This is used only for a featureType value of GRIB1. It controls whether the table version is used in the CDM hashcode, with a default value of true. To set to false use:

 <pdsHash>
   <useTableVersion>false</useTableVersion>
 </pdsHash>
The mergeIntv Element

This can be used for both GRIB1 and GRIB2 values of featureType. It controls whether time intervals are merged. If false, separate values are created for each time interval length. The default value is true, and to set this to false use:

 <pdsHash>
   <intvMerge>false</intvMerge>
 </pdsHash>
The useCenter Element

This This is used only for a featureType value of GRIB1. It controls wheter the center/subcenter is used in the CDM hashcode when the parameter number is greater than 127. The default is true, and to set it to false use:

 <pdsHash>
   <useCenter>false</useCenter>
 </pdsHash>

10.3.5. The intvFilter Element

While GRIB uses time intervals as coordinates, CF time interval coordinates use an auxiliary coordinate to describe the time intervals. For example, a coordinate named time1(30) will have an auxiliary coordinate time1_bounds(30,2) containing the upper and lower bounds of the time interval for each coordinate. The CDM places all intervals in the same variable rather than create separate variables for each interval size. When all intervals have the same size, the interval size is added to the variable name; when they don’t, the phrase mixed_intervals is added to the variable name.

In CDM the coordinate value is generally placed at the end of the interval, e.g. the time interval (0,6) will have a coordinate value of 6. The CDM looks for unique intervals when constructing the variable, which implies that the coordinate values are not always unique, although the coordinate bounds pairs are unique.

There are two special situations the user can fix in NcML or the TDS:

  • choosing to ignore (0,0) intervals; and

  • choosing that certain parameters use only selected intervals, e.g. if the parameter has redundant mixed levels that can be derived from the set of intervals of a fixed size, e.g. the 3-hour intervals (0,3), (3,6), (6,9) and (9,12) are present such that other intervals such as (0,6), (0,9) and (0,12) can be ignored.

The same process as used in the gdsHash section can be used to configure this. An example using NcML is:

  <gribConfig>
   <intvFilter excludeZero="true"/>
   <intvFilter intvLength="3">
     <variable id="0-1-8"/>
     <variable id="0-1-10"/>
   </intvFilter>
 </gribConfig>

wherein:

  • excludeZero is used to exclude intervals that have (0,0) bounds; and

  • intvLength is used to include only the 3 hour intervals for parameters 0-1-8 and 0-1-10, where the parameter is defined using discipline-category-number in GRIB-2 or subcenter-version-param in GRIB-1.

10.3.6. The timeUnitConvert Element

Not available as of June 2014.

10.3.7. The datasetTypes Element

This defines which datasets will be made available in the TDS catalog. The available values are:

  • Best - the "best timeseries" of the collection dataset

  • Files - each separate file is exposed as a separate dataset

  • LatestFile - add the latest resolver dataset to the files catalog (Files must also be selected for this to work)

10.3.8. The latestNamer Element

This changes the name of the latest file dataset in the collection as is listed under the Files entry. The default name is Latest File. Both datasetTypes options LatestFile and Files must be enabled. The dataset urlPath is not affected and is always latest.xml. An example is:

<gribConfig datasetTypes="Best LatestFile Files">
  <latestNamer name="My Latest Name"/>
</gribConfig>

10.3.9. The bestNamer Element

This changes the name of the Best dataset in the collection, with the default being Best Timeseries. The datasetTypes option Best must be enabled for this to work. An example is:

<gribConfig datasetTypes="Best LatestFile Files">
  <bestNamer name="My Best Name" />
</gribConfig>

10.3.10. The fileSort Element

This sorts the files lexigraphically, in either decreasing or increasing (default) order. The datasetTypes option Files must be enabled for this to work. An example of changing the default value to false is:

<gribConfig datasetTypes="Best LatestFile Files">
  <filesSort>
    <lexigraphicByName increasing="false" />
  </filesSort>
</gribConfig>

11. Services

11.1. Overview

A TDS service is a method for exploring and obtaining the data available on a TDS via the web. Here we will explore how to configure and use the following available TDS services:

Each of the available services will be fully investigated within its own subsection below.

11.2. Catalog Services

11.2.2. Overview

The most basic of services are those for THREDDS catalogs. The catalog service provides catalogs containing the information you’ll need to find and use the more complex services such as NetCDF subsetting, OPeNDAP, WCS and WMS. Local catalogs can be subsetted and viewed as HTML, while remote catalogs can be additionally validated.

11.2.3. Local Catalogs

All catalogs served by a TDS can be operated on with the various catalog services, whether the catalog is the served version of a TDS catalog configuration file, or a catalog produced from a datasetScan element. A specific catalog is identified by the path of the request URL, an example being:

http://motherlode.ucar.edu:8080/thredds/catalog.html

In this example, an HTML version of the catalog is returned. If you instead request,

http://motherlode.ucar.edu:8080/thredds/catalog.xml

you will get an XML version of the catalog. You can also request a subset of the catalog via a dataset parameter whose value is the ID number of a particular dataset within the catalog. If a dataset has ID number 8675309, then the URL will be:

http://motherlode.ucar.edu:8080/thredds/catalog.xml?dataset=8675309

11.2.4. Remote Catalogs

Configuration

Catalog services on remote catalogs are only allowed if the following XML code is added to the threddsConfig.xml configuration file:

<CatalogServices>
    <allowRemote>true</allowRemote>
</CatalogServices>
Remote Catalog Requests

The base URL from which remote catalog requests must be made is:

http://server:port/thredds/remoteCatalogService

with a full request string looking something like:

http://server:port/thredds/remoteCatalogService?catalog=http://motherlode.ucar.edu:8080/thredds/catalog.xml

As you can see in the full request example, parameters are added to the base URL via:

?parameter_name=parameter_value

and if more than one parameter is used - and this is allowable - the format is:

?par1=parval1&par2=parval2[&par3=parval3]

With the ampersand & used instead of the question mark ? for all parameters appended following the first one.

The available and allowable parameters are:

  • catalog - This is required and the URI of the target catalog.

  • command - An optional parameter one of three values: SHOW, SUBSET or VALIDATE.

  • dataset = An optional parameter used only in command=SUBSET requests which gives the ID of a specific dataset in the target catalog.

  • htmlView - An optional parameter used only in command=SUBSET requests where a false value causes an XML view to be returned instead of HTML.

  • verbose - An optional parameters used only in VALIDATE requests where a value of true increases the detail of the validation messages returned.

The action performed for each of the command parameters is:

  • SHOW - An HTML view of the entire catalog is returned.

  • SUBSET - An ID supplied with the dataset parameter is used to find the dataset in the target catalog. An HTML view is returned unless htmlView=false is additionally specified.

  • VALIDATE - An HTML page containing THREDDS catalog validation messages is returned.

11.3. Query Capability Service

A THREDDS-defined service type that returns an XML document over HTTP.

11.4. Resolver Service

A THREDDS-defined service type that returns an XML document over HTTP.

11.5. Compound Service

A service type that indicates that the service is composed of other services.

11.6. HTTPServer Service

The HyperText Transfer Protocol (HTTP) is a bulk transfer service.

11.7. FTP Service

The File Transfer Protocol (FTP) service is a bulk transfer service.

11.8. GridFTP Service

The GridFTP service is a bulk transfer service. GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks. The home site for GridFTP is:

11.9. ADDE Service

The Abstract Data Distribution Environment (ADDE) is a part of the McIDAS-X software package that allows you workstation to act as a client to efficiently access data from multiple McIDAS-X servers. McIDAS-X is a suite of applications for analyzing and displaying meteorological data for research and education. The home site of McIDAS is at:

11.10. File Service

The File service is used for local files, local catalogs, or in situations like a DODS Aggregation Server configuration.

11.11. NetCDF Server

This…

11.12. DODS/OPeNDAP Service

11.12.2. Overview

The Open-source Project for a Network Data Access Protocol (OPeNDAP) is a data transport architecture and protocol used by earth scientists. It is based on HTTP and includes standards for encapsulating structured data, annotating the data with attributes, and adding semantics that describe the data.

An OPeNDAP server can serve an arbitrarily large collection of data. The data is usually in NetCDF or HDF format, but can be in any format including a user-defined format. It transfers files like with FTP, although it can additionally retrieve subsets of files and aggregate data from several files for one transfer operation. An OPeNDAP client - basically any program that links with the OPeNDAP library - contacts the server and allows the user to interactively choose subsets of or aggregate the data offered by the server.

The OPeNDAP server in THREDDS provides access to any CDM dataset.

11.12.3. Configuring the OPeNDAP Service

The OPeNDAP service is enabled by default in the standard THREDDS distribution. The Opendap element in the threddsConfig.xml file is used to configure the service. The defaults are:

<Opendap>
    <ascLimit>50</ascLimit>
    <binLimit>500</binLimit>
    <serverVersion>opendap/3.7</serverVersion>
</Opendap>

where:

  • ascLimit is the maximum allowable size of an ASCII data request in megabytes, with a default size of 50;

  • binLimit is the maximum allowable size of a binary data request in megabytes, with a default size of 500;

  • serverVersion is the string returned by the OPeNDAP getVersion request.

11.12.4. The OPeNDAP Dataset Access Form

The OPeNDAP service presents itself to the user in the guise of an OPeNDAP Dataset Access Form GUI that displays information about the dataset in which the user is interested. The form consists of several parts that allow you to select and obtain all or part of a dataset. The URL of the access form is created by appending .html to the TDS access URL of the dataset. For example, in the default configuration of the TDS distribution we find a clickable Test Single Dataset on the catalog page at:

http://localhost:8080/thredds/catalog.html

If we click on this, we get to the test dataset page at:

http://localhost:8080/thredds/catalog.html?dataset=testDataset

and if we click on the OPENDAP service choice we find there, i.e.:

/thredds/dodsC/test/testData.nc

we get to the OPeNDAP Dataset Access Form at:

http://localhost:8080/thredds/dodsC/test/testData.nc.html

and see that there are several sections therein, each of which we’ll now investigate.

11.12.5. The Action Section

The Action part is the first, and consists of a set of buttons you can click to download your selected data in various formats. In the example we have the choices Get ASCII and Get Binary, which allow you to obtain, respectively, an ASCII version of your data and a DODS binary object containing your data. More recent versions of OPeNDAP also included a button for downloading your selected data in NetCDF format, although we’ve not yet used any version of the TDS that includes that option.

11.12.6. Selecting the ASCII File

If we immediately click on the Get ASCII button, a web page for the URL:

http://localhost:8080/thredds/dodsC/test/testData.nc.ascii?

will open to show us the entire file in ASCII format. For this example, it will look something like this:

Dataset {
    Float64 reftime[record = 1];
    Float64 valtime[record = 1];
    String datetime[record = 1];
    Float32 valtime_offset[record = 1];
    Int32 model_id[nmodels = 1];
    String nav_model[nav = 1];
    Int32 grid_type_code[nav = 1];
    String grid_type[nav = 1];
    String grid_name[nav = 1];
    Int32 grid_center[nav = 1];
    Int32 grid_number[nav = 1][ngrids = 1];
    String x_dim[nav = 1];
    String y_dim[nav = 1];
    Int32 Nx[nav = 1];
    Int32 Ny[nav = 1];
    Float32 La1[nav = 1];
    Float32 Lo1[nav = 1];
    Float32 Lov[nav = 1];
    Float32 Dx[nav = 1];
    Float32 Dy[nav = 1];
    Byte ProjFlag[nav = 1];
    Byte ResCompFlag[nav = 1];
    Float32 Z_sfc[record = 1][y = 95][x = 135];
} test%2ftestData%2enc;

reftime[1]
102840.0
...

This is not at all recommended, though, for larger datasets. Warning: If you click on Select ASCII for very large underlying datasets, you will at least be waiting a while for the web page to render, and might even get a bonus of having your web browser freeze or crash. If you must try this with larger datasets, then continue to the following sections to learn how to select much smaller subsets to download.

11.12.7. Selecting the Binary DataDDS File

The file you get by clicking on the Get Binary button is an OPeNDAP binary file - sometimes called the DataDDS file - with the filename created by appending .dods to the name of the original file. For example, if the original filename was ocean.nc, the binary filename will be ocean.nc.dods. The data returned in the DataDDS file is a MIME document that consists of two parts:

  • the DDS, or Dataset Descriptor Structure;

  • the actual data encoded in External Data Representation (XDR) binary format;

with the two parts separated by the string:

Data:<CR><NL>

The XDR format is a standard data serialization format for the descripton and encoding of data that allows data to be transferred between different kinds of computer systems, and is implemented as a library of software functions. XDR uses a language to describe data formats, although it is a language used solely for description and not for programming. The language resembles C, and allows the description of intricate data formats in a concise manner.

The XDR format is defined in Network Working Group RFC 4506 which can be found at:

Lengthier explanations with examples can be found in a Cisco Technical Note at:

as well as in the XDR section of the IBM AIX documentation at:

Useful software implementations of XDR include:

  • xdrlib - A Python module for encoding and decoding XDR streams

  • FXDR - A Fortran library containing XDR routines

  • PortableXDR - A portable C library containing XDR routines

11.12.8. How to Work With DataDDS Files

The only available documentation found thus far that explains anything about what to use to access these DataDDS files is an email discussion at:

wherein we learn:

You can point anything that reads netCDF and is using the newest version of that library (4.1) at those files and read them. Otherwise, you can use the getdap tool that’s bundled with libdap to read them, albeit as ASCII.

The DAP++ SDK or libdap can be obtained at:

and does contain a binary program getdap in addition to the library. Requesting the help information (from the binary obtained via compiling libdap-3.11.1) via getdap -h yields:

Usage: /home/baum/Downloads/libdap-3.11.2/.libs/lt-getdap
 [idDaxAVvks] [-B <db>][-c <expr>][-m <num>] <url> [<url> ...]
 [VvksM] <file> [<file> ...]

In the first form of the command, dereference the URL and
perform the requested operations. This includes routing
the returned information through the DAP processing
library (parsing the returned objects, et c.). If none
of a, d, or D are used with a URL, then the DAP library
routines are NOT used and the URLs contents are dumped
to standard output.

In the second form of the command, assume the files are
DataDDS objects (stored in files or read from pipes)
and process them as if -D were given. In this case the
information *must* contain valid MIME header in order
to be processed.

Options:
        i: For each URL, get the server version.
        d: For each URL, get the the DDS.
        a: For each URL, get the the DAS.
        D: For each URL, get the the DataDDS.
        x: For each URL, get the DDX object. Does not get data.
        X: Request a DataDDX from the server (the DAP4 data response
        B: Build a DDX in getdap using the DDS and DAS.
        v: Verbose output.
        V: Version of this client; see 'i' for server version.
        c: <expr> is a constraint expression. Used with -D/X.
           NB: You can use a `?' for the CE also.
        k: Keep temporary files created by libdap.
        m: Request the same URL <num> times.
        z: Ask the server to compress data.
        s: Print Sequences using numbered rows.
        M: Assume data read from a file has no MIME headers
           (the default is to assume the headers are present).
        p: Set DAP protocol to x.y

If you wish to use the getdap option, this help file is the only documentation found for it thus far. Good luck with that, as we’ve yet to try it.

The other and probably better option is to use an OPeNDAP-enabled NetCDF client library. This will work since NetCDF uses an extended form of XDR for representing information in the header and data parts of files, as explained in further detail at:

The OPeNDAP developers used to maintain an OPeNDAP-enabled NetCDF library separate from the main NetCDF library developed at UNIDATA, but the projects have been merged. Now all you have to do to make your NetCDF library able to access OPeNDAP servers is to compile the officially released NetCDF library, which is available from:

As of this writing, the official NetCDF library is at release 4.2, and the INSTALL file included in that distribution tells us that DAP is automatically enabled if libcurl is installed. NOTE: It is also important to have a sufficiently recent version of libcurl installed, as we once found out the hard way. The NetCDF 4.2 documentation tells us that it needs at least version 7.18.0 of libcurl, with the latest official release (Mar. 22, 2012) of libcurl being 7.25.0 and available at:

11.12.9. The Data URL Section

The Data URL part shows the URL path from which the original dataset can be obtained. An example URL from the test dataset included in the TDS distribution is:

http://localhost:8080/thredds/dodsC/test/testData.nc

For a quick look at what happens when you select a variable - a topic that will be fully covered a couple of sections down the page - try clicking on the box next to the first variable reftime. You’ll notice that the URL in the Data URL box has instantly changed to:

http://localhost:8080/thredds/dodsC/test/testData.nc?reftime[0:1:0]

This should give you a good idea of how the whole selection and subsetting thing works. Basically, the subsetting information is encoded in various parameters that are appended to the base dataset URL before the request is sent.

11.12.10. The Global Attributes Section

The Global Attributes part lists the global attributes contained in the dataset. Just about all scientific data standards - e.g. NetCDF, HDF, GRIB - have some method for specifying and storing global attributes within the binary file format. This section pulls them out of the dataset and displays them in human readable format. In our example we find the global attributes:

record: reftime, valtime
history: 2003-09-25 16:09:26 - created by gribtocdl 1.4 - 12.12.2002
title: CMC_reg_HGT_SFC_0_ps60km_2003092500_P000.grib
Conventions: NUWG
version: 0.0
Unlimited_Dimension: record

This section is more or less useful depending on what sort of information you’re looking for and how thorough the original data provider was in providing what you might need in the form of global attributes.

11.12.11. The Variables Section

The meat of the OPeNDAP Dataset Access Platform is found in the variables section. In this section you specify which variables you wish to download and - if you don’t want every available value of any or all of the variables - which subset of them you desire.

Skip down to the Z_sfc variable in our example and notice the three main sections:

  • a checkbox followed by a data description;

  • a set of text input boxes; and

  • a list of variable attributes.

The data description for this variable is:

Z_sfc: Array of 32 bit Reals [record = 0..0][y = 0..94][x = 0..134]

which tell us the variable name, its type (real, integer, string, etc.), the names of the variable’s dimensions, and the numerical value of these dimensions. Note that the dimensions are given in C-like syntax where arrays start at zero rather than Fortran-like syntax where arrays start at one.

If you now click on the checkbox such that a check mark appears within the box, you’ll see that the three boxes marked record, y and x have changed from being empty to containing the strings 0:1:0, 0:1:94 and 0:1:134, respectively. The general format for these is:

begin:[stride:]end

where begin is the array value at which you wish to start your subset, stride is an optional value for selecting every nth value, and end is the array value at which you wish to end your subset. The stride value is optional but shown in these examples as its default value of one where no values are skipped.

You can edit any of these values by clicking your cursor within the appropriate box and using the backspace and numerical keys to, respectively, delete and add numbers. Unlike when you click on the checkbox, the Data URL section does not immediately reflect your editing changes in the input boxes. These changes, however, show up when you click on one of the Get buttons in the Action section.

11.12.12. The DDS Section

The Data Descriptor Structure (DDS) section provides a description of the shape of the data. That is, the sizes of the dimensions of the array containing the data. In our example dataset, the DDS section looks like this:

Dataset {
    Float64 reftime[record = 1];
    Float64 valtime[record = 1];
    String datetime[record = 1];
    Float32 valtime_offset[record = 1];
    Int32 model_id[nmodels = 1];
    String nav_model[nav = 1];
    Int32 grid_type_code[nav = 1];
    String grid_type[nav = 1];
    String grid_name[nav = 1];
    Int32 grid_center[nav = 1];
    Int32 grid_number[nav = 1][ngrids = 1];
    String x_dim[nav = 1];
    String y_dim[nav = 1];
    Int32 Nx[nav = 1];
    Int32 Ny[nav = 1];
    Float32 La1[nav = 1];
    Float32 Lo1[nav = 1];
    Float32 Lov[nav = 1];
    Float32 Dx[nav = 1];
    Float32 Dy[nav = 1];
    Byte ProjFlag[nav = 1];
    Byte ResCompFlag[nav = 1];
    Float32 Z_sfc[record = 1][y = 95][x = 135];
} test/testData.nc;

This differs slightly from the information found by clicking on the checkboxes in that we are shown the total number of records for each variable rather than C-like syntax array information.

11.13. NetCDF Subset Service

11.13.2. Overview

The NetCDF Subset Service is a web service for subsetting CDM scientific datasets. The subsetting is specified using Earth coordinates, i.e. latitude and longitude, and date ranges. The spatial and temporal array index ranges inside of the specified latitude, longitude and date ranges will be selected and made available to the user. The selected data arrays are subsetted but not resampled or reprojected, and preserve the resolution and accuracy of the original dataset.

This service can return requested subset files in NetCDF binary, XML or ASCII format, depending on the request and dataset types. Both point and gridded data can be used with this service.

11.13.3. Subsetting Parameters

The instructions for selecting subsets are delivered in the form of subset parameter strings appended to the requesting URL. The subsetting parameters for this service include those for specifying variables, spatial extent, time, return format, vertical coordinate, and adding lat/lon arrays to the returned file. The use of each will now be described.

11.13.4. Variables

Variables are specified via:

  • var=[list of comma-separated variable names]

with examples using the variables T, S and P being:

  • var=T

  • var=T,S,Pa

The following format is also acceptable, although the first is preferred.

  • var=T&var=S&var=P

11.13.5. Spatial Extent

Spatial extent can be specified in several ways. The first is to specify a latitude/longitude bounding box via:

  • north=[latitude north in decimal degrees]

  • south=[latitude south in decimal degrees]

  • east=[longitude east in decimal degrees]

  • west=[longitude west in decimal degrees]

The bounding box has west as its west edge and includes all points going east until the east edge is reached. The units must be degrees east, may be positive or negative, and will be taken modulo 360. For example, when crossing the dateline the west edge may be greater than the east edge. An example is:

  • north=17.3&south=12.088&west=140.2&east=160.0

Alternatively, a single latitude/longitude point can be specified via:

  • latitude=[latitude of point in decimal degrees north]

  • longitude=[longitude of point in decimal degrees north]

The requested point must of course reside within the spatial range of the dataset. For station data, the station closest to the requested point will be used, and for grids, the grid cell containing the requested point will be used.

A projection bounding box can be defined using the four parameters minx, miny, maxx, maxy. Each of the minimum values must be less than its corresponding maximum value. These parameters are coordinates on the projection of the data, and also define the rectangle that will be returned.

11.13.6. Station Datasets

For station datasets, a list of stations can be specified via:

  • stn=[list of comma-separated station names]

The dataset description supplies a list of valid stations. Station names with spaces or other illegal characters must be escaped. Examples for stations called AB, AC and BD are:

  • stn=AB

  • stn=AB,AC,BD

The following format is also acceptable although the previous format is preferred:

  • stn=AB&stn=AC&stn=BD

For grid data, a horizontal stride can be specified via an integer N instructing the server to supply on the Nth point in both the x and y dimensions. An example is:

  • horizStride=3

where every 3rd point in the x and y directions will be extracted.

Time

The time can be specified as either a range or a point. The range is specified by using 2 of the following 3 parameters:

  • time_start=[starting time as a W3C date string or present]

  • time_end=[ending time as a W3C date string or present]

  • time_duration=[length of time as a W3C time duration]

The intersection of the requested time range with the dataset range will be returned. Examples are:

  • time_start=2007-03-29T12:00:00Z&time_end=2007-03-29T13:00:00Z (between 12 and 1 pm Greenwich time)

  • time_start=present&time_duration=P3D (get 3 day forecast starting from the present)

  • time_end=present&time_duration=PT3H (get last 3 hours)

A W3C date string can be a dateTime or a date, with the dateTime having the form:

'-'? yyyy '-' mm '-' dd 'T' hh ':' mm ':' ss ('.' s+)? (zzzzzz)?

where:

  • '-'? yyyy is a four (or more) digit and optionally negative integer representing the year, with eading zeros, the string 0000, and a leading plus sign are all prohibited;

  • the other instances of '-\' are separators between parts of the date portion;

  • the first mm is a two digit numeral representing the month;

  • the dd is a two digit numeral representing the day;

  • 'T' is a separator indicating the time-of-day follows;

  • hh is a two digit numeral representing the hour, with 24 permitted if the minutes and seconds are both zero, and the dateTime value so represented is the first instant of the following day;

  • ':' is a separator between parts of the time-of-day portion;

  • the second mm is a two digit numeral representing the minute;

  • ss is a two digit integer representing whole seconds;

  • '.' s\+ (optionally) represents the fractional seconds;

  • zzzzzz (optionally) represents the time zone, for which the default is UCT.

An example of a time zone specification for noon on Oct. 10, 2002 - for either Central Daylight Savings Time or Eastern Standard Time - is:

2002-10-10T12:00:00-05:00

which is equivalent to a UCT of:

2002-10-10T17:00:00Z

A date is a dateTime without the time-of-day portion. Additional information about these formats can be found in the appropriate XML document portions:

A W3C time duration is represented in the format:

PnYnMnDTnHnMnS

where:

  • nY represents the number of years;

  • nM the number of months;

  • nD the number of days;

  • T is the date/time separator;

  • nH the number of hours;

  • nM the number of minutes;

  • nS the number of seconds, which can include decimal digits to arbitrary precision.

All components except seconds allow arbitrary unsigned integers, while seconds allow an arbitrary unsigned decimal. At least one digit must follow a decimal point if it appears. An optional preceding minus sign - is allowed to indicate negative duration, with no sign indicating a default positive duration.

An example that indicates a duration of 1 year, 2 months, 3 days, 10 hours, and 30 minutes is:

P1Y2M3DT10H30M

An example indicating a duration of minus 120 days is:

-P120D

Truncated and reduced precision versions of the duration are allowed but must follow a few rules:

  • If the number of any of the quantities equals zero, it and its corresponding designator may be omitted, but at least one number and designator must be present.

  • The seconds part may have a decimal fraction.

  • The designator T can be absent if and only if all the time items are absent.

  • The designator P must always be present.

Valid examples of applying these rules are:

  • P1347Y

  • P1347M

  • P1Y2MT2H

  • P0Y1347M

  • P0Y1347M0D

  • -P1347M

while a couple of invalid examples are:

  • P-1347M

  • P1Y2MT

The full XML schema documentation for duration can be found at:

11.13.7. Return Format

The return format(s) you desire are specified using the accept parameter in the form:

accept=mime_type[,mime-type][,mime-type]

The list of possible return formats depends on the dataset and can be found in the Dataset Description Document. If you request multiple formats, the server will choose one to return to you. Queries can use either the mime-types or various aliases for them as shown in the following table which lists all currently supported return formats.

Mime Types Aliases

text/plain

raw,ascii

application/xml

xml

text/csv

csv

text/html

html

application/x-netcdf

netcdf

Examples of valid queries would then be:

  • accept=application/x-netcdf - requests a NetCDF file using a mime-type

  • accept=netcdf - requests a NetCDF file using an alias

  • accept=ascii,csv - requests either an ASCII or CSV document

11.13.8. Vertical Coordinate

A vertical coordinate can be specified via the vertCoord parameter, with an example being:

  • vertCoord=850

where the value given must be in the same units as those in the dataset. The server will return the level closest to the chosen value if one does not coincide with the value.

11.13.9. Adding Lon/Lat Arrays

If the underlying grid is a lat/lon grid, the lat and lon coordinates will be automatically included as 1D coordinate variables in the returned file. If the grid is on a projection, the lat/lon information will not be included unless the parameter addLatLon is included. If so, then the lat and lon coordinates will be calculated and included in the file as 2D variables. This is done by converting the four corners of the lat/lon bounding box also specified into projection cooordinates, and then using the smallest rectangle that includes those four points.

11.13.10. Summary of Subsetting Parameters

Parameter Name Required Constraints Description/Possible Values Default

var

yes

Variables must be in dataset description. Only requests on variables with the same vertical levels are supported.

Comma-separated list of variables.

latitude

no

Must be within data range and also include longitude.

In Grid As Point requests latitude of the point.

longitude

no

Must be within data range and also include latitude.

In Grid As Point requests longitude of the point.

north

no

The lat/lon bounding box must have north > south.

Used to define a lat/lon bounding box. Must have all four parameters north, south, east and west.

Returns entire grid if no box specified.

south

no

The lat/lon bounding box must have north > south.

Used to define a lat/lon bounding box. Must have all four parameters north, south, east and west.

Returns entire grid if no box specified.

east

no

The lat/lon bounding box must have east > west.

Used to define a lat/lon bounding box. Must have all four parameters north, south, east and west.

Returns entire grid if no box specified.

west

no

The lat/lon bounding box must have east > west.

Used to define a lat/lon bounding box. Must have all four parameters north, south, east and west.

Returns entire grid if no box specified.

minx

no

Projection bounding box must have minx < maxx.

Used to define a projection bounding box which must have all four parameters minx, maxx, miny and maxy.

Returns entire grid if no box specified.

maxx

no

Projection bounding box must have minx < maxx.

Used to define a projection bounding box which must have all four parameters minx, maxx, miny and maxy.

Returns entire grid if no box specified.

miny

no

Projection bounding box must have miny < maxy.

Used to define a projection bounding box which must have all four parameters minx, maxx, miny and maxy.

Returns entire grid if no box specified.

maxy

no

Projection bounding box must have miny < maxy.

Used to define a projection bounding box which must have all four parameters minx, maxx, miny and maxy.

Returns entire grid if no box specified.

time

no

Must be within dataset time range.

Time specified as W3C date or present, with time slice closest to request time returned.

Returns time closest to current time if none specified.

time_start

no

Must intersect dataset time range.

Specifies start of a time range as W3C date or present. Two of time_start, time_end and time_duration must be present.

Returns time closest to current time if none specified.

time_end

no

Must intersect dataset time range.

Specifies end of a time range as W3C date or present. Two of time_start, time_end and time_duration must be present.

Returns time closest to current time if none specified.

time_duration

no

Must intersect dataset time range.

Specifies duration of a time range as W3C date or present. Two of time_start, time_end and time_duration must be present.

Returns time closest to current time if none specified.

temporal

no

Must be all to have effect.

Requests all available time range.

Returns time closest to current time if none specified.

timeStride

no

Only used for grid requests.

Take every nth time in the available series.

1

horStride

no

Only used for grid requests.

Take every nth point in both x and y.

1

vertCoord

no

Variables must have same vertical levels.

If variables have vertical levels, all levels are returned.

accept

no

Accepted values: netCDF for grid requests, and csv, xml and netCDF for grid-as-point requests.

Specifies the return format.

netCDF for grid requests; csv for grid-as-point requests.

11.13.11. THREDDS GUI Interface

All that complicated stuff having been carefully explained, the standard THREDDS distribution contains a GUI interface that hides all the details and allows you to interactively choose the spatial and temporal subsets you desire. You fill in the blanks and check the appropriate boxes and it creates and transmits the appropriate URL string. An example of this interface is shown in the following figure.

thredds/netcdf_subset_service.png
Figure 4. THREDDS NetCDF Subset Service GUI Interface

11.14. LAS Service

The LAS service is for connection to Live Access Servers (LAS). The LAS is a highly configurable web server designed to provide flexible access to geo-referenced scientific data. It can present distributed data sets as a unified virtual data base through the use of DODS networking. The home site is at:

11.15. WSDL Service

This…

11.16. OGC Services - WMS and WCS

11.16.1. Synopsis

In this section about the WMS and WCS services we will investigate:

  • the context of the OGC WMS and WCS specifications within the TDS and in the larger context of the OGC project;

  • the basic components of each specification;

  • the basic configuration for enabling the WCS and WMS servers in the TDS;

  • the features and limitations of each server in the context of the official specifications;

  • advanced configuration options for each server; and

  • how to add WCS and WMS services to THREDDS catalogs.

11.16.2. Overview

The Open Geospatial Consortium (OGC) is an international consortium of hundreds of entities that have combined their efforts to develop publicly available interface standards to support interoperable methods for the specification and transport of geospatial data over the web.

One high priority has been to create ways for the literal and figurative mountains and oceans of data created by both the Geographic Information System (GIS) and geoscience communities to be more easily located, accessed and obtained by each another. For example, most of the datasets created by the geoscience community do not have the geolocation information used by the GIS community to precisely locate their data on the surface of the earth. Another example is that while most geoscience data exists in a form wherein various properties of the continuous surface of land or water is represented by a grid of discrete and spatially-separated points, most GIS data is in the form of images wherein those properties are represented by a much more spatially dense atomic entity called a pixel. Also, while the former usually contains explicit quantifications of the various properties, the latter usually represents the properties implicitly via color encoding.

OGC Web Services (OWS) is a project to enable web interoperability of geospatial content and services by defining specifications for various geospatial concepts built on top of and therefore also becoming open open and non-proprietary Internet standards. A web service is thus defined as a self-contained, self-describing modular application that can be published, located and invoked across the web. These services can range from simple requests for single numbers to complicated, multi-step processes.

Two of the OWS specifications that are of great importance to the interoperability of geospatial data are the Web Mapping Service (WMS) and the Web Coverage Service (WCS). As such, both are implemented in server form within the THREDDS Data Server. We shall now find out what these are and how they are configured and used within THREDDS.

11.16.3. Web Mapping Service (WMS)

The OGC Web Mapping Service (WMS) is a specification that defines the behavior of a service that dynamically produces spatially referenced maps from files containing geographic information. It specifies specific operations for how to retrieve information about what kinds of maps are available from a server, and then operations for how to retrieve the maps themselves. In the context of WMS, a map is defined as a portrayal of geographic information as a digital image file suitable for display on a computer screen. That is, a map is an image such as a PNG, GIF or JPEG image. As such, WMS is for finding and retrieving images rather than the data used to create those images.

The WMS specification supports three operations:

  • GetCapabilities - A command to retrieve service-level metadata, that is, a description of the service’s information content and acceptable request parameters.

  • GetMap - A command to retrieve a map image whose geospatial and dimensional properties are well-defined by parameters within the command.

  • GetFeatureInfo - A command to obtain information about specific features shown on a map.

11.16.4. Web Coverage Service (WCS)

The OGC Web Coverage Service (WCS) is a specification defining a standard inferface and operations for access to geospatial coverages, which are defined as digital, geospatial information representing space and time varying phenomenon, or basically the numbers from which images are made. The WCS provides the actual data along with detailed metadata describing it.

The WCS specification supports three operations:

  • GetCapabilities - A command to request information about the server’s capabilities and the coverages provided.

  • DescribeCoverage - A command for a client to request detailed metadata about a specific coverage offered by the server.

  • GetCoverage* - A command allowing a client to request a coverage comprised of a selected properties at a selected set of spatio-temporal coordinates, and to retrieve it in a useful format.

11.16.5. Web Feature Service (WFS)

This…

11.16.6. Enabling WCS and WMS on TDS

The WCS server is disabled in the standard THREDDS distribution. It can be enabled by adding the following lines to the threddsConfig.xml file.

<WCS>
  <allow>true</allow>
</WCS>

Similarly, the WMS server can be enabled by adding:

<WMS>
  <allow>true</allow>
</WMS>

11.16.7. Additional WCS Configuration Options

Beyond basic enabling, there are other configuration options for the TDS WCS server. All available options are shown in the following example along with their defaults:

<WCS>
  <allow>false</allow>
  <dir>(see note below)</dir>
  <scour>15 min</scour>
  <maxAge>30 min</maxAge>
</WCS>

It is recommended that you only include in the threddsConfig.xml file those options you wish to change, omitting all others and letting them assume their default values. The additional options are:

  • dir - Sets the working directory where generated files are cached before being sent to the client. The default location is /opt/tomcat/content/thredds/cache/wcs/ and it is recommended that this not be changed.

  • scour - The period at which to scour and purge the cache directory of files not successfully downloaded.

  • maxAge - How long to leave successfully downloaded files in the cache directory before they are deleted.

11.16.8. WCS Server Capabilities and Limitations

The type of files that can currently be served via the TDS WCS server are:

  • Data files with gridded data.

  • Data files wherein the NetCDF-Java CDM can identify the coordinate system.

  • Data files with regularly spaced X and Y axes.

The TDS server cannot serve files that do not satisfy all of these.

The TDS WCS server does not implement the complete OGC WCS specification. The limitations are listed below, and can really only be fully appreciated after becoming fairly familiar with the full OGC WCS specification.

  • No interpolation is available, i.e. only interpolationMethod="none" is accepted.

  • All coordinate reference systems or spatial reference systems (CRS/SRS) are listed as WGS84(DD) even if the data have a different CRS/SRS.

  • The CRS can be horizontal and X/Y only.

  • The response coverage is in the native CRS of the data.

  • The NetCDF-Java library can only understand a subset of the CF convention grid mappings, most of which assume a spherical Earth.

  • Only one value can be specified as a temporal coverage, i.e. no listings or min/max/res specifications.

  • Each coverage has only a single range field.

  • The range axis is vertical only if the coordinate has a vertical component.

  • Only one value can be specified for the range axis, i.e. no listings or min/max/res specifications.

  • Suported GetCoverage response formats are NetCDF3, GeoTIFF (a grayscale 8-bit GeoTIFF file) and GeoTIFF_Float (a floating point "data sample" GeoTIFF file).

11.16.9. Additional WMS Configuration Options

Beyond basic enabling, there are other configuration options for the TDS WMS server. All available options are shown in the following example along with their defaults:

<WMS>
  <allow>false</allow>
  <allowRemote>false</allowRemote>
  <paletteLocationDir>/WEB-INF/palettes</paletteLocationDir>
  <maxImageWidth>2048</maxImageWidth>
  <maxImageHeight>2048</maxImageHeight>
</WMS>

It is recommended that only those options you wish to change be included in the configuration file. The additional options are:

  • allowRemote - Enables the WMS service to serve datasets available from remote servers.

  • paletteLocationDir - Although TDS has a set of default palettes for creating map images, they can be overridden by specifying this as the location of a directory containing your own palette files.

  • maxImageWidth - The maximum image width in pixels that will be returned.

  • maxImageHeight - The maximum image height in pixels that will be returned.

11.16.10. Serving Remote Datasets via WMS

If allowRemote is set to true, then the TDS can serve remote datasets via the WMS protocol. While a typical request for your WMS server would look like this (in the case of a GetCapabilities request):

http://servername:8080/thredds/wms?service=WMS&version=1.3.0&request=GetCapabilities

The allowRemote configuration parameter allows you to serve a remote file via the form:

http://servername:8080/thredds/wms?dataset=dataURL

So, for example, if the THREDDS OPeNDAP URL request for a remote server looks like:

http://las.pfeg.noaa.gov/cgi-bin/nph-dods/data/oceanwatch/nrt/gac/AG14day.nc

then you can serve this remotely as a WMS dataset with the following URL:

http://servername:8080/thredds/wms?dataset=http://las.pfeg.noaa.gov/cgi-bin/nph-dods/data/oceanwatch/nrt/gac/AG14day.nc

11.16.11. WMS Server Capabilities and Limitations

The type of files that can currently be served via the TDS WMS server are:

  • Data files with gridded data.

  • Data files wherein the NetCDF-Java CDM can identify the coordinate system.

11.16.12. Adding WCS and WMS Services to THREDDS Catalogs

WCS and WMS are services just like OPeNDAP and can be added to catalogs in exactly the same manner. A compound service example that includes all three is:

<service name="grid" serviceType="Compound" base="" >
    <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
    <service name="wcs" serviceType="WCS" base="/thredds/wcs/" />
    <service name="wms" serviceType="WMS" base="/thredds/wms/" />
</service>

These services could be accessed by a typical dataset via:

<dataset ID="sample" name="Sample Data" urlPath="sample.nc">
  <serviceName>grid</serviceName>
</dataset>

or, if you wanted to restrict a dataset to just WCS:

<dataset ID="sample" name="Sample Data" urlPath="sample.nc">
  <serviceName>wcs</serviceName>
</dataset>

11.16.13. Obtaining Data With WCS and WMS

Unlike with the NetCDF Subset Service, the standard THREDDS distribution does not include a GUI client to automatically construct the byzantine URL suffix needed by the server program to extract the appropriate data. If you click on the WCS access link on the page for a real or aggregated dataset, you will get something that looks like the following:

<WCS_Capabilities version="1.0.0">

  <Service>
    <fees>NONE</fees>
    <accessConstraints>NONE</accessConstraints>
  </Service>

  <Capability>
    <Request>
      <GetCapabilities>
        <DCPType>
          <HTTP>
            <Get>
              <OnlineResource xlink:href="http://megara.tamu.edu:8080/thredds/wcs/gcoos/seadas_sst"/>
            </Get>
          </HTTP>
        </DCPType>
      </GetCapabilities>
      <DescribeCoverage>
        <DCPType>
          <HTTP>
            <Get>
              <OnlineResource xlink:href="http://megara.tamu.edu:8080/thredds/wcs/gcoos/seadas_sst"/>
            </Get>
          </HTTP>
        </DCPType>
      </DescribeCoverage>
      <GetCoverage>
        <DCPType>
          <HTTP>
            <Get>
              <OnlineResource xlink:href="http://megara.tamu.edu:8080/thredds/wcs/gcoos/seadas_sst"/>
            </Get>
          </HTTP>
        </DCPType>
      </GetCoverage>
    </Request>
    <Exception>
      <Format>application/vnd.ogc.se_xml</Format>
    </Exception>
  </Capability>

  <ContentMetadata>
    <CoverageOfferingBrief>
      <description>
        l2_flags     1   false Level-2   Processing Flags
      </description>
      <name>l2_flags</name>
      <label>Level-2 Processing Flags</label>
      <lonLatEnvelope srsName="urn:ogc:def:crs:OGC:1.3:CRS84">
        <gml:pos>-98.0 18.01039</gml:pos>
        <gml:pos>-79.01098999999999 31.0</gml:pos>
        <gml:timePosition>2009-10-12T02:59:47Z</gml:timePosition>
        <gml:timePosition>2012-01-29T20:02:58Z</gml:timePosition>
      </lonLatEnvelope>
    </CoverageOfferingBrief>
    <CoverageOfferingBrief>
      <description>
        cloud_mask    1     false
      </description>
      <name>cloud_mask</name>
      <label/>
      <lonLatEnvelope srsName="urn:ogc:def:crs:OGC:1.3:CRS84">
        <gml:pos>-98.0 18.01039</gml:pos>
        <gml:pos>-79.01098999999999 31.0</gml:pos>
        <gml:timePosition>2009-10-12T02:59:47Z</gml:timePosition>
        <gml:timePosition>2012-01-29T20:02:58Z</gml:timePosition>
      </lonLatEnvelope>
    </CoverageOfferingBrief>
    <CoverageOfferingBrief>
      <description>
        seadas_sst   Celsius  false Sea Surface  Temperature
      </description>
      <name>seadas_sst</name>
      <label>Sea Surface Temperature</label>
      <lonLatEnvelope srsName="urn:ogc:def:crs:OGC:1.3:CRS84">
        <gml:pos>-98.0 18.01039</gml:pos>
        <gml:pos>-79.01098999999999 31.0</gml:pos>
        <gml:timePosition>2009-10-12T02:59:47Z</gml:timePosition>
        <gml:timePosition>2012-01-29T20:02:58Z</gml:timePosition>
      </lonLatEnvelope>
    </CoverageOfferingBrief>
  </ContentMetadata>
</WCS_Capabilities>

If you don’t want to spend a lot of time trying to construct WCS or WMS URL strings by hand, you need to find or build a client.

11.16.14. WCS and WMS Libraries with OWSLib

Probably the best open source solution to creating a WCS or WMS client is OWSLib, a Python package for client programming with OGC web service standards. It presently supports WMS version 1.1.1 and WCS version 1.1.0. It can be found at:

OWSLib requires a couple of other Python packages as prerequisites, but it and those packages are easy enough to install if you’ve got any familiarity with Python. The documentation is a bit sketchy but good enough for building a basic client. An example of such a client can be found at:

11.17. WebForm Service

This…

11.18. ncISO Service

11.18.2. Overview

A set of services called ncISO are available for serving metadata. The ncISO tool underlying the services traverses a THREDDS catalog, reads the dataset documentation, and translates that documentation into different views - e.g. the NcML, ISO 19116, and UDDC formats described below - using Extensible Stylesheet Language Transformations (XSLT). The three available services are:

  • NCML - Creates an NcML representation of the dataset’s structure and metadata;

  • ISO - Creates an ISO 19115 metadata representation of the dataset; and

  • UDDC - Performs an evaluation of how well the metadata contained within the dataset conforms to the NetCDF Attribute Convention for Data Discovery (NACDD).

The ISO 19115 standard defines the schema required for describing geographic information and services. It provides information about the identification, extent, quality, spatial and temporal schema, spatial reference, and distribution of digital geographic data.

The NACCD describes the NetCDF attributes recommended for describing a NetCDF dataset to discovery systems such as digital libraries. A detailed description of these attributes can be found at:

11.18.3. Configuring the Services

In TDS version 4.2.8 and later, the ncISO services are disabled by default but can be enabled for locally served datasets by including the following XML code in the threddsConfig.xml file.

<NCISO>
  <ncmlAllow>true</ncmlAllow>
  <uddcAllow>true</uddcAllow>
  <isoAllow>true</isoAllow>
</NCISO>

If you are using a pre-4.2.8 version of TDS, please update it to version 4.2.8 or greater. If you have a reason other than sheer perversity to not do this, then the enabling procedure is a bit more difficult and explained at:

11.18.4. Using the Services

Once the ncISO services have been enabled, datasets can be configured to have them in the same way as the other services have been supplied. The service element and base attribute values required are:

<service name="ncml" serviceType="NCML" base="/thredds/ncml/"/>
<service name="uddc" serviceType="UDDC" base="/thredds/uddc/"/>
<service name="iso" serviceType="ISO" base="/thredds/iso/"/>

11.18.5. Sample NCML Service Response

For the NetCDF dataset mch_wind_frc.nc located at:

and which has the NetCDF header information (as extracted with ncdump -h):

netcdf mch_wind_frc {
dimensions:
        x_rho = 128 ;
        eta_rho = 64 ;
        s_rho = 30 ;
        wind_time = 1089443 ;
variables:
        double wind_time(wind_time) ;
                wind_time:units = "days since 1970-01-01" ;
        double Uwind(wind_time) ;
                Uwind:units = "N-S wind speed [m/s]" ;
        double Vwind(wind_time) ;
                Vwind:units = "E-W wind speed [m/s]" ;

// global attributes:
                :about = "wind from noaa burl1, 42040 and 42007" ;
                :author = "Rob Hetland, Martinho MA, TAMU" ;
}

the response to the NCML service request at:

will yield the following NcML representation of the file:

<netcdf location="http://megara.tamu.edu:8080/thredds/dodsC/mch_inputs/mch_wind_frc.nc">
  <attribute name="about" value="wind from noaa burl1, 42040 and 42007"/>
  <attribute name="author" value="Rob Hetland, Martinho MA, TAMU"/>
  <attribute name="time_coverage_resolution" value="0.010043129072991406 days"/>
  <attribute name="time_coverage_units" value="days since 1970-01-01"/>
  <attribute name="time_coverage_end" value="2010-12-31T23:00:00Z"/>
  <attribute name="time_coverage_start" value="1981-01-16T12:59:59Z"/>
  <attribute name="thredds_opendap_service" value="http://megara.tamu.edu:8080/thredds/dodsC/mch_inputs/mch_wind_frc.nc"/>
  <dimension name="x_rho" length="128"/>
  <dimension name="eta_rho" length="64"/>
  <dimension name="s_rho" length="30"/>
  <dimension name="wind_time" length="1089443"/>
  <variable name="Uwind" shape="wind_time" type="double">
    <attribute name="units" value="N-S wind speed [m/s]"/>
  </variable>
  <variable name="Vwind" shape="wind_time" type="double">
    <attribute name="units" value="E-W wind speed [m/s]"/>
  </variable>
  <variable name="wind_time" shape="wind_time" type="double">
    <attribute name="units" value="days since 1970-01-01"/>
    <attribute name="_CoordinateAxisType" value="Time"/>
  </variable>
</netcdf>

11.18.6. Sample ISO Service Reponse

The ISO service response to the same file used in the NCML service response example above is found at:

and, as seen below, is very much longer than the NCML service response. This kind of file is created by a machine to be read by other machines. It is not really supposed to be read by humans attempting to find information about this dataset.

<gmi:MI_Metadata xsi:schemaLocation="http://www.isotc211.org/2005/gmi http://www.ngdc.noaa.gov/metadata/published/xsd/schema.xsd">
  <gmd:fileIdentifier gco:nilReason="missing"/>
  <gmd:language>
    <gmd:LanguageCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#gmd:LanguageCode"
             codeListValue="eng">eng</gmd:LanguageCode>
  </gmd:language>
  <gmd:characterSet>
    <gmd:MD_CharacterSetCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#gmd:MD_CharacterSetCode"
             codeListValue="UTF8">UTF8</gmd:MD_CharacterSetCode>
  </gmd:characterSet>
  <gmd:hierarchyLevel>
  <gmd:MD_ScopeCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#gmd:MD_ScopeCode"
             codeListValue="dataset">dataset</gmd:MD_ScopeCode>
  </gmd:hierarchyLevel>
  <gmd:hierarchyLevel>
    <gmd:MD_ScopeCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#gmd:MD_ScopeCode"
             codeListValue="service">service</gmd:MD_ScopeCode>
  </gmd:hierarchyLevel>
  <gmd:contact gco:nilReason="unknown"/>
  <gmd:dateStamp gco:nilReason="unknown"/>
  <gmd:metadataStandardName>
    <gco:CharacterString>ISO 19115-2 Geographic Information - Metadata Part 2 Extensions for imagery
             and gridded data
    </gco:CharacterString>
  </gmd:metadataStandardName>
  <gmd:metadataStandardVersion>
    <gco:CharacterString>ISO 19115-2:2009(E)</gco:CharacterString>
  </gmd:metadataStandardVersion>
  <gmd:spatialRepresentationInfo>
    <gmd:MD_GridSpatialRepresentation>
      <gmd:numberOfDimensions>
        <gco:Integer>4</gco:Integer>
      </gmd:numberOfDimensions>
      <gmd:axisDimensionProperties>
        <gmd:MD_Dimension>
          <gmd:dimensionName>
            <gmd:MD_DimensionNameTypeCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#gmd:MD_DimensionNameTypeCode"
                   codeListValue="temporal">temporal</gmd:MD_DimensionNameTypeCode>
          </gmd:dimensionName>
          <gmd:dimensionSize gco:nilReason="unknown"/>
          <gmd:resolution>
            <gco:Measure uom="unknown">0.010043129072991406 days</gco:Measure>
          </gmd:resolution>
        </gmd:MD_Dimension>
      </gmd:axisDimensionProperties>
      <gmd:axisDimensionProperties>
        <gmd:MD_Dimension id="x_rho">
          <gmd:dimensionName>
            <gmd:MD_DimensionNameTypeCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#gmd:MD_DimensionNameTypeCode"
                   codeListValue="unknown">unknown</gmd:MD_DimensionNameTypeCode>
          </gmd:dimensionName>
          <gmd:dimensionSize>
            <gco:Integer>128</gco:Integer>
          </gmd:dimensionSize>
          <gmd:resolution gco:nilReason="missing"/>
        </gmd:MD_Dimension>
      </gmd:axisDimensionProperties>
      <gmd:axisDimensionProperties>
        <gmd:MD_Dimension id="eta_rho">
          <gmd:dimensionName>
            <gmd:MD_DimensionNameTypeCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#gmd:MD_DimensionNameTypeCode"
                  codeListValue="unknown">unknown</gmd:MD_DimensionNameTypeCode>
          </gmd:dimensionName>
          <gmd:dimensionSize>
            <gco:Integer>64</gco:Integer>
          </gmd:dimensionSize>
          <gmd:resolution gco:nilReason="missing"/>
        </gmd:MD_Dimension>
      </gmd:axisDimensionProperties>
      <gmd:axisDimensionProperties>
        <gmd:MD_Dimension id="s_rho">
          <gmd:dimensionName>
            <gmd:MD_DimensionNameTypeCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#gmd:MD_DimensionNameTypeCode"
                  codeListValue="unknown">unknown</gmd:MD_DimensionNameTypeCode>
          </gmd:dimensionName>
          <gmd:dimensionSize>
            <gco:Integer>30</gco:Integer>
          </gmd:dimensionSize>
          <gmd:resolution gco:nilReason="missing"/>
        </gmd:MD_Dimension>
      </gmd:axisDimensionProperties>
      <gmd:axisDimensionProperties>
        <gmd:MD_Dimension id="wind_time">
          <gmd:dimensionName>
            <gmd:MD_DimensionNameTypeCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#gmd:MD_DimensionNameTypeCode"
                  codeListValue="unknown">unknown</gmd:MD_DimensionNameTypeCode>
          </gmd:dimensionName>
          <gmd:dimensionSize>
            <gco:Integer>1089443</gco:Integer>
          </gmd:dimensionSize>
          <gmd:resolution gco:nilReason="missing"/>
        </gmd:MD_Dimension>
      </gmd:axisDimensionProperties>
      <gmd:cellGeometry>
        <gmd:MD_CellGeometryCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#gmd:MD_CellGeometryCode"
                  codeListValue="area">area</gmd:MD_CellGeometryCode>
      </gmd:cellGeometry>
      <gmd:transformationParameterAvailability