Creating, Modifying and Sharing NetCDF Files
The NetCDF Data Format
In the geosciences - especially in oceanography - the NetCDF format is the most popular and portable format. It is used as both a standard model output format (ROMS, MITgcm, POM, etc.), as an input format for data processing and graphing applications (ncview, NCO, IDV, GMT, etc.), and as a data transport format (OpenDAP, THREDDS, etc.). Additionally, there are NetCDF interfaces for most commonly used programming languages (Fortran, C, C++, Python (Scipy/Numpy), Perl (PDL), etc.).
The NetCDF package can be found at the Unidata NetCDF site:
http://www.unidata.ucar.edu/software/netcdf/
Source code and binary versions can be obtained at:
http://www.unidata.ucar.edu/downloads/netcdf/netcdf-4_0/index.jsp
and installation instructions are at:
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-install/
What is NetCDF?
NetCDF - the Network Common Data Format - "is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data."
NetCDF data is:
What is a NetCDF File?
A typical NetCDF file has three sections: dimensions, variables and data.
The Dimensions Section
The dimensions section contains information about the size of the variables contained in the file. An example taken from one of the GNOME-ready files produced by the TABS/TGLO project is:
dimensions:
lat = 128 ;
lon = 256 ;
time = UNLIMITED ; // (24 currently)
The Variables Section
The variables section from the same file is:
variables:
double lon(lat, lon) ;
lon:long_name = "Longitude" ;
lon:units = "degrees_east" ;
lon:standard_name = "longitude" ;
double lat(lat, lon) ;
lat:long_name = "Latitude" ;
lat:units = "degrees_north" ;
lat:standard_name = "latitude" ;
double depth(lat, lon) ;
depth:long_name = "Bathymetry" ;
depth:units = "meters" ;
depth:positive = "down" ;
depth:standard_name = "depth" ;
double mask(lat, lon) ;
mask:long_name = "Land Mask" ;
mask:units = "nondimensional" ;
mask:standard_name = "land_binary_mask" ;
double u(time, lat, lon) ;
u:long_name = "Eastward Water Velocity" ;
u:units = "m/s" ;
u:missing_value = -99999. ;
u:_FillValue = -99999. ;
u:scale_factor = 1. ;
u:add_offset = 0. ;
u:standard_name = "surface_eastward_sea_water_velocity" ;
double v(time, lat, lon) ;
v:long_name = "Northward Water Velocity" ;
v:units = "m/s" ;
v:missing_value = -99999. ;
v:_FillValue = -99999. ;
v:scale_factor = 1. ;
v:add_offset = 0. ;
v:standard_name = "surface_northward_sea_water_velocity" ;
double time(time) ;
time:long_name = "Time" ;
time:units = "seconds since 2003-10-28 0:00:00 0:00" ;
time:standard_name = "time" ;
// global attributes:
:file_type = "Full_Grid" ;
:Conventions = "COARDS" ;
:grid_type = "curvilinear" ;
:z-type = "s-coordinate" ;
:model = "ROMS" ;
:title = "Forecast: wind" ;
:history = "Wed Mar 4 20:52:14 2009
The Data Section
The first two sections contained metadata. The data section contains all the numbers and is much, much bigger than the metadata sections. The first few lines from this section are:
data:
lon =
-97.4269240972859, -97.3692695804302, -97.3163689369103, -97.2648562779617,
-97.2144000472512, -97.1645639770556, -97.1151653483693,
-97.0660444411106, -97.0170886909746, -96.9682001432619,
-96.9192937914673, -96.8702906542265, -96.8211154501405,
-96.7716943071207, -96.7219533780609, -96.6718177637292,
...
Creating NetCDF Files
There are several methods for creating NetCDF files. In descending order of difficulty, they are:
1. Write a program in Fortran, C, C++ or Java containing calls to the native NetCDF libraries available for each language. The program is then compiled and linked with the appropriate library, and then run to create a NetCDF output file. A skeletal structure within a Fortran 77 program, from the NetCDF Fortran 77 Interface Guide at:
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-f77/
would look like:
CALL NF_CREATE ! create netCDF dataset: enter define mode
...
CALL NF_DEF_DIM ! define dimensions: from name and length
...
CALL NF_DEF_VAR ! define variables: from name, type, dims
...
CALL NF_PUT_ATT ! assign attribute values
...
CALL NF_ENDDEF ! end definitions: leave define mode
...
CALL NF_PUT_VAR ! provide values for variable
...
CALL NF_CLOSE ! close: save new netCDF dataset
2. Employ a scripting language (Python, Perl, etc.) for which a NetCDF wrapper exists. Run the commands interactively, or create a script file containing a set of commands. An example using the netcdf4-python module is:
# Open an input file called "filename".
ncifile = netCDF4.Dataset(filename,'r+')
# Extract a geographical subset via numpy commands.
lons = ncifile.variables['gridlon_221']
lats = ncifile.variables['gridlat_221']
lonmin = lons > -98
lonmax = lons < -80
latmin = lats > 18
latmax = lats < 31
lonlatrange = lonmin & lonmax & latmin & latmax
lonsub = lons[lonlatrange]
latsub = lats[lonlatrange]
# Extract the u velocity component field from the input file.
u = ncifile.variables['U_GRD_221_HTGL'][:]
# Extract the geographic subset from the v velocity component field.
usub = u[lonlatrange]
# Open an output file called "filenew".
ncofile = netCDF3.Dataset(filenew,'w')
# Create the output file dimensions "lon" and "lat"
ncofile.createDimension('points', nsize)
# Create the output file variables "lon" and "lat"
lons = ncofile.createVariable('longitude','f4',('points',))
lats = ncofile.createVariable('latitude','f4',('points',))
# Create the output file variable "Uwind"
Uwind = ncofile.createVariable('Uwind','f4',('time','points',))
# Create attributes for the variables "lons" and "lats".
lons.long_name = "longitude"
lons.standard_name = "longitude"
lons.units = "degrees_east"
lats.long_name = "latitude"
lats.standard_name = "latitude"
lats.units = "degrees_north"
# Write the "lons", "lats" and "u" fields to the output file.
lons[:] = lonsub
lats[:] = latsub
Uwind[:] = umid
# Write some global attributes to the output file.
ncofile.title = "NARR Project Data Subset for Gulf of Mexico ROMS Simulations"
ncofile.institution = "US National Weather Service - NCEP (WMC)"
ncofile.source = "North American Regional Reanalysis (NARR) Project"
ncofile.history = history
ncofile.note1 = "U and V components of vector quantities are resolved relative to earth NOT grid"
3. Use an application that already has the capability to create NetCDF files, e.g. ROMS.
Metadata Conventions
A NetCDF file can be created whose metadata consists of nothing more than cryptically named dimensions and variables, and no attributes. While you might remember what your cryptic names mean in 5 years (doubtful), anyone else who might want to use your file is going to be extremely confused. A remote user could also be confused even if you chose personally meaningful names for the dimensions and variables, and also added a sprinkling of attributes. These issues have led to the establishment of what are known as metadata conventions designed to promote the processing and sharing of NetCDF files. The most well-known and commonly used metadata convention is the CF Metadata Convention, the details of which can be found at:
The CF metadata conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities. The CF conventions exist chiefly as two documents, a CF Conventions manual:
http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.4
describes a set of "best practices" for creating your metadata, and the CF Standard Name Table:
http://cf-pcmdi.llnl.gov/documents/cf-standard-names/
provides a standardized list of names for quantities used in the geosciences.
NcML and ncML-Gml - The Future of Metadata
The present version of the NetCDF library stores the metadata as unstructured ASCII text strings in a format called CDL, or the Common Data form Language. Future versions will probably switch to some combination of the experimental NcML and ncML-Gml formats. The NcML project
http://www.unidata.ucar.edu/software/netcdf/ncml/
has created an XML representation of NetCDF metadata, i.e. XML syntax is used. When XML was introduced, it was touted as a universal panacea for representing any and all forms of data, and everybody rushed out to create their own XML formats. As practical considerations and limitations were inevitably encountered, it was discovered that XML was good for some things and not so good for others. In the case of NetCDF, XML is a good idea for representing the metadata. With a properly defined and published XML grammar, the NetCDF metadata can be perused and processed with the substantial amount of software that's been created to handle XML data.
The data - on the other hand - is not nearly as amenable to being represented via XML. Huge N-dimensional fields are probably best stored and handled in a binary format, which is indeed the choice that has been made by most projects that have created software to deal with this issue, e.g. NetCDF, HDF, etc. The eventual solution will almost certainly consist of an XML-based metadata section that describes what is in a binary data section.
The current instantiation of the ncML format is the NetCDF-Java at:
https://www.unidata.ucar.edu/software/netcdf-java/
which is a Java application implementing NcML - and many other useful features - that allows you to add NcML metadata to existing NetCDF files.
Another issue presently being addressed via the ncML-Gml project is how to make NetCDF files more user friendly for the rapidly growing GIS community. The ncML-Gml project
http://zeus.pin.unifi.it/joomla/index.php?option=com_content&task=view&id=50&Itemid=78%20
is an extension of NcML based on the grammar of GML, the Geography Markup Language
http://en.wikipedia.org/wiki/Geography_Markup_Language
an XML grammar defined to express geographical features. It includes primitives for such concepts as features, geometry and coordinate reference systems. Neither NetCDF or NcML presently have the capability to express the geometrical and coordinate system information needed by the GIS community for automatically viewing and processing data contained within NetCDF files. The current instantiation of the ncML-Gml extension is available at:
http://ulisse.pin.unifi.it:8080/archiva/browse/it.cnr.imaa.essi/galeon2-ncml-gml/0.9
and is a Java API for converting NetCDF-CF files into ncML-Gml files.
The goal is to eventually have a metadata format that allows all standard tools in both the earth sciences and GIS communities to automatically read and understand the contents of a NetCDF file.
Modifying and Viewing NetCDF Files
ncdump
A part of the basic NetCDF library installation, ncdump dumps or writes the contents of a binary NetCDF file to ASCII format. A quick and easy way to see if the contents of your file are what you want them to be.
ncdump data.nc | more
ncdump data.nc > data.txt
NCO (NetCDF Operators)
A set of command-line programs that perform various manipulations on NetCDF files. These can be found at:
The programs include:
ncview
A visual browser for NetCDF files, ncview can be found at:
http://meteora.ucsd.edu/~pierce/ncview_home_page.html
This plots the 2-D fields contained within NetCDF files, and will also create an animation if there is more than one time slice. It can also create time series plots from specific points on the 2-D plots.
IDV (Integrated Data Viewer)
A Java-based software framework for analyzing and visualizing geoscience data, IDV can be found at:
http://www.unidata.ucar.edu/software/idv/
and an online user's guide at:
http://www.unidata.ucar.edu/software/idv/docs/userguide/toc.html
netcdf-python
Many tools for analyzing and visualizing geoscience data have been developed in the Python scripting language. One of these is netcdf4-python, a Python module that provides a wrapper around the NetCDF4 library. It leverages the Numpy/Scipy modules that themselves provide elegant tools for dealing with large data arrays. It can be found at:
http://code.google.com/p/netcdf4-python/
matplotlib
Another very useful Python module - also built on top of Numpy/Scipy - is the 2-D plotting library matplotlib found at:
http://matplotlib.sourceforge.net/
Especially useful is the matplotlib toolkit basemap documented at:
http://matplotlib.sourceforge.net/basemap/doc/html/
This toolkit provides the means to transform coordinates to many different map projections and then plot contours, images, vectors, lines or points in the transformed coordinates. This is somewhat similar to GMT or GrADS in functionality.
Sharing NetCDF Files
Other people might be interested in using the NetCDF files you create. How do you make them easily and quickly available?
OpenDAP (Open-source Project for a Network Data Access Protocol)
The OPenDAP package allows you to make local data in a variety of data formats - including NetCDF - available to remote locations via the web. The OPeNDAP package can be found at:
A local server is installed that serves data to clients at remote locations. For example, the OPeNDAP-enabled NetCDF library allows NetCDF files halfway around the world to be accessed as simply as local files. There is also pydap, a Python implementation of the protocols available at:
http://code.google.com/p/pydap/
An example of a web interface to a local OPeNDAP server is:
http://csanady.tamu.edu/GNOME/gnome.html
THREDDS (Thematic Realtime Environmental Distributed Data Services)
Software for publishing, contributing, finding and interacting with data relating to the Earth system in a convenient, effective and integrated fashion. The THREDDS software can be found at:
http://www.unidata.ucar.edu/projects/THREDDS/
The THREDDS Data Server or TDS is a web server providing metadata and data access for scientific datasets using OPeNDAP, OGC WMS and WCS, HTTP and other data access protocols. A local example of a THREDDS server for the TGLO/TABS GNOME-ready files is:
http://csanady.tamu.edu:8080/thredds/catalog.html
While the TDS is a complicated Java program, installing and configuring are fairly simple. You make your NetCDF files available to others via a configuration file, an example of which can be found at:
http://www.unidata.ucar.edu/projects/THREDDS/tech/tutorial/BasicConfig.html