|
graph sys tool lib analyze format
|
Mostly Data Formats and Storage
These are software packages for the storage and retrieval of
data in standard formats. The software usually takes the
form of a library of subroutines which are used either by
another package, e.g.
Matlab,
Ferret,
etc., or in a separate
(usually Fortran or C) program created by the user to retrieve
or store some type of data. I've listed the most common ones
used in the geosciences (at least as far as I know) below along
with a final link to even more such packages. It may seem a bit
strange to have so many choices for a "standard" data format, but
as you'll find out when delving into them a bit more deeply, there's
the beginnings of some sort of consolidation (at least amongst the
netCDF, HDF, and CDF packages listed) so eventually we might have
one (or at least very few) standard formats with which to work.
Last updated and checked on Mar. 23, 2004, just a few short years since the last update on Mar. 15, 1996. CDFThe National Space Science Data Center's (NSSDC) Common Data Format (CDF) is a self-describing data abstraction for the storage and manipulation of multidimensional data in a discipline-independent fashion. When one first hears the term "Common Data Format" one intuitively thinks of data formats in the traditional (i.e. messy/convoluted storage of data on disk or tape) sense of the word. Altho ugh CDF has its own internal self describing format, it consists of more than just a data format. C DF is a scientific data management package (known as the "CDF Library") which allows programmers an d application developers to manage and manipulate scalar, vector, and multi-dimensional data arrays . The irony of the term "FORMAT" is that the actual data format which CDF utilizes is completely tr ansparent to the user and accessible through a consistent set of interface (known as the "CDF Inter face") routines. Therefore, programmers are not burdened with performing low level I/O's to physica lly format and unformat the data file. This is all done for them. The development of CDF arose out of the recognition by the NSSDC for a class of data models that is matched to the structure of scie ntific data and the applications (i.e. statistical and numerical methods, visualization, and manage ment) they serve.[ http://nssdc.gsfc.nasa.gov/cdf/cdf_home.html] CDMS/CDMLThe Climate Data Management System is an object-oriented data management system, specialized for organizing multidimensional, gridded data used in climate analysis and simulation. Data can be obtained from files in any of the self-describing formats netCDF, HDF, GrADS/GRIB or PCMDI DRS.The Climate Data Markup Language (CDML) is the markup language used to represent data in CDMS. It is based on the XML standard. CDML is an XML dialect geared toward the representation of gridded climate datasets. CGMThe Computer Graphics Metafile is a 2D data interchange standard which allows graphical data to be stored and exchanges among graphics devices, applications and computer systems in a device-independent manner. It is a revisable, structured format that can represent vector graphics, raster graphics and text. The given URL is the NIST CGM site which contains the full CGM standard.
[
http://www.cgmopen.org/technical/cgm_standard.html]
CGNSThe CFD General Notation System (CGNS) consists of a collection of conventions, and software implementing those conventions, for the storage and retrieval of CFD (computational fluid dynamics) data. The system consists of two parts: (1) a standard format for recording the data, and (2) software that reads, writes, and modifies data in that format. The format is a conceptual entity established by the documentation; the software is a physical product supplied to enable developers to access and produce data recorded in that format.The CGNS system is designed to facilitate the exchange of data between sites and applications, and to help stabilize the archiving of aerodynamic data. The data are stored in a compact, binary format and are accessible through a complete and extensible library of functions. The API (Application Program Interface) is platform independent and can be easily implementd in C, C++, Fortran and Fortran90 applications. DRSThe Data Retrieval and Storage (DRS) library and utilities, part of the PCMDI Project, support the scientific data format used at PCMDI. This was developed to support high-volume multi-dimensional array output from general circulation models. The available source code is currently set to compile on Cray, Sun, SGI and HP platforms, although a quick perusal of it shows that it probably wouldn't be difficult to port to other platforms.[ http://www-pcmdi.llnl.gov/drach/DRS.html] ESMLAn interchange technology is an enabling technology that utilizes external metadata to allow applications to plug and play seamlessly with datasets in heterogeneous formats. An interchange technology can be utilized to solve the data/application interoperability problem.The Earth Science Markup Language (ESML) is one such interchange technology. Based on XML it consists of the ESML Schema, ESML Files and the ESML Library. ESML Files contain descriptions of the content, structure, and semantics of a particular set of data files. The ESML Schema defines rules for creating the ESML file. Because ESML Files are external files (i.e. not contained within the data files), both data producers and consumers can create and use these descriptions at any time. A key point is that the ESML Files do not modify the application or the data file itself. The ESML Library is utilized by applications to parse the ESML file and to decode the data format. Application developers can now build data format independent applications utilizing the ESML Library. Furthermore, the applications will not require modification in order to access new formats as they become available. [ http://esml.itsc.uah.edu/index2.html] GeoTIFFGeoTIFF represents an effort by over 160 different remote sensing, GIS, cartographic, and surveying related companies and organizations to establish a TIFF based interchange format for georeferenced raster imagery.[ http://www.remotesensing.org/geotiff/geotiff.html] GeoVRMLGeoVRML is an official Working Group of the Web3D Consortium. It was formed on 27 Feb 1998 with the goal of developing tools and recommended practice for the representation of geographical data using the Virtual Reality Modeling Language (VRML). The desire is to enable geo-referenced data, such as maps and 3-D terrain models, to be viewed over the web by a user with a standard VRML plugin for their web browser.GRIBThe World Meteorological Organization (WMO) Commission for Basic Systems (CBS) Extraordinary Meeting Number VIII (1985) approved a general purpose, bit-oriented data exchange format, designated FM 92-VIII Ext. GRIB (GRIdded Binary). It is an efficient vehicle for transmitting large volumes of gridded data to automated centers over high-speed telecommunication lines using modern protocols. By packing information into the GRIB code, messages (or records - the terms are synonymous in this context) can be made more compact than character oriented bulletins, which will produce faster computer-to-computer transmissions. GRIB can equally well serve as a data storage format, generating the same efficiencies relative to information storage and retrieval devices.
[
http://www.wmo.ch/web/www/WDM/Guides/Guide-binary-2.html]
HDFThe Hierarchical Data Format is a multi-object file format that facilitates the transfer of various types of data between machines and operating systems. It allows self-definitions of data content and is easily extensible for future enhancements or compatibility with other standard formats. The latest version of HDF supports the complete netCDF interface.
HDF-EOSNASA developed the HDF-EOS format with additional conventions and data types for HDF files. HDF-EOS supports three geospatial data types (grid, point, and swath), providing uniform access to diverse data types in a geospatial context. The HDF-EOS software library allows a user to query or subset the contents of a file by earth coordinates and time (if there is a spatial dimension in the data). Tools that process standard HDF files will also read HDF-EOS files; however, standard HDF library calls cannot access geolocation data, time data, and product metadata as easily as with HDF-EOS library calls.[ http://hdfeos.gsfc.nasa.gov/hdfeos/index.cfm] MarineXMLMarineXML's structure allows for complete encapsulation of all possible marine environmental parameters including metadata, quality assessments and their results and a complete history of edits made to the data throughout its entire life, thus making it the ideal archiving format.Most AODC data management systems and procedures now revolve around data in MarineXML. MarineQC semi-automatically validates incoming marine environmental data in MarineXML and writes any results within the appropriate elements. MEDI is then used to automatically extract the metadata from the MarineXML file and create a metadata record that complies with NASA's GCMD format and most fields of the national ANZLIC metadata standards. [ http://www.aodc.gov.au/products/prod/marinexml.html]
MathMLMathML is a low-level specification for describing mathematics as a basis for machine to machine communication.NetCDFThe network Common Data Form is an interface for scientific data access and a library that provides an implementation of the interface. It also defines a machine-independent format for representing data. Data stored in the netCDF format is self-describing, network transparent, direct-access, appendable, and sharable. There is a netCDF interface to HDF available. further details.[ http://www.unidata.ucar.edu/packages/netcdf/]
PNGA format for portable graphics, as you might surmise from the name. PNG unofficially stands for "PNG's Not GIF", which originates from the decision of Unisys and CompuServe to require royalties from programs using the GIF format since Unisys has a patent on the LZW compression format used therein. Besides the fact that it's not GIF, its features include an unambiguous pronunciation, multiple CRCs so that file integrity can be checked without viewing, a magic signature that can detect the most common types of file corruption, better compression than GIF, a 2-D interlacing scheme, and a non-patented free and completely referenced implementation with full source code.
[
http://www.libpng.org/pub/png/]
PyTablesPyTables is a hierarchical database package designed to efficiently manage very large amounts of data. It is built on top of the HDF5 library and the numarray package. It features an object-oriented interface that, combined with natural naming and C-code generated from Pyrex sources, makes it a fast, yet extremely easy to use tool for interactively save and retrieve very large amounts of data. Besides, it provides flexible indexed access on disk to anywhere in the data you want to go.PyTables was born because one of its authors (Francesc Alted) had a need to save lots of data in a both hierarchical and efficient way for later post-processing it. After using several approaches (ZODB, the NetCDF interface of Scientific Python, and HL-HDF5), he found that these software presented distinct inconveniences. For example, working with file sizes larger than, say, 100 MB, was rather painful with ZODB (it took a lot of memory ). The NetCDF interface provided by Scientific Python was great, but it does not allow to endow the data with a hierarchical structure; besides, NetCDF only supports homogeneous datasets, not heterogeneous datasets (i.e. tables). Finally, HL-HDF5, which is a high level interface to HDF5 library, and specially its module PyHL, was closer to what he needed, but working with tables demonstrated to be cumbersome (you need to build a Python C module containing the table definition). [ http://pytables.sourceforge.net/html/WelcomePage.html] SEDRISSEDRIS is fundamentally about two key aspects: (1) representation of environmental data, and (2) the interchange of environmental data sets.To achieve the first one, SEDRIS offers a data representation model, augmented with its environmental data coding specification and spatial reference model, so that one can articulate one's environmental data clearly, while also using the same representation model to understand others' data unambiguously. Therefore, the data representation aspect of SEDRIS is about capturing and communicating meaning and semantics. For the second part, we know from practice that it is not enough to be able to clearly represent or describe the data, we must also be able to share such data with others in an efficient manner. So the second aspect of SEDRIS is about interchange of data that can be described using the data representation model. For the interchange part, the SEDRIS API, its format, and all the associated tools and utilities play the primary role, while being semantically coupled to the data representation model. SILOSil is a library which implements an application programming interface (API) designed for reading and writing scientific data. It is a high-level, portable interface that was developed at Lawrence Livermore National Laboratory to address difficult database issues, such as different, incompatible file formats and libraries.Silo takes advantage of features in netCDf and PDB, a binary database file format developed at LLNL, to build a powerful data access mechanism and to provide a higher level view of the data. It assigns meaning to different types of objects and supports a hierarchical directory structure. Entities managed by the Silo library include not just arrays, but also meshes, mesh variables, material data, and curves. The Silo interface allows the development of generic tools. [ http://www.llnl.gov/bdiv/meshtv/] SVGSVG is a language for describing two-dimensional graphics and graphical applications in XML.[ http://www.w3.org/Graphics/SVG/]
WOMFThe Weather Observation Markup Format is an application of XML to describe a particular kind of documents: weather observation reports.[ http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/OMF.html] XDFXDF is a common scientific data format based on XML and general mathematical principles that can be used throughout the scientific disciplines. It includes these key features: hierarchical data structures, any dimensional arrays merged with coordinate information, high dimensional tables merged with field information, variable resolution, easy wrapping of existing data, user specified coordinate systems, searchable ASCII meta-data, and extensibility to new features/data formats.[ http://xml.gsfc.nasa.gov/XDF/XDF_home.html] XDMFThe eXtensible Data Model and Format (XDMF) is an active, common data hub used to pass values and metadata in a standard fashion between application modules. XDMF views data as consisting of two basic types : Light data and Heavy data. Light data is both metadata and small amounts of values. Heavy data typically consists of large arrays of values.[ http://www.arl.hpc.mil/ice/] XMMLXMML, the eXploration and Mining Markup Language, is an XML based encoding for geoscience and exploration information. It is intended to support exchange of exploration information in a wide variety of contexts. This includes between software packages on the desktop, between users and organisations, and in particular to be compatible with http.[ https://www.seegrid.csiro.au/twiki/bin/view/Xmml/WebHome] XSILThe Extensible Scientific Interchange Language (XSIL) is a flexible, hierarchical, extensible, transport language for scientific data objects.The entire object may be represented in the file, or there may be metadata in the XSIL file, with a powerful, fault-tolerant linking mechanism to external data. The language is based on XML, and is designed not only for parsing and processing by machines, but also for presentation to humans through web browsers and web-database technology. It comes with a Java object model that is designed to be extensible, so that scientific data and metadata represented in XML is available to a Java code. [ http://www.cacr.caltech.edu/SDA/xsil/] Scientific Data ManagementBelow I listed common problem and opportunities in scientific data access. Then I collected what are considered the parts of a Data Management solution. A list of references and examples of data access and scientific data collections follow.The paper ends with more implementation oriented issues: a survey of some scientific data formats, planning for a possible implementation and a survey of the supporting technologies available. [ http://www.cscs.ch/~mvalle/sdm/scientific-data-management.html] Scientific Data Format FAQA list of Frequently Asked Questions about various standard data formats. It includes links to sites with the software when and where it's available. This document was last updated in Oct. 1995.[http://fits.cv.nrao.edu/traffic/scidataformats/faq.html]
|
S. Baum
Dept. of Oceanography
Texas A&M University
baum@stommel.tamu.edu