The NASA Earth Observing System (EOS) has more than a petabyte of critical earth science data stored in the HDF4 format. It will be important to have access to these data long into the future, since they comprise a core component of the long-term climate record.
The normal way to access HDF-formatted data is through the HDF software libraries, either by using the HDF Application Programming Interface (API) directly or by using HDF tools that depend on the HDF libraries.
However, there is a risk in depending solely on the HDF libraries to access HDF-formatted data over the long term. It is possible, especially in the distant future, that the software may not be as readily available as it is today. To address this risk, it is desirable to have a way to retrieve the data independently.
A desire to read HDF4 files without relying on HDF4 libraries prompted the work described in here - the construction of text-based "maps" of the actual data in NASA's HDF4 files, allowing simple readers to be written to access the data in the files. The format selected for the maps is XML documents.
The project has five major components:
- An assessment of NASA data products in the HDF4 format, to identify all data objects for which maps should be created.
- Creation of an XML-based schema for HDF4 mapping files. The schema describes all structural and application metadata that might occur in NASA's HDF4 data products, as well as the locations in the HDF4 file of the data itself.
- Creation of a map writing tool that inspects an HDF4 files and creates a map of the file.
- Creation of a sample reader, which can serve as a model for others to develop their own reader.
- Test and validate the map writer at select NASA data centers.
In addition to these major tasks, there are two smaller tasks the project has
- Investigated ways that HDF4 map information may be integrated with existing preservation standards, such as METS and PREMIS.
- Studied the contents of EOS files to determine whether additional information needs to be included in maps, beyond the basic information about HDF objects found in the files.
Although the focus of this effort was NASA EOSDIS data stored in HDF4 files, the general methodology is also relevant to other cases where the long-term accessibility of data stored in binary files is of concern. In addition, this work demonstrates how binary HDF files can be used to efficiently store large volumes of scientific data that is referenced by text-based XML documents (the mapping files).
- - Last modified: 13 February 2014