Conversion of from HDF4 to HDF5:
"Hybrid" HDF-EOS Files

Robert E. McGrath
Muqun Yang
National Center for Supercomputing Applications
University of Illinois, Urbana Champaign

February 14, 2002


Revised March 4, 2002

Contents


Acknowledgments

References

Appendix1

Appendix2

Appendix3


1. Introduction

Version 2.6 and earlier of HDF-EOS were built on top of the HDF4 library (this is now termed HDFEOS4) [2].  Version 5 of HDF-EOS is built on top of HDF5 [3]. Files created with HDFEOS4 cannot be read with HDFEOS5 and vice versa.  In some cases, programs will use data in on or both formats, with multiple reader or writer modules. In other cases, it may be desirable to convert older files from HDFEOS4 to an equivalent HDFEOS5 file.

Since the HDF-EOS objects are equivalent, files can be translated by reading the HDFEOS4 file and writing an equivalent HDFEOS5 file. For example, the heconvert program [4] converts an HDFEOS4 file to an equivalent HDF-EOS 5 file. All the EOS objects--the Grid, Swath, and Point objects, and associated metadata--are read from the HDFEOS4 (HDF4) file and written to an equivalent HDFEOS5 (HDF5) file.

This is not the end of the story, however. Most HDF-EOS data products contain standard HDF objects as well as the HDF-EOS objects. In addition to the Grid, Swath, and Points managed by the HDFEOS library, these "hybrid" files may also contain:

These HDF objects are created through calls to the HDF4 library, they are not managed by the HDF-EOS library.

The heconvert utility converts only the objects and metadata managed by the HDF-EOS library, and consequently cannot convert other HDF objects that may be present. The result is that the file created by heconvert may omit some of the objects and metadata form the original. To fully convert "hybrid" files, it is necessary to read the additional HDF4 objects, and write equivalent HDF5 objects.

The NCSA HDF4 to HDF5 Conversion Library (h3toH5 Library)[5], is a library of routines that reads individual HDF objects or groups of objects from an HDF4 file and writes equivalent HDF5 objects to an HDF5 file, using a default translation [6].  This library can be used by C applications to create a custom conversion for specific data products.

In this experiment, the heconvert utility was augmented using h3toH5 Library. In addition to the standard conversion of HDF-EOS objects, the experimental program identifies and converts non-EOS objects in an EOS dataset, creating a more complete HDFEOS5 file.


2. Method

2.1. Sample Data

A sample of HDF-EOS files was selected from the EOS-SAMPLER CD [7] (Table 1). The heosls utility [2] was used to summarize the objects in the file (Table 1, left column). In each case, the file has one or more HDF-EOS object (Grid, Swath, Point) and corresponding StructMetadata.0. Each file also contains standard HDF4 objects (VData tables, SDS datasets, annotations). These latter objects are not managed by the HDF-EOS library.  They are product-specific, created by the data processing software and written directly using the HDF4 library. Thus, these files are different examples of "hybrid" files, containing both HDF-EOS and regular HDF objects.

Table 1. Sample Data Used (from [7])
Case 
(link to heosls)
Original File From HDF-EOS SAMPLER HDF4 size (bytes)
ceres CER_ES8_Terra-FM2_Test_SCF_016011.20000830.
subset_70_20_-140_-40.20001012_204110Z.hdf
76,196,301
aster ASTL1B_0008301851.hdf 124,464,518
mod02hk MODIS/MOD02HKM.A2000242.0140.002.2000247230108.hdf 275,064,875
mod04-242 MODIS/MOD04_L2.A2000242.0140.002.2000264223516.hdf 10,455,467
mod04-243 MODIS/MOD04_L2.A2000243.1850.002.2000252164712.hdf 10,455,471
mod05 MODIS/MOD05_L2.A2000243.1850.002.2000252164414.hdf 20,149,501
mod06 MODIS/MOD06_L2.A2000243.1850.002.2000252173103.hdf 69,590,548
mod35 MODIS/MOD35_L2.A2000243.1850.002.2000244222700.hdf 47,404,592

For example, the heosls utility [2] shows the contents of ceres.hdf  (slightly edited for space):

FILE NAME: ./ceres.hdf                
NCSA HDF Version 4.1 Release 2, March 1998                                    
HDF-EOS Version: HDFEOS_V2.6                                                  
"CERES_ES8_subset"  SWATH                                                     
"StructMetadata.0"  Global Attribute                                          
"coremetadata"  Global Attribute                                              
"archivemetadata"  Global Attribute                                           
"SubsetMetadata.0"  Global Attribute                                          
VDATA   "CERES_metadata"  (CERES)
This file contains one Swath, and the corresponding StructMetadata.0.  The HDF file also contains four important HDF4 objects: The Vdata and annotations are not managed by the HDF-EOS library.
 

The other test files had different objects, but all had both EOS and non-EOS objects.

2.2 Procedure

The experiment used the heconvert utility [4] to convert the sample data files from HDFEOS4 to HDFEOS5. The heconvert utility was augmented with calls using the prerelease of the NCSA h3toH5 Library [5], in order to additionally convert some or all of the non-EOS objects.

The software configuration is summarized in Table 2.


 
Table 2. Summary of Configuration
heconvert ?
HDF-EOS (4)  V. 2.6
HDF4  4.1.R5
HDF-EOS 5 5.1
HDF5 5.1.4.2-patch1
libh3toh5 1.0 beta (pre release)
Sun SPARC 5 Solaris 2.7
NFS partition 10Mbit/s network

In the control condition, the heconvert utility was run on the test input file, with a command similar to:

heconvert -i ceres.hdf -o ceres-cont.he5
For the experimental condition, the source code to heconvert was modified to add a single subroutine that attempts to convert the regular HDF4 objects in the file into the HDFEOS5 file. The subroutine has a series of calls to the h3toH5 Library [5]. These calls locate and read the objects of interest in the HDF4 file, and write equivalent objects to the HDF5 file. The experimental code handles:

A sketch of this code is given in Appendix 3. The experimental code is called when the '-hybrid' option is selected, e.g.:

heconvert -hybrid -i ceres.hdf -o ceres-exper.he5
Each conversion was repeated at least five times to estimate a best case time. The output files were examined with the standard h5dump utility and other tools.


3. Results

Table 3 shows the conversion times and the size of the converted HDF5 files. As would be expected, the converted files are almost the same size as the input file. The files from the experimental condition were slightly larger than the control, reflecting the additional objects that were converted.
 
Table 3. Results: Size of output file (bytes) and run time of conversion (mm:ss).
(Links from col. 3 and 5 to listings of the output files.)
Data file HDF4 size Control: HDF5 size Conversion time (mm:sec) Exper: HDF5 size Conversion time (mm:ss)
mod04-242 10,455,467 11,255,895 0:31 11,314,359 0:32
mod04-243 10,455,471 11,255,895 0:32 11,314,367 0:32
mod05 20,149,501 20,906,324 0:50 20,964,036 0:50
mod35 47,404,592 47,623,160 2:00 47,682,720 2:03
mod06 69,590,548 69,155,580 3:00 69,216,068 2:59
ceres 76,196,301 77,127,264 2:51 77,183,172 2:13
aster 124,464,518 123,872,432 5:40 123,982,208 5:39
mod02hk 275,064,875 275,876,364 11:11 276,082,296 11:10

The conversion times varied because of system and network load.  The times reported in Table 3 are the best observed time from at least 5 trials.  As would be expected, these times are highly correlated with the size of the data, and show a conversion rate of about 700-800 KB/s across the different files.  The network disk has a maximum theoretical speed of about 1100 KB/s, so the conversion appears to be substantially I/O bound, as might be expected.

The content of the output files was examined using the h5dump utility. As expected, in the control condition, the product specific objects were not copied by heconvert, and consequently they are missing from the output file.

For example, in the case of the ceres.hdf file, the control conversion created an output file with the Swath and the StructMetadata.0. The other objects are not present in the output. (Appendix 1.)

The other datasets were similar: the HDF-EOS objects are converted, but the product specific objects are not.

In the control condition, the output files have all the objects converted by the control condition, plus some or all of the other objects. The listings are linked from the file size in Table 3.

For example, for the ceres.hdf file, the HDF5 file contains the same HDF5 objects for the Swath and the HDF-EOS metadata as the control. In addition, the product specific annotations are copied to the output file (as attributes of "/") and the Vdata is created (as a Compound Dataset under "/"). Appendix 2 shows a summary of the HDF-5 file. The non-EOS HDF4 objects are highlighted.

In some cases, the demonstration program does not convert all of the HDF4 objects. For example, the aster.hdf dataset has several Vgroups with product-specific Vdatas and SDSs (e.g., the Vgroup "Ancillary_Data"). These objects were not converted, so the output HDF5 file is still incomplete.

The h3toH5 Library can convert Vgroups and their members. However, in order for a general purpose program to identify which Vgroups are from HDF-EOS and which are not, it is necessary to check every Vgroup individually.  This was not attempted for this experiment. We would expect that user's would create product specific conversion utilities, in which case the objects that need to be converted from the HDF4 file should be well understood for each product.


4. Discussion and Conclusions

This experiment demonstrates the use of the h3toH5 Library to construct a customized conversion utility for "hybrid" HDF-EOS files. The standard heconvert utility was extended with approximately fifty lines of code to handle several classes of standard HDF4 files. The conversion accurately detects many of the additional objects and performs a complete conversion into the HDF5 file.  The running time is essentially identical to the original utility.

It should be noted that the h3toH5 Library performs a default conversion of the HDF4 objects, which may not be the desired result in all cases.  It seems likely that when a data product is developed for HDF5, it will be designed to use HDF5 most effectively, which need not and should not be expected to conform to the default mapping in [6].

For example, the way HDFEOS5 stores the Grid, Swath, and Point objects in HDF5 is not the default conversion of the constituent HDF4 objects. Note, for example, that in HDFEOS4 the 'StructMetadata.0' is stored as a global annotation. In HDFEOS5, this is stored as a Dataset (which is definitely the best choice), rather than an attribute. This is an example of a case where the design for HDF5 should not follow the default mapping.

The non-HDF-EOS objects in datasets may well deserve non-default conversions as well. For example, in the ceres.hdf dataset, the HDF4 objects are  created in a default location (under "/"), and the attributes have default names: such as "coremetadata_GLOSDS", etc.  For a realistic conversion, it is likely that these would be put in more appropriate locations in the output file, under the Group "/HDFEOS/ADDITIONAL", or some other place in the file.  Thus, a product specific converter should design the desired HDF5 file, and then create custom conversion to implement it.

The h3toH5 Library API has optional parameters which can customize the conversion. For example, the group and name of the HDF5 object to be created can be specified. In this demonstration, the conversion used the default locations and names for the objects it created. For a product-specific conversion, parameters could be set to implement the desired layout of the HDF5 file. The h3toH5 Library cannot do all possible conversions, there will likely be cases where the conversion must be specifically designed for a dataset or project.  For example, converting the annotation 'StructMetadata.0' to an HDF5 Dataset cannot be done by the h3toH5 Library.

It is important to point out that the h3toH5 Library can be mixed with other calls to HDF4 and HDF5.  It would be possible to insert one or more objects from an HDF4 file (or from several different HDF4 files) into an HDF5 file along with other objects written through HDF5.  Also, the converted objects can be modified after they are converted. For example, we have used the NCSA H5View [9] program to delete and rename attributes created by the conversion library that are not needed in HDF5.

In conclusion, this experiment shows that the h3toH5 Library provides a toolkit to more easily construct conversion utilities for NASA HDFEOS files. Specifically, we showed that the h3toH5 Library could be used to extend the heconvert utility to handle at least some hybrid files.

This toolkit might be used to create a standard converter for standardized data products that are defined in both HDF4 and HDF5. It might also be used for more ad hoc conversions, e.g., for a small science team that needs to convert HDF output from a program to be read using HDF5, or a future data service that needs to construct a value added data product based on data from both HDFEOS4 and HDFEOS5.

Overall, it is clear that it is feasible to convert HDFEOS4 files to HDFEOS5 when needed. For some purposes and users, it may be sufficient to continue to use current HDFEOS4 data and add future HDFEOS5 data when needed. In other cases, it may be necessary to migrate software to HDFEOS5, and to convert HDFEOS4 data into HDFEOS5. In this experiment, we have show that both of these options are technically viable.

Acknowledgments

This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA grants NAG 5-2040 and NCC5-599. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration.

Other support provided by NCSA and other sponsors and agencies [10].


References

1. HDF

2. HDF-EOS 4 (HE4)

3. HDF-EOS 5 (HE5)

4. heconvert

5. HDF (4.x) and HDF5

6. Mapping HDF4 Objects to HDF5 Objects

7. "EOSDIS Terra Data Sampler  #1", 2000

8. "HDFEOS ASCII dumper"

9. Java-HDF5

10. Acknowledgements


Appendix 1.  Summary of HDF-5 file, control condition


Output of h5dump -H, edited for space.  See also the file.

HDF5 "ceres-control.he5" {
GROUP "/" {
   GROUP "HDFEOS" {
      GROUP "ADDITIONAL" {
         GROUP "FILE_ATTRIBUTES" {
         } 
      } 
      GROUP "SWATHS" {
         GROUP "CERES_ES8_subset" {
            GROUP "Data Fields" {
               DATASET "CERES LW flux at TOA" {} 
               DATASET "CERES LW unfiltered radiance" {} 
               DATASET "CERES SW filtered radiance" {} 
               DATASET "CERES SW flux at TOA" {} 
               DATASET "CERES SW unfiltered radiance" {} 
               DATASET "CERES TOT filtered radiance" {} 
               DATASET "CERES WN filtered radiance" {} 
               DATASET "CERES WN unfiltered radiance" {} 
               DATASET "CERES relative azimuth at TOA" {} 
               DATASET "CERES solar zenith at TOA" {} 
               DATASET "CERES viewing zenith at TOA" {} 
               DATASET "Colatitude of Sun at observation" {} 
               DATASET "Colatitude of satellite nadir at record end" {} 
               DATASET "Colatitude of satellite nadir at record start" {} 
               DATASET "ERBE scene identification at observation" {} 
               DATASET "Earth-Sun distance at record start" {} 
               DATASET "Longitude of Sun at observation" {} 

               DATASET "Longitude of satellite nadir at record end" {} 
               DATASET "Longitude of satellite nadir at record start" {} 
               DATASET "Rapid retrace flag words" {} 
               DATASET "SW channel flag words" {} 
               DATASET "Scanner FOV flag words" {} 
               DATASET "TOT channel flag words" {} 
               DATASET "WN channel flag words" {} 
               DATASET "X component of satellite position at record end" {} 
               DATASET "X component of satellite position at record start" {} 
               DATASET "X component of satellite velocity at record end" {} 
               DATASET "X component of satellite velocity at record start" {} 
               DATASET "Y component of satellite position at record end" {} 
               DATASET "Y component of satellite position at record start" {} 
               DATASET "Y component of satellite velocity at record end" {} 
               DATASET "Y component of satellite velocity at record start" {} 
               DATASET "Z component of satellite position at record end" {} 
               DATASET "Z component of satellite position at record start" {} 
               DATASET "Z component of satellite velocity at record end" {} 
               DATASET "Z component of satellite velocity at record start" {} 
            } 
            GROUP "Geolocation Fields" {
               DATASET "Colatitude of CERES FOV at TOA" {} 
               DATASET "Longitude of CERES FOV at TOA" {} 
               DATASET "Time of observation" {} 
            } 
            GROUP "Profile Fields" {
            } 
         } 
      } 
   } 
   GROUP "HDFEOS INFORMATION" {
      ATTRIBUTE "HDFEOSVersion" { 
      } 
      DATASET "StructMetadata.0" {
      } 
   } 
} 
}

Appendix 2.  Description of HDF5 file for experimental condition.

Output of h5dump -H, edited for space.  See also the file. Non-EOS objects highlighted.
HDF5 "ceres-exper.he5" {
GROUP "/" {
   ATTRIBUTE "coremetadata_GLOSDS" {
   } 
   ATTRIBUTE "archivemetadata_GLOSDS" { 
   } 
   ATTRIBUTE "SubsetMetadata.0_GLOSDS" {
   } 
   DATASET "CERES_metadata" {
      DATATYPE  H5T_COMPOUND {
         "SHORTNAME";
         "RANGEBEGINNINGDATE";
         "RANGEBEGINNINGTIME";
         "RANGEENDINGDATE";
         "RANGEENDINGTIME";
         "AUTOMATICQUALITYFLAG";
         "AUTOMATICQUALITYFLAGEXPLANATION";
         "ASSOCIATEDPLATFORMSHORTNAME";
         "ASSOCIATEDINSTRUMENTSHORTNAME";
         "LOCALGRANULEID";
         "LOCALVERSIONID";
         "CERPRODUCTIONDATETIME";
         "NUMBEROFRECORDS";
         "PRODUCTGENERATIONLOC";
      }        
      ATTRIBUTE "HDF4_OBJECT_TYPE" {
      } 
      ATTRIBUTE "HDF4_OBJECT_NAME" {
      } 
      ATTRIBUTE "HDF4_REF_NUM" {} 
   } 
   GROUP "HDFEOS" {
      GROUP "ADDITIONAL" {

         GROUP "FILE_ATTRIBUTES" {
         } 
      } 

      GROUP "SWATHS" {
         GROUP "CERES_ES8_subset" {
            GROUP "Data Fields" {
               DATASET "CERES LW flux at TOA" {} 
               DATASET "CERES LW unfiltered radiance" {} 
               DATASET "CERES SW filtered radiance" {} 
               DATASET "CERES SW flux at TOA" {} 
               DATASET "CERES SW unfiltered radiance" {} 
               DATASET "CERES TOT filtered radiance" {} 
               DATASET "CERES WN filtered radiance" {} 
               DATASET "CERES WN unfiltered radiance" {} 
               DATASET "CERES relative azimuth at TOA" {} 
               DATASET "CERES solar zenith at TOA" {} 
               DATASET "CERES viewing zenith at TOA" {} 
               DATASET "Colatitude of Sun at observation" {} 
               DATASET "Colatitude of satellite nadir at record end" {} 
               DATASET "Colatitude of satellite nadir at record start" {} 
               DATASET "ERBE scene identification at observation" {} 
               DATASET "Earth-Sun distance at record start" {} 
               DATASET "Longitude of Sun at observation" {} 
               DATASET "Longitude of satellite nadir at record end" {} 
               DATASET "Longitude of satellite nadir at record start" {} 
               DATASET "Rapid retrace flag words" {} 
               DATASET "SW channel flag words" {} 
               DATASET "Scanner FOV flag words" {} 
               DATASET "TOT channel flag words" {} 
               DATASET "WN channel flag words" {} 
               DATASET "X component of satellite position at record end" {} 
               DATASET "X component of satellite position at record start" {} 
               DATASET "X component of satellite velocity at record end" {} 
               DATASET "X component of satellite velocity at record start" {} 
               DATASET "Y component of satellite position at record end" {} 
               DATASET "Y component of satellite position at record start" {} 
               DATASET "Y component of satellite velocity at record end" {} 
               DATASET "Y component of satellite velocity at record start" {} 
               DATASET "Z component of satellite position at record end" {} 
               DATASET "Z component of satellite position at record start" {} 
               DATASET "Z component of satellite velocity at record end" {} 
               DATASET "Z component of satellite velocity at record start" {} 
            } 
            GROUP "Geolocation Fields" {
               DATASET "Colatitude of CERES FOV at TOA" {} 
               DATASET "Longitude of CERES FOV at TOA" {} 
               DATASET "Time of observation" {} 
            } 
            GROUP "Profile Fields" {
            } 
         } 
      } 
   } 
   GROUP "HDFEOS INFORMATION" {
      ATTRIBUTE "HDFEOSVersion" {
      } 
      DATASET "StructMetadata.0" {} 
   } 
} 
}

Appendix 3.  Sketch of the demonstration code.

void get_the_rest(char * h3file, char *h5file);

int main (int argc, char *argv[])
{
 /* ... */
        status = DoSwathConversion(hdf4Info);
 /* ... */
        status = DoGridConversion(hdf4Info);
 /* ... */
        status = DoPointConversion(hdf4Info);

/***************************************************************************
  All EOS objects have been recreated in the output file.
  Now pick up at least some of the regular HDF objects, i.e.,
  not managed by HDF-EOS.
***************************************************************************/
    if (convertHybrid == CONVERT_TRUE) {
        get_the_rest(inNameGlobal, outNameGlobal);
        /* check for errors ... */
    }

    return 0;
}

/*
 *  Demonstration of the use of libh3toh5
 *
 *  This routine finds some of the HDF objects that are not managed
 *  by the HDF-EOS library and does a default conversion.
 *
 *  This is a generic routine, you would write a customized 
 *  version for specific data products.
 */
void get_the_rest(char * h3file, char *h5file)
{
hid_t h5fid;
hid_t h5gid;
hid_t h5aid;
hid_t h35id;
herr_t res;
int32 nvd;

    h35id = h3toh5open(h3file, h5file, h325_OPEN);

    /*
     *  HDF-EOS doesn't use file annotations, so any we find
     *  should be converted.
     */
    h3toh5annofil_alldescs(h35id);
    h3toh5annofil_alllabels(h35id);

    /*
     *  HDF-EOS uses SDS Global attributes for 'StructMetadata.0',
     *  and HDFEOSVersion.
     *
     *  Other SDS global attributes have important product 
     *  metadata but are not managed by HDF-EOS.
     *
     *  This section moves them all to default HDF-5 attributes,
     *  and then get rid of the ones moved by heconvert.
     */
    h3toh5_glosdsattr(h35id); /* this moves everything */

    /* Find duplicate HDFEOSVersion, StructMetadata.0 and delete
     *  from output file.
     */
    h5fid = H5Fopen(h5file,H5F_ACC_RDWR, H5P_DEFAULT);
    if (h5fid >=0 ) {
        h5gid = H5Gopen(h5fid,"/");
        res = H5Adelete(h5gid, "HDFEOSVersion_GLOSDS" ) ;
        res = H5Adelete(h5gid, "StructMetadata.0_GLOSDS" ) ;
        H5Gclose(h5gid);
        H5Fclose(h5fid);
    }

   /*
    *  The file may have images, e.g. a browse image.
    *
    *  Find them and move to HDF5.
    */
    res = h3toh5allloneimage(h35id, "/", NULL, h325_ALLATTRS, h325_PAL);

   /*
    *  The file may have Vdata tables.
    *
    *  Find them and move to HDF5.
    */

    nvd = h3toh5alllonevdata(h35id, "/", h325_ALLATTRS);
   /*
    *  The file may have SDS datasets.
    *
    *  Find them and move to HDF5.
    */

    res = h3toh5alllonesds(h35id, "/", NULL, h325_DIMSCALE, h325_ALLATTRS);
   /*
    *  ... there may be other objects as well, which need to be located
    *    and converted.
    */

    h3toh5close(h35id);
}

Appendix 4. Configuration and build

When compiling, it is necessary to compile/link with HDF4, HDF-EOS2.x, HDF5, HDF-EOS5.x, and libh3toh5.  The following gives a sketch of a Unix makefile.
#
#  Need paths to everything!
#

# HDF4 and HDFEOS2.x
HDF=/usr/local/hdf
HE2=/usr/local/hdfeos

# HDF5 and HDFEOS5.x
HE5=/usr/local/hdfeos5
HDF5=/usr/local/hdf5

# The libh3toh5 installed
h35=/usr/local/h3toh5

CFLAGS = -I$(HDF5)/include -I$(HE2)/include -I$(HE5)/include -I$(HDF)/include -I$(h35)/include

LIBS=-L$(h35)/lib -L$(HDF5)/lib -L$(HDF)/lib -L$(HE2)/lib/sun5 -L$(HE5)/lib/sun5 -L$(h35LIB)/lib -lh3toh5
-lhdfeos -lhe5_hdfeos -lhdf5 -lmfhdf -ldf -lz -ljpeg -lGctp -lnsl -lm -lsocket -lnsl heconvert: convert.o         $(CC) -o heconvert convert.o $(LIBS) convert.o: convert.c
For other build environments, such as Windows or Macintosh, equivalent include and library paths are needed.