ABOUT HDF4.0 Release 1 February 7, 1996 INTRODUCTION This document describes the differences between HDF4.0r1 and HDF3.3r4. It is written for people who are familiar with previous releases of HDF and wish to migrate to HDF4.0r1. The documentation and release notes provide more in-depth information concerning the topics discussed here. The HDF 4.0 documentation can be found on the NCSA ftp server in the directory /HDF/Documentation/HDF4.0/Users_Guide. For more history behind the implementation of the items listed here, refer to the ABOUT_4.0.alpha, ABOUT_4.0b1 and ABOUT_4.0b2 files. First-time HDF users are encouraged to read the FAQ in this release for more information about HDF. Users can also look at the home page for HDF at: http://hdf.ncsa.uiuc.edu/ If you have any questions or comments, please send them to: hdfhelp@ncsa.uiuc.edu CONTENTS - Important Changes (that will affect you) - New Features and Changes - Changes in Utilities - Known Problems Important Changes: ----------------- 1. Several changes have been made to the libraries in HDF4.0 which affect the way that users compile and link their programs: * The mfhdf library has been renamed to libmfhdf.a from libnetcdf.a in previous releases. * HDF 4.0 libraries now use v5 of the Independent JPEG Group (IJG) JPEG file access library. * Gzip library libz.a is added in HDF4.0r1, in order to support "deflate" style compression of any object in HDF files. Due to these changes, users are required to specify four libraries when compiling and linking a program: libmfhdf.a, libdf.a, libjpeg.a and libz.a, even if your program does not use JPEG or GZIP compression. For example: For C: cc -o myprog myprog.c -I \ \ or cc -o myprog myprog.c -I \ -L -lmfhdf -ldf -ljpeg -lz For FORTRAN: f77 -o myprog myprog.f \ \ or f77 -o myprog myprog.f -L \ -lmfhdf -ldf -ljpeg -lz NOTE: The order of the libraries is important: libmfhdf.a first, then libdf.a, followed by libjpeg.a and libz.a. This is also discussed in Items 1, 2, and 3 of the New Features and Changes section of this document. 2. The HDF 4.0 library will ONLY compile with ANSI C compilers. See Item 4 in the New Features and Changes section of this document for more information. 3. The HDF library and netCDF library on Unix systems can now be automatically configured and built with one command. See Item 5 in the New Features and Changes section of this document for more information. 4. In HDF 4.0, the FORTRAN programs dffunct.i and constant.i have been changed to dffunct.inc and hdf.inc. See Item 16 in the New Features and Changes section of this document for more information. 5. Platforms tested on: IRIX (5.3, 6.1 (32 bit and 64 bit)), SunOS 4.1.4, Solaris (ver 2.4, 2.5), Solaris x86, HP-UX, Digital Unix, AIX, LINUX (A.OUT), CM5, YMP, FreeBSD, C90, Exemplar, Open VMS, and SP2 (single node only). HDF 4.0 is not yet available on the Macintosh for HDF4.0r1. 6. The HDF 4.0 binaries for each tested platform are available. Unix binaries are located in the bin/ directory. Binaries for Windows NT are located in the zip/ directory. New Features and Changes: ------------------------ 1. Changes to the mfhdf library The mfhdf library has been renamed to libmfhdf.a from libnetcdf.a in previous releases. To link a program with HDF4.0r1 libraries, four libraries are required: libmfhdf.a, libdf.a, libjpeg.a and libz.a. See Item 1 of 'Important Changes' for examples of how you would compile and link your programs. 2. JPEG Group v5b library HDF Version 4.0 libraries now use v5 of the Independent JPEG Group (IJG) JPEG file access library. The JPEG library will need to be linked with user's applications whether they are compressed with JPEG or not. See Item 1 of 'Important Changes' for examples of how you would compile and link your programs. 3. Gzip library added New with this release is support for gzip "deflate" style compression of any object in an HDF file. This is supported through the standard compression interface function calls (HCcreate, SDsetcompress, GRsetcompress). The ABOUT_4.0b2 file contains additional information on this. See Item 1 of 'Important Changes' for examples of how you would compile and link your programs. 4. ANSI C only As was previously noted in the HDF newsletters, this release of the HDF library will compile only with ANSI C compilers. This shift to ANSI C compliance has been accompanied by a large clean up in the source code. An attempt has been made to remove all warnings and informational messages that the compilers on supported platforms occasionally emit, but this may not be completely clean for all user sites. 5. Auto configuration Both the HDF library and netCDF library on Unix systems now use the same configure script and can be configured uniformally with one command. See the README and the INSTALL files at the top level of HDF4.0r1 for detailed instructions on configuration and installation. A consequence of the auto configuration is that on UNIX systems without FORTRAN installed, the top level config/mh- will need to have the 'FC' macros defined to "NONE" for correct configuration. 6. New version of dimension record In HDF4.0b1 and previous releases of the SDS interface, a vgroup was used to represent a dimension. The vgroup had a single field vdata with a class of "DimVal0.0". The vdata had number of records, with each record having a fake value from 0, 1, 2 ... , ( - 1). The fake values were not really required and took up a large amount of space. For applications that created large one dimensional array datasets, the disk space taken by these fake values almost doubled the size of the HDF file. In order to omit the fake values, a new version of dimension vdata was implemented. The new version uses the same structure as the old version. The only differences are that the vdata has only 1 record with a value of and that the vdata's class is "DimVal0.1", to distinguish it from the old version. No change was made in unlimited dimensions. Functions added to support this are: - SDsetdimval_comp -- sets backward compatibility mode for a dimension. The default mode is compatible in HDF4.0r1, and will be incompatible in HDF4.1. See the man page of SDsetdimval_comp(3) for detail. - SDisdimval_bwcomp(dimid) -- gets the backward compatibility mode of a dimension. See the man page of SDisdimval_bwcomp(3) for detail. 7. Reading CDF files With HDF 4.0 limited support for reading CDF files was added to the library. This support is still somewhat in the development stage and is therefore limited. To begin with, unlike the netCDF merger, the CDF API is not supported. Rather, the SD and netCDF APIs can be used to access information pulled out of CDF files. The type of files supported are limited to CDF 2.X files. The header information is different between CDF 1.X and 2.X files. In addition, all of the files must be stored as single-file CDFs in network encoding. If there is user demand, and support, the types of CDF files that are readable may be increased in the future. 8. Parallel I/O interface on CM5 An extension using the parallel IO in CM5 has been added to the SDS interface. Initial tests have resulted in about 25 MBytes/second IO throughput using the SDA (Scalable Disk Array) file system. The library provides interfaces for both C* and CMF programming languages. The ABOUT_4.0.alpha file has more information concerning this. Users will find some examples in the directory mfhdf/CM5/Examples. The parallel I/O interface stores scientific datasets in external files. New options have been added to hdfls and hdfpack to handle them. A new utility, hdfunpac, was created for external files handling, too. 9. Support for SGI Power Challenge running IRIX6.1 Power Challenge is now supported, both in the native 64-bit and the 32-bit objects modes. Note that the Power Challenge native 64 bits objects use 64 bits long integers. Users should be careful when using the netcdf interface. They should declare their variables as "nclong", not "long". 10. Multi-file Annotation Interface (ANxxx) The multi-file annotation Interface is for accessing file labels and descriptions, and object labels and descriptions. It allows users to keep open more than one file at a time, and to access more than one annotation at a time. It also allows multiple labels and multiple descriptions to be applied to an HDF object or HDF file. 11. Multi-file Raster Image (GRxxx) interface The new Generic Raster (GR) interface provides a set of functions for manipulating raster images of all kinds. This interface allows users to keep open more than one file at a time, and to "attach" more than one raster image at a time. It supports a general framework for attributes within the RIS data-model, allowing 'name = value' style metadata. It allows access to subsamples and subsets of images. The GRreqlutil and GRreqimageil functions allow for different methods of interlacing images in memory. The images are interlaced in memory only, and are actually written to disk in "pixel" interlacing. 12. Compression for HDF SDS Two new compression functions have been added to the SD interface for HDF 4.0: SDsetcompress and SDsetnbitdataset. SDsetcompress allows users to compress a scientific dataset using any of several compression methods. Initially three schemes, RLE encoding, an adaptive Huffman compression algorithm, and gzip 'deflation' compression are available. SDsetnbitdataset allows for storing a scientific dataset using integers whose size is any number of bits between 1 and 32 (instead of being restricted to 8, 16 or 32-bit sizes). Access to the data stored in an n-bit data item is transparent to the calling program. The ABOUT_4.0.alpha file has an in-depth description concerning this ("n-bit SDS" listed under Item 2). 13. External Path Handling New functions have been added to allow applications to specify directories to create or search for external files. - HXsetcreatedir (hxscdir for FORTRAN) - HXsetdir (hxsdir for FORTRAN) 14. I/O performance improvement HDF 4.0 unofficially supports file page buffering. With HDF 4.1 it will be officially supported. The file page buffering allows the file to be mapped to user memory on a per page basis i.e. a memory pool of the file. With regards to the file system, page sizes can be allocated based on the file system page-size or in a multiple of the file system page-size. This allows for fewer pages to be managed as well as accommodating the user's file usage pattern. See the top level INSTALL file and the release_notes/page_buf.txt file for creating the library with this support and using it. 15. Improvement in memory usage and general optimizations Considerable effort was put into this release (since the b2 release) to reduce the amount of memory used per file and by the library in general. In general terms, we believe that the library should have at least half as large of a memory footprint during the course of its execution and is more frugal about allocating large chunks of memory. Much time was spent optimizing the low-level HDF routines for this release to be faster than they have been in the past also. Applications which make use of files with many (1000+) datasets should notice significant improvements in execution speed. 16. In hdf/ there are two files for FORTRAN programs to include the values and functions defined in HDF. They were originally named as constant.i and dffunct.i. The extension .i caused problems on some machines since *.i is used by cc as an "intermediate" file produced by the cpp preprocessor. In HDF 4.0 dffunct.i has been changed to dffunct.inc, and constant.i has been changed to hdf.inc. Users' existing FORTRAN application programs need to make the corresponding changes, if they include the .i files, in order to compile with HDF4.0. 17. Limits file A new file, limits.txt, has been added to the ftp server. It is aimed at HDF applications programmers and defines the upper bounds of HDF 4.0. This information is also found in the hdf/src/hlimits.h file. Refer to the ABOUT_4.0.alpha for historical information concerning this. 18. Pablo available HDF4.0 supports creating an instrumented version of the HDF library(libdf-inst.a). This library, along with the Pablo performance data capture libraries, can be used to gather data about I/O behavior and procedure execution times. See the top level INSTALL file and the hdf/pablo/README.Pablo file for further information. 19. Support for the IBM SP-2 The HDF library has been ported to run in a single SP2 node. It does not support the parallel or distributed computing for multiple SP-2 nodes yet. 20. Miscellaneous fixes - To avoid conflicts with C++, internal structures' fields which were named 'new' have been renamed. - The maximum number of fields in a vdata now is decided by VSFIELDMAX. - The platform number subclass problem when an external data file was in Little_endian format has been fixed. - Unlimited dimension was not handled correctly by the HDF3.3r4 FORTRAN interface. This problem has been fixed in HDF4.0r1. Changes to utilities: -------------------- o hdf/util/ristosds Ristosds now converts several raster images into a 3D uint8, instead of float32, SDS. o hdf/util/hdfls New options have been added to support the parallel I/O interface on CM5. o hdf/util/hdfpack New options have been added to support the parallel I/O interface on CM5. o hdf/util/hdfunpac This is a new utility for external file handling for the parallel I/O interface on CM5. o mfhdf/dumper/hdp Hdp is a new command line utility designed for quick display of contents and data objects. It can list the contents of hdf files at various levels with different details. It can also dump the data of one or more specific objects in the file. See hdp.txt in the release notes for more information. Known Problems: -------------- o On the IRIX4 platform, fp2hdf creates float32 and float64 values incorrectly. o On the SP2, the hdp command gives a false message of "Failure to allocate space" when the hdf file has no annotations to list. o On the C90, hdfed fails inconsistently when opening hdf files more than once in the same hdfed session. o Currently there is a problem in re-writing data in the middle of compressed objects. o VMS gives an error on the test for Little Endian float64. o If the external element test in hdf/test/testhdf fails and there is no subdirectory "testdir" in hdf/test/, create one via "mkdir" and run the test again. (The "testdir" should have been created by "make". But the "make" in some old systems does not support the creation commands.)