NCSA HDF Specification and DeveloperÕs Guide HDF Software Overview 2-1 National Center for Supercomputing Applications November 8, 1993 2-1 November 8, 1993 2-1 Chapter 2 Software Overview Chapter Overview This chapter describes the HDF software organization and provides guidelines for writing HDF software. HDF is an amalgam of code and functionality from many sources. For example, the netCDF code came from the Unidata Program Center, and data compression and conversion software has been acquired from a variety of third parties. NCSA staff wrote the code for the basic HDF functionality and perfomed all of the integration work. This document contains specifications for the NCSA- developed code and functionality. It does not include specifications for code or functionality from non-NCSA sources, though it does sometimes refer to specifications provided by other sources. Only the HDF interface to such code is specified in this document. HDF Software Layers Ê There are three basic levels of HDF software: ¥ The HDF low level interface ¥ The HDF application interfaces ¥ HDF applications and utilities The lowest layer, the low level interface, includes general purpose routines that form the basis of all higher-level HDF development. The low level routines directly execute functions such as file I/O, error handling, memory management, and physical storage. The application interfaces support higher level views of data and provide the interfaces for building user-level applications. Routines to handle raster images, palettes, annotations, scientific data sets, Vdatas and netCDF appear at this level. The applications and utilities are implemented at the highest level. NCSA utilities, NCSA applications, and third party applications are all implemented at this level. The utilities perform general functions, such as listing the contents of an HDF file, and more specialized functions, such as converting data from one HDF data type to another (e.g., raster images to scientific data sets). In general, the utilities have simple command line interfaces and perform data management tasks. The applications usually perform data analysis tasks and have polished interactive user interfaces. They include the NCSA Visualization Tool Suite, commercial software packages that use HDF, and other packages created at NCSA and by various third party projects. HDF-based software comes in four basic forms: ¥ The HDF interface library ¥ HDF command line utilities ¥ HDF-based software tools ¥ User programs that store and retrieve data in HDF files The HDF interface library includes general purpose routines that form the basis of all higher-level HDF development and application interfaces that support higher level views of data. The HDF command line utilities are distributed with the HDF library. They range from general purpose utilities, such as listing the contents of an HDF file, to special purpose utilities, such as converting data between HDF data types (e.g., raster images to scientific data sets). In general, the utilities have simple command line interfaces and perform data management tasks. In contrast, HDF-based software tools usually perform data analysis tasks and have polished interactive user interfaces. They include the NCSA Visualization Tool Suite and commercial software packages that use HDF. User programs access HDF files via calls to the HDF library. User programs are attached to the HDF library when they are compiled and linked. Figure 2.1 illustrates the layered HDF software implementation. The general purpose modules, which perform basic I/O, are at the lowest level. Interfaces for commonly used objects, such as 8-bit raster images (RIS8) and multidimensional arrays (SDS), appear at the next level. User programs, utilities, and software tools such as the NCSA visualization software are at the highest level. Figure 2.1 illustrates this layered implementation. Figure 2.1. HDF Software Layers 1 The general purpose interfaces are described in detail in this document. The application interfaces and command line utilities are described in the document NCSA HDF Calling Interfaces and Utilities for Versions 3.2 and earlier and in the NCSA HDF UserÕs Guide and NCSA HDF Reference Manual for Version 3.3. Other HDF-based software tools should have their own manuals. Since the NCSA user community writes programs primarily in C and FORTRAN, all of the HDF application interfaces developed at NCSA are callable from both C and FORTRAN programs. Since the general purpose interface is primarily for program development, not for applications, it provides C- callable routines only. Software Organization Versions and Release Numbers Since HDF is under continual development, new releases are periodically made available. Each new release of the HDF library is identified by a version number. The version number consists of three elements: majorv Major version number minorv Minor version number rn Release number The version number is presented in the following format: majorv.minorvrrn (e.g., Version 3.2r1) These elements are interpreted as follows: Major version number A new major version number is assigned when there is some fundamental difference between a new version of the library and the previous version. When a new major version is released, HDF users and developers are strongly encouraged to obtain the new source code and documentation. There will probably be added functionality in successive major versions of the library and some obsolete code may be deleted. Some user code may have to be modified to use the new library. Minor version number A new minor version number indicates an intermediate release between one major version and the next. Changes will probably be significant. When a new minor version is released, users and developers are strongly encouraged to obtain the new source code and documentation. Release number A new release number is assigned when bug fixes or other small modifications have been made. Using a new release of the same version of the library will not usually require modifying existing user code. ANSI C and Portability To ensure that HDF can be easily ported to new platforms, all versions of the HDF source code from Version 3.2 on will be written in ANSI standard C, with special provisions for non-ANSI compilers. For more information about porting HDF and writing portable HDF-based code, refer to ChapterÊ7, "Making HDF Portable." Modules and Interfaces The HDF distribution contains many source files or modules that can be grouped into families. For example, dfp.c, dfpf.c, and dfpff.f all share the root name dfp and, therefore, all belong to the dfp family. In general, each family of source modules represents one HDF applications interface; the dfp family represents the HDF Palette Interface. Exceptions to this rule will be discussed later in this section. For each interface, there is necessarily one file that contains the C code that provides the basic functionality of that interface. But some interfaces may have one or two additional code modules that provide FORTRAN callability for the interface, so families may have one, two, or three files: 1 file Modules of this sort are generally not calling interfaces themselves; they provide useful support functions for actual calling interfaces. Since they are not meant to be called by any routine outside the HDF library, they do not need to be FORTRAN- callable. Example: hblocks.c is called only by internal HDF routines and has only the C-callable interface. 2 files Although there are currently no two-file families, it is conceivable (and desirable) that some future interface will need only one extra source module to provide FORTRAN compatibility. If this were to happen, there would only be two source modules for the interface. Example: dfnew.c and dfnewf.c would make up the New Interface. 3 files Most current implementations of FORTRAN-callable HDF interfaces require that character string arguments be passed to some of their functions. Due to differences in the way C and FORTRAN represent strings, passing strings requires that there be a small amount of special purpose FORTRAN code written for each function that takes a string argument. Therefore, most FORTRAN-callable HDF interfaces consist of three source modules: ¥ The primary C module ¥ A FORTRAN-callable C module ¥ A FORTRAN module Example: dfsd.c, dfsdf.c, and dfsdff.f make up the Scientific Data Set Interface. dfsd.c contains the basic functionality of the interface. dfsdf.c provides the major part of FORTRAN callability. And dfsdff.f contains the special purpose FORTRAN code that enables passing character string arguments. Header Files In addition to the source code modules discussed above, some interfaces also have C header files associated with them that are meant to be included by C applications programmers with the #include preprocessor directive. They contain useful constants and data structures for interaction with the interface from C programs. The header files can be identified by the same name as the root name for the rest of the family with the .h extension. For example, dfsd.h is the header file for the Scientific Data Set Interface. Of particular importance among the C header files are hdf.h and hdfi.h: hdf.h Contains all the symbolic constants and public data structures required by HDF. hdf.h should be included by any program that uses any of these constants or data structures. hdfi.h Contains specific portability information about each platform on which HDF is supported. hdfi.h is automatically included in programs when hdf.h is included, so programmers need not explicitly include it. Refer to Chapter 7, ÒMaking HDF Portable,Ó for more information on hdfi.h and other portability issues. By way of illustration, Table 2.1 lists selected families of source code modules and header files from of HDF VersionÊ3.3. Table 2.1 Sample HDF Version 3.3 Source Code Modules General headers General purpose Grouping (non-Vset) Utilities Annota- tions General rasters Scientif c data sets Vsets hdf.h hdfi.h hproto.h dfivms.h hfile.c hfilef.c hfileff.f hkit.c hblocks.c hextelt.c herr.c herrf.c hfile.h herr.h dfgroup.c dfgroup.h dfutil.c dfutilf.c dfutilff.f dfutil.h dfan.c dfanf.c dfanff.f dfan.h dfgr.c dfgr.h dfcomp.c dfimcomp.c dfrig.h dfsd.c dfsdf.c dfsdff.f dfsd.h vg.c vgf.c vgff.f vfp.c vgi.h vio.c vconv.c vparse.c vrw.c vsfld.c vg.h vproto.h General headers General purpose Grouping (non-Vset) Utilities Vsets Old general purpose hdf.h hdfi.h hproto.h dfivms.h hfile.c hfilef.c hfileff.f hkit.c hblocks.c hextelt.c herr.c herrf.c hfile.h herr.h dfgroup.c dfgroup.h dfutil.c dfutilf.c dfutilff.f dfutil.h vg.c vgf.c vgff.f vfp.c vgi.h vio.c vconv.c vparse.c vrw.c vsfld.c vg.h vproto.h dfstubs.c dff.c dfff.f df.h dfi.h dfstubs.h 8- and 24- bit rasters General rasters Palettes Scientific data sets Annota- tions Special FORTRAN dfr8.c dfr8f.c dfr8ff.f df24.c df24f.c df24ff.f dfgr.c dfgr.h dfcomp.c dfimcomp.c dfrig.h dfp.c dfpf.c dfpff.f dfsd.c dfsdf.c dfsdff.f dfsd.h dfan.c dfanf.c dfanff.f dfan.h constants.f functions.f The HDF Test Suite In addition to the source code for the HDF library, versions 3.2 and higher include a test suite. There are two test modules: one for C and one for FORTRAN. Each module tests all of the routines in all of the application interfaces and in the general purpose interface. The exact form of these test modules may vary from one release to the next; consult the release code and online test documentation for details. Every effort has been made to ensure that the test programs provide a thorough and accurate assessment of the health of the HDF library. Although the test suite will greatly improve the reliability of HDF code, it is almost inevitable that some parts of the code will remain untested. Therefore, no guarantees can be made on the basis of test suite performance. Sample HDF Programs Each HDF release includes several sample programs to help users write HDF programs. They illustrate some of the common techniques employed by HDF programmers. Some HDF Conventions The HDF specification described in the previous chapter is not sufficient to guarantee its success. It is also important that HDF programmers and users adhere to certain conventions. Some guidelines are implicit in the discussions in other sections of this document. Others are presented in the document NCSA HDF Calling Interfaces and Utilities (for Versions 3.2 and earlier) or in the NCSA HDF UserÕs Guide and NCSA HDF Reference Manual (for Version 3.3). Guidelines not covered elsewhere are introduced in this section. Naming and Assigning Tags Tags that are to be made available to a general population of HDF users should be assigned and controlled by NCSA. Tags of this type are given numbers in the range 1 to 32,767. If you have an application that fits this criterion, contact NCSA at the address listed in the front matter at the beginning of this manual and specify the tags you would like. For each tag, your specifications should include a suggested name, information about the type and structure of the data that the tag will refer to, and information about how the tag will be used. Your specifications should be similar to those contained in Chapter 6, ÒTag Specifications.Ó NCSA will assign a set of tags for your application and will include your tag descriptions in the HDF documentation. Tags in the range 32,768 to 64,999 are user-definable. That is, you can assign them for any private application. If you use tags in this range, be aware that they may conflict with other people's private tags. Using Reference Numbers to Organize Data Objects Note: Users are discouraged from assigning any meaning to reference numbers beyond that imparted by the HDF library. The HDF library itself uses reference numbers solely to distinguish among objects with the same tag. While application programmers may find it convenient to impart some meaning to reference numbers, they should be forewarned that the HDF library will be ignorant of any such meaning. In other words, any meaning attached to reference numbers exists only at the application program or software tool level, not within the HDF library. Although there is nothing to prevent you from doing so, you are discouraged from assigning any meaning to reference numbers other than that which the HDF library imparts. A tag/ref provides a unique identifier for any HDF object within a file and can be used for a keyed access mechanism. One can build a table of tag/refs and use those tag/refs as primary keys providing random access to HDF objects. Reference numbers can also be used to impose an order on HDF objects. Once again, the reference number assignment scheme does not guarantee any order, so caution is advised. Multiple References Multiple references to a single data element are quite common in HDF. The general purpose routine Hdupdd generates a new reference to data that is already pointed to by another DD. If Hdupdd is used several times, there may be several DDs that point to the same data element. It is important to note that when a multiply-referenced data element is deleted or moved, the various DDs that previously pointed to the data element are not automatically deleted or adjusted to point to the data element in its new location. Consequently, each DD to be deleted or moved should be checked for multiple references and handled appropriately. 1 This is a simplified illustration of the HDF software layers. Though the basic principles illustrated here continue to apply, the introduction of netCDF and multiple-file HDF data structures renders the implementation considerably more complex.