NCSA HDF Specification and DeveloperÕs Guide

		Portability Issues

7-1	National Center for Supercomputing Applications

November 8, 1993	7-1

                                                            
November 8, 1993	7-1


Chapter 7	Portability Issues


Chapter Overview

The NCSA implementation of HDF is accessible to both C 
and FORTRAN programs and is implemented on many 
different machines and several operating systems.  There are 
important differences between C and FORTRAN, and among 
implementations of each language, especially FORTRAN. 
There are also important differences among the machines 
and operating systems that HDF supports.  

If HDF is to be a portable tool, these differences must be 
constructively addressed.  This chapter describes many of 
these differences, discusses the problems and issues 
associated with them, and presents the methods employed in 
the HDF implementation to reduce their impact.

The HDF Environment

The list of machines and operating systems on which HDF is 
implemented is steadily growing.  For reasons that this 
chapter will make clear, the number of NCSA-supported HDF 
platforms is growing slowly.  Every time a platform is added, 
additional code must be written to address concerns of 
memory management, operating system and file system 
differences, number representations, and differences in 
FORTRAN and C implementations on that system.  


Supported Platforms 
As of this writing, NCSA supports the platforms listed in 
Table 7.1.

.c.Table 7.1	NCSA-supported HDF Platforms
Hardware Platform	Operating System
Convex	Concentrix
Cray X-MP, Y-MP, Cray 2	UNICOS
DEC Alpha	Ultrix
DECStation	Ultrix
HP 9000	HPUX
IBM PC	MS DOS, Windows 3.1
IBM RS/6000	AIX
IBM RT	UNIX
Macintosh	MPW Shell
NeXT	NeXTStep
Silicon Graphics	UNIX
Sun Sparc	UNIX
Vax	VMS

HDF has also been ported to several platforms that NCSA 
does not currently support.  These include Alliant, Apollo 
(Domain), HP 3000,  Stellar, Amiga, Symbolics, Fujitsu, and 
IBM 3090 (MVS).


Language Standards
Unfortunately, not all compilers are the same.  FORTRAN 
compilers often differ in the ways they pass parameters, in 
the identifier naming conventions they employ, and in the 
number types that they support.  Similarly, though generally 
not as drastically, C compilers differ in the number types that 
they support and in their adherence to the ANSI C standard.  

To minimize the difficulties caused by these differences, the 
HDF source code is written primarily in the following dialects:
¥	FORTRAN 77
¥	ANSI C
¥	The original C defined by Kernighan and Ritchie1, 
hereafter referred to as old C
Almost all platforms have C and FORTRAN compilers that 
adhere to at least one of these standards.  

When time and resources permit, NCSA attempts to support 
features or variations in other dialects of C and FORTRAN, 
particularly on platforms that are important to NCSA users.  
Much of the remainder of this chapter addresses these efforts.


Guidelines
One cannot over stress the importance of following the 
guidelines outlined in this chapter.  It may take longer to 
write code and it may be difficult to adapt your coding style, 
but the long-term benefits, in terms of portability and 
maintenance costs, will be well worth the effort. 


Organization of Source Files

Three types of files appear in the HDF source code directory:
¥	Header files
¥	Source code files
¥	A makefile
Header files  and source code files are organized by 
application area.  All of the functions that apply to a 
particular application area are stored in three source files, 
and all the definitions and declarations that apply to that 
application are stored in a corresponding header file.  The 
makefile describes the dependencies among the source and 
header files and provides the commands required to compile 
the corresponding libraries and utilities.


Header Files
Certain application modules require header files.  The header 
file dfan.h, for example, contains definitions and declarations 
that are unique to the annotation interface.

There are also several general header files that are used in 
compiling the libraries for all application areas:
hdf.h, hdfi.h2
hdf.h contains declarations and definitions for the 
common data structures used throughout HDF, 
definitions of the HDF tags, definitions of error numbers, 
and definitions and declarations specific to the general 
purpose interface.  Since hdf.h depends on hdfi.h, it 
includes hdfi.h via #include.
	hdfi.h contains information specific to the various NCSA-
supported HDF computing environments, environmental 
parameters that need to be set to particular values when 
compiling the HDF libraries, and machine dependent 
definitions of such things as number types and macros for 
reading and writing numbers.  
	When porting HDF to a new system, only hdfi.h and the 
makefile should need to be modified, though there may 
be exceptions.  
	It is normally a good idea to include hdf.h  (and 
therefore indirectly hdfi.h) in user programs, though 
users usually need not be aware of its contents.
hproto.h
This file contains ANSI C prototypes for all HDF C 
routines.  It must be included in ANSI C programs that 
call HDF routines. 
constants.i
This file is for use in FORTRAN programs.  It contains 
important constants, such as tag values, that are defined 
in hdf.h.  Systems with FORTRAN preprocessors might 
be able to include this file via #include statements or 
their equivalent.
dffunc.i
This file is for use in FORTRAN programs.  It contains 
declarations of all HDF FORTRAN-callable functions.  
Systems with FORTRAN preprocessors might be able to 
include this file via #include statements or their 
equivalent. 


Source  Code Files
All HDF operations are performed by routines written in C.  
Hence, even FORTRAN calls to HDF result in calls to the 
corresponding C routines.  Because of the problems described 
below the relationships between the C routines and the 
corresponding FORTRAN routines can be confusing.  This 
section discusses the C and FORTRAN source file 
organization.  It is followed by discussions of problems users 
will face in the FORTRANÐC interface.

HDF interfaces typically  have three or four associated files.  
For example, the scientific data set (SDS) interface is 
associated with the following files: dfsd.h, dfsd.c, dfsdf.c, 
and dfsdff.f.

These files fill the following roles:
Header files
The *.h files are header files.
Normal C routines
These routines do the actual HDF work.  The others 
are used to transfer control and data from a 
FORTRAN environment to a C environment.  
	These routines are in the *.c files, as in dfsd.c.  
Every call to HDF, whether from C or FORTRAN, 
ultimately results in a call to one of these routines.
C routines that are directly callable from FORTRAN
These routines provide recognizable function names 
to the linker.  They may also perform operations on 
data they receive from the FORTRAN routines that 
call them, such as transferring a FORTRAN string to 
a local C data area.  Examples are provided below. 
	These routines are in the *f.c files, such as dfsdf.c.  
The f means that the routines can be called from 
FORTRAN; the .c means that they are C source 
code.  
FORTRAN routines that perform some operation on the 
parameters that C would be  unable to perform, before 
and/or 
after calling the corresponding C routine
These routines are required, for example, when one of 
the parameters is a string. The corresponding C 
routine has no way of knowing the length of the string 
unless it is explicitly given the length by the 
FORTRAN routine.

These routines are in the *ff.f files, such as 
dfsdff.f.  The ff means that the routines perform 
some FORTRAN operation that C cannot perform 
and that they are to be called from FORTRAN; the 
.f means that they are FORTRAN source code.

The roles of these different types of source file types will 
become clearer as we look at some of the problems that arise 
in interfacing C and many different implementations of 
FORTRAN.

File naming 
conventions
The naming conventions for HDF library source code files are 
complicated by several factors.  Because HDF must 
accommodate a wide variety of platforms, all files that will 
compile to object modules must have names that are unique 
in the first 8 characters, ignoring case.  The difficulties 
involved in maintaining a FORTRAN-callable interface to a 
library that is primarily written in C further complicate the 
naming of source code files.


Passing Strings Between FORTRAN and C

One of the most important differences between FORTRAN 
and C compilers is in the way strings are represented.  
Different compilers use different data structures for strings, 
and supply string length information in different ways.  


Passing Strings from 
FORTRAN to C
When strings are passed between FORTRAN and C routines, 
they may need to be converted from one representation to the 
other.  C compilers store strings in an array of type char, 
terminated by a null byte (\0).  The name of a string variable 
is equivalent to a pointer the first character in the string.  
FORTRAN compilers are not consistent in the ways that they 
store strings.

Two pieces of information must be acquired before FORTRAN 
can pass a string to C: 
The stringÕs length
The stringÕs address

The stringÕs length is determined by invoking the standard 
FORTRAN function len(), which returns the length of a 
string.  Since C expects a null byte at the end of a string, care 
must be taken that this null byte does not overwrite useful 
information in the FORTRAN string.

Determining the stringÕs address is more difficult because of 
the different ways that different FORTRAN implementations 
store strings.  The macro _fcdtocp (FORTRAN character 
descriptor to C pointer) is used to acquire this information.  
_fcdtocp is one of the elements that must be customized for 
each platform.  The following paragraphs discuss several 
existing customized implementations:

¥	UNICOS FORTRAN stores strings in a structure called 
_fcd (FORTRAN character descriptor).  _fcdtocp is a 
built-in UNICOS function that returns the stringÕs 
address.  (Since UNICOS provides this function, HDF 
omits the corresponding macro definition on UNICOS 
systems.)

¥	VMS FORTRAN uses a string descriptor structure that 
provides the stringÕs address and length.  When compiled 
under VMS, _fcdtocp extracts the string's address from 
that structure. 

¥	Most other FORTRAN compilers supported by HDF store 
strings just as C does, in character arrays with the array 
name identifying the array's address.  In such situations, 
nothing special needs to be done to pass a string from 
FORTRAN to C, except to add a NULL byte..

An HDF FORTRAN call that involves passing a string results 
in the following sequences of actions:

1.	A FORTRAN filter routine determines the length and 
address in memory of the string.  Since this filter is a 
FORTRAN routine, it can be found in the appropriate 
*ff.f file.

2.	The FORTRAN filter then calls a C routine, to which it 
passes all parameters from the initial call the string's 
length.

3.	The C routine converts the FORTRAN string to a C string 
by copying it to a C array of type char and appending a 
null byte.  Since this C routine serves as a link between a 
FORTRAN filter and the corresponding C interface call, it  
can be found in the appropriate *f.c file.

4.	This C routine then calls the HDF C routine that performs 
the actual work.

This process is illustrated in Figure 7.1

Figure 7.1.  Sequence of Events When a FORTRAN Call Includes a String as a Parameter

                                                             
Passing Strings from C 
to FORTRAN
When strings are passed from C to FORTRAN, the reverse 
procedure is followed.  First, a string pointer is allocated 
within the FORTRAN routine's data area.  (It is assumed 
that the space pointed to has already been allocated, and is 
sufficiently large to hold the string.)   The string is then 
copied from the C data area to the FORTRAN data area.  
Finally, the FORTRAN string's data area is padded with 
blanks, if necessary.


Function Return Values between FORTRAN and C

When a FORTRAN routine calls a C function, it always 
expects a return value from that function.  Unfortunately, C 
functions do not always return arguments in a FORTRAN-
compatible format.  

To solve this problem, some FORTRAN compilers offer the 
option of controlling the form of the return value from a 
function.  For example, Language Systems FORTRAN for the 
Macintosh requires that all C function declarations be 
prepended by the word pascal so that the return value can 
be recognized by a FORTRAN routine that calls it, as in:

pascal int dsgrang(void *pmax, void *pmin)

Since C always expects return values to be passed by value 
rather than, say, by reference, it is important to coerce 
FORTRAN functions to do the same.  This is accomplished by 
defining a macro FRETVAL that is prepended to the 
declaration of every FORTRAN-callable C function.  For 
example:

    FRETVAL(int)
dsgrang(void *pmax, void *pmin)

If Language Systems FORTRAN is to be used, FRETVAL is 
defined in hdfi.h as follows:

#if defined(MAC)        /* with LS FORTRAN */
#   define FRETVAL(x)   pascal x
#endif


Differences in Routine Names

HDF generally employs standard C conventions in naming 
routines.  But many FORTRAN compilers impose varying 
restrictions on the length, character set, and form of 
identifiers, some of which are considerable more restrictive 
than the C conventions.  Therefore, an extra effort must be 
made to accommodate those FORTRAN compilers.

To address this issue, HDF defines a set of preprocessor flags 
in hdfi.h.  Then conditional compilation, with #ifdef 
statements in the source code , produces routine names that 
the target systemÕs FORTRAN will understand. 


Case  Sensitivity
C compilers are case sensitive; uppercase and lowercase letters 
are recognized as different characters.  Many FORTRAN 
compilers are not case sensitive; they allow users to use 
uppercase and lowercase letters while naming routines in the 
source code, but the names are converted to all uppercase or 
all lowercase in the object module symbol tables.  Routine 
name recognition problems are common when routines 
compiled by a case sensitive compiler are to be linked with 
routines compiled by a non-case sensitive compiler.

For example, the UNICOS FORTRAN compiler allows you to 
name routines without regard to case, but produces object 
module symbol tables with the routine names in all 
uppercase.  UNICOS C, on the other hand, performs no such 
conversion. 

Consider the HDF routine Hopen.  Hopen is written in C, so 
the HDF library symbol table contains the name Hopen.  
Suppose you make the following call in your UNICOS 
FORTRAN program:

file_id = Hopen('myfile', ...)

The FORTRAN compiler will create an object module symbol 
table with the routine name HOPEN.  When you link it to the 
HDF library, it will find Hopen but not HOPEN, and will 
generate an unsatisfied external reference error.

HDF supports the following non-case sensitive compilers:   
¥	VMS FORTRAN
¥	UNICOS FORTRAN
¥	Language Systems FORTRAN.
All of these compilers convert identifiers to all uppercase 
when building an object module symbol table.  In the 
following discussion, they are referred to as all-uppercase 
compilers.

The HDF Solution
HDF addresses the all-uppercase compiler problem in the 
platform-specific section of hdfi.h where the DF_CAPFNAMES 
flag is defined.  With conditional compilation, HDF generates 
all-uppercase routine names and symbol table entries.

Once again, consider UNICOS.  The UNICOS section of 
hdfi.h contains the following line:

#define DF_CAPFNAMES

The *f.c files contain corresponding conditional sections that  
produce all-uppercase routine names.  For example, the 
function name Fun can be redefined as FUN:

#ifdef DF_CAPFNAMES
   define  Fun  FUN
#endif /* DF_CAPFNAMES */


Appended Underscores
Differing compiler conventions create a similar problem in 
their use of the underscore ( _ ) character.  Many compilers, 
including most C compilers, prepend an underscore to all 
external symbols in the object module symbol table.  The 
linker then looks for external symbols in other symbol tables 
with the prefixed underscore.

Many FORTRAN compilers also append an underscore to 
identify external symbols.  Since C compilers do not generally 
do this, external references in FORTRAN-generated object 
modules will not recognize externals with the same names in 
C-generated modules.

For example, the FORTRAN compiler on the CONVEX 
system places an underscore both at the beginning and at the 
end of routine names, while the C compiler places an 
underscore only at the beginning.

Since FUN is a C function, it appears under the name _FUN in 
the object module containing it.  Now suppose you make the 
following call in a FORTRAN program:

x = FUN(y)

The FORTRAN compiler will create an object module symbol 
table  with the routine name _FUN_.  When you link it to the 
C module, the linker will be unable to link _FUN and _FUN_ 
and will generate an unsatisfied external reference error.


The HDF Solution
Like the all-uppercase compiler problem, this issue is 
resolved in the platform-specific sections of hdfi.h and with 
conditional sections of code that append an underscore to C 
routine names on platforms where the FORTRAN compiler 
expects it.

This is implemented as follows: The FNAME_POST_UNDERSCORE 
flag is defined in the platform-specific section of hdfi.h for 
every platform whose FORTRAN compiler requires appended 
underscores.  Similarly, the FNAME_PRE_UNDERSCORE flag is 
defined on platforms where the FORTRAN compiler expects 
prepended underscores.  The macro FNAME is then defined to 
append and/or prepend underscores as required.

The FNAME macro is then applied to each routine in the 
module in which it is actually defined (including in 
hptroto.h), adding the appropriate underscores.

Consider the above example in which Fun was renamed FUN.  
The actual definition appears as follows:

#ifdef DF_CAPFNAMES
   define  Fun  FNAME(FUN)
#endif /* DF_CAPFNAMES */


Short Names vs. Long 
Names
In the C implementations supported by HDF, identifiers may 
be any length with at least the first 31 characters being 
significant.  FORTRAN compilers differ in the maximum 
lengths of identifiers that they allow, but all of those 
supported by HDF allow identifiers to be at least seven 
characters long.

To deal with the discrepancies between identifier lengths 
allowed by C and those allowed by the various FORTRAN 
compilers, a set of equivalent short names has been created 
for use when programming in FORTRAN.  For every HDF 
routine with a name more than seven characters long, there 
is an identical routine whose name is seven or fewer 
characters long.

For example, the routines DFSDgetdims (in dfsd.c) and 
dsgdims (in dfsdff.f) are functionally identical.


Differences Between ANSI C and Old C

The current HDF release supports both ANSI C and oldÊC 
compilers.  ANSI C is preferred because it has many features 
that help ensure portability; unfortunately, many important 
platforms do not support full ANSI C.  The HDF code 
determines whether ANSI C is available from the flag 
__STDC__.  If ANSI C is available on a platform, then 
__STDC__ is defined by the compiler.3  

The most noticeable difference between ANSI C and old C is 
in the way functions are declared.  For example, in ANSI C 
the function DFSDsetdims() is declared with a single line:

int DFSDsetdims(intn rank, int32 dimsizes[])

In old C the same function is declared as follows:

int DFSDsetdims(rank, dimsizes)
intn rank;
int32  dimsizes[];

HDF accommodates these differences by defining the flag 
PROTOTYPE in hdfi.h.  PROTOTYPE is used for every function 
declaration in a manner similar to the following example:

#ifdef PROTOTYPE
int DFSDsetdims(intn rank, int32 dimsizes[])
#else
int DFSDsetdims(rank, dimsizes)
intn rank;
int32  dimsizes[];
#endif /* PROTOTYPE */

Note that prototypes are supported by some C compilers that 
are not otherwise ANSI-conformant.  In such situations, 
PROTOTYPE is defined even though __STDC__ is not.

Another difference between old C and ANSI C is that ANSI C 
supports function prototypes with arguments.  (Old C also 
supports function prototypes, but without the argument list.)  
, This feature helps in detecting errors in the number and 
types of arguments.  This difference is handled by means of a 
macro PROTO, which is defined as follows:

#ifdef PROTOTYPE
#define    PROTO(x) x
#else
#define    PROTO(x) ()
#endif

This macro is applied as in the following example:

extern int32 Hopen
PROTO((char *path, intn access, int16 ndds));

When PROTOTYPE is defined, PROTO causes the argument list 
to stay as it is.  When PROTOTYPE is not defined, PROTO 
causes the argument list to disappear.


Type Differences

Platforms and compilers also differ in the sizes of numbers 
that they assign to different data types, in their 
representations of different number types, and in the way 
they organize aggregates of numbers (especially structures).


Size differences
The same number type can be different sizes on different 
platforms.  The type int, for example, is 16 bits to many 
IBM PC compilers, 48 bits to some supercomputer compilers, 
and 32 bits on most others.  This can cause problems that 
are difficult to diagnose in code, like the HDF code, that 
depends in many places on numbers being the right size.

HDF handles this problem by fully defining all variable types 
and function data types via typedef, including the number of 
bits occupied.  All parameters, members of structures, and 
static, automatic, and external variables are so defined .  

The HDF data types include the following (types with the 
prefix u are unsigned.)

int8
uint8
int16
uint16
int32
uint32
float32
float64
intn
uintn

For each machine, typedefs are declared that map all of the 
data types used into the best available types.  For example, 
int32 is defined as follows for Sun's C compiler:

typedef long int int32;

Unfortunately, the HDF data types do not always map 
exactly to one of the native data types.  For example, the 
Cray UNICOS C compiler does not support a 16-bit data 
type.  In such instances, HDF uses the best available match 
and care is taken to minimize potential problems.

The data types intn and uintn are for situations where it 
can be determined that number type size is unimportant and 
that a 16-bit integer is large enough to hold any value the 
number can have.   In such cases, the native integer type (or 
unsigned integer type) of the host machine is used.  
Experience indicates that substantial performance gains can 
be achieved by using intn or uintn in certain circumstances.


Number Representation
One of the keys to producing a portable file format is to 
ensure that numbers that are represented differently on 
different machines are converted correctly when moved from 
machine to machine.  HDF provides conversion routines to 
convert between native representations and a standard 
representation that is actually used in the HDF file.  This 
ensures that HDF data will always be interpreted correctly, 
regardless of the platform on which it is read or written.  
Details of this process will be included in a later edition of 
this manual.


Byte-order and 
Structure 
Representations
Even when the basic bit-representation of constants or 
aggregates like structures is the same across platforms, the 
ways that the bits are packed into a word and the order in 
which the bits are laid out can differ.  For example, DEC and 
Intel-based machines generally order bytes differently from 
most others.  And the C compiler on a Cray, with a 64-bit 
word, packs structures differently from those on 32-bit word 
machines.

Differences in byte order among machines are handled in 
either of two ways.  When the data to be written (or read) 
includes non-integer data and/or a large array of any type of 
data, conversion routines mentioned in the previous section, 
ÒNumber Representation,Ó are invoked.  When an individual 
integer is to be written (or read), an ENCODE or DECODE macro 
is used. 

The following ENCODE and DECODE macros are available 
for 16-bit and 32-bit integers:

INT16ENCODE
UINT16ENCODE
INT32ENCODE
UINT32ENCODE
INT16DECODE
UINT16DECODE
INT32DECODE
UINT32DECODE

The ENCODE macros write integers to an HDF file in a 
standard format regardless of the word-size and byte order of 
the host machine.

Likewise, the DECODE macros read integers from a 
standard format in an HDF file and provide the integers in 
the required byte order and word size to the host machine.

Since the ENCODE and DECODE macros deal with both 
byte order and word size, they are also used in reading and 
writing record-like structures.  For example, an HDF data 
descriptor consists of two 16-bit fields followed by two 32-bit 
fields, as implied by the following C declaration:

struct {
	uint16 tag;
	uint16 ref;
	uint32 offset;
	uint32 length;
}

Even though this structure might occupy 12 bytes on one 
platform or 32 bytes on another (e.g., a Cray), it must occupy 
exactly 12 bytes in an HDF file.  Furthermore, some 
machines represent the numbers internally in different byte 
orders than others, but the byte order must always be big-
endian in an HDF file.  The ENCODE and DECODE macros 
ensure that these values are always represented correctly in 
HDF files and as presented to any host machine.


Access to Library Functions

Despite standardization efforts, function libraries often differ 
in significant ways.  At least three types of functions require 
special treatment in the HDF implementation:

File I/O
Some platforms use 16-bit values for the element size 
and the number of elements to write or read, while others 
use 32-bit values.  This must be considered when 
working with either stream or system level I/O functions 
(i.e., the functions associated with the fopen() and 
open() calls).  

Memory allocation and release
First, 16-bit machines use a 16-bit value to indicate the 
number of bytes to allocate or release at one time.  
Second, certain operating systems (notably MS Windows 
and MAC/OS) don't have malloc() and free() calls.  
These operating systems use handles for allocating 
memory and require different function calls.

Memory and string manipulation
These functions (e.g., memcpy(), memcmp(), strcpy(), and 
strlen()) require slightly different function names under 
different memory models in MS DOS and under MS 
Windows than on most other systems.

HDF accommodates these special situations by defining 
appropriate macros in the machine-specific sections of hdfi.h.
1	The version of C described in the first edition of The C Programming Language, by Brian Kernighan and Dennis Ritchie, 
published by Prentice-Hall.
2	In earlier implementations of HDF, these files were called df.h and dfi.h.  Starting with HDF Version 3.2, the general 
purpose layer of HDF was completely rewritten and all routine names were changed from df* to hdf*.
3	__STDC__ is generally defined by ANSI-conforming C compilers.  Some C compilers are not entirely ANSI-conforming, 
yet they conform well enough that the HDF implementation can treat them as if they were.  In such cases, it is permissible 
to define __STDC__ by adding the option -D__STDC__ to the cc line in the makefile.