Data
Conversion Of Arithmetic Data Types
Quincey Koziol and Raymond Lu
Revised
on
I. Introduction
This document addresses the HDF5 library’s design and behaviors of data conversion between arithmetic data types. This document is mainly for the HDF5 application users. It can be also useful for the HDF5 library developers.
In this document, the arithmetic data types refer to both integers and floatingpoint numbers. The integers include all the library’s predefined integers and any userdefined integers. The library’s predefined integers include standard, Intelspecific, Alphaspecific, MIPSspecific, ANSI C9xspecific, and native data types. The HDF5 Predefined Datatypes section in the HDF5 Reference Manual lists all these predefined data types.
For the convenience of discussion in this document, we repeat all the possible native integers here,
C types HDF5 types
char H5T_NATIVE_CHAR
signed char H5T_NATIVE_SCHAR
unsigned char H5T_NATIVE_UCHAR
short H5T_NATIVE_SHORT
unsigned
short H5T_NATIVE_USHORT
int H5T_NATIVE_INT
unsigned int H5T_NATIVE_UINT
long H5T_NATIVE_LONG
unsigned long H5T_NATIVE_ULONG
long long H5T_NATIVE_LLONG
unsigned long
long H5T_NATIVE_ULLONG
The floatingpoint numbers include all the library’s predefined floatingpoint types and any userdefined types. The HDF5 Reference Manual lists all the predefined types like IEEE, Alphaspecific, MIPSspecific, and native floatingpoint data types.
Possible native floatingpoint types are repeated below,
C types HDF5 types
float H5T_NATIVE_FLOAT
double H5T_NATIVE_DOUBLE
long double H5T_NATIVE_LDOUBLE
For the library, data conversion happens in two scenarios. One is when transferring data between memory and disk through the executions of H5Dwrite() or H5Dread(); another is when converting data in memory through H5Tconvert(). In either case, if the source and destination data types are different, there will be data conversion.
II. Hard and Soft Conversions
The HDF5 library has two ways of converting data for a given
pair of different data types, hard and soft conversions.
1. Hard vs. Soft Conversions
A hard conversion is basically a casting done by a compiler, like int a = (int)b, where b is declared as float type. In contrary, a soft conversion is done by the HDF5 library’s own conversion functions, where the bit sequence of the source data are examined and converted into the bit sequence of the destination data. The soft conversions tend to be more rigid although this method is slower than the hard conversions because of all the bit operations during the conversions.
During the development of the library, the term hardware conversion and compiler conversion have been used to refer to the hard conversion. The terms software conversion and library conversion have been used for the soft conversion. These terms may be seen in other documents and the library’s source code.
Internally, the library maintains a list of soft conversion functions for each pair of source and destination data type classes. A data type class is the category to which a data type belongs. For example, data type H5T_NATIVE_INT and H5T_NATIVE_LONG are in the H5T_INTEGER class. Therefore, soft conversion is designed to handle any data types in a class including library predefined and userdefined data types.
The library also maintains a table of hard conversion functions for each pair of source and destination data types. Function H5T_conv_int_float() is a hard conversion function. The library’s data type conversion path is this table of hard conversion functions, i.e., hard conversion is always picked first by the library over soft conversion. Hard conversion can only handle native data types because compilers would not recognize any nonnative data types.
So keep this in mind: soft conversion is for data type classes while hard conversion is for data types. The library’s default conversion between native data types is hard conversion.
2. Registration and Unregistration of Conversion Functions
Users can register their own conversion functions to the library through the function H5Tregister(). These conversion functions can be either soft or hard conversion functions.
When a soft conversion function is registered into the library through function H5Tregister(), it is appended to the list of soft conversion functions. It also goes into the table of hard conversion functions to replace all the conversion functions which it can apply to. For example, if a new soft conversion function conv_integer_fp() which converts any integer to any floatingpoint number is registered into the soft list, all hard conversion functions from any integer to any floatingpoint types will be replaced by this soft function. All library’s conversion paths from integer to floatingpoint number are updated to this function.
When a hard conversion function is registered into the library, it will go to the table of hard functions and replace existing hard conversion function. For example, if a new function conv_int_float() which converts data of int type to float type is registered as a hard function, it will replace library’s existing conversion function H5T_conv_int_float(). The library will use the conv_int_float() to convert data from int to float type if hard conversion is selected.
On the other hand, if a user wants to unregister some conversion functions, he or she can use the function H5Tunregister(). This function has the same parameters as H5Tregister(). But all of those parameters are optional. The missed parameters will become “wild cards”, which are used to generalize the criteria. For example, if a user wants to disable all hard conversions to use soft conversions, he or she can simply unregister all hard conversions by calling
H5Tunregister
(H5T_PERS_HARD, NULL, 1, 1, NULL);
3. Handling Incorrect Hard Conversions
While developing data conversion of the HDF5 library, some incorrect hard conversions have been discovered. Those problems are mainly from compilers’ incorrect casting. We need a good way to handle these incorrect conversions instead of giving users corrupted data.
Library’s way to handle incorrect hard conversion is to register no hard conversion function when problematic conversions are detected during configuration. In this way, the library’s conversion path will be the library’s soft conversion function unless users register their own conversion function.
To find out whether the library is using a hard or soft conversion routine for certain pair of source and destination data types, the function H5Tis_hard() can be used (a proposed reference manual for this new function can be found in the Appendix of this document). To find out the conversion routine that the library is using for certain pair of data types, the function H5Tfind() should be used.
The table below lists the library’s conversions using soft routines on some systems and the reason of choosing the soft conversions. All the other conversions not listed here are hard conversions.
source and
destination data types 
system 
reason 
floatingpoint to floatingpoint number 
all Crays 
compiler does not support denormalized values. 
all integers to long double 
all SGIs 
compiler gives some incorrect conversion 
unsigned (long) long to floatingpoint number 
all SGIs 
compiler gives some incorrect conversion 
64bit Solaris 
compiler does different rounding 

unsigned long long to floatingpoint number 
Windows Visual Studio 6 

long double to all integers 
all SGIs 
compiler does some incorrect conversion 
HPUX 11.00 
Compiler generates floating exception 

floatingpoint to unsigned long long 
PGI compiler 
compiler roundup when the fraction part is greater than 0.5 
III. Handling Exception
1. How to handle exceptions
The library has provided the users with the ability to handle exceptions during data conversion. Through the property list function H5Pset_type_conv_cb(), user’s callback function can be registered with the library. This gives users the control over data values whenever an exception happens.
The following piece of code shows how to register an exception callback function except_func() to the library,
if(H5Pset_type_conv_cb(dxpl_id, except_func, &fill_value)<0)
goto error;
if(H5Pget_type_conv_cb(dxpl_id, &op, &user_data)<0)
goto error;
if(op != except_func  *(int*)user_data != fill_value)
goto error;
It also uses H5Pget_type_conv_cb() to verify that the callback function has been registered successfully. The library define the prototype of the conversion exception callback to be
typedef
H5T_conv_ret_t (H5Z_conv_except_func_t
)
(int except_type
, hid_t
*src_id
,
hid_t *dst_id
, void *src_buf
,
void *dst_buf
,
void *op_data
)
So somewhere in the code, the function except_func() is defined as
H5T_conv_ret_t
except_func(int except_type,
hid_t src_id, hid_t dst_id, void *src_buf, void *dst_buf, void *user_data)
{
H5T_conv_ret_t ret =
H5T_CONV_HANDLED;
if(except_type ==
H5T_CONV_EXCEPT_RANGE_HI)
/*only
test integer case*/
*(int*)dst_buf = *(int*)user_data;
else if(except_type ==
H5T_CONV_EXCEPT_RANGE_LOW)
/*only
test integer case*/
*(int*)dst_buf = *(int*)user_data;
else if(except_type ==
H5T_CONV_EXCEPT_TRUNCATE)
ret = H5T_CONV_UNHANDLED;
else if(except_type ==
H5T_CONV_EXCEPT_PRECISION)
ret = H5T_CONV_UNHANDLED;
else if(except_type ==
H5T_CONV_EXCEPT_PINF)
/*only
test integer case*/
*(int*)dst_buf = *(int*)user_data;
else if(except_type ==
H5T_CONV_EXCEPT_NINF)
/*only
test integer case*/
*(int*)dst_buf = *(int*)user_data;
else if(except_type ==
H5T_CONV_EXCEPT_NAN)
/*only
test integer case*/
*(int*)dst_buf = *(int*)user_data;
return ret;
}
This example only handles the cases in which the destination data type is integer. The source data type can be either integer or floatingpoint number.
2. Cases of Exceptions
A number of exceptions may happen during conversion. These exceptions are
H5T_CONV_EXCEPT_RANGE_HI : source value is positive and is too big to the destination. Overflow happens.
H5T_CONV_EXCEPT_RANGE_LOW: source value is negative and its magnitude is too big to the destination. Overflow happens.
H5T_CONV_EXCEPT_TRUNCATE: source is floatingpoint type and destination is integer. The floatingpoint number has fractional part.
H5T_CONV_EXCEPT_PRECISION: source is integer and destination is floatingpoint type. The mantissa of floatingpoint type is not big enough to hold all the digits of the integer.
H5T_CONV_EXCEPT_PINF: source is floatingpoint type and the value is positive infinity.
H5T_CONV_EXCEPT_NINF: source is floatingpoint type and the value is negative infinity.
H5T_CONV_EXCEPT_NAN: source is floatingpoint
type and the value is
Valid return values of the exception handling callback
function are H5T_CONV_ABORT
,
H5T_CONV_UNHANDLED
and H5T_CONV_HANDLED
.
IV. Soft Data Conversions
This section is mainly for advanced users or library developers who want to know the library’s behavior in performing soft data conversion.
1. Understanding Bit Patterns of Arithmetic Data Types
In order to understand data conversion, it will be helpful for us to know about the bit patterns of arithmetic data types. Integers generally have simple bit patterns. Using the twoscomplement notation, a signed integer of n bits in size will have a range from 2^{n1} to 2^{n1} – 1. The highorder bit is the sign bit. There are n1 data bits. For unsigned integers, the highorder bit becomes a data bit. All the n bits are data bits. So an unsigned integer of n bit in size has a range from 0 to 2^{n}–1. An example bit sequence of (signed) char of 1 byte long is like 10010111. The highorder (leftmost) bit is set to 1, meaning the value is negative. If the same bit sequence represents an unsigned char, the highorder bit becomes a data bit, making the value be 151. Any implementation of C language has the ranges of the native integer types documented in the header file limits.h. For example, the ranges of short type are SHRT_MAX = 32,767 and SHRT_MIN = 32,767.
The floatingpoint number representation is more complicated. A more thorough description of IEEE standard floatingpoint numbers can be found in the IEEE Standard 754 document. For IEEE standard floatingpoint numbers, there are three components for a floatingpoint number, the sign, the exponent, and the mantissa. The diagram below shows the layouts of IEEE float and double types.
Type 
Sign 
Exponent 
Mantissa 
Bias 
float 
1[31] 
8[3023] 
23[2200] 
127 
double 
1[63] 
11[6252] 
52[5100] 
1023 
The numbers are the size of each component. The bit index is in the square brackets. To calculate the true exponent value, the bias has to be subtracted from the value represented by the bits of exponent. The mantissa represents the precision bits. The leading bit has been implied. When the true precision is calculated, this implicit bit will be restored. Consider this bit sequence for float in littleendian order,
Byte 3 byte 2 byte 1 byte 0
11000011 11110000 00000000 00000000
The highorder (leftmost) bit is the sign bit. It is set to indicate the number is negative. The eight bits after the sign bit, 10000111 in byte 3 and 2, is the exponent. The value of these eight bits is 135. After subtracting the bias 127, the true exponent is 8. The 23 bits after the exponent 1110000 00000000 00000000 in byte 2, 1, 0, is the mantissa. After restoring the implicit leading bit and adding the radix, the mantissa becomes 1.1110000 00000000 00000000. The value of this float number is 1.111 x 2^{8} = 111100000.0 = 480.0.
There are a few special values for floatingpoint numbers,
Denormalized – when exponent bits are all 0s but mantissa bits are nonzero. There will be no implicit bit for the mantissa.
Zero – when exponent and mantissa bits are all set to 0s. There can be both +0 and 0.
Infinity – when exponent bits are all 1s and mantissa bits are all 0s. There can be both positive and negative infinities.
For other predefined or useddefined types, they should be similar to IEEE standard. There should be the sign, exponent, mantissa, and bias. The bits of exponent or mantissa should be contiguous.
2. Between Integer and Integer Types
Generally, converting from one integer type to another should result in the same mathematical value. There are some cases that overflow may happen. If overflow happens, the library will let user’s exception handling function to handle if this function is available. The two exceptions relating to this kind of conversion are H5T_CONV_EXCEPT_RANGE_HI and H5T_CONV_EXCEPT_RANGE_LOW. Otherwise, the library’s default way is to assign maximal or minimal value to the destination type. To the library, the maximal value is that all data bit of an integer is set to 1s. The minimal value is that all data bit of an unsigned integer is set to 0s, or that only the sign bit of a signed integer is set to 1.
The following table lists all possible scenarios when overflow can happen and the values assigned to the destination.
Source type 
Destination type 
When source value
may be out of the range of destination and overflow happens 
Value assigned to
destination when overflows 
unsigned 
unsigned 
source size > destination size 
maximum 
signed 
unsigned 
source data bit size > destination data bit size 
maximum 
source value < 0 
0 

unsigned 
signed 
source data bit size > destination data bit size 
maximum 
signed 
signed 
source value < 0; source size > destination size 
minimum 
source value > 0; Source size > destination size 
maximum 
3. From Integer To Floatingpoint Number
When the library converts integer to floatingpoint number, the result should be equal to the original value except two cases. One is when the mantissa of floatingpoint is not big enough to hold all the digits of integer, there will be some precision loss. In this case, an exception of H5T_CONV_EXCEPT_PRECISION will be returned if user’s exception handling has been registered with the library. Otherwise, the library will round up or round down the source integer to the closest floatingpoint number.
Another case is when the integer value is beyond the range of floatingpoint number, overflow happens. The exception H5T_CONV_EXCEPT_RANGE_HI or H5T_CONV_EXCEPT_RANGE_LOW is returned to user’s exception handling function. If user’s exception handling function is absent, the library will assign the value of positive or negative infinity to the destination. However, this case does not happen often because floatingpoint numbers have broad ranges.
4. From Floatingpoint Number To Integer
The conversion from floatingpoint number to integer results the same value if the floatingpoint number does not have fractional part. If the fractional part is nonzero, the library will return an exception of H5T_CONV_EXCEPT_TRUNCATE to the user’s exception handling function. If this function is absent, the library will discard the fractional part. The conversion from floatingpoint number to integer usually involves truncating of the fractional part.
Because floatingpoint numbers normally have greater ranges than integers, overflow may happen. The library returns the exceptions of H5T_CONV_EXCEPT_RANGE_HI and H5T_CONV_EXCEPT_RANGE_LOW to the user’s exception handling function. If such function is absent, the library will set the maximal (set all data bits to 1s) or minimal (set all data bits to 0s for unsigned integer; set only the sign bit to 1 for signed integer) values for the integer. This is similar to conversion between integer and integer.
Floatingpoint numbers have some special values. These values are +/0, +/infinity, and
5. Between Floatingpoint and Floatingpoint Numbers
The conversion between two floatingpoint numbers involves more issues to consider. Converting from a smaller floatingpoint number like float to a bigger type like double should result in the same value. Converting from a bigger type like double to a smaller type like float has three problems to be taken care of. One is that if the source value is within the range of the destination, there can be some precision loss because the source mantissa is bigger than the destination mantissa. The library will do rounding to make the result be closest to the original value. Another one is that if the source value is beyond the range of the destination, overflow happens. The library will return the exception of H5T_CONV_EXCEPT_RANGE_HI or H5T_CONV_EXCEPT_RANGE_LOW to user’s exception handling function. If no such a function is present, the library will assign infinity to the destination. The third issue is that if the source value is very small, the library will try to denormalize the destination. If it is still too small for the destination, underflow happens. The library simply assigns 0 to the destination.
For the special values of floatingpoint numbers, +/0,
+/infinity, and
V. Hard Data Conversions
Because the details are handled by compilers, the library has little control over hard conversion. The library’s control is mainly on overflow. When the source value is beyond the ranges of the destination, overflow happens. Just like the soft conversion, the library will signal H5T_CONV_EXCEPT_RANGE_HI or H5T_CONV_EXCEPT_RANGE_LOW to user’s exception handling function. If no such a function is present, the library will assign maximal or minimal values to the destination. These maximal or minimal values are found in the C library’s header file limits.h for integers or floats.h for floatingpoint numbers.
For integers, these values can be different from the maximal and minimal values of soft conversion because they are defined by specific C library implementation. For example, the INT_MAX (maximal int) can be defined as 32,767 and the INT_MIN (minimal int) can be defined as 32,767.
For floatingpoint numbers, the soft conversion sets the
destination to positive or negative infinity when overflow happens. The hard conversion will set to the maximal
values found in C’s header file floats.h. For example, the FLT_MAX(maximal float) can be defined
as 10^{+37} and the minimal value is simply –FLT_MAX.
VI. Summary
Generally speaking, during data conversion, the HDF5 library tries to convert the original value to the same mathematical value, or close to the original value if the same value is not possible.
The way that HDF5 library deals with overflow and underflow may not be the same as some C implementations. For the special values of floatingpoint numbers, the library may behave differently from some C implementations, too.
Appendix:
Name: H5Tis_hard
Signature:
herr_t
H5Tis_hard(
hid_t src_id
, hid_t
dst_id
)
Purpose:
Check whether the library’s default conversion is hard conversion.
Description:
H5Tis_hard
finds out whether the library’s conversion function from type src_id
to type dst_id
is a hard conversion. A hard conversion
uses compiler’s casting; a soft conversion uses the library’s own conversion
function.
Parameters:
hid_t 
IN: Identifier for the source datatype. 
hid_t 
IN: Identifier for the destination datatype. 
Returns:
Returns TRUE for hard conversion, FALSE for soft conversion.
Fortran90 Interface:
None.