Data Conversion Of Arithmetic Data Types

                                    Quincey Koziol and Raymond Lu

                                                April 14, 2005

                                    Revised on July 12, 2005

 

I.       Introduction

 

This document addresses the HDF5 library’s design and behaviors of data conversion between arithmetic data types.  This document is mainly for the HDF5 application users.  It can be also useful for the HDF5 library developers.

 

In this document, the arithmetic data types refer to both integers and floating-point numbers.  The integers include all the library’s predefined integers and any user-defined integers.  The library’s predefined integers include standard, Intel-specific, Alpha-specific, MIPS-specific, ANSI C9x-specific, and native data types.  The HDF5 Predefined Datatypes section in the HDF5 Reference Manual lists all these predefined data types.    

 

For the convenience of discussion in this document, we repeat all the possible native integers here,

 

                        C types                                    HDF5 types

            char                    H5T_NATIVE_CHAR

            signed char             H5T_NATIVE_SCHAR

            unsigned char           H5T_NATIVE_UCHAR

            short                   H5T_NATIVE_SHORT

            unsigned short          H5T_NATIVE_USHORT

            int                     H5T_NATIVE_INT

            unsigned int            H5T_NATIVE_UINT

            long                    H5T_NATIVE_LONG

            unsigned long           H5T_NATIVE_ULONG

            long long               H5T_NATIVE_LLONG

            unsigned long long      H5T_NATIVE_ULLONG

 

The floating-point numbers include all the library’s predefined floating-point types and any user-defined types.  The HDF5 Reference Manual lists all the predefined types like IEEE, Alpha-specific, MIPS-specific, and native floating-point data types.     

 

Possible native floating-point types are repeated below,

 

                        C types                                    HDF5 types

                        float                   H5T_NATIVE_FLOAT

            double                  H5T_NATIVE_DOUBLE

            long double             H5T_NATIVE_LDOUBLE

 

For the library, data conversion happens in two scenarios.  One is when transferring data between memory and disk through the executions of H5Dwrite() or H5Dread(); another is when converting data in memory through H5Tconvert().  In either case, if the source and destination data types are different, there will be data conversion.

 

II.    Hard and Soft Conversions

 

The HDF5 library has two ways of converting data for a given pair of different data types, hard and soft conversions. 

 

1.      Hard vs. Soft Conversions

 

A hard conversion is basically a casting done by a compiler, like int a = (int)b, where b is declared as float type.  In contrary, a soft conversion is done by the HDF5 library’s own conversion functions, where the bit sequence of the source data are examined and converted into the bit sequence of the destination data.  The soft conversions tend to be more rigid although this method is slower than the hard conversions because of all the bit operations during the conversions.

 

During the development of the library, the term hardware conversion and compiler conversion have been used to refer to the hard conversion.  The terms software conversion and library conversion have been used for the soft conversion.  These terms may be seen in other documents and the library’s source code.

 

Internally, the library maintains a list of soft conversion functions for each pair of source and destination data type classes.  A data type class is the category to which a data type belongs.  For example, data type H5T_NATIVE_INT and H5T_NATIVE_LONG are in the H5T_INTEGER class.  Therefore, soft conversion is designed to handle any data types in a class including library predefined and user-defined data types.   

 

The library also maintains a table of hard conversion functions for each pair of source and destination data types.  Function H5T_conv_int_float() is a hard conversion function.  The library’s data type conversion path is this table of hard conversion functions, i.e., hard conversion is always picked first by the library over soft conversion.  Hard conversion can only handle native data types because compilers would not recognize any non-native data types.

 

So keep this in mind: soft conversion is for data type classes while hard conversion is for data types.  The library’s default conversion between native data types is hard conversion.   

 

2.      Registration and Un-registration of Conversion Functions

 

Users can register their own conversion functions to the library through the function H5Tregister().  These conversion functions can be either soft or hard conversion functions.

 

When a soft conversion function is registered into the library through function H5Tregister(), it is appended to the list of soft conversion functions.  It also goes into the table of hard conversion functions to replace all the conversion functions which it can apply to.  For example, if a new soft conversion function conv_integer_fp() which converts any integer to any floating-point number is registered into the soft list, all hard conversion functions from any integer to any floating-point types will be replaced by this soft function.  All library’s conversion paths from integer to floating-point number are updated to this function. 

 

When a hard conversion function is registered into the library, it will go to the table of hard functions and replace existing hard conversion function.  For example, if a new function conv_int_float() which converts data of int type to float type is registered as a hard function, it will replace library’s existing conversion function H5T_conv_int_float().  The library will use the conv_int_float() to convert data from int to float type if hard conversion is selected.

 

On the other hand, if a user wants to un-register some conversion functions, he or she can use the function H5Tunregister().  This function has the same parameters as H5Tregister(). But all of those parameters are optional.  The missed parameters will become “wild cards”, which are used to generalize the criteria.  For example, if a user wants to disable all hard conversions to use soft conversions, he or she can simply un-register all hard conversions by calling

 

            H5Tunregister(H5T_PERS_HARD, NULL, -1, -1, NULL);     

 

3.      Handling Incorrect Hard Conversions

 

While developing data conversion of the HDF5 library, some incorrect hard conversions have been discovered.  Those problems are mainly from compilers’ incorrect casting.  We need a good way to handle these incorrect conversions instead of giving users corrupted data.

 

Library’s way to handle incorrect hard conversion is to register no hard conversion function when problematic conversions are detected during configuration.  In this way, the library’s conversion path will be the library’s soft conversion function unless users register their own conversion function.

 

To find out whether the library is using a hard or soft conversion routine for certain pair of source and destination data types, the function H5Tis_hard() can be used (a proposed reference manual for this new function can be found in the Appendix of this document).  To find out the conversion routine that the library is using for certain pair of data types, the function H5Tfind() should be used.

 

The table below lists the library’s conversions using soft routines on some systems and the reason of choosing the soft conversions.  All the other conversions not listed here are hard conversions.

 

source and destination data types

system

reason

floating-point to floating-point number

all Crays

compiler does not support denormalized values.

all integers to long double

all SGIs

compiler gives some incorrect conversion

unsigned (long) long to floating-point number

all SGIs

compiler gives some incorrect conversion

64-bit Solaris

compiler does different rounding

unsigned long long to floating-point number

Windows Visual Studio 6

 

long double to all integers

all SGIs

compiler does some incorrect conversion

HP-UX 11.00

Compiler generates floating exception

floating-point to unsigned long long

PGI compiler

compiler round-up when the fraction part is greater than 0.5

 

 

III. Handling Exception

 

1.      How to handle exceptions

 

The library has provided the users with the ability to handle exceptions during data conversion.  Through the property list function H5Pset_type_conv_cb(), user’s callback function can be registered with the library.  This gives users the control over data values whenever an exception happens.

 

The following piece of code shows how to register an exception callback function except_func() to the library,

 

        if(H5Pset_type_conv_cb(dxpl_id, except_func, &fill_value)<0)

            goto error;

 

        if(H5Pget_type_conv_cb(dxpl_id, &op, &user_data)<0)

            goto error;

 

        if(op != except_func || *(int*)user_data != fill_value)

            goto error;

 

It also uses H5Pget_type_conv_cb() to verify that the callback function has been registered successfully.  The library define the prototype of the conversion exception callback to be

 

typedef H5T_conv_ret_t (H5Z_conv_except_func_t) (int except_type, hid_t *src_id, hid_t *dst_id, void *src_buf, void *dst_buf, void *op_data)

 

So somewhere in the code, the function except_func() is defined as

 

H5T_conv_ret_t

except_func(int except_type, hid_t src_id, hid_t dst_id, void *src_buf, void *dst_buf, void *user_data)

{

    H5T_conv_ret_t      ret = H5T_CONV_HANDLED;

 

    if(except_type == H5T_CONV_EXCEPT_RANGE_HI)

        /*only test integer case*/

        *(int*)dst_buf = *(int*)user_data;

    else if(except_type == H5T_CONV_EXCEPT_RANGE_LOW)

        /*only test integer case*/

        *(int*)dst_buf = *(int*)user_data;

    else if(except_type == H5T_CONV_EXCEPT_TRUNCATE)

        ret = H5T_CONV_UNHANDLED;

    else if(except_type == H5T_CONV_EXCEPT_PRECISION)

        ret = H5T_CONV_UNHANDLED;

    else if(except_type == H5T_CONV_EXCEPT_PINF)

        /*only test integer case*/

        *(int*)dst_buf = *(int*)user_data;

    else if(except_type == H5T_CONV_EXCEPT_NINF)

        /*only test integer case*/

        *(int*)dst_buf = *(int*)user_data;

    else if(except_type == H5T_CONV_EXCEPT_NAN)

        /*only test integer case*/

        *(int*)dst_buf = *(int*)user_data;

 

    return ret;

}

 

This example only handles the cases in which the destination data type is integer.  The source data type can be either integer or floating-point number. 

 

2.      Cases of Exceptions

 

A number of exceptions may happen during conversion.  These exceptions are

H5T_CONV_EXCEPT_RANGE_HI :  source value is positive and is too big to the destination.  Overflow happens.                                                                                                                                                                                          

H5T_CONV_EXCEPT_RANGE_LOW: source value is negative and its magnitude is too big to the destination.  Overflow happens.

H5T_CONV_EXCEPT_TRUNCATE: source is floating-point type and destination is integer.  The floating-point number has fractional part.

H5T_CONV_EXCEPT_PRECISION: source is integer and destination is floating-point type.  The mantissa of floating-point type is not big enough to hold all the digits of the integer.

H5T_CONV_EXCEPT_PINF: source is floating-point type and the value is positive infinity.

H5T_CONV_EXCEPT_NINF: source is floating-point type and the value is negative infinity. 

H5T_CONV_EXCEPT_NAN: source is floating-point type and the value is NaN(not a number, including QNaN and SNaN).

 

Valid return values of the exception handling callback function are H5T_CONV_ABORT, H5T_CONV_UNHANDLED and H5T_CONV_HANDLED.  

 

IV.  Soft Data Conversions

 

This section is mainly for advanced users or library developers who want to know the library’s behavior in performing soft data conversion. 

 

1.      Understanding Bit Patterns of Arithmetic Data Types

 

In order to understand data conversion, it will be helpful for us to know about the bit patterns of arithmetic data types.  Integers generally have simple bit patterns.  Using the twos-complement notation, a signed integer of n bits in size will have a range from -2n-1 to 2n-1 – 1.  The high-order bit is the sign bit.  There are n-1 data bits.  For unsigned integers, the high-order bit becomes a data bit.  All the n bits are data bits.  So an unsigned integer of n bit in size has a range from 0 to 2n–1.  An example bit sequence of (signed) char of 1 byte long is like 10010111.  The high-order (leftmost) bit is set to 1, meaning the value is negative.  If the same bit sequence represents an unsigned char, the high-order bit becomes a data bit, making the value be 151.  Any implementation of C language has the ranges of the native integer types documented in the header file limits.h.  For example, the ranges of short type are SHRT_MAX = 32,767 and SHRT_MIN = -32,767.

 

The floating-point number representation is more complicated.  A more thorough description of IEEE standard floating-point numbers can be found in the IEEE Standard 754 document.  For IEEE standard floating-point numbers, there are three components for a floating-point number, the sign, the exponent, and the mantissa.  The diagram below shows the layouts of IEEE float and double types.

 

Type

Sign

Exponent

Mantissa

Bias

float

1[31]

8[30-23]

23[22-00]

127

double

1[63]

11[62-52]

52[51-00]

1023

 

The numbers are the size of each component.  The bit index is in the square brackets.  To calculate the true exponent value, the bias has to be subtracted from the value represented by the bits of exponent.  The mantissa represents the precision bits.  The leading bit has been implied.  When the true precision is calculated, this implicit bit will be restored.  Consider this bit sequence for float in little-endian order,

 

            Byte 3      byte 2      byte 1      byte 0

     11000011     11110000    00000000    00000000

 

The high-order (leftmost) bit is the sign bit.  It is set to indicate the number is negative.  The eight bits after the sign bit, 10000111 in byte 3 and 2, is the exponent.  The value of these eight bits is 135.  After subtracting the bias 127, the true exponent is 8.  The 23 bits after the exponent 1110000 00000000 00000000 in byte 2, 1, 0, is the mantissa.  After restoring the implicit leading bit and adding the radix, the mantissa becomes 1.1110000 00000000 00000000.  The value of this float number is 1.111 x 28 = 111100000.0 = 480.0.

 

There are a few special values for floating-point numbers,

 

Denormalized – when exponent bits are all 0s but mantissa bits are non-zero.  There will be no implicit bit for the mantissa.

 

Zero – when exponent and mantissa bits are all set to 0s.  There can be both +0 and -0.

 

Infinity – when exponent bits are all 1s and mantissa bits are all 0s.  There can be both positive and negative infinities.

 

NaN(Not a Number) – when exponent bits are all 1s and mantissa bits are not all 0s.  NaN can be either positive or negative. 

 

For other predefined or used-defined types, they should be similar to IEEE standard.  There should be the sign, exponent, mantissa, and bias.  The bits of exponent or mantissa should be contiguous.  

             

2.      Between Integer and Integer Types

 

Generally, converting from one integer type to another should result in the same mathematical value.  There are some cases that overflow may happen.  If overflow happens, the library will let user’s exception handling function to handle if this function is available.  The two exceptions relating to this kind of conversion are H5T_CONV_EXCEPT_RANGE_HI and H5T_CONV_EXCEPT_RANGE_LOW.  Otherwise, the library’s default way is to assign maximal or minimal value to the destination type.  To the library, the maximal value is that all data bit of an integer is set to 1s.  The minimal value is that all data bit of an unsigned integer is set to 0s, or that only the sign bit of a signed integer is set to 1.

 

The following table lists all possible scenarios when overflow can happen and the values assigned to the destination.

 

Source type

Destination type

When source value may be out of the range of destination and overflow happens

Value assigned to destination when overflows

unsigned

unsigned

source size > destination size

maximum

signed

unsigned

source data bit size > destination data bit size

maximum

source value < 0

0

unsigned

signed

source data bit size > destination data bit size

maximum

signed

signed

source value < 0; source size > destination size

minimum

source value > 0;

Source size > destination size

maximum

  

 

3.      From Integer To Floating-point Number

 

When the library converts integer to floating-point number, the result should be equal to the original value except two cases.  One is when the mantissa of floating-point is not big enough to hold all the digits of integer, there will be some precision loss.  In this case, an exception of H5T_CONV_EXCEPT_PRECISION will be returned if user’s exception handling has been registered with the library.  Otherwise, the library will round up or round down the source integer to the closest floating-point number.

 

Another case is when the integer value is beyond the range of floating-point number, overflow happens.  The exception H5T_CONV_EXCEPT_RANGE_HI or H5T_CONV_EXCEPT_RANGE_LOW is returned to user’s exception handling function.  If user’s exception handling function is absent, the library will assign the value of positive or negative infinity to the destination.  However, this case does not happen often because floating-point numbers have broad ranges.            

 

4.      From Floating-point Number To Integer

 

The conversion from floating-point number to integer results the same value if the floating-point number does not have fractional part.  If the fractional part is non-zero, the library will return an exception of H5T_CONV_EXCEPT_TRUNCATE to the user’s exception handling function.  If this function is absent, the library will discard the fractional part.  The conversion from floating-point number to integer usually involves truncating of the fractional part.

 

Because floating-point numbers normally have greater ranges than integers, overflow may happen.  The library returns the exceptions of H5T_CONV_EXCEPT_RANGE_HI and H5T_CONV_EXCEPT_RANGE_LOW to the user’s exception handling function.  If such function is absent, the library will set the maximal (set all data bits to 1s) or minimal (set all data bits to 0s for unsigned integer; set only the sign bit to 1 for signed integer) values for the integer.  This is similar to conversion between integer and integer.

 

Floating-point numbers have some special values.  These values are +/-0, +/-infinity, and NaN.  For +/-0, the library simply converts them to 0 for the integer.  For +/-infinity, the exception of  H5T_CONV_EXCEPT_PINF or H5T_CONV_EXCEPT_NINF is returned to the user’s exception handling function.  If no such function, the library will assign the maximal or minimal values to the integer.  For NaN, the library returns the exception H5T_CONV_EXCEPT_NAN.  The library’s default way is to assign the value 0 to the integer for NaN.      

 

5.      Between Floating-point and Floating-point Numbers

 

The conversion between two floating-point numbers involves more issues to consider.  Converting from a smaller floating-point number like float to a bigger type like double should result in the same value.  Converting from a bigger type like double to a smaller type like float has three problems to be taken care of.  One is that if the source value is within the range of the destination, there can be some precision loss because the source mantissa is bigger than the destination mantissa.  The library will do rounding to make the result be closest to the original value.  Another one is that if the source value is beyond the range of the destination, overflow happens.  The library will return the exception of H5T_CONV_EXCEPT_RANGE_HI or H5T_CONV_EXCEPT_RANGE_LOW to user’s exception handling function.  If no such a function is present, the library will assign infinity to the destination.  The third issue is that if the source value is very small, the library will try to denormalize the destination.  If it is still too small for the destination, underflow happens.  The library simply assigns 0 to the destination.

 

For the special values of floating-point numbers, +/-0, +/-infinity, and NaN, the library’s default way is to assign the same special values to the destination.  But if the user’s exception handling function is present, the library will returns exception to it.  For positive infinity, the exception is H5T_CONV_EXCEPT_PINF; for the negative infinity, the exception is H5T_CONV_EXCEPT_NINF; for the Not a Number, the exception is H5T_CONV_EXCEPT_NAN.

 

V.     Hard Data Conversions

 

Because the details are handled by compilers, the library has little control over hard conversion.  The library’s control is mainly on overflow.  When the source value is beyond the ranges of the destination, overflow happens.  Just like the soft conversion, the library will signal H5T_CONV_EXCEPT_RANGE_HI or H5T_CONV_EXCEPT_RANGE_LOW to user’s exception handling function.  If no such a function is present, the library will assign maximal or minimal values to the destination.  These maximal or minimal values are found in the C library’s header file limits.h for integers or floats.h for floating-point numbers. 

 

For integers, these values can be different from the maximal and minimal values of soft conversion because they are defined by specific C library implementation.  For example, the INT_MAX (maximal int) can be defined as 32,767 and the INT_MIN (minimal int) can be defined as -32,767. 

 

For floating-point numbers, the soft conversion sets the destination to positive or negative infinity when overflow happens.  The hard conversion will set to the maximal values found in C’s header file floats.h.  For example, the FLT_MAX(maximal float) can be defined as 10+37 and the minimal value is simply –FLT_MAX.  

 

VI.  Summary

 

Generally speaking, during data conversion, the HDF5 library tries to convert the original value to the same mathematical value, or close to the original value if the same value is not possible.

 

The way that HDF5 library deals with overflow and underflow may not be the same as some C implementations.  For the special values of floating-point numbers, the library may behave differently from some C implementations, too.

 

 

Appendix:

 

Name: H5Tis_hard

Signature:

herr_t H5Tis_hard(hid_t src_id, hid_t dst_id)

Purpose:

Check whether the library’s default conversion is hard conversion.

Description:

H5Tis_hard finds out whether the library’s conversion function from type src_id to type dst_id is a hard conversion.  A hard conversion uses compiler’s casting; a soft conversion uses the library’s own conversion function.

Parameters:

hid_t src_id

IN: Identifier for the source datatype.

hid_t dst_id

IN: Identifier for the destination datatype.

Returns:

Returns TRUE for hard conversion, FALSE for soft conversion.

Fortran90 Interface:

None.