Revising Numeric Overflows in HDF5

Quincey Koziol
koziol@ncsa.uiuc.edu
April 23, 2004

  1. Document's Audience:

  2. Background Reading:

    The HDF5 reference manual sections for H5Pset_type_conv_cb and H5Pget_type_conv_cb:
    H5Pset_type_conv_cb
    H5Pget_type_conv_cb
    (H5Tset_overflow and H5Tget_overflow have been removed from the HDF5 Library. This note inserted: January 2010.)
    This section of the HDF5 User’s Guide briefly mentions conversion overflows:
    Datatype Conversion and Selection
    The netCDF user guide section on type conversion:
    Type Conversion
    The netCDF user guide section on fill values:
    Set Fill Mode for Writes: nc_set_fill
  3. Introduction:

    What is this document about?
    This document describes some limitations of the current HDF5 datatype conversion overflow behavior, new requirements for the overflow behavior and describes a solution for covering those requirements.

    How does the overflow callback currently operate?

    The HDF5 library's current behavior when converting a value from one datatype to another is to issue a call to the "overflow" callback whenever a source value is outside of the range of values representable by the destination datatype. If the overflow callback is not set or is set, but indicates that it hasn't handled the overflow (by returning a negative value when called), the destination value is set to either the maximum or minimum destination value (depending if the source value was beyond the maximum or minimum destination value respectively). The development branch of the HDF5 library (v1.7.x) extends this behavior to it's support for int <-> float type conversions.

    For example, when converting from a signed 16-bit integer to an unsigned 16-bit integer, negative source values would trigger a call to the overflow callback, and if it didn't exist or handle the overflow, the minimum unsigned 16-bit integer (0) would be used as the destination value. Likewise, when converting from a signed 32-bit integer to a signed 16-bit integer, source values greater than 32767 or less than -32768 would trigger a call to the overflow callback and would be set to 32767 or -32768 (respectively) if the overflow callback didn't exist or didn't handle the exception.


    What is the prototype for the current conversion callback?

    Here's the prototype for the overflow callback routine:

    herr_t (*H5T_overflow_t)(hid_t src_id, hid_t dst_id, void *src_buf, void *dst_buf);


  4. New Requirements:

    There are several new requirements listed here, in order of decreasing importance:

  5. Suggested Changes:

    To accomodate the requirements above, changing the prototype of the "overflow" callback is necessary, as well as attaching it to a data transfer property list and changing the library's behavior for the return values of the callback. I would also suggest changing the name from an "overflow" callback to the more generic "type conversion exception callback".

    After accomodating the changes above, the prototype would look like this:

    H5T_conv_ret_t (*H5T_conv_except_func_t)(H5T_conv_except_t except_type,
            hid_t src_id, hid_t dst_id, void *src_buf, void *dst_buf, void *user_data);
        

    The parameters are as follows:

    The H5T_conv_except_t enumerated type has the following values:

    The return value from conversion callbacks is H5T_conv_ret_t, a new enumerated type with the following values:

    Additionally, the H5Tset_overflow and H5Tget_overflow routines would be retired in favor of the following routines:




  6. Compatibility With Version 1.6:

    As we try to switch the API from H5Tset(get)_overflow to H5Pset(get)_type_conv_cb, we may encounter backward compatibility issue. We have a few options here:

    Option 1
    Simply discard the old API functions(H5Tset(get)_overflow). This is an easy solution. Hopefully, there are not many major users having these functions in their program. Those failing programs have to migrate to the new design with functions H5Pset(get)_type_conv_cb.

    Option 2
    Keep the old API functions for some time during the transition. User programs do not have to adopt the new functions. In case someone happens to set both of the old and new callback functions, we give the new one green light to proceed.


    Option 3
    Keep the names of the old API functions but return error messages when they are executed. Just remind users there are new API functions available.

    This issue is like what we face whenever we try to change API functions. We should do it according our API change policy if we have one in hand.
  7. Discussion:

    With the above changes to the library, the datatype conversion code will be able to fulfill all the new requirements listed. Supporting the new type conversion exception callback and detecting the new exception types (loss of precision and truncation) will require some additional coding, but not a huge amount. Adding support for detecting the new exception types will make the int <-> float conversions slightly slower and more complex though.

    If fill value is needed when conversion exception happens, it can be passed in through the user data(a void pointer) in the callback or can be made a global variable. To find the fill value is user's responsibility. They can retrieve it from the dataset transfer property list easily.


Last modified: 20 January 2012. Links updated and note added that (H5Tset_overflow and H5Tget_overflow have been removed from HDF5.