Allowing Users to Access HDF5’s ID System
Nat Furrer and James Laird
HDF5 has an ID interface to store data that is referenced by a single ID number, essentially providing the functionality of a struct in C. This is implemented in H5I.c, H5Ipublic.h, H5Iprivate.h, and H5Ipkg.h.
These IDs are organized by “group”—that is, by the type of object they point to (Datatype, File, Attribute, etc.). While the IDs can be used outside the library, they can only be created inside the library; valid ID groups are enumerated in H5Ipublic.h and cannot be altered without actually recompiling the library. The majority of functions that access IDs are also private; users can get the type of an ID, but cannot access any of the information stored in it.
Currently, each ID is 32 bits long, with 5 bits devoted to the “group” and 26 bits devoted to the object ID within that group (one bit is only used to mark invalid IDs). The strength of this system is that finding an ID’s group is extremely fast, since the group is stored as part of the ID itself.
Since group numbers are enumerated in the source code, they are always equal to the same value and can be referenced as constants. Each group has a hash table containing the data to which the IDs point.
It would be useful to extend this same ID system to store information about user-defined objects without sacrificing the current system’s speed (since IDs are accessed very frequently) and without having to change the functions that are currently used throughout the HDF5 library. Thus, high-level APIs could be designed that would be able to use the functionality of structs while still being compatible with FORTRAN.
We can accomplish this by registering group numbers at run-time instead of hardcoding them. If we also expand the current 32-bit IDs to 64-bit IDs, we will have room to store a much larger number of groups, which will be necessary to accommodate additional groups registered by users at run-time.
A number of functions must be added to allow users outside the library to register new groups and use IDs in those groups, most of which are simply public wrappers for existing private functions. The H5I_register_group function must now keep track of the number of groups allocated and search for available group numbers (from groups that were allocated and deleted) if the maximum number of groups have already been allocated. Some minor changes must also be made to the existing H5I code to accommodate an increasing number of valid groups, including storing the number of groups that have been allocated in a static variable.
The library-wide constants that refer to group numbers (H5I_DATATYPE, H5I_FILE, etc.) must become variables, since we no longer know ahead of time what numbers they will be assigned. A new function, H5I_register_group, must be called when they are initialized instead of H5I_init_group. The only other significant change is that the group numbers can no longer be used as cases in switch statements, since they are no longer constants (and a number of existing switch statements must be changed to if-else statements).
As proposed, these new 64-bit IDs have 10 bits for group numbers and 53 bits for IDs within groups, so we do not anticipate running out of either groups or IDs in the foreseeable future. Allowing ample room for expansion seems more important to us than the memory that would be saved by keeping the size of the hid_t type to 32 bits, which would require limiting the number of user-defined groups to 16 or reducing the number of available IDs.
Since a number of formerly-private functions now have public wrappers, it is possible for users to extract information from internal library structures or even destroy groups of IDs that the library is still using. As long as users use only group numbers they register themselves, this should not be a problem.
It is true that the creation of new groups slows down considerably once the maximum number of groups has been allocated and the system must search for groups that have been deleted. This is also an issue with the current implementation of IDs, and increasing the number of available groups and IDs (by increasing the size of an hid_t as we have done) is the simplest solution to this issue.
The only other issues we foresee from this change are those resulting from the change from a 32- to a 64-bit hid_t , primarily the additional memory use. We believe that this change will be inevitable eventually as users take advantage of this functionality to create their own groups and IDs.
Functions added to H5I.c:
H5Iregister_group (public wrapper for H5I_register_group):
This function finds the next free number for an ID “group,” then creates that group with a call to H5I_init_group, passing along its arguments. If fewer than the maximum number of groups have been allocated, it simply allocates the next group number. Otherwise, it searches through the valid group numbers to find a free group ID. It fails if there are no available group numbers. On success, it returns the number of the newly-created group.
H5I_type_t H5Iregister_group(size_t hash_size, unsigned reserved, H5I_free_t free_func);
H5I_type_t H5I_register_group(size_t hash_size, unsigned reserved, H5I_free_t free_func);
Other wrapper functions:
All of these functions call the corresponding private functions, descriptions of which can be found in H5I.c. The only private function which is not supported with a public wrapper is H5Iobject, which provides the same functionality as H5Iobject_verify without verifying the ID’s group (this provides an extra level of error-checking for the system’s use). H5Idestroy_group frees a group’s number so that it can be used again by H5I_register_group.
hid_t H5Iregister(H5I_type_t grp, void *object);
void *H5Iremove(hid_t id);
void *H5Isearch(H5I_type_t grp, H5I_search_func_t func, void *key);
void *H5Iobject_verify(hid_t id, H5I_type_t id_type);
int H5Inmembers(H5I_type_t grp);
herr_t H5Iclear_group(H5I_type_t grp, hbool_t force);
herr_t H5Idestroy_group(H5I_type_t grp);