Reserved File Address Space

Note: this tech note is no longer accurate for HDF5 1.8. While users can still adjust the size of HDF5 addresses, HDF5 no longer uses lazy allocations due to the consequences of doing so in parallel when using the new metadata cache. -JML 2006-09-28

HDF5 files use 8-byte addresses by default, but users can change this to 2, 4, or even 16 bytes. This means that it is possible to have files that only address 64 KB of space, and thus that HDF must handle the case of files that have enough space on disk but not enough internal address space to be written.

Thus, every time space is allocated in a file, HDF needs to check that this allocation is within the file’s address space. If not, HDF should output an error and ensure that all the data currently in the file (everything that is still addressable) is successfully written to disk.

Unfortunately, some structures are stored in memory and do not allocate space for themselves until the file is actually flushed to disk (object headers and the local heap). This is good for efficiency, since these structures can grow without creating the fragmentation that would result from frequent allocation and deallocation, but means that if the library runs out of addressable space while allocating memory, these structures will not be present in the file. Without them, HDF5 does not know how to parse the data in the file, rendering it unreadable.

Thus, HDF keeps track of the space “reserved for allocation” in the file (H5FD_t struct). When a function tries to allocate space in the file, it first checks that the allocation would not overflow the address space, taking the reserved space into account. When object headers or the heap finally allocate the space they have reserved, they free the reserved space before allocating file space.

A given object header is only flushed to disk once, but the heap can be flushed to disk multiple times over the life of the file and will require contiguous space every time. To handle this, the heap keeps track of how much space it has reserved. This allows it to reserve space only when it grows (when it is dirty and needs to be re-written to disk).

For instance, if the heap is flushed to disk, it frees its reserved space. If new data is inserted into the heap in memory, the heap may need to flush to disk again in a new, larger section of memory. Thus, not only does it reserve space in the file for this new data, but also for all of the previously-existing data in the heap to be re-written. The next insert, however, will only need to reserve space for its new data, since the rest of the heap already has space reserved for it.

Potential issues:

  1. This system does not take into accout deleted space. Deleted space can still be allocated as usual, but "reserved" space is always taken off the end of a file. This means that a file that was filled to capacity but then had a significant number of objects deleted will still throw errors if more data is written. This occurs because the file's free space is in the middle of the file, not at the end. A more complete space-reservation system would test if the reserved data can fit into the file's free list, but this would be significantly more complicated to implement.
  2. HDF5 currently claims to support 16-byte addresses, but a number of platforms do not support 16-byte integers, so addresses of this size cannot be represented in memory. This solution does not attempt to address this issue.


Last modifoed: 28 September 2006