hdf images hdf images

Fast Output through Direct Chunk Writing

About the Project

The problem. DECTRIS, Ltd. builds X-ray pixel detectors for use at synchrotron light sources (specialized particle accelerators). These detectors can produce data at rates of tens of gigabytes per second. Before transferring the data over their network, the detectors compress the data by a factor of 10 or more.

Because their data rates are so high, these detectors tested the limits of HDF5 writing capabilities, which in turn imposed constraints on how fast detectors could become in the future. So the client worked with The HDF Group to find a way to save time in writing pre-compressed data to the file. In the end, the project enables the data writing speed of the HDF5 Library to match the output rate of their data.

The solution: direct chunk writing. An HDF5 dataset may be broken into “chunks” when stored in an HDF5 file. When writing a chunk of data, several operations, data compression for example, can be performed on the data to prepare it for storage in an HDF5 file. These operations can slow down I/O significantly. A new function called H5DOwrite_chunk has been added that avoids these steps by writing data chunks directly to the file from memory. If an application can pre-process the data, then the application can write the data much faster.

In the case of the DECTRIS X-ray pixel detectors, the application compresses the data internally and requires no other services provided by the library. Thus H5DOwrite_chunk allows the application to write its data directly to an HDF5 file.

Performance results. Normally, chunked data is written using the function H5Dwrite. The following table shows how much H5DOwrite_chunk can improve performance when used instead. Note for each test that the middle bar showing H5DOwrite_chunk speed is significantly faster than the H5Dwrite speed. Note also that the H5DOwrite_chunk speed compares favorably with the write speed by a Unix system of a flat file to disk.

A word of caution. Since H5DOwrite_chunk writes data chunks directly in a file, care must be taken during its use. The function bypasses hyperslab selection, the conversion of data from one datatype to another, and the filter pipeline to write the chunk. Developers should have experience with these processes before they use H5DOwrite_chunk.


For more information, click the links on the left.

- - Last modified: 12 October 2016