Help us improve by taking our short survey: https://www.hdfgroup.org/website-survey/
HDF5 Last Updated on 2025-12-13
The HDF5 Field Guide
Loading...
Searching...
No Matches
HDF5 Virtual File Drivers

Navigate back: Main / HDF5 User Guide


The HDF5 Virtual File Driver Interface

Introduction

The HDF5 Virtual File Driver (VFD) interface provides an abstraction layer for file I/O operations, enabling HDF5 to work with different file storage mechanisms. The VFD layer intercepts all low-level file access operations and forwards them to a specific driver implementation, allowing HDF5 files to be stored in various ways beyond simple POSIX files.

See also
File Drivers (H5FD) Reference Manual

Purpose and Benefits

The Virtual File Driver interface serves several important purposes:

  • Storage Flexibility: Enables HDF5 to work with different storage backends including local files, parallel file systems, memory, and cloud storage.
  • Performance Optimization: Allows selection of file drivers optimized for specific computing environments and I/O patterns.
  • Parallel I/O: Provides support for MPI-based parallel I/O through specialized drivers.
  • Custom Storage: Enables development of custom file drivers for specialized storage requirements.

Built-in File Drivers

HDF5 includes several standard Virtual File Drivers:

  • SEC2 Driver: The default POSIX I/O driver using standard system calls like read() and write(). Suitable for most serial applications on local file systems. Set with H5Pset_fapl_sec2.
  • STDIO Driver: Uses buffered I/O from the C standard library (fread/fwrite). May provide better performance for some applications. Set with H5Pset_fapl_stdio.
  • Core Driver: Stores the HDF5 file entirely in memory, with optional backing store to disk. Provides fastest I/O for temporary files or small datasets. Set with H5Pset_fapl_core.
  • Family Driver: Splits a logical HDF5 file across multiple physical files of equal size. Useful for circumventing file system limitations. Set with H5Pset_fapl_family.
  • Multi Driver: Stores different types of HDF5 data in separate files (metadata, raw data, etc.). Can optimize I/O by placing different data types on different storage devices. Set with H5Pset_fapl_multi.
  • Split Driver: A simplified version of the Multi driver that separates metadata and raw data into two files. Set with H5Pset_fapl_split.
  • Log Driver: Wraps another driver and logs all file access operations. Useful for debugging and I/O profiling. Set with H5Pset_fapl_log.
  • MPI-IO Driver: Enables parallel I/O using MPI-IO for HPC applications. Required for parallel HDF5 operations. Set with H5Pset_fapl_mpio.
  • Subfiling Driver: A parallel I/O driver that improves parallel I/O performance on parallel file systems by splitting the logical HDF5 file into multiple subfiles distributed across I/O concentrator nodes. Reduces contention and improves scalability for large-scale parallel applications. Set with H5Pset_fapl_subfiling.
  • Direct Driver: Uses direct I/O (O_DIRECT) to bypass OS caching. Can improve performance for large sequential I/O. Set with H5Pset_fapl_direct.
  • Onion Driver: Provides revision control for HDF5 files by storing file modifications as separate revisions. Enables tracking changes over time and accessing previous versions. Set with H5Pset_fapl_onion.
  • Splitter Driver: Writes file operations simultaneously to two different channels using different VFDs. Useful for creating redundant copies or logging I/O to separate locations. Set with H5Pset_fapl_splitter.
  • Mirror Driver: Mirrors all file operations to a remote server in real-time over a network connection. Enables remote backup and replication scenarios. Set with H5Pset_fapl_mirror.
  • ROS3 Driver: Read-only driver for accessing HDF5 files in S3-compatible object storage. Set with H5Pset_fapl_ros3.
  • HDFS Driver: Read-only driver for accessing HDF5 files in Hadoop Distributed File System. Set with H5Pset_fapl_hdfs.

Selecting a File Driver

File drivers are selected through the file access property list when opening or creating a file. The basic pattern is:

  • Create a file access property list with H5Pcreate
  • Set the desired file driver using the appropriate H5Pset_fapl_* function
  • Pass the property list to H5Fcreate or H5Fopen

Custom File Drivers

Applications can implement custom file drivers by:

  • Defining a H5FD_class_t structure with function pointers for all required operations
  • Implementing the driver callbacks (open, close, read, write, etc.)
  • Registering the driver with H5FDregister
  • Setting the driver in a file access property list with H5Pset_driver

Custom drivers enable specialized I/O strategies such as:

  • Integration with custom storage systems
  • Transparent encryption or compression at the I/O layer
  • Specialized caching strategies
  • Network-based storage protocols

Parallel File Drivers

For parallel HDF5 applications, the MPI-IO file driver is required (see A Brief Introduction to Parallel HDF5 for details on parallel HDF5 programming). This driver coordinates file access across multiple MPI processes, enabling collective I/O operations and preventing conflicts. Parallel applications must:

  • Build HDF5 with parallel support enabled
  • Use the MPI-IO file driver via H5Pset_fapl_mpio
  • Provide MPI communicator and info objects
  • Coordinate file access across processes

The Subfiling driver provides additional performance benefits for large-scale parallel applications on parallel file systems. It works by:

  • Distributing the HDF5 file across multiple subfiles
  • Designating I/O concentrator processes (typically one per node)
  • Striping data across subfiles to reduce contention
  • Enabling better parallel I/O scaling on Lustre, GPFS, and similar file systems

The Subfiling driver is particularly beneficial when running at scale on parallel file systems where a single shared file can become a bottleneck.

Performance Considerations

Choosing the right file driver can significantly impact I/O performance:

  • Local Files: SEC2 or STDIO drivers typically provide good performance
  • Temporary Data: Core driver provides fastest access by avoiding disk I/O
  • Large Files: Family driver can work around file size limitations
  • Parallel Applications: MPI-IO driver required for coordinated parallel access
  • Large-Scale Parallel: Subfiling driver can dramatically improve performance on shared parallel file systems by reducing metadata contention and enabling better striping
  • Network Storage: Consider drivers optimized for network protocols
  • Redundancy: Splitter or Mirror drivers enable real-time backup and replication
  • Versioning: Onion driver enables tracking file revisions for provenance and rollback

Querying File Driver Information

Applications can query the current file driver:

Summary

The HDF5 Virtual File Driver interface provides:

  • Abstraction of file I/O operations for flexibility and portability
  • Multiple built-in drivers for common storage scenarios
  • Support for parallel I/O via MPI-IO
  • Extensibility through custom driver implementation
  • Performance optimization opportunities through driver selection

Proper selection and configuration of file drivers is essential for optimal HDF5 performance in different computing environments.


Navigate back: Main / HDF5 User Guide