Navigate back: Main / HDF5 User Guide
The HDF5 Virtual File Driver Interface
Introduction
The HDF5 Virtual File Driver (VFD) interface provides an abstraction layer for file I/O operations, enabling HDF5 to work with different file storage mechanisms. The VFD layer intercepts all low-level file access operations and forwards them to a specific driver implementation, allowing HDF5 files to be stored in various ways beyond simple POSIX files.
- See also
- File Drivers (H5FD) Reference Manual
Purpose and Benefits
The Virtual File Driver interface serves several important purposes:
- Storage Flexibility: Enables HDF5 to work with different storage backends including local files, parallel file systems, memory, and cloud storage.
- Performance Optimization: Allows selection of file drivers optimized for specific computing environments and I/O patterns.
- Parallel I/O: Provides support for MPI-based parallel I/O through specialized drivers.
- Custom Storage: Enables development of custom file drivers for specialized storage requirements.
Built-in File Drivers
HDF5 includes several standard Virtual File Drivers:
- SEC2 Driver: The default POSIX I/O driver using standard system calls like read() and write(). Suitable for most serial applications on local file systems. Set with H5Pset_fapl_sec2.
- STDIO Driver: Uses buffered I/O from the C standard library (fread/fwrite). May provide better performance for some applications. Set with H5Pset_fapl_stdio.
- Core Driver: Stores the HDF5 file entirely in memory, with optional backing store to disk. Provides fastest I/O for temporary files or small datasets. Set with H5Pset_fapl_core.
- Family Driver: Splits a logical HDF5 file across multiple physical files of equal size. Useful for circumventing file system limitations. Set with H5Pset_fapl_family.
- Multi Driver: Stores different types of HDF5 data in separate files (metadata, raw data, etc.). Can optimize I/O by placing different data types on different storage devices. Set with H5Pset_fapl_multi.
- Split Driver: A simplified version of the Multi driver that separates metadata and raw data into two files. Set with H5Pset_fapl_split.
- Log Driver: Wraps another driver and logs all file access operations. Useful for debugging and I/O profiling. Set with H5Pset_fapl_log.
- MPI-IO Driver: Enables parallel I/O using MPI-IO for HPC applications. Required for parallel HDF5 operations. Set with H5Pset_fapl_mpio.
- Subfiling Driver: A parallel I/O driver that improves parallel I/O performance on parallel file systems by splitting the logical HDF5 file into multiple subfiles distributed across I/O concentrator nodes. Reduces contention and improves scalability for large-scale parallel applications. Set with H5Pset_fapl_subfiling.
- Direct Driver: Uses direct I/O (O_DIRECT) to bypass OS caching. Can improve performance for large sequential I/O. Set with H5Pset_fapl_direct.
- Onion Driver: Provides revision control for HDF5 files by storing file modifications as separate revisions. Enables tracking changes over time and accessing previous versions. Set with H5Pset_fapl_onion.
- Splitter Driver: Writes file operations simultaneously to two different channels using different VFDs. Useful for creating redundant copies or logging I/O to separate locations. Set with H5Pset_fapl_splitter.
- Mirror Driver: Mirrors all file operations to a remote server in real-time over a network connection. Enables remote backup and replication scenarios. Set with H5Pset_fapl_mirror.
- ROS3 Driver: Read-only driver for accessing HDF5 files in S3-compatible object storage. Set with H5Pset_fapl_ros3.
- HDFS Driver: Read-only driver for accessing HDF5 files in Hadoop Distributed File System. Set with H5Pset_fapl_hdfs.
Selecting a File Driver
File drivers are selected through the file access property list when opening or creating a file. The basic pattern is:
- Create a file access property list with H5Pcreate
- Set the desired file driver using the appropriate H5Pset_fapl_* function
- Pass the property list to H5Fcreate or H5Fopen
Custom File Drivers
Applications can implement custom file drivers by:
- Defining a H5FD_class_t structure with function pointers for all required operations
- Implementing the driver callbacks (open, close, read, write, etc.)
- Registering the driver with H5FDregister
- Setting the driver in a file access property list with H5Pset_driver
Custom drivers enable specialized I/O strategies such as:
- Integration with custom storage systems
- Transparent encryption or compression at the I/O layer
- Specialized caching strategies
- Network-based storage protocols
Parallel File Drivers
For parallel HDF5 applications, the MPI-IO file driver is required (see A Brief Introduction to Parallel HDF5 for details on parallel HDF5 programming). This driver coordinates file access across multiple MPI processes, enabling collective I/O operations and preventing conflicts. Parallel applications must:
- Build HDF5 with parallel support enabled
- Use the MPI-IO file driver via H5Pset_fapl_mpio
- Provide MPI communicator and info objects
- Coordinate file access across processes
The Subfiling driver provides additional performance benefits for large-scale parallel applications on parallel file systems. It works by:
- Distributing the HDF5 file across multiple subfiles
- Designating I/O concentrator processes (typically one per node)
- Striping data across subfiles to reduce contention
- Enabling better parallel I/O scaling on Lustre, GPFS, and similar file systems
The Subfiling driver is particularly beneficial when running at scale on parallel file systems where a single shared file can become a bottleneck.
Performance Considerations
Choosing the right file driver can significantly impact I/O performance:
- Local Files: SEC2 or STDIO drivers typically provide good performance
- Temporary Data: Core driver provides fastest access by avoiding disk I/O
- Large Files: Family driver can work around file size limitations
- Parallel Applications: MPI-IO driver required for coordinated parallel access
- Large-Scale Parallel: Subfiling driver can dramatically improve performance on shared parallel file systems by reducing metadata contention and enabling better striping
- Network Storage: Consider drivers optimized for network protocols
- Redundancy: Splitter or Mirror drivers enable real-time backup and replication
- Versioning: Onion driver enables tracking file revisions for provenance and rollback
Querying File Driver Information
Applications can query the current file driver:
Summary
The HDF5 Virtual File Driver interface provides:
- Abstraction of file I/O operations for flexibility and portability
- Multiple built-in drivers for common storage scenarios
- Support for parallel I/O via MPI-IO
- Extensibility through custom driver implementation
- Performance optimization opportunities through driver selection
Proper selection and configuration of file drivers is essential for optimal HDF5 performance in different computing environments.
Navigate back: Main / HDF5 User Guide