The purpose of this chapter is to describe how to work with HDF5 data files.
If HDF5 data is to be written to or read from a file, the file must first be explicitly created or opened with the appropriate file driver and access privileges. Once all work with the file is complete, the file must be explicitly closed.
This chapter discusses the following:
This chapter assumes an understanding of the material presented in the data model chapter, “HDF5 Data Model and File Structure.”
There are two issues regarding file access:
Four access modes address these concerns. Two of these modes can
be used with H5Fcreate
, and two modes can be used with
H5Fopen
.
H5Fcreate
accepts H5F_ACC_EXCL
or
H5F_ACC_TRUNC
H5Fopen
accepts H5F_ACC_RDONLY
or H5F_ACC_RDWR
The access modes are described in the table below.
Table 1. Access flags and modes | |
Access Flag | Resulting Access Mode |
H5F_ACC_EXCL |
If the file already exists, H5Fcreate fails.
If the file does not exist, it is created and opened with
read-write access. (Default) |
H5F_ACC_TRUNC |
If the file already exists, the file is opened with read-write access, and new data will overwrite any existing data. If the file does not exist, it is created and opened with read-write access. |
H5F_ACC_RDONLY |
An existing file is opened with read-only access.
If the file does not exist, H5Fopen fails.
(Default) |
H5F_ACC_RDWR |
An existing file is opened with read-write access.
If the file does not exist, H5Fopen fails. |
By default, H5Fopen
opens a file for read-only access;
passing H5F_ACC_RDWR
allows read-write access to the file.
By default, H5Fcreate
fails if the file already exists;
only passing H5F_ACC_TRUNC
allows the truncating of an
existing file.
File creation and file access property lists control the more complex aspects of creating and accessing files.
File creation property lists control the characteristics of a file such as the size of the user-block, a user-definable data block; the size of data address parameters; properties of the B-trees that are used to manage the data in the file; and certain HDF5 library versioning information.
See the “File Creation
Properties,” section below, for a more detailed discussion
of file creation properties and appropriate references to the
HDF5 Reference Manual.
If you have no special requirements for these file characteristics,
you can simply specify H5P_DEFAULT
for the default
file creation property list when a file creation property list
is called for.
File access property lists control properties and means of accessing a file such as data alignment characteristics, metadata block and cache sizes, data sieve buffer size, garbage collection settings, and parallel I/O. Data alignment, metadata block and cache sizes, and data sieve buffer size are factors in improving I/O performance.
See the “File Access
Properties” section below for a more detailed discussion of
file access properties and appropriate references to the
HDF5 Reference Manual. If you have no special
requirements for these file access characteristics, you can simply
specify H5P_DEFAULT
for the default file access
property list when a file access property list is called for.
|
Figure 1. UML model for an HDF5 file and its property lists
|
The concept of an HDF5 file is actually rather abstract: the address space for what is normally thought of as an HDF5 file might correspond to any of the following at the storage level:
This HDF5 address space is generally referred to as an HDF5 file regardless of its organization at the storage level.
HDF5 accesses a file (the address space) through various types of low-level file drivers. The default HDF5 file storage layout is as an unbuffered permanent file which is a single, contiguous file on local disk. Alternative layouts are designed to suit the needs of a variety of systems, environments, and applications.
Programming models for creating, opening, and closing HDF5 files are described in the sub-sections below.
The programming model for creating a new HDF5 file can be summarized as follows:
First, consider the simple case where we use the default values for the property lists. See the example below.
file_id = H5Fcreate ("SampleFile.h5", H5F_ACC_EXCL, H5P_DEFAULT, H5P_DEFAULT) |
Example 1. Creating an HDF5 file using property list defaults
|
Note that this example specifies that H5Fcreate
should fail
if SampleFile.h5
already exists.
A more complex case is shown in the example below. In this example,
we define file creation and access property lists (though we do not
assign any properties), specify that H5Fcreate
should
fail if SampleFile.h5
already exists, and create a
new file named SampleFile.h5
. The example does not
specify a driver, so the default driver,
H5FD_SEC2
, will be used.
fcplist_id = H5Pcreate (H5P_FILE_CREATE) <...set desired file creation properties...> faplist_id = H5Pcreate (H5P_FILE_ACCESS) <...set desired file access properties...> file_id = H5Fcreate ("SampleFile.h5", H5F_ACC_EXCL, fcplist_id, faplist_id) |
Example 2. Creating an HDF5 file using property lists
|
Notes:
A root group is automatically created in a file when the file is first created.
File property lists, once defined, can be reused when another file is created within the same application.
The programming model for opening an existing HDF5 file can be summarized as follows:
The code in the example below shows how to open an existing file with read-only access.
faplist_id = H5Pcreate (H5P_FILE_ACCESS) status = H5Pset_fapl_stdio (faplist_id) file_id = H5Fopen ("SampleFile.h5", H5F_ACC_RDONLY, faplist_id) |
Example 3. Opening an HDF5 file
|
The programming model for closing an HDF5 file is very simple:
We close SampleFile.h5
with the code in the example below.
status = H5Fclose (file_id) |
Example 4. Closing an HDF5 file
|
Note that H5Fclose
flushes all unwritten data to storage and
that file_id
is the identifier returned for
SampleFile.h5
by H5Fopen
.
More comprehensive discussions regarding all of these steps are provided below.
h5dump
to View a Fileh5dump
is a command-line utility that is included in
the HDF5 distribution. This program provides a straight-forward means of
inspecting the contents of an HDF5 file. You can use h5dump
to verify that a program is generating the intended HDF5 file.
h5dump
displays ASCII output formatted according to the
HDF5 DDL grammar.
The following h5dump
command will display the
contents of SampleFile.h5
:
h5dump SampleFile.h5
If no datasets or groups have been created in and no data has been written to the file, the output will look something like the following:
HDF5 "SampleFile.h5" { GROUP "/" { } }
Note that the root group, indicated above by /
,
was automatically created when the file was created.
h5dump
is fully described on the
Tools page of the
HDF5 Reference Manual.
The HDF5 DDL grammar is fully described in the document
DDL in BNF for HDF5,
an element of this HDF5 User’s Guide.
General library functions and macros (H5), file functions (H5F), file related property list functions (H5P), and file driver functions (H5P) are listed below.
Function Listing 1. General library functions and macros (H5) | ||
C Function Fortran Function | Purpose | |
H5check_version
h5check_version_f | Verifies that HDF5 library versions are consistent. | |
H5close
h5close_f | Flushes all data to disk, closes all open identifiers, and cleans up memory. | |
H5dont_atexit
h5dont_atexit_f |
Instructs the library not to install the atexit cleanup
routine. |
|
H5garbage_collect
h5garbage_collect_f | Garbage collects on all free-lists of all types. | |
H5get_libversion
h5get_libversion_f | Returns the HDF library release number. | |
H5open
h5open_f | Initializes the HDF5 library. | |
H5set_free_list_limits
h5set_free_list_limits_f | Sets free-list size limits. | |
H5_VERSION_GE
(none) | Determines whether the version of the library being used is greater than or equal to the specified version. | |
H5_VERSION_LE
(none) | Determines whether the version of the library being used is less than or equal to the specified version. | |
Function Listing 2. File functions (H5F) | ||
C Function Fortran Function | Purpose | |
H5Fclear_elink_file_cache
(none) | Clears the external link open file cache for a file. | |
H5Fclose
h5fclose_f | Closes HDF5 file. | |
H5Fcreate
h5fcreate_f | Creates new HDF5 file. | |
H5Fflush
h5fflush_f | Flushes data to HDF5 file on storage medium. | |
H5Fget_access_plist
h5fget_access_plist_f | Returns a file access property list identifier. | |
H5Fget_create_plist
h5fget_create_plist_f | Returns a file creation property list identifier. | |
H5Fget_file_image
h5fget_file_image_f | Retrieves a copy of the image of an existing, open file. | |
H5Fget_filesize
h5fget_filesize_f | Returns the size of an HDF5 file. | |
H5Fget_freespace
h5fget_freespace_f | Returns the amount of free space in a file. | |
H5Fget_info
(none) | Returns global information for a file. | |
H5Fget_intent
(none) | Determines the read/write or read-only status of a file. | |
H5Fget_mdc_config
(none) | Obtain current metadata cache configuration for target file. | |
H5Fget_mdc_hit_rate
(none) | Obtain target file’s metadata cache hit rate. | |
H5Fget_mdc_size
(none) | Obtain current metadata cache size data for specified file. | |
H5Fget_mpi_atomicity
h5fget_mpi_atomicity_f | Retrieves the atomicity mode in use. | |
H5Fget_name
h5fget_name_f | Retrieves the name of the file to which the object belongs. | |
H5Fget_obj_count
h5fget_obj_count_f | Returns the number of open object identifiers for an open file. | |
H5Fget_obj_ids
h5fget_obj_ids_f | Returns a list of open object identifiers. | |
H5Fget_vfd_handle
(none) | Returns pointer to the file handle from the virtual file driver. | |
H5Fis_hdf5
h5fis_hdf5_f | Determines whether a file is in the HDF5 format. | |
H5Fmount
h5fmount_f | Mounts a file. | |
H5Fopen
h5fopen_f | Opens existing HDF5 file. | |
H5Freopen
h5freopen_f | Returns a new identifier for a previously-opened HDF5 file. | |
H5Freset_mdc_hit_rate_stats
(none) | Reset hit rate statistics counters for the target file. | |
H5Fset_mdc_config
(none) | Use to configure metadata cache of target file. | |
H5Fset_mpi_atomicity
h5fset_mpi_atomicity_f | Use to set the MPI atomicity mode. | |
H5Funmount
h5funmount_f | Unmounts a file. | |
Function Listing 3. File creation property list functions (H5P) | ||
C Function Fortran Function | Purpose | |
H5Pset/get_userblock
h5pset/get_userblock_f | Sets/retrieves size of user-block. | |
H5Pset/get_sizes
h5pset/get_sizes_f | Sets/retrieves byte size of offsets and lengths used to address objects in HDF5 file. | |
H5Pset/get_sym_k
h5pset/get_sym_k_f | Sets/retrieves size of parameters used to control symbol table nodes. | |
H5Pset/get_istore_k
h5pset/get_istore_k_f | Sets/retrieves size of parameter used to control B-trees for indexing chunked datasets. | |
H5Pget_file_image
h5pget_file_image_f | Retrieves a copy of the file image designated as the initial content and structure of a file. | |
H5Pset_file_image
h5pset_file_image_f | Sets an initial file image in a memory buffer. | |
H5Pset_shared_mesg_nindexes
h5pset_shared_mesg_nindexes_f | Sets number of shared object header message indexes. | |
H5Pget_shared_mesg_nindexes
(none) | Retrieves number of shared object header message indexes in file creation property list. | |
H5Pset_shared_mesg_index
h5pset_shared_mesg_index_f | Configures the specified shared object header message index. | |
H5Pget_shared_mesg_index
(none) | Retrieves the configuration settings for a shared message index. | |
H5Pset_shared_mesg_phase_change
(none) | Sets shared object header message storage phase change thresholds. | |
H5Pget_shared_mesg_phase_change
(none) | Retrieves shared object header message phase change information. | |
H5Pget_version
h5pget_version_f | Retrieves version information for various objects for file creation property list. | |
Function Listing 4. File access property list functions (H5P) | ||
C Function Fortran Function | Purpose | |
H5Pset/get_alignment
h5pset/get_alignment_f | Sets/retrieves alignment properties. | |
H5Pset/get_cache
h5pset/get_cache_f | Sets/retrieves metadata cache and raw data chunk cache parameters. | |
H5Pset/get_elink_file_cache_size
(none) | Sets/retrieves the size of the external link open file cache from the specified file access property list. | |
H5Pset/get_fclose_degree
h5pset/get_fclose_degree_f | Sets/retrieves file close degree property. | |
H5Pset/get_gc_references
h5pset/get_gc_references_f | Sets/retrieves garbage collecting references flag. | |
H5Pset_family_offset
h5pset_family_offset_f | Sets offset property for low-level access to a file in a family of files. | |
H5Pget_family_offset
(none) | Retrieves a data offset from the file access property list. | |
H5Pset/get_meta_block_size
h5pset/get_meta_block_size_f | Sets the minimum metadata block size or retrieves the current metadata block size setting. | |
H5Pset_mdc_config
(none) | Set the initial metadata cache configuration in the indicated File Access Property List to the supplied value. | |
H5Pget_mdc_config
(none) | Get the current initial metadata cache configuration from the indicated File Access Property List. | |
H5Pset/get_sieve_buf_size
h5pset/get_sieve_buf_size_f | Sets/retrieves maximum size of data sieve buffer. | |
H5Pset_libver_bounds
h5pset_libver_bounds_f | Sets bounds on library versions, and indirectly format versions, to be used when creating objects. | |
H5Pget_libver_bounds
(none) | Retrieves library version bounds settings that indirectly control the format versions used when creating objects. | |
H5Pset_small_data_block_size
h5pset_small_data_block_size_f | Sets the size of a contiguous block reserved for small data. | |
H5Pget_small_data_block_size
h5pget_small_data_block_size_f | Retrieves the current small data block size setting. | |
Function Listing 5. File driver functions (H5P) | ||
C Function Fortran Function | Purpose | |
H5Pset_driver
(none) | Sets a file driver. | |
H5Pget_driver
h5pget_driver_f | Returns the identifier for the driver used to create a file. | |
H5Pget_driver_info
(none) | Returns a pointer to file driver information. | |
H5Pset/get_fapl_core
h5pset/get_fapl_core_f | Sets driver for buffered memory files (i.e., in RAM) or retrieves information regarding driver. | |
H5Pset_fapl_direct
h5pset_fapl_direct_f | Sets up use of the direct I/O driver. | |
H5Pget_fapl_direct
h5pget_fapl_direct_f | Retrieves direct I/O driver settings. | |
H5Pset/get_fapl_family
h5pset/get_fapl_family_f | Sets driver for file families, designed for systems that do not support files larger than 2 gigabytes, or retrieves information regarding driver. | |
H5Pset_fapl_log
(none) | Sets logging driver. | |
H5Pset/get_fapl_mpio
h5pset/get_fapl_mpio_f | Sets driver for files on parallel file systems (MPI I/O) or retrieves information regarding the driver. | |
H5Pset_fapl_mpiposix
h5pset_fapl_mpiposix_f | No longer available. | |
H5Pget_fapl_mpiposix
h5pget_fapl_mpiposix_f | No longer available. | |
H5Pset/get_fapl_multi
h5pset/get_fapl_multi_f | Sets driver for multiple files, separating categories of metadata and raw data, or retrieves information regarding driver. | |
H5Pset_fapl_sec2
h5pset_fapl_sec2_f | Sets driver for unbuffered permanent files or retrieves information regarding driver. | |
H5Pset_fapl_split
h5pset_fapl_split_f | Sets driver for split files, a limited case of multiple files with one metadata file and one raw data file. | |
H5Pset_fapl_stdio
H5Pset_fapl_stdio_f | Sets driver for buffered permanent files. | |
H5Pset_fapl_windows
(none) | Sets the Windows I/O driver. | |
H5Pset_multi_type
(none) | Specifies type of data to be accessed via the MULTI driver enabling more direct access. | |
H5Pget_multi_type
(none) | Retrieves type of data property for MULTI driver. | |
This section describes in more detail how to create and how to open files.
New HDF5 files are created and opened with H5Fcreate
;
existing files are opened with H5Fopen
.
Both functions return an object identifier which must eventually
be released by calling H5Fclose
.
H5Fcreate
:
hid_t H5Fcreate (const char *name,
unsigned flags,
hid_t fcpl_id,
hid_t fapl_id)
H5Fcreate
creates a new file named name
in the current directory.
The file is opened with read and write access;
if the H5F_ACC_TRUNC
flag is set, any pre-existing file
of the same name in the same directory is truncated.
If H5F_ACC_TRUNC
is not set or
H5F_ACC_EXCL
is set and if a file of the same name exists,
H5Fcreate
will fail.
The new file is created with the properties specified in the property
lists fcpl_id
and fapl_id
.
fcpl
is short for file creation property list.
fapl
is short for file access property list. Specifying
H5P_DEFAULT
for either the creation or access property
list calls for the library’s default creation or access properties.
See “File Property Lists” below
for details on setting property list values.
See “File Access Modes”
above for the list of file access flags and their descriptions.
If H5Fcreate
successfully creates the file,
it returns a file identifier for the new file. This identifier will be
used by the application any time an object identifier, an OID, for the
file is required. Once the application has finished working with a file,
the identifier should be released and the file closed with
H5Fclose
.
H5Fopen
:
hid_t H5Fopen (const char *name, unsigned flags,
hid_t fapl_id)
H5Fopen
opens an existing file with
read-write access if H5F_ACC_RDWR
is set and
read-only access if H5F_ACC_RDONLY
is set.
fapl_id is the file access property list identifier.
Alternatively, H5P_DEFAULT
indicates that the application
relies on the default I/O access parameters.
Creating and changing access property lists is documented further below.
A file can be opened more than once via multiple H5Fopen
calls. Each such call returns a unique file identifier and the file can
be accessed through any of these file identifiers as long as they remain
valid. Each of these file identifiers must be released by calling
H5Fclose
when it is no longer needed.
H5Fclose
both closes a file and releases the
file identifier returned by H5Fopen
or H5Fcreate
.
H5Fclose
must be called when an application
is done working with a file;
while the HDF5 Library makes every effort to maintain file integrity,
failure to call H5Fclose
may result in the file
being abandoned in an incomplete or corrupted state.
H5Fclose
:
herr_t H5Fclose (hid_t file_id)
This function releases resources associated with an open file.
After closing a file, the file identifier,
file_id
, cannnot be used again
as it will be undefined.
H5Fclose
fulfills three purposes:
to ensure that the file is left in an uncorrupted state,
to ensure that all data has been written to the file,
and to release resources. Use
H5Fflush
if you wish to ensure that all data has
been written to the file but it is premature to close it.
Note regarding serial mode behavior:
When H5Fclose
is called in serial mode,
it closes the file and terminates new access to it,
but it does not terminate access to objects that remain
individually open within the file.
That is, if H5Fclose
is called for a file but one or
more objects within the file remain open, those objects will remain
accessible until they are individually closed.
To illustrate, assume that a file, fileA
, contains
a dataset, data_setA
, and that both are open when
H5Fclose
is called for fileA
.
data_setA
will remain open and accessible,
including writable, until it is explicitly closed.
The file will be automatically and finally closed once all objects within
it have been closed.
Note regarding parallel mode behavior:
Once H5Fclose
has been called in parallel mode,
access is no longer available to any object within the file.
Additional information regarding file structure and access
are passed to H5Fcreate
and H5Fopen
through property list objects.
Property lists provide a portable and extensible method of
modifying file properties via simple API functions.
There are two kinds of file-related property lists:
In the following sub-sections, we discuss only one file creation property, user-block size, in detail as a model for the user. Other file creation and file access properties are mentioned and defined briefly, but the model is not expanded for each; complete syntax, parameter, and usage information for every property list function is provided in the “H5P: Property List Interface” chapter of the HDF5 Reference Manual.
If you do not wish to rely on the default file creation and
access properties, you must first create a property list with
H5Pcreate
.
hid_t H5Pcreate (hid_t cls_id)
type
is the type of property list being created.
In this case, the appropriate values are
H5P_FILE_CREATE
for a file creation property list and
H5P_FILE_ACCESS
for a file access property list.
Thus, the following calls create a file creation property list and a
file access property list with identifiers fcpl_id
and fapl_id
, respectively:
fcpl_id = H5Pcreate (H5P_FILE_CREATE) fapl_id = H5Pcreate (H5P_FILE_ACCESS)
File creation property lists control the file metadata, which is maintained in the superblock of the file. These properties are used only when a file is first created.
User-block size
herr_t H5Pset_userblock (hid_t plist,
hsize_t size)
herr_t H5Pget_userblock (hid_t plist,
hsize_t *size)
The user-block is a fixed-length block of data
located at the beginning of the file and is ignored by the
HDF5 Library.
This block is specifically set aside for any data or information
that developers determine to be useful to their applications but
that will not be used by the HDF5 Library.
The size
of the user-block is defined in bytes
and may be set to any power of two with a minimum size of 512 bytes.
In other words, user-blocks might be 512, 1024, or 2048 bytes in size.
This property is set with H5Pset_userblock
and queried via H5Pget_userblock
. For example, if an
application needed a 4K user-block, then the following function call
could be used:
status = H5Pset_userblock(fcpl_id, 4096)
The property list could later be queried with
status = H5Pget_userblock(fcpl_id, size)
and the value 4096
would be returned in the parameter
size
.
Other properties, described below, are set and queried in exactly the same manner. Syntax and usage are detailed in the “H5P: Property List Interface” section of the HDF5 Reference Manual.
Offset and length sizesThis property specifies the number of bytes used to store the offset and length of objects in the HDF5 file. Values of 2, 4, and 8 bytes are currently supported to accommodate 16-bit, 32-bit, and 64-bit file address spaces.
These properties are set and queried via
H5Pset_sizes
and H5Pget_sizes
.
The size of symbol table B-trees can be controlled by setting the 1/2-rank and 1/2-node size parameters of the B-tree.
These properties are set and queried via
H5Pset_sym_k
and H5Pget_sym_k
.
The size of indexed storage B-trees can be controlled by setting the 1/2-rank and 1/2-node size parameters of the B-tree.
These properties are set and queried via
H5Pset_istore_k
and H5Pget_istore_k
.
Various objects in an HDF5 file may over time appear in different versions. The HDF5 Library keeps track of the version of each object in the file.
Version information is retrieved via H5Pget_version
.
This section discusses file access properties that are not related to the low-level file drivers. File drivers are discussed separately in “Alternate File Storage Layouts and Low-level File Drivers,” later in this chapter.
File access property lists control various aspects of file I/O and structure.
H5Pset_alignment
function.
There are two values involved:
Any allocation request at least as large as the threshold will be aligned on an address that is a multiple of the alignment interval.
H5Pset_meta_block_size
sets the minimum size in bytes
of metadata block allocations.
H5Pget_meta_block_size
retrieves the current
minimum metadata block allocation size.
H5Pset_cache
sets the minimum cache size for both
metadata and raw data and a preemption value for raw data chunks.
H5Pget_cache
retrieves the current values.
H5Pset_sieve_buf_size
sets the maximum size in bytes
of the data sieve buffer.
H5Pget_sieve_buf_size
retrieves the current maximum size
of the data sieve buffer.
1
) and the
user passes in an uninitialized value in a reference structure,
the heap might become corrupted.
When garbage collection is off (0
), however,
and the user re-uses a reference, the previous heap block
will be orphaned and not returned to the free heap space.
When garbage collection is on, the user must initialize the
reference structures to 0
or risk heap corruption.
H5Pset_gc_references
sets the garbage collecting
references flag.
The concept of an HDF5 file is actually rather abstract: the address space for what is normally thought of as an HDF5 file might correspond to any of the following:
This HDF5 address space is generally referred to as an HDF5 file regardless of its organization at the storage level.
HDF5 employs an extremely flexible mechanism called the
virtual file layer, or VFL, for file I/O.
A full understanding of the VFL is only necessary if you plan to write
your own drivers (see “Virtual File Layer”
and “List of VFL Functions” in the
HDF5 Technical Notes). For our purposes here, it is
sufficient to know that the low-level drivers used for file I/O
reside in the VFL, as illustrated in the following figure.
Note that H5FD_STREAM
is not available with 1.8.x
and later versions of the library.
|
Figure 2. I/O path from application
through VFL and low-level drivers to storage
|
As mentioned above, HDF5 applications access HDF5 files through
various low-level file drivers.
The default driver for that layout is the POSIX driver (also known
as the SEC2 driver), H5FD_SEC2
. Alternative layouts and
drivers are designed to suit the needs of a variety of systems,
environments, and applications. The drivers are listed in the table below.
Table 2. Supported file drivers |
For more information, see the HDF5 Reference Manual entries for the function calls shown in the column on the right in the table above.
Note that the low-level file drivers manage alternative file storage layouts. Dataset storage layouts (chunking, compression, and external dataset storage) are managed independently of file storage layouts.
If an application requires a special-purpose low-level driver, the VFL provides a public API for creating one. For more information on how to create a driver, see “Virtual File Layer” and “List of VFL Functions” in the HDF5 Technical Notes.
When creating a new HDF5 file, no history exists, so the file driver must be specified if it is to be other than the default.
When opening existing files, however, the application may need
to determine which low-level driver was used to create the file.
The function H5Pget_driver
is used for this purpose.
See the example below.
hid_t H5Pget_driver (hid_t fapl_id) |
Example 5. Identifying a driver
|
H5Pget_driver
returns a constant identifying the
low-level driver for the access property list fapl_id.
For example, if the file was created with the POSIX (aka SEC2)
driver, H5Pget_driver
returns H5FD_SEC2
.
If the application opens an HDF5 file without both determining the driver used to create the file and setting up the use of that driver, the HDF5 Library will examine the superblock and the driver definition block to identify the driver. See the HDF5 File Format Specification for detailed descriptions of the superblock and the driver definition block.
The POSIX driver, H5FD_SEC2
, uses functions from
section 2 of the POSIX manual to access unbuffered files stored on
a local file system. This driver is also known as the SEC2 driver.
The HDF5 Library buffers metadata regardless of the low-level driver,
but using this driver prevents data from being buffered again by the
lowest layers of the library.
The function H5Pset_fapl_sec2
sets the file access
properties to use the POSIX driver. See the example below.
herr_t H5Pset_fapl_sec2 (hid_t fapl_id) |
Example 6. Using the POSIX, aka SEC2, driver
|
Any previously-defined driver properties are erased from the property list.
Additional parameters may be added to this function in the future.
Since there are no additional variable settings associated with
the POSIX driver, there is no H5Pget_fapl_sec2
function.
The Direct driver, H5FD_DIRECT
, functions like the
POSIX driver except that data is written to or read from the file
synchronously without being cached by the system.
The functions H5Pset_fapl_direct
and
H5Pget_fapl_direct
are used to manage file access properties.
See the example below.
herr_t H5Pset_fapl_direct( hid_t fapl_id, size_t alignment, size_t block_size, size_t cbuf_size ) herr_t H5Pget_fapl_direct( hid_t fapl_id, size_t *alignment, size_t *block_size, size_t *cbuf_size ) |
Example 7. Using the Direct driver
|
H5Pset_fapl_direct
sets the file access properties
to use the Direct driver; any previously defined driver properties
are erased from the property list. H5Pget_fapl_direct
retrieves the file access properties used with the Direct driver.
fapl_id
is the file access property list identifier.
alignment
is the memory alignment boundary.
block_size
is the file system block size.
cbuf_size
is the copy buffer size.
Additional parameters may be added to this function in the future.
The Log driver, H5FD_LOG
, is
designed for situations where it is necessary to log file access activity.
The function H5Pset_fapl_log
is used to manage
logging properties. See the example below.
herr_t H5Pset_fapl_log (hid_t fapl_id, const char *logfile, unsigned int flags, size_t buf_size) |
Example 8. Logging file access
|
H5Pset_fapl_log
sets the file access property list
to use the Log driver. File access characteristics are identical to
access via the POSIX driver. Any previously defined driver properties
are erased from the property list.
Log records are written to the file logfile
.
The logging levels set with the verbosity
parameter are shown in the table below.
Table 3. Logging levels | |
Level | Comments |
0 | Performs no logging. |
1 | Records where writes and reads occur in the file. |
2 | Records where writes and reads occur in the file and what kind of data is written at each location. This includes raw data or any of several types of metadata (object headers, superblock, B-tree data, local headers, or global headers). |
There is no H5Pget_fapl_log
function.
Additional parameters may be added to this function in the future.
The Windows driver, H5FD_WINDOWS
, was modified in
HDF5-1.8.8 to be a wrapper of the POSIX driver, H5FD_SEC2
.
In other words, if the Windows drivers is used, any file I/O will
instead use the functionality of the POSIX driver. This change should
be transparent to all user applications. The Windows driver used to be
the default driver for Windows systems. The POSIX driver is now the
default.
The function H5Pset_fapl_windows
sets the file access
properties to use the Windows driver. See the example below.
herr_t H5Pset_fapl_windows (hid_t fapl_id) |
Example 9. Using the Windows driver
|
Any previously-defined driver properties are erased from the property list.
Additional parameters may be added to this function in the future.
Since there are no additional variable settings associated with
the POSIX driver, there is no H5Pget_fapl_windows
function.
The STDIO driver, H5FD_STDIO
, accesses permanent files
in a local file system like the POSIX driver does. The STDIO driver also
has an additional layer of buffering beneath the HDF5 Library.
The function H5Pset_fapl_stdio
sets the file access
properties to use the STDIO driver. See the example below.
herr_t H5Pset_fapl_stdio (hid_t fapl_id) |
Example 10. Using the STDIO driver
|
Any previously defined driver properties are erased from the property list.
Additional parameters may be added to this function in the future.
Since there are no additional variable settings associated with
the STDIO driver, there is no H5Pget_fapl_stdio
function.
There are several situations in which it is reasonable, sometimes even required, to maintain a file entirely in system memory. You might want to do so if, for example, either of the following conditions apply:
The Memory driver, H5FD_CORE
, provides a mechanism
for creating and managing such in-memory files. The functions
H5Pset_fapl_core
and H5Pget_fapl_core
manage file access properties. See the example below.
herr_t H5Pset_fapl_core (hid_t access_properties, size_t block_size, hbool_t backing_store) herr_t H5Pget_fapl_core (hid_t access_properties, size_t *block_size), hbool_t *backing_store) |
Example 11. Managing file access for in-memory files
|
H5Pset_fapl_core
sets the file access property list
to use the Memory driver; any previously defined driver properties
are erased from the property list.
Memory for the file will always be allocated in units of the
specified block_size
.
The backing_store
Boolean flag is set when
the in-memory file is created. backing_store
indicates whether to write the file contents to disk when the file
is closed. If backing_store
is set to 1 (TRUE),
the file contents are flushed to a file with the same name as the
in-memory file when the file is closed or access to the file is
terminated in memory. If backing_store
is set
to 0 (FALSE), the file is not saved.
The application is allowed to open an existing file with the
H5FD_CORE
driver. While using
H5Fopen
to open an existing file, if
backing_store
is set to 1
and the flag
for H5Fopen
is set to H5F_ACC_RDWR
, changes to the file
contents will be saved to the file when the file is closed. If
backing_store
is set to 0
and the flag
for H5Fopen
is set to H5F_ACC_RDWR
, changes to the file
contents will be lost when the file is closed. If the
flag
for H5Fopen
is set to
H5F_ACC_RDONLY
, no change to the file will be allowed
either in memory or on file.
If the file access property list is set to use the Memory driver,
H5Pget_fapl_core
will return block_size
and backing_store
with the relevant file access
property settings.
Note the following important points regarding in-memory files:
H5Fcreate
or
H5Fopen
backing_store
is set
to 1
Additional parameters may be added to these functions in the future.
See the “HDF5 File Image Operations” section for information on more advanced usage of the Memory file driver, and see the “ Modified Region Writes” section for information on how to set write operations so that only modified regions are written to storage.
HDF5 files can become quite large, and this can create problems on systems that do not support files larger than 2 gigabytes. The HDF5 file family mechanism is designed to solve the problems this creates by splitting the HDF5 file address space across several smaller files. This structure does not affect how metadata and raw data are stored: they are mixed in the address space just as they would be in a single, contiguous file.
HDF5 applications access a family of files via the
Family driver, H5FD_FAMILY
. The functions
H5Pset_fapl_family
and H5Pget_fapl_family
are used to manage file family properties. See the example below.
herr_t H5Pset_fapl_family (hid_t fapl_id, hsize_t memb_size, hid_t member_properties) herr_t H5Pget_fapl_family (hid_t fapl_id, hsize_t *memb_size, hid_t *member_properties) |
Example 12. Managing file family properties
|
Each member of the family is the same logical size though
the size and disk storage reported by file system listing tools may be
substantially smaller. Examples of file system listing tools are
’ls -l’
on a Unix system or the detailed folder
listing on an Apple Macintosh or Microsoft Windows system.
The name passed to H5Fcreate
or H5Fopen
should include a printf(3c)
-style integer format specifier
which will be replaced with the family member number.
The first family member is numbered zero (0
).
H5Pset_fapl_family
sets the access properties to use
the Family driver; any previously defined driver properties are erased
from the property list. member_properties
will
serve as the file access property list for each member of the file family.
memb_size
specifies the logical size, in bytes,
of each family member. memb_size
is used only
when creating a new file or truncating an existing file; otherwise
the member size is determined by the size of the first member of the
family being opened. Note: If the size of the off_t
type is four bytes, the maximum family member size is usually
2^31-1 because the byte at offset 2,147,483,647 is generally inaccessible.
H5Pget_fapl_family
is used to retrieve file family
properties. If the file access property list is set to use the
Family driver, member_properties will be returned with a
pointer to a copy of the appropriate member access property list.
If memb_size
is non-null, it will contain
the logical size, in bytes, of family members.
Additional parameters may be added to these functions in the future.
It occasionally becomes necessary to repartition a file family.
A command-line utility for this purpose, h5repart
, is
distributed with the HDF5 Library.
h5repart
[-v
]
[-b
block_size[suffix]]
[-m
member_size[suffix]]
source destination
h5repart
repartitions an HDF5 file by copying the source file
or file family to the destination file or file family, preserving holes
in the underlying UNIX files. Families are used for the source and/or
destination if the name includes a printf
-style integer
format such as %d
.
The -v
switch prints input and output file names on the
standard error stream for progress monitoring,
-b
sets the I/O block size (the default is 1KB), and
-m
sets the output member size if the destination is a
family name (the default is 1GB).
block_size
and member_size
may be suffixed with the letters g
, m
, or
k
for GB, MB, or KB respectively.
The h5repart
utility is described on the
Tools page of the
HDF5 Reference Manual.
An existing HDF5 file can be split into a family of files by running
the file through split(1)
on a UNIX system and numbering
the output files. However, the HDF5 Library is lazy about extending
the size of family members, so a valid file cannot generally be
created by concatenation of the family members.
Splitting the file and rejoining the segments by concatenation
(split(1)
and cat(1)
on UNIX systems)
does not generate files with holes; holes are preserved only through
the use of h5repart
.
In some circumstances, it is useful to separate metadata from raw data and some types of metadata from other types of metadata. Situations that would benefit from use of the Multi driver include the following:
In either case, access to the metadata is substantially easier with the smaller, and possibly more localized, metadata files. This often results in improved application performance.
The Multi driver, H5FD_MULTI
, provides a mechanism
for segregating raw data and different types of metadata into multiple
files. The functions H5Pset_fapl_multi
and
H5Pget_fapl_multi
are used to manage access properties
for these multiple files. See the example below.
herr_t H5Pset_fapl_multi (hid_t fapl_id, const H5FD_mem_t *memb_map, const hid_t *memb_fapl, const char * const *memb_name, const haddr_t *memb_addr, hbool_t relax) herr_t H5Pget_fapl_multi (hid_t fapl_id, const H5FD_mem_t *memb_map, const hid_t *memb_fapl, const char **memb_name, const haddr_t *memb_addr, hbool_t *relax) |
Example 13. Managing access properties for multiple files
|
H5Pset_fapl_multi
sets the file access properties
to use the Multi driver; any previously defined driver properties are
erased from the property list. With the Multi driver invoked, the
application will provide a base name to H5Fopen
or
H5Fcreate
. The files will be named by that base name as
modified by the rule indicated in memb_name
.
File access will be governed by the file access property list
memb_properties
.
See
H5Pset_fapl_multi
and
H5Pget_fapl_multi
in the HDF5 Reference Manual for descriptions
of these functions and their usage.
Additional parameters may be added to these functions in the future.
The Split driver, H5FD_SPLIT
, is a limited case of the
Multi driver where only two files are created. One file holds metadata,
and the other file holds raw data.
The function H5Pset_fapl_split
is used to manage Split
file access properties. See the example below.
herr_t H5Pset_fapl_split (hid_t access_properties, const char *meta_extension, hid_t meta_properties, const char *raw_extension, hid_t raw_properties |
Example 14. Managing access properties for split files
|
H5Pset_fapl_split
sets the file access properties
to use the Split driver; any previously defined driver properties
are erased from the property list.
With the Split driver invoked, the application will provide a base
file name such as file_name
to
H5Fcreate
or H5Fopen
. The metadata and raw
data files in storage will then be named
file_name.meta_extension
and
file_name.raw_extension
, respectively. For
example, if meta_extension
is defined as
.meta
and raw_extension
is defined
as .raw
, the final filenames will be
file_name.meta
and
file_name.raw
.
Each file can have its own file access property list. This allows the creative use of other low-level file drivers. For instance, the metadata file can be held in RAM and accessed via the Memory driver while the raw data file is stored on disk and accessed via the POSIX driver. Metadata file access will be governed by the file access property list in meta_properties. Raw data file access will be governed by the file access property list in raw_properties.
Additional parameters may be added to these functions in the future.
Since there are no additional variable settings associated with
the Split driver, there is no H5Pget_fapl_split
function.
Parallel environments require a parallel low-level driver. HDF5’s
default driver for parallel systems is called the Parallel driver,
H5FD_MPIO
. This driver uses the MPI standard for both
communication and file I/O.
The functions H5Pset_fapl_mpio
and
H5Pget_fapl_mpio
are used to manage file access properties
for the H5FD_MPIO
driver. See the example below.
herr_t H5Pset_fapl_mpio (hid_t fapl_id, MPI_Comm comm, MPI_info info) herr_t H5Pget_fapl_mpio (hid_t fapl_id, MPI_Comm *comm, MPI_info *info) |
Example 15. Managing parallel file access properties
|
The file access properties managed by H5Pset_fapl_mpio
and retrieved by H5Pget_fapl_mpio
are
the MPI communicator, comm
, and
the MPI info object, info
. comm
and info
are used for file open. info
is an
information object much like an HDF5 property list. Both are defined
in MPI_FILE_OPEN
of MPI-2.
The communicator and the info object are saved in the file access
property list fapl_id
.
fapl_id
can then be passed to
MPI_FILE_OPEN
to create and/or open the file.
H5Pset_fapl_mpio
and H5Pget_fapl_mpio
are available only in the parallel HDF5 Library and are not collective
functions. The Parallel driver is available only in
the parallel HDF5 Library.
Additional parameters may be added to these functions in the future.
H5F_ACC_TRUNC
FlagThe following example uses the H5F_ACC_TRUNC
flag when it
creates a new file. The default file creation and file access properties
are also used. Using H5F_ACC_TRUNC
means the function
will look for an existing file with the name specified by the function.
In this case, that name is FILE
. If the function does not
find an existing file, it will create one. If it does find an existing
file, it will empty the file in preparation for a new set of data.
The identifier for the "new" file will be passed back to the application
program. See the "File Access Modes"
section for more information.
hid_t file; /* identifier */ /* Create a new file using H5F_ACC_TRUNC access, default file * creation properties, and default file access properties. */ file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); /* Close the file. */ status = H5Fclose(file); |
Example 17. Creating a file with default creation and
access properties
|
The example below shows how to create a file with 64-bit object offsets and lengths.
hid_t create_plist; hid_t file_id; create_plist = H5Pcreate(H5P_FILE_CREATE); H5Pset_sizes(create_plist, 8, 8); file_id = H5Fcreate("test.h5", H5F_ACC_TRUNC, create_plist, H5P_DEFAULT); . . . H5Fclose(file_id); |
Example 18. Creating a file with 64-bit offsets
|
This example shows how to open an existing file for independent datasets access by MPI parallel I/O:
hid_t access_plist; hid_t file_id; access_plist = H5Pcreate(H5P_FILE_ACCESS); H5Pset_fapl_mpi(access_plist, MPI_COMM_WORLD, MPI_INFO_NULL); /* H5Fopen must be called collectively */ file_id = H5Fopen("test.h5", H5F_ACC_RDWR, access_plist); . . . /* H5Fclose must be called collectively */ H5Fclose(file_id); |
Example 19. Opening an existing file for parallel I/O
|
Multiple HDF5 files can be associated so that the files can be worked
with as though all the information is in a single HDF5 file. A temporary
association can be set up by means of the H5Fmount
function.
A permanent association can be set up by means of the external link
function H5Lcreate_external
.
The purpose of this section is to describe what happens when the
H5Fmount
function is used to mount one file on another.
When a file is mounted on another, the mounted file is mounted at a group, and the root group of the mounted file takes the place of that group until the mounted file is unmounted or until the files are closed.
The figure below shows two files before one is mounted on the other. File1 has two groups and three datasets. The group that is the target of the A link has links, Z and Y, to two of the datasets. The group that is the target of the B link has a link, W, to the other dataset. File2 has three groups and three datasets. The groups in File2 are the targets of the AA, BB, and CC links. The datasets in File2 are the targets of the ZZ, YY, and WW links.
|
Figure 3. Two separate files
|
The figure below shows the two files after File2 has been mounted File1 at the group that is the target of the B link.
|
Figure 4. File2 mounted on File1
|
Note that the dataset that is the target of the W link is not shown in the figure above. That dataset is masked by the mounted file.
If a file is mounted on a group that has members, those members are hidden until the mounted file is unmounted. There are two ways around this if you need to work with a group member. One is to mount the file on an empty group. Another is to open the group member before you mount the file. Opening the group member will return an identifier that you can use to locate the group member.
The example below shows how H5Fmount
might be used to
mount File2 onto File1.
status = H5Fmount(loc_id, "/B", child_id, plist_id) |
Example 20.Using H5Fmount loc_id is the file identifier for File1, /B is the link path to the group where File2 is mounted, child_id is the file identifier for File2, and plist_id is a property list identifier. |
For more information, see the
“HDF5 Groups” chapter, and the
H5Fmount
, H5Funmount
, and
H5Lcreate_external
functions in the
HDF5 Reference Manual.