Please see The HDF Group's new Support Portal for the latest information.
Contents:
- Description ?
- Programming Model
- File Creation Property List
- File Access Property List
- Dataset Creation Property List
- Data Access/Transfer Property List
Description
As defined in the Introductory Topics, a property is a characteristic or feature associated with an HDF5 object, and all HDF5 objects have default properties which handle the most common needs.
The default properties can be modified by use of the Property List interface and function parameters.
The Property List API allows a user to take advantage of the more powerful features in HDF5. In HDF5 1.6, it supports unusual cases when:
- Creating Files
- Accessing Files
- Creating Datasets
- Accessing Datasets
There is a programming model for working with property lists in HDF5.
In Summary:
Properties are features of HDF5 objects, that can be changed by use of the Property List API and function parameters. Property lists provide a mechanism for adding functionality to HDF5 calls without increasing the number of arguments used for a given call. The Property List API supports unusual cases when creating and accessing HDF5 objects. |
Programming Model
Default properties are specified by simply passing inH5P_DEFAULT
(C) / H5P_DEFAULT_F
(F90) for
the property list parameter in those functions for which
properties can be changed.
The programming model for changing a property list is as follows:
-
Create a copy or "instance" of the desired pre-defined property type, using the H5Pcreate (C) / h5pcreate_f (F90) call. This will return a property list identifier. Please see the Reference Manual entry for H5Pcreate (C) / h5pcreate_f (F90), for a comprehensive list of the property types.
-
With the property list identifier, modify the property, using the H5P APIs.
-
Modify the object feature, by passing the property list identifier into the corresponding HDF5 object function.
-
Close the property list when done, using H5Pclose (C) / h5pclose_f (F90).
File Creation Property List
The File Creation property list, H5P_FILE_CREATE, is specified by the third parameter to H5Fcreate and is used to control the file metadata which is maintained in the super block of the file.
Following are some properties you can change with the File Creation property list:
-
The user-block size (set using H5Pset_userblock).
The user block stores user defined information at the beginning of the file. An example of information that could be stored is ASCII text which describes a file. The default size is 0. -
The byte size of offsets and lengths used to address objects in an HDF5 file (set using H5Pset_sizes).
-
The size of parameters used to control the symbol table nodes (set using H5Pset_sym_k).
-
The size of parameters used to control the B-trees for indexing chunked datasets (set using H5Pset_istore_k).
For a complete list of File Creation properties, refer to the Property List Interface in the HDF5 Reference Manual.
Using the Default File Creation Property List
To use a default File Creation property list with H5Fcreate
, you
would specify H5P_DEFAULT
(C) or H5P_DEFAULT_F
(F90)
for the third parameter, as is highlighted below (in C):
file_id = H5Fcreate ("file.h5", H5_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
Some defaults are:
User Block Size | 0 |
Byte Size of offsets and lengths used to address objects | Same as sizeof (hsize_t) in the library (normally 8 bytes) |
Size of parameters controlling the symbol table nodes | 16 |
Size of parameters controlling B-trees for indexing chunked datasets | 32 |
Modifying the File Creation Property List
To modify the File Creation property list follow these steps:
-
Create a copy or instance of the File Creation property list with H5Pcreate. For example, in C:
fcpl = H5Pcreate (H5P_FILE_CREATE);
-
Modify the property list with one of the File Creation property list functions. For example, modify the User Block with H5Pset_userblock :
status = H5Pset_userblock(fcpl, 512);
-
Call H5Fcreate, passing in the identifier of the property list that was just modified. For example:
file_id = H5Fcreate(FILE, H5F_ACC_TRUNC, fcpl, H5P_DEFAULT);
-
Lastly, close the property list with H5Pclose:
status = H5Pclose (fcpl);
The following example shows how to modify the File Creation property list to create a file with 64-bit object offsets and lengths. Note that it follows the programming model:
[ C program ]
- h5_filessize.c
[ F90 program ]
- filesetsize.f90
File Access Property List
The File Access property list, H5P_FILE_ACCESS, is specified by a parameter to the following functions, and is used to control different methods of performing I/O (unbuffered, buffered, memory, parallel with MPI I/O, data alignment) on files:hid_t H5Fcreate (const char *name, unsigned flags, hid_t create_id, \ hid_t access_id ) hid_t H5Fopen(const char *name, unsigned flags, hid_t access_id )Use of File Access property functions can affect performance:
-
H5Pset_cache
Sets the metadata cache and raw data chunk cache parameters. An improper size for these parameters can greatly degrade performance. Ensure that your chunk size is not larger than your chunk cache size. See the section on performance in the FAQ for more details on the metadata cache and chunk cache. H5Pset_meta_block_size
Sets the minimum metadata block size.
It can reduce the number of small data objects in the file that would otherwise be required for metadata, which will reduce the number of write operations.H5Pset_sieve_buf_size
Sets the maximum size of the data sieve buffer. The data sieve is used when performing I/O on datasets in the file. Increasing the size can improve performance, when selecting hyperslabs.H5Pset_fclose_degree
Sets the file close degree property, which determines how aggressivelyH5Fclose
closes objects within a file. The default property is H5F_CLOSE_WEAK, which indicates that the file is not closed until all objects in the file are closed.- The Virtual File layer (VFL) layer (see VFL below).
The File Access property list also can modify the usage of the low-level I/O libraries and physical storage:
- It can be used to define low level I/O libraries, such as MPI I/O for parallel access (see the Parallel HDF5 Tutorial) and STDIO vs. SEC2.
- It can be used to access the Virtual File Layer (VFL),
a public API for writing I/O drivers.
If using the VFL layer, a file does not have to be a file; it can be mapped to multiple files, memory, a network protocol, etc.The VFL API allows users to design and implement their own mapping between the HDF5 format address space and storage, with each mapping being a separate file driver:
- It can be used to modify the physical storage of an HDF5 file to create:
- A memory driver (HDF5 file in the application's memory).
- A stream driver (HDF5 file written to a socket).
- A Family of Files (using
H5Pset_fapl_family
).
This property is useful when an HDF5 file cannot fit on a filesystem. It can be divided into equal sized pieces with the maximum size that will fit on that filesystem. A family member size must be a power of two. - Split files (using
H5Pset_fapl_split
).
An HDF5 file can be split into two files, one with just metadata, the other with just data:
- A memory driver (HDF5 file in the application's memory).
Using the Default File Access Property List
To use a default File Access property list withH5Fcreate
or H5Fopen
, you would specify H5P_DEFAULT
(C)
or H5P_DEFAULT_F
(F90)
for the access_id
parameter. Examples (in C) are:
file_id = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); file_id = H5Fopen (FILE, H5F_ACC_RDWR, H5P_DEFAULT);Default Sizes are:
Metadata and Raw Data Chunk Cache | 4MB, 1MB | Metadata Block Size | 2048 |
Maximum Size of Data Sieve Buffer | 64KB |
File Close Degree | H5F_CLOSE_WEAK (for all, except parallel, which is H5F_CLOSE_SEMI) |
Modifying the Default File Access Property List
To modify the File Access property list, follow these steps:- Create a copy or instance of the File Access property list. For
example, in C:
fapl = H5Pcreate (H5P_FILE_ACCESS);
- Modify the property list by calling one of the File Access property list
functions (
H5Pset_cache
,H5Pset_meta_block_size
, ...). - Call
H5Fcreate
, passing in the identifier of the property list that was just modified. For example (in C):file_id = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, fapl);
- Lastly, close the property list:
status = H5Pclose (fapl);
Following is an example of using the H5P_FILE_ACCESS property list for creating
HDF5 files with the metadata and data split into different files:
[ C program ]
- h5split.c
[ F90 program ]
- dsetsplit.f90
Following is an example of using the H5P_FILE_ACCESS property list to
create a family of files:
[ C program ]
- h5family.c
[ F90 program ]
- dsetfamily.f90
Dataset Creation Property List
The Dataset Creation property list, H5P_DATASET_CREATE, is a parameter to theH5Dcreate
(C) / h5dcreate_f
(F90) call.
It is used to control information on how raw data is organized on disk:
hid_t H5Dcreate (hid_t loc_id, const char *name, hid_t type_id, \ hid_t space_id, hid_t create_plist_id )Following is a list of the properties that are affected by the Dataset Creation Property List. They are not mutually exclusive of each other: An explanation of these properties follows:
- H5D_CONTIGUOUS:
This is used for data that is large, non-extendible, non-compressible, and non-sparse. It is the default.External Files:
-
Contiguous datasets can be stored externally.
- Ease of including existing data into HDF5.
- Ease of exporting raw data.
Advantages of this might be:However, users have to keep track of additional files to preserve the integrity of the HDF5 file:
A feature of external files is that a a dataset can be partitioned into different parts with each of those parts stored in a separate segment of an external file:
The C code to do this might look as follows:
plist = H5Pcreate (H5P_DATASET_CREATE); status = H5Pset_external (plist, "raw_data.ext", 3000, 1000); status = H5Pset_external (plist, "raw_data.ext", 0, 2500); status = H5Pset_external (plist, "raw_data.ext", 4500, 1500);
- H5D_COMPACT:
This is used for small datasets. The data is stored in the object header. This eliminates disk seek/read requests, and can therefore improve I/O. - H5D_CHUNKED:
This is used for data that is large. It is partitioned into chunks so each chunk is the same logical size. Chunked data is required for:- Extendible datasets
- Compression and other filters
- Improving partial I/O for big datasets
Better subsetting
access time;
extendibleOnly two chunks will
be written/read
Storage:
Datasets are partitioned by their storage layout. The storage layout can affect I/O performance, as well as the size of the HDF5 file.Following are the storage layouts in HDF5:
Filters Applied To The Raw Data:
Filters provide a mechanism for manipulating data while transferring it between memory and disk.There are pre-defined filters in HDF5 for using ZLIB and SZIP compression, as well as for using the shuffling and checksum filters. The H5P interface provides calls to work with these pre-defined filters.
Users can also set up their own user-defined filters, using the H5Z and H5P interfaces. The following is a demonstration of how to do this:
Adding BZIP2 Compression to HDF5General features of filters:
- They MUST be chunked.
- They can be combined together (for eg. ZLIB + Checksum)
- They are called in the order they are defined for writing and in the reverse order for reading.
- Users are responsible for using them properly. For example, using ZLIB+SZIP+shuffle would not make sense, but it could be done.
Purpose of the pre-defined filters in HDF5:
Compression |
|
![]() |
Checksum | Includes the Fletcher32 checksum algorithm for error detection. |
![]() |
Shuffling |
Changes the byte order in a stream of data.
Combined with compression, shuffling provides:
|
![]() |
Space Allocation For Raw Data In a File:
TheH5Pset_alloc_time
and H5Pget_alloc_time
calls
give control over the space allocation in an HDF5 file.
This is important for both performance and the size of the files.
Sequential space can be allocated:
- At creation time (H5D_ALLOC_TIME_EARLY). This is the default for compact storage.
- As data is written (H5D_ALOC_TIME_INCR). This is the default for chunking storage.
- At write time (H5D_ALLOC_TIME_LATE). This is the default for contiguous storage.
For Parallel HDF5 this property is ignored. The space is ALWAYS allocated early.
Fill Values:
A fill value can be applied to chunked, contiguous, or compact data.
Fill value properties are defined with the H5Pset(get)_fill_value
and H5Pset(get)_fill_time
calls. The fill times that can be
specified with H5Pset(get)_fill_time
are:
- H5D_FILL_TIME_IFSET: The user defined fill value is written when storage is allocated, if set.
- H5D_FILL_TIME_ALLOC: The fill value is written when the space is allocated.
- H5D_FILL_TIME_NEVER: The fill value is never written.
How and when the fill value is written depends on the storage layout, allocation time setting and access method (sequential or parallel). See the documentation on fill value behavior for more details.
Using the Default Dataset Creation Property List
To use the default Dataset Creation property list withH5Dcreate
, you would specify
H5P_DEFAULT
(C) or H5P_DEFAULT_F
(F90)
for the create_plist_id
parameter. For example (in C):
did = H5Dcreate(fid, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT);
Modifying the Dataset Creation Property List
To modify the Dataset Creation property list follow these steps:- Create a copy or instance of the Dataset Creation property list. For
example, in C:
dcpl = H5Pcreate (H5P_DATASET_CREATE);
- Modify the property list with one of the Dataset Creation property list
functions. If using Compact or Chunked datasets, then the corresponding
property would need to be set with the
H5Pset_layout
orH5Pset_chunk
call. For examples (in C):Compact Datasets:
status = H5Pset_layout (dcpl, H5D_COMPACT);
Chunked Datasets:status = H5Pset_chunk (dcpl, rank, chunk_dims);
Then any other properties would need to be set, using the same property list identifier (dcpl
, in this case). - Call
H5Dcreate
, passing in the identifier of the property list that was just modified. For example:dset_id = H5Dcreate ( file_id, DATASETNAME, H5T_NATIVE_INT, dataspace, dcpl );
- Lastly, close the property list:
status = H5Pclose (dcpl);
-
External Files:
-
Contiguous raw data can be stored in an external file
using the
H5Pset_external
function. For example (in C):
status = H5Pset_external (dcpl, "raw_data.ext", offset, size);Example Program:
[ C program ] -
h5_crtextd.c
[ F90 program ] -
dsetexternal.f90
Chunked Data:
- Extendible Datasets: A dataset can be extended by
specifying a dataspace with unlimited dimensions when calling
H5Screate_simple
and by using theH5Dextend
call. Following is an example:
[ C program ] -h5_extend.c
[ F90 program ] -chunk.f90
-
Compression: A dataset can be compressed with ZLIB, SZIP
or a user-defined compression.
In order to use the pre-defined compression methods you must first have configured and built HDF5 with compression enabled. For example, this will configure and build HDF5 with ZLIB and SZIP compression enabled:
./configure --with-zlib=INCDIR, LIBDIR --with-szlib=INCDIR, LIBDIR make check >& check.out make install
Once HDF5 includes ZLIB or SZIP compression, a compressed dataset can be created with theH5Pset_deflate
orH5Pset_szip
call. Following are programming examples:ZLIB Compression:
[ C program ] -h5zip.c
[ F90 program ] -dsetzlib.f90
SZIP Compression:
[ C program ] -h5szip.c
[ F90 program ] -dsetszip.f90
- The Checksum Filter (Fletcher32 Checksum Algorithm) for error detection
is automatically included in HDF5. To use it you must add it to the
filter pipeline with the
H5Pset_filter
call.See the following example program:
[ C program ] -h5cksum.c
[ F90 program ] -dsetcksum.f90
- Shuffle Filter: The shuffling filter is automatically
included in HDF5. To add it to the pipeline,
H5Pset_shuffle
must be called, followed by the desired compression call (it will not work if you set the compression method first).See the following programming example:
[ C program ] -h5shzip.c
[ F90 program ] -dsetshuffle.f90
Data Access/Transfer Property List
The Data Access/Transfer property list, H5P_DATASET_XFER, is specified by a parameter toH5Dread
and H5Dwrite
. It is used to control
various aspects of I/O, such as caching hints or collective I/O information:
herr_t H5Dread (hid_t dataset_id, hid_t mem_type_id, hid_t mem_space_id, \ hid_t file_space_id, hid_t xfer_plist_id, void * buf ) herr_t H5Dwrite (hid_t dataset_id, hid_t mem_type_id, hid_t mem_space_id, \ hid_t file_space_id, hid_t xfer_plist_id, const void * buf )This property can be used to improve performance, with the following calls:
H5Pset_buffer
:
Sets the size of the datatype conversion buffer during I/O. The size should be large enough to hold the slice along the slowest changing dimension. For example, if your hyperslab size is 100x200x300, then the buffer size should be 200x300.H5Pset_hyper_vector_size
:
Sets the number of hyperslab offset and length pairs. This can improve performance for partial I/O.
Following are other functions that can modify the Data Access/Transfer properties:
H5Pset_edc_check
:
This is used for datasets created with the error detection filter enabled. It enables or disables error checking when reading data.H5Pset_dxpl_mpio
:
This sets the data transfer mode for parallel I/O to either H5FD_MPIO_INDEPENDENT (default) or H5FD_MPIO_COLLECTIVE.
Using the Default Dataset Access/Transfer Property List
To use the default Data Access/Transfer property list, specifyH5P_DEFAULT
for C and H5P_DEFAULT_F
for F90,
when calling H5Dread
or H5Dwrite
. Examples in
C are:
status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); status = H5Dread (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);
Modifying the Data Access/Transfer Property List
To modify the Data Access/Transfer property list, follow these steps:- Create a copy or instance of the Data Access/Transfer property list. For
example, in C:
dtpl = H5Pcreate (H5P_DATASET_XFER);
- Modify the property list by calling one of the Data Access/Transfer
property list functions.
- Call
H5Dread
orH5Dwrite
, passing in the identifier of the property list that was just modified. - Lastly, close the property list:
status = H5Pclose (dtpl);
Following is an example of using the Data Access/Transfer property to set the maximum size for the type conversion buffer and background buffer:
plist_xfer = H5Pcreate (H5P_DATASET_XFER); status H5Pset_buffer(plist_xfer, (hsize_t)NX*NY*NZ, NULL, NULL); status = H5Dread (dataset, H5T_NATIVE_UCHAR, memspace, dataspace, plist_xfer);The following example uses the Data Access/Transfer property list with the error detection filter.
[ C program ] -
h5cksum.c
[ F90 program ] -
dsetcksum.f90
The Parallel HDF5 tutorial covers Writing and Reading Hyperslabs which also uses the Data Access/Transfer property list.
- - Last modified: 21 December 2016