hdf images hdf images

HDF5 Tutorial:   Learning The Basics
Reading from or Writing to a Subset of a Dataset


Reading from or Writing to a Subset of a Dataset

HDF5 allows you to read from or write to a portion or subset of a dataset. This is done by selecting a subset of the dataset's dataspace in the file, selecting a memory dataspace, and then using the memory and file dataspaces to read from or write to the dataset.

There are two types of selections in HDF5, hyperslab selections and element selections, specified with the H5Sselect_hyperslab (C) / h5sselect_hyperslab_f (Fortran) and H5Sselect_elements (C) /h5sselect_elements_f (Fortran) calls, respectively:

This tutorial topic shows how to write to a simple subset of data in a dataset. See the Advanced Tutorial topics for a more complex example and an example of using element selection.

Selecting a Subset of the Dataset's Dataspace

The H5Dget_space / h5dget_space_f call obtains the dataspace of a dataset in a file.

A subset of that dataspace can be selected with H5Sselect_hyperslab (C) / h5sselect_hyperslab_f (Fortran). The offset, count, stride and block parameters of this API define the shape and size of the selection. They must be arrays with the same number of dimensions as the rank of the dataset's dataspace. These arrays ALL work together to define a selection. A change to one of these arrays can affect the others.

The offset or start array specifies the offset of the starting element of the specified hyperslab.

The count array determines how many blocks to select from the dataspace in each dimension. If the block size for a dimension is one then the count is the number of elements along that dimension.

The stride array allows you to sample elements along a dimension. For example, a stride of one (or NULL) will select every element along a dimension, a stride of two will select every other element, and a stride of three will select an element after every two elements.

The block array determines the size of the element block selected from a dataspace. If the block size is one or NULL then the block size is a single element in that dimension.

Selecting a Memory Dataspace

You must select a memory dataspace in addition to a file dataspace before you can read a subset from or write a subset to a dataset. A memory dataspace can be specified by calling H5Screate_simple (C) / h5screate_simple_f (Fortran).

The memory dataspace passed to the read or write call must contain the same number of elements as the file dataspace. The number of elements in a dataspace selection can be determined with the H5Sget_select_npoints (C) / h5sget_select_npoints_f (Fortran) API.

Reading From or Writing To a Dataset Subset

To read from or write to a dataset subset, the H5Dread (C) /h5dread_f (Fortran) and H5Dwrite (C) / h5dwrite_f (Fortran) routines are used. The memory and file dataspace identifiers from the selections that were made are passed into the read or write call. For example (C):

    status = H5Dwrite (.., .., memspace_id, dataspace_id, .., ..);

Programming Example


This example (for C and C++) creates an 8 x 10 integer dataset in an HDF5 file. It then selects and writes to a 3 x 4 subset of the dataset created with the dimensions offset by 1 x 2. (If using Fortran, the dimensions will be swapped. The dataset will be 10 x 8, the subset will be 4 x 3, and the offset will be 2 x 1.)

PLEASE NOTE that the examples and images below were created using C.

The following image shows the dataset that gets written originally, and the subset of data that gets modified afterwards. Dimension 0 is vertical and Dimension 1 is horiziontal as shown below:

The subset on the right above is created using these values for offset, count stride, and block:

To obtain the example, download:

See HDF5 Introductory Examples for the examples used in the Learning the Basics tutorial. There are examples for several other languages.

For details on compiling an HDF5 application, click here.

Experiments with Different Selections

Following are examples of changes that can be made to the example code provided to better understand how to make selections.

Example 1:

By default the example code will select and write to a 3 x 4 subset. You can modify the count parameter in the example code to select a different subset, by changing the value of DIM0_SUB (C, C++) / dim0_sub (Fortran) near the top. Change its value to 7 to create a 7 x 4 subset:

If you were to change the subset to 8 x 4, the selection would be beyond the extent of the dimension:

The write will fail with the error: "file selection+offset not within extent"

Example 2:

In the example code provided, the memory and file dataspaces passed to the H5Dwrite call have the same size, 3 x 4 (DIM0_SUB x DIM1_SUB). Change the size of the memory dataspace to be 4 x 4 so that they do not match, and then compile:

    dimsm[0] = DIM0_SUB + 1;
    dimsm[1] = DIM1_SUB;
    memspace_id = H5Screate_simple (RANK, dimsm, NULL); 

The code will fail with the error: "src and dest data spaces have different sizes"

How many elements are in the memory and file dataspaces that were specified above? Add these lines:

    hssize_t    size;
    /* Just before H5Dwrite call the following */
    size = H5Sget_select_npoints (memspace_id);
    printf ("\nmemspace_id size: %i\n", size);
    size = H5Sget_select_npoints (dataspace_id);
    printf ("dataspace_id size: %i\n", size);

You should see these lines followed by the error:

    memspace_id size: 16
    dataspace_id size: 12

Example 3:

This example shows the selection that occurs if changing the values of the offset, count, stride and block parameters in the example code.

This will select two blocks. The count array specifies the number of blocks. The block array specifies the size of a block. The stride must be modified to accomodate the block size.

Now try modifying the count as shown below. The write will fail because the selection goes beyond the extent of the dimension:

If the offset were 1x1 (instead of 1x2), then the selection can be made:

The selections above were tested with the h5_subsetbk.c example code. The memory dataspace was defined as one-dimensional.


- - Last modified: 21 December 2016