HDF5 User's Guide: Attributes

Chapter 7
HDF5 Attributes

1. Introduction

An HDF5 attribute is a small meta data object describing the nature and/or intended usage of a primary data object, which may be a dataset, group, or named datatype.

Attributes are assumed to be very small as data objects go, so storing them as standard HDF5 datasets would be quite inefficient. HDF5 attributes are therefore managed through a special attributes interface, H5A, which is designed to easily attach attributes to primary data objects as small datasets containing metadata information and to minimize storage requirements

Consider, as examples of the simplest case, a set of laboratory readings taken under known temperature and pressure conditions of 18.0 degrees celsius and 0.5 atmospheres, respectively. The temperature and pressure could be stored as attributes of the dataset could be described as the following name/value pairs:

   temp=18.0
   pressure=0.5

While HDF5 attributes are not standard HDF5 datasets, they have much in common:

An attribute has a user-defined dataspace and the included metadata has a user-assigned datatype.
That metadata can be of any valid HDF5 datatype.
Attributes are addressed by name.

Note

Attributes are small datasets but not separate objects; they are contained within the object header of a primary data object. As such, attributes are opened, read, or written only with H5A functions.

But there are some very important differences:

There is not provision for special storage, such as compression or chunking.
There is no partial I/O or subsetting capability for attribute data.
Attributes cannot be shared.
Being small, an attributes is stored in the object header of the object it describes and is thus attached directly to that object.

Large attributes, described below in “Special Issues”, are best stored as separate HDF5 datasets and are not subject to the above limitations.

This chapter discusses or lists the following:

The HDF5 attributes programming model
H5A function summaries
Working with HDF5 attributes
- The structure of an attribute
- Creating, writing, and reading attributes
- Accessing attributes by name or index
- Obtaining information regarding an object's attributes
- Iterating across an object's attributes
- Deleting an attribute
- Closing attributes
Special issues regarding attributes

In the following discussions, attributes are generally attached to datasets. Attributes attached to other primary data objects, i.e., groups or named datatypes, are handled in exactly the same manner.

2. Programming Model

Image of UML model for an HDF5 attribute and its
associated dataspace and datatype

Figure 2: UML model for an HDF5 attribute and its associated dataspace and datatype.

2.1 To create and write a new attribute

Creating an attribute is similar to creating a dataset. To create an attribute, the application must specify the object to which the attribute is attached, the datatype and dataspace of the attribute data, and the attribute creation property list.

The following steps are required to create and write and an HDF5 attribute:

Obtain the object identifier for the attribute's primary data object.
Define the characteristics of the attribute and specify the attribute creation property list.
- Define the datatype.
- Define the dataspace.
- Specify the attribute creation property list.
Create the attribute.
Write the attribute data (optional).
Close the attribute (and datatype, dataspace, and attribute creation property list, if necessary).
Close the primary data object (if appropriate).

2.2 To open and read/write an existing attribute

The following steps are required to open and read/write an existing attribute. Since HDF5 attributes allow no partial I/O, you need specify only the attribute and the attribute's memory datatype to read it:

Obtain the object identifier for the attribute's primary data object.
Obtain the attribute's name or index.
Open the attribute.
- Get attribute dataspace and datatype (optional).
Specify the attribute's memory type.
Read and/or write the attribute data.
Close the attribute.
Close the primary data object (if appropriate).

3. Attribute (H5A) Function Summaries

C Function F90 Function	Purpose
`H5Acreate` `h5acreate_f`	Creates a dataset as an attribute of another group, dataset, or named datatype.
`H5Awrite` `H5awrite_f`	Writes an attribute.
`H5Aread` `h5aread_f`	Reads an attribute.
`H5Aopen_name` `h5aopen_name_f`	Opens an attribute specified by its name.
`H5Aopen_idx` `h5aopen_idx_f`	Opens the attribute specified by its index.
`H5Aclose` `h5aclose_f`	Closes the specified attribute.
`H5Aiterate` (none)	Calls a user’s function for each attribute attached to a data object.
`H5Adelete` `h5adelete_f`	Deletes an attribute.
`H5Aget_name` `h5aget_name_f`	Gets an attribute name.
`H5Aget_space` `h5aget_space_f`	Gets a copy of the dataspace for an attribute.
`H5Aget_type` `h5aget_type_f`	Gets an attribute datatype.
`H5Aget_num_attrs` `h5aget_num_attrs_f`	Determines the number of attributes attached to a data object.

4. Working with Attributes

4.1 The structure of an attribute

An attribute has two parts:

name
value(s)

HDF5 attributes are sometimes discussed as name/value pairs in the form name=value.

An attribute's name is a null-terminated ASCII character string. Each attribute attached to an object has a unique name.

The value portion of the attribute contains one or more data elements of the same datatype.

HDF5 attributes have all the characteristics of HDF5 datasets except that there is no partial I/O capability; attributes can be written and read only in full, with no subsetting.

4.2 Creating, writing, and reading attributes

If attributes are used at all in an HDF5 file, these three functions will be employed. H5Acreate and H5Awrite are used together to place the attribute in the file. If an attribute is to be used and it is not currently in memory, H5Aread generally comes into play, usually in concert with one each of the H5Aget_* and H5Aopen_* functions.

To create an attribute, call H5Acreate:: hid_t H5Acreate (hid_t loc_id, const char *name, hid_t type_id, hid_t space_id, hid_t create_plist)

loc_id identifies the object to which the attribute is to be attached, a dataset, group, or named datatype. This object, incidentally, is known as the primary data object; the attribute is a meta data object. name, type_id, space_id, and create_plist convey, respectively, the attribute's name, datatype, dataspace, and attribute creation property list. The attribute's name must be locally unique, i.e., it must be unique within the context of the object to which it is attached.

H5Acreate creates the attribute in memory; the attribute does not exist in the file until H5Awrite writes it there.

(Note that the attribute property list is currently unused. The only accepted value for create_plist is H5P_DEFAULT.)

To write or read an attribute, call H5Awrite or H5Aread, respectively:: herr_t H5Awrite (hid_t attr_id, hid_t mem_type_id, const void *buf); herr_t H5Aread (hid_t attr_id, hid_t mem_type_id, void *buf)

attri_id identifies the attribute while mem_type_id identifies the in-memory datatype of the attribute data.

H5Awrite writes the attribute data, i.e., the meta data, from the buffer buf to the file; H5Aread reads attribute data from the file into buf.

The HDF5 library converts the meta data between the in-memory datatype, mem_type_id, and the in-file datatype, defined when the attribute was created, without user intervention.

4.3 Accessing attributes by name or index

When accessing attributes, they can be identified by name or by an index value. The use of an index value makes it possible to iterate through all of the attributes associated with a given object.

To access an attribute by its name, text text H5Aopen_name:: hid_t H5Aopen_name (hid_t loc_id, const char *name)

H5Aopen_name returns an attribute identifier that can then be used by any function that must access an attribute, such as H5Aread.

Use the function H5Aget_name, described below, to determine an attribute's name. The information required to establish an index

To access an attribute by its index value, text text H5Aopen_idx:: hid_t H5Aopen_idx (hid_t loc_id, unsigned index)

To determine an attribute index value when it is not already known, you must first use the function H5Aget_num_attrs, described below, to determine the number of attributes attached to the primary object. The index values of the attributes attached to that object range from 0 through 1 less than the value returned by H5Aget_num_attrs.

H5Aopen_idx is generally used in the course of opening several attributes for later access. Use H5Aiterate, described below, if the intent is to perform the same operation on every attribute attached to an object.

4.4 Obtaining information regarding an object's attributes

In the course of working with HDF5 attributes, one may need to obtain any of several pieces of information:

An attribute name
The dataspace of an attribute
The datatype of an attribute
The number of attributes attached to an object

To obtain an attribute's name, call H5Aget_name with an attribute identifier, attr_id:: ssize_t H5Aget_name (hid_t attr_id, size_t buf_size, char *buf)

As with other attribut functions, attri_id identifies the attribute. buf is the buffer to which the attribute's name will be read; buf_size defines the size of that buffer.

If the length of the attribute name, and hence the value required for buf_size, is unknown, a first call to H5Aget_name will return that size. If the value of buf_size used in that first call is too small, the name will simply be truncated in buf. A second H5Aget_name call can then be used to retrieve the name in an appropriately-sized buffer.

To determine the dataspace or datatype of an attribute, call H5Aget_space or H5Aget_type, respectively:: hid_t H5Aget_space (hid_t attr_id); hid_t H5Aget_type (hid_t attr_id)

H5Aget_space returns the dataspace identifier for the attribute attr_id.

H5Aget_type returns the datatype identifier for the attribute attr_id.

To determine the number or attributes attached to an object, call H5Aget_num_attrs:: int H5Aget_num_attrs (hid_t loc_id)

H5Aget_num_attrs returns the of number attributes attached to the object identified by the object identifier loc_id.

A call to H5Aget_num_attrs is generally the preferred first step in determining attribute index values. If the call to H5Aget_num_attrs returns N, the attributes attached to the object loc_id have index values of 0 through N-1

4.5 Iterating across an object's attributes

It is sometimes useful to be able to perform the identical operation across all of the objects attached to an object. At the simplest level, you might just want to open each attribute; at a higher level, you might wish to perform a rather complex operation on each attribute as you iterate across the set.

To iterate an operation across the attributes attached to an object, one must make a series of calls to H5Aiterate:: herr_t H5Aiterate (hid_t loc_id, unsigned *index, H5A_operator_t op_func, void *op_data)

H5Aiterate successively marches across all of the attributes attached to the object specified in loc_id, performing the operation(s) specified in op_func with the data specified in op_data on each attribute.

When H5Aiterate is called, index contains the index of the attribute to be accessed in this call; when H5Aiterate returns, index will contain the index of the next attribute. If the returned index is the null pointer, then all attributes have been processed and the iterative process is complete.

op_func is a user-defined operation that adheres to the H5A_operator_t prototype. This prototype and certain requirements imposed on the operator's behavior are described in the H5Aiterate entry in the HDF5 Reference Manual.

op_data is also user-defined to meet the requirements of op_func. Beyond providing a parameter with which to pass this data, HDF5 provides no tools for its management and imposes no restrictions.

4.6 Deleting an attribute

Once an attribute has outlived its usefulness or, for whatever reason, is no longer appropriate, it may become necessary to delete it.

To delete an attribute, call H5Adelete:: herr_t H5Adelete (hid_t loc_id, const char *name)

H5Adelete removes the attribute name from the group, dataset, or named datatype specified in loc_id.

H5Adelete must not be called if there are any open attribute identifiers on the object loc_id. Such a call can cause the internal attribute indexes to change; future writes to an open attribute would then produce unintended results.

4.7 Closing an attribute

As is the case with all HDF5 objects, once access to an attribute it is no longer needed, that attribute must be closed. It is best practice to close it as soon as practicable; it is mandatory that it be closed prior to the H5close call closing the HDF5 library.

To close an attribute, call H5Aclose:: herr_t H5Aclose (hid_t attr_id)

H5Aclose closes the specified attribute by terminating access to its identifier, attr_id.

Further use of attr_id is illegal; any function employing it will fail.

5. Special Issues

Model for a large or shared HDF5 attribute

Figure 3: A large or shared HDF5 attribute and its associated dataset(s). DatasetA is an attribute of Dataset1 that may have been too large to store as an attribute. It is associated with Dataset1 by means of an object reference pointer attached as an attribute to Dataset1. Such an attribute can be shared among multiple datasets by means of addtional object reference pointers attached to addtional datasets.

Large attributes

Attributes are intended to be small objects. A large dataset intended as meta data for another dataset can be stored as a supplemental dataset. An attribute would then be attached to the original dataset indicating the relationship as an object reference pointer. This approach is illustrated in Figure 3.

How small is small and how large is large are not defined by the library; it is left to the user's interpretation. (In considering attributes and size, the HDF5 development team has considered attributes to be up to 16K, but this has never been set as a design or implementation limit.)

Shared attributes

Attributes written and managed through the H5A interface cannot be shared. If shared attributes are required, they must be handled in the manner described above for large attributes and illustrated in Figure 3.

Attribute names

While an attribute name may include any valid ASCII characters, including blanks, it is generally wise to keep readability issues in mind. In C, the name must be terminated with a null character, \0.

No special I/O or storage

HDF5 attributes have all the characteristics of HDF5 datasets except the following:

Attributes are written and read only in full; there is no provision for partial I/O or subsetting.
No special storage capability is provided for attributes; there is no compression or chunking and attributes are not extendable.

Chapter 7HDF5 Attributes