[Top] [Prev] [Next] [Bottom]

3.2 The SD Scientific Data Set Data Model

In HDF, any multi-dimensional array qualifies as a scientific data set or SDS if it's associated with a dimension record and a data type. In addition to providing a framework for storing arrays of arbitrary dimensions and data type, the SDS data model supports dimension scales, user-defined attributes and predefined attributes. (See Figure 3a.)

FIGURE 3a The Contents of a Three-Dimensional SD Scientific Data Set

Scientific data sets consist of required and optional objects. The required objects will be covered first. Please note that in this chapter the terms "SDS" and "data set" are used interchangeably.

3.2.1 Required SD SDS Objects

Every SD scientific data set must contain three objects. These objects include the SDS array, data type and dimension records. Required objects are automatically created from the information provided at the time the SDS is defined.

3.2.1.1 SDS Array

An SDS array is an n-dimensional data structure that serves as the basic building block of an SDS. When an SDS array is created, the number and size of the dimensions that define its shape are specified, as is its data type. SDS arrays are conceptually equivalent to variables in the netCDF data model.

3.2.1.2 SDS Array Name

An SDS array has an SDS name consisting of a sequence of case-sensitive alphanumeric characters. A name can be assigned to an SDS by a calling program, but if a name is not provided by the calling program one will be assigned by the HDF library. Names are assigned when the data set is created and cannot be changed. SDS names do not have to be unique within a file, but if they are not it can be difficult to distinguish between the scientific data sets in the file.

3.2.1.3 Data Type

The standard data types supported by the SD API are 32- and 64-bit floating-point numbers, 8-, 16- and 32-bit signed integers and 8-, 16- and 32-bit unsigned integers. The SD interface also includes a routine that allows SD data sets with variable bit lengths (1 to 32 bits) to be created.

Before writing an SDS to a file, HDF normally converts its elements from the native format of the host machine to a standard HDF format. The standard representations used by HDF for floating-point numbers are the IEEE 32- and 64-bit floating-point formats. For integers, HDF uses big- endian byte ordering. For signed integers HDF uses twos-complement representation. Converting to and from the standard formats can result in low-order inaccuracies in the data. For example, data converted from 64-bit to 32-bit floating-point representation is accurate to about 10-7.

Sometimes users prefer not to have their data automatically converted, either because the conversion slows down processing or because it introduces intolerable inaccuracies. For those instances, HDF provides a "native format" option, whereby numbers are stored "as is" in the file and are tagged accordingly. HDF also provides a "little-endian" option to suppress any rearranging of byte ordering from little- to big-endian. This is primarily for users of Intel-based machines who do not want to incur the cost of reordering data when writing to an HDF file. Because HDF does not support direct conversion between many machine architectures, using a native format can diminish the portability of HDF files. However, note that direct conversions are supported between little-endian and all other byte-order formats supported by HDF.

3.2.1.4 Dimensions

SDS dimensions specify the shape and size of an SDS array. The number of dimensions of an array is known as the rank of the array. Dimension names are not treated in the same way as array names. For example, if a name assigned to a dimension was previously assigned to another dimension the SD interface treats both dimensions as the same data object and any changes made to one will be reflected in the other. The size of a dimension is a positive integer.

Also, one dimension of an SDS array can be assigned the predefined size SD_UNLIMITED. This dimension is referred to as an unlimited dimension - which, as the name suggests, can grow to any length.

3.2.2 Optional SD SDS Objects

There are two types of optional objects available for inclusion in an SDS: dimension scales and attributes. Attributes are either predefined or defined by the user. Optional objects are only created when specifically requested by the calling program.

Dimension Scales

A dimension scale is a sequence of numbers placed along a dimension to demarcate intervals along it. Dimension scales are described in Section 3.9 on page 71.

User-Defined Attributes

Attributes are alphanumeric strings describing the nature and/or intended usage of the file, SDS or dimension they're attached to. User-defined attributes are attributes defined by the calling program containing auxiliary information about a file, SDS array or dimension. They are more fully described in Section 3.10 on page 80.

Predefined Attributes

Predefined attributes are attributes that have reserved labels and in some cases predefined data types. Predefined attributes are useful because they establish conventions that applications can depend on. They are further described in Section 3.11 on page 87.

3.2.3 Annotations and the SD Data Model

With the expansion of the SD interface to include user-defined attributes, annotations should no longer be used in conjunction with scientific data sets. In fact, metadata once stored as an annotation is now more conveniently stored as an attribute. However, to insure backward compatibility with scientific data sets and applications relying on annotations, the DFAN annotation API, described in Chapter 10, titled Annotations (DFAN API), can be used to annotate SDSs. There is no cross-compatibility between attributes and annotations; creating one does not automatically create the other.



[Top] [Prev] [Next] [Bottom]

hdfhelp@ncsa.uiuc.edu
HDF User's Guide - 06/04/97, NCSA HDF Development Group.