hdf images hdf images

HDF5-SRB - Users Guide


Peter Cao, The HDF Group

Mike Wan, SDSC

September, 2005

The HDF GROUP
1901 S 1st St Ste C-2
Champaign, IL , 61820-7406
Phone: 217-840-7815

SDSC
9500 Gilman Drive, MC 0505
La Jolla, CA 92093-0505
Phone :858-534-5000


Also available as a download in pdf/doc format

    

Table of Contents


1. Introduction

2. The HDF-SRB Model

3 Client and Server Applications


Introduction

This document provides information about the HDF-SRB system, and instructions on how to use it.

The HDF-SRB model is a client-server model that provides interactive and efficient access to remote HDF5 files. Like most client/server model, the HDF-SRB model is implemented on a set of client and server APIs and message passing scheme. Unlike other client/server model, the HDF-SRB model is object oriented. The client can access datasets or subsets of datasets in large files without bringing entire files into local machines.

Storing massive data presents two big challenges: management of distributed data systems and efficient access to complex data types. The NCSA Hierarchical Data Format (HDF) and the SDSC Storage Resource Broker (SRB) have addressed the two issues. The SRB is client-server middleware (or grid data software) that provides a uniform interface and authorization mechanism to access heterogeneous data resources (UNIX FS, HPSS, UniTree, DBMS, etc.) distributed on multiple hosts and diverse platforms. The HDF is a file format and software library for storing all kinds of data, simple integers and floats or complex users defined compound data types. The HDF employs a common data model with standard library APIs, providing efficient data storage and I/O access.

The HDF and the SRB offer valuable and complementary data management services, but they have not previously been integrated in an effective way. Earlier work had the SRB accessing HDF data either (a) by extracting entire HDF files, or (b) by extracting byte-streams through the SRB's POSIX interface. Approach (a) fails to take advantage of HDF's ability to offer interactive and efficient access to complex collections of objects. Approach (b) has been shown to be far too low-level to perform reasonably for some data extraction operations.

In discussions between NCSA and SDSC, it has been determined that a more effective approach is possible, one that uses modified HDF APIs on the server side to extract data from large files at the instruction of client-side HDF APIs and SRB as middleware to transfer data between the server and client. This approach would insert the HDF library and other object-level HDF-based libraries (such as HDF-EOS) between the SRB and a data storage source (such as a file system), making it possible to extract objects, rather than files or byte streams. Furthermore, these libraries typically offer query, subsetting, sub-sampling, and other object-level operations, so that these services might also be available.

The HDF-SRB was funded and Sponsored by The National Laboratory for Advanced Data Research (NLADR), The National Science Foundation (NFS ) Partnerships for Advanced Computational Infrastructure (PACI) Project in Support of an NCSA-SDSC Collaboration. The first phase of the project was to provide a working prototype of the HDF-SRB system. For more information about the project, please read the project report at NCSA - SRB Reports

1.1 Overview of SRB

This section gives a overview of SRB. For deails, visit the SRB

What is SRB

The SDSC Storage Resource Broker (SRB) is client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and accessing replicated data sets. SRB, in conjunction with the Metadata Catalog (MCAT), provides a way to access data sets and resources based on their attributes and/or logical names rather than their names or physical locations. SRB provides:

The SRB has been used to implement data grids for data sharing across multiple resources, digital libraries (to support collection-based management of distributed data), and persistent archives (to manage technology evolution). The SRB is in widespread use, supporting collections that have up to 25 million data objects.

SRB architecture

The Storage Resource Broker (SRB) is a middleware that provides distributed clients with uniform access to diverse storage resources in a heterogeneous computing Environment.



Figure 1 -- A simplified view of the SRB middleware

Figure 1 gives a simplified view of the SRB architecture. The model consists of three components: the meta data catalog (MCAT) service, SRB servers and SRB clients, connected to each other via network.

The MCAT stores meta data associated with data sets, users and resources managed by the SRB. The MCAT server handles requests from the SRB servers. These requests include information queries as well as instructions for meta data creation and update.

Client applications are provided with a set of API for sending requests and receiving response to/from the SRB servers. The SRB server is responsible for carrying out tasks to satisfy the client requests. These tasks include interacting with the MCAT service, and performing I/O on behalf of the clients. A client uses the same common API to access every storage systems managed by the SRB. The complex tasks of interacting with various types of storage system and OS/hardware architecture, are handled by the SRB server.

1.2 Overview of HDF5

This section gives a brief overview of HDF5. For more details, visit the HDF website

What is HDF?

The Hierarchical Data Format (HDF) is a file format for storing and managing scientific data. There are two basic versions of HDF: HDF4 and HDF5. HDF4 is the earlier version and HDF5 is the new version. The two versions are incompatible. HDF4 has limit of 20,000 number of objects and 2 gigabytes in file size. The HDF5 has more improved features and performance. The HDF-SRB system supports HDF5 only.

HDF5 file structure

HDF5 files are organized in a hierarchical structure, with two primary structures: groups and datasets.

Working with groups and group members is similar in many ways to working with directories and files in UNIX. As with UNIX directories and files, objects in an HDF5 file are often described by giving their full (or absolute) path names. For example,

/ -- signifies the root group.
/foo -- signifies a member of the root group called foo.
/foo/zoo -- signifies a member of the group foo, which in turn is a member of the root group.

Any HDF5 group or dataset may have an associated attribute list. HDF5 attributes are small named datasets that are attached to primary datasets, groups, or named datatypes. Attributes can be used to describe the nature and/or the intended usage of a dataset or group.


Figure 2 -- Structure of an example HDF5 file

2. The HDF-SRB Model

The HDF and SRB technologies deal with data management in different aspect. The HDF emphasis on efficient data access and complicate data operations on file and SRB focus on data distribution and storage. The goal of the HDF-SRB model is to bring these two technologies together to create client/server system with efficient data access.

The HDF-SRB model is implemented on a set of standard client/server APIs and data structures. The data structures represent the HDF5 objects such as groups, datasets and attributes. Information of client requests and server results are encapsulated in the data structures and packed into SRB message. A set of client/server APIs is defined and used to transfer the SRB message between the client and server.

2.1 Design requirements

The following is a list of design requirements applied to the HDF-SRB model.