Peter Cao, The HDF Group
Mike Folk, The HDF Group
Nahil Sobh, NCSA
Paul Ricker, NCSA
- - August 2006
It is common to apply simple data reduction operations and filters to large data, in which the resulting data is small relative to the original data. When the data is stored remotely, the cost of moving data to a local platform for such analysis can be very high. A great deal of time can often be saved by performing these operations on the data in place.
An application, "HDF5-on-SRB 1," addresses this problem by enabling a client to access objects in-place in HDF5 files stored remotely in an SRB. The success of the HDF5-SRB project has prompted the NCSA and SDSC to take the project to the second phase: supporting selected applications in the Strategic Applications Program (SAP). FLASH 2 is one of the SAP projects selected for initial implementation.
FLASH is a modular, portable, highly scalable, adaptive-mesh simulation code for astrophysical hydrodynamics problems. FLASH Cosmology 3 is a set of FLASH modules for cosmological applications, particularly the simulation of a large-scale structure.
There are two challenges in the FLASH cosmology simulations.
- First, due to the large scale structure, these simulations require a supercomputer to run, and generate terabytes of data 4.
- Second, accessing even a small subset of the data may takes hours because the whole data file has to be brought into the local machine.
For example, a command line tool, FLASH slice extractor developed by Paul Ricker at NCSA, retrieves a specific 2D slice from a 3D AMR (adaptive mesh refinement) data file from FLASH simulations.
Figure 1 shows an example of such a slice of AMR data. This tool requires the data file to be at local machine. In the following proposed work, we would like to extend this tool to use the HDF5-SRB model to retrieve part of a data file, such as a slice, directly from where the data is generated.
This project will offer the following major benefits to FLASH community:
- 1. No need for large storage on local machine. Terabytes of data files sit only on the sever machine
- 2. Instant access to the data and metadata. Since only the selected data/metadata are transferred to the client instead of the whole multi-terabyte file, time on the network is tremendously reduced.
- 3. Data sharing among scientists. Since data sits on the server machine, it can be shared with scientists without multiple local copies of the same file.
Also, everyone will have access to the updated data after a new simulation run.
* The project is sponsored by the NCSA/SDSC-led Cyber-Infrastructure Partnership (CIP) and the National Laboratory for Advanced Data Research (NLADR), NFS PACI project in support of NCSA-SDSC collaboration
Figure 1 -- Slice (gas temperature) and isosurface (gas density) in a 5123 particle/mesh simulation
The goal of this project is to use the HDF5-on-SRB for performing simple operations from a remote client on large Flash files stored in a remote SRB, and in so doing develop a framework to provide similar capabilities to other applications.
If successful, this will enable users to query and analyze Flash files without having to incur the latency and storage costs associated with fetching and storing the files locally.
- The project has the following tasks:
- 1. Improve the current dataset function in the HDF5-SRB model to support FLASH AMR type of data selection. The dataset function will be able to slice through a FLASH AMR mesh on the server and return a uniform mesh to the client. A command line client utility similar to the current FLASH slice extractor will be implemented to support remote access of mesh selection.
Figure 2 explains the basic architecture of the FLASH mesh selection.
- 2. Create a package of other utilities with similar remote-invocation capabilities. This task will need to be scoped out based on the knowledge gained from the first task. Candidate utilities include the following capabilities:
- a. Query and return simple statistics, such as maximum and minimum without bring the whole dataset to the client.
- b. Extract and return particle information (metadata). Although one already can query basic information on datasets such as dimension, datatype and attributes, FLASH specific information such as particle information is not supported in the current HDF5-SRB model.
- c. Analyze a line segment inside a particular domain, interpolate points within the domain, and return the resultant values.
- d. Support other formats such as png and ASCII text instead of raw bytes as results of a data selection.
- 3. Create an extensible framework for adding tools that accommodates the FLASH utilities and can be adapted to other applications.
- 4. (Possibly) integrate utilities as appropriate to the visualization pipeline. Nahil Sobh and Peter Cao will work on more details on this task
Figure 2 -- A simplified view of the HDF5-SRB-FLASH model
This project will be divided into three phases: Phase I, Phase II and Phase III. Phase I and part of phase II are required. Phase III and part of phase II are optional, depending on the duration and amount of the funding. The time estimates are based on an half (50%) FTE.
Phase I: a command line client utility and server function to support selection of FLASH AMR mesh data.
|Collaboration between NCSA and SDSC||Duration of the project|
|Exploring the current tool||10/16/06||10/27/06|
|Implementing server function||10/30/06||12/22/06|
|Implementing client utility||12/25/06||2/16/07|
|Testing with FLASH AMR mesh data||2/19/07||3/16/07|
|Deploying to evaluation server||3/19/07||4/13/07|
|Documentation and release||4/19/07||5/11/07|
Phase II: enhanced utilities querying FLASH specific metadata and statistics, and performing data interpolation and simple mathematic manipulation
|Collaboration between NCSA and SDSC||Duration of the project|
|Query and return simple statistics||5/14/07||8/17/07|
|Extract and return particle information||6/25/07||8/17/07|
|Analyze and interpolate line segment||8/20/07||10/12/07|
|Add support for png and ASCII text||10/15/07||11/23/07|
Phase III: development of general framework that can be adapted to other applications. We need to scope out the milestone for this phase based on the experience of phase I and II.
THINGS TO MOVE FORWARD FOR THE CURRENT HDF5-SRB PROJECT
- Improve GUI component in HDFView to open remote files (NCSA)
- Support primitive writing capabilities for HDF5 files in SRB (NCSA and SDSC)
- Write documentation and extensive testing (NCSA and SDSC)
- Release the production system and publish documentation (NCSA and SDSC)