An Overview of the Scientific Data Browser (SDB)
What is the SDB? An Overview
NCSA’s Scientific Data Browser provides easy fast access to a wide range of scientific data via the Web, facilitating data sharing between science teams and the general public. Science teams can download data files from a large data provider, augment this "mini-archive" with their own datasets, and make the whole heterogeneous data collection accessible via the Web. The SDB is a Common Gateway Interface (CGI) program that serves scientific data in a variety of scientific file formats. Since scientific datasets are typically large the SDB provides a means for users to identify the files of interest to them before downloading the entire dataset. Users can interactively browse the contents of a file, examine a subset of the data, view a thumbnail image of the data, and download the desired subset. The SDB can be coupled with services for extracting, indexing, and searching metadata to create a complete repository service. One such system, the Hughes/NCSA DIAL service, permits users to query a catalog, locate the files of interest, and then browse and retrieve the file.
The DIAL system is a low cost small scale Web based client/server package to provide access to collections of science data. The initial implementation is tailored toward Earth science data, specifically toward handling NASA's Earth Observing System (EOS) data. The DIAL repository service consists of tools for extracting and ingesting metadata into a SQL database, a java JDBC [9] query interface to the SQL database, a java and a HTML based client graphical user interface that is used with a Web browser, and the scientific data server.DIAL's SDB is full featured scientific data server; it returns subsets and subsamples of the data and enables users to browse, examine, select and view thumbnail images of the data in the file. The user may also retrieve the file, or a desired subset of the file, in it's original file format or in ASCII. For example, the SDB displays a Hierarchical Data Format (HDF) [10] by showing the metadata describing the file's global attributes, multidimensional arrays, and table data.
The delivery of subsets of data is another desirable capability of a scientific data server. Generally, users need to locate a particular part or parts of a complex datasets, rather than the entire dataset. For instance, the user may need data from a small region within a large geographic area, perhaps for selected instruments or time periods. It is wasteful and impractical to transfer many multi-megabyte datasets over the network, in order to select part of the data from a few of the files.
THE SDB , How it works
The SDB is a Common Gateway Interface (CGI) program written in the C programming language. The SDB development team has worked to keep the SDB extensible and modular in design so that the SDB can be modified to read new file formats. NCSA's prototype SDB reads the following scientific file formats: HDF, netCDF, Common Data Format (CDF), Flexible Image Transport System (FITS) [13,4]. The SDB serves scientific data by calling the appropriate file access library (HDF, CDF, etc) visualizing the data if necessary, and then formatting the results in HTML to present to the user via the Web. Figure 1 illustrates the current internal architecture of the SDB and how the file access libraries are modularized. Libsdbutil.a contains utilities and HTML presentation functions used by the SDB file format libraries.
Figure 1: SDB Internal Architecture
Currently, modifying the SDB to read any new file format is a fairly significant programming task, largely due to the fact that each file format defines it's own interface for accessing standard scientific data structures, such as multi-dimensional arrays. Each file access library presents it's own API to the base SDB code and much of the SDB data extraction, visualization, and presentation code is duplicated with each new additional file format module.
ACKNOWLEDGEMENTS
This work was funded by a cooperative agreement between NASA and the University of Illinois,and by Hughes STX. The following individuals contributed their time through creative discussion and computer programming: George Velamparampil, Mike Folk (NCSA), Robert E. McGrath (NCSA), Ramachandran Suresh (Hughes STX), Doug Ilg (Hughes STX), Liping Di (Hughes STX), Khoa Doan (Hughes STX).
REFERENCES
An Experimental Data Server, Data and Information Access Link (DIAL),
Hughes STX, http://hops.stx.com:8080/dialhome.html
The HDF WWW Scientific Data Browser(CGI), http://opus.ncsa.uiuc.edu:4321/
HDF-FITS Conversion Page,
http://hdf/fits/index.html
Hierarchical Data Format, HDF,
http://hdf.ncsa.uiuc.edu
The Common Data Format (CDF),
http://nssdc.gsfc.nasa.gov/cdf/
CDF Browser, http://hdf/projects.html
FITS, Flexible Image Transport System,
http://www.gsfc.nasa.gov/astro/fits/fits_home.html
EOSDIS Information Architecture, HDF-EOS, Heirarchical Data Format for the Earth Observing System
http://spsosun.gsfc.nasa.gov//InfoArch.html
National Center for Supercomputing Application ALIGN="CENTER">University of Illinois ALIGN="CENTER">Urbana-Champaign, Illinois , USA ALIGN="CENTER"> nyeager@ncsa.uiuc.edu