ncsa.hdf.object
Class Dataset

java.lang.Object
  extended byncsa.hdf.object.HObject
      extended byncsa.hdf.object.Dataset
All Implemented Interfaces:
DataFormat, java.io.Serializable
Direct Known Subclasses:
CompoundDS, ScalarDS

public abstract class Dataset
extends HObject

The abstract class includes general information of a dataset object such as datatype and dimensions, and common operation on the dataset such as read/write data values.

Dataset has two subclasses: ScalarDS and CompoundDS. A ScalarDS is a multiple dimension array of atomic data such as int, float, char, etc.

How to Select a Subset

Dataset defines APIs for read, write and subet a dataset. No function is defined to select a subset of a data array. The selection is done in an implicit way. Function calls to dimension information such as getSelectedDims() return an array of dimension values, which is a reference to the array in the dataset object. Changes of the array outside the dataset object directly change the values of the array in the dataset object. It is like pointers in C.

The following is an example of how to make a subset. In the example, the dataset is a 4-dimension with size of [200][100][50][10], i.e. dims[0]=200; dims[1]=100; dims[2]=50; dims[3]=10;
We want to select every other data points in dims[1] and dims[2]

     int rank = dataset.getRank();   // number of dimension of the dataset
     long[] dims = dataset.getDims(); // the dimension sizes of the dataset
     long[] selected = dataset.getSelectedDims(); // the selected size of the dataet
     long[] start = dataset.getStartDims(); // the off set of the selection
     long[] stride = dataset.getStride(); // the stride of the dataset
     int[]  selectedIndex = dataset.getSelectedIndex(); // the selected dimensions for display

     // select dim1 and dim2 as 2D data for display,and slice through dim0
     selectedIndex[0] = 1;
     selectedIndex[1] = 2;
     selectedIndex[1] = 0;

     // reset the selection arrays
     for (int i=0; i

 

See Also:
ScalarDS, CompoundDS, Serialized Form

Field Summary
protected  long[] chunkSize
          The chunk size of each dimension
protected  java.lang.String compression
          Compression level.
protected  java.lang.Object data
          The buff which holds the content of this dataset.
protected  Datatype datatype
          the datatype of this dataset.
protected  java.lang.String[] dimNames
          names of dimensions
protected  long[] dims
          The current dimension sizes of this dataset
static java.lang.String H5Z_FILTER_CONFIG_DECODE_ENABLED
           
static java.lang.String H5Z_FILTER_CONFIG_ENCODE_ENABLED
           
protected  int rank
          The rank of this dataset.
protected  long[] selectedDims
          The number of data points of each dimension of the selected subset.
protected  int[] selectedIndex
          Indices of selected dimensions.
protected  long[] selectedStride
          The number of elements to move from the start location in each dimension.
protected  long[] startDims
          The starting position of each dimension of the selected subset.
 
Fields inherited from class ncsa.hdf.object.HObject
hasAttribute, oid, separator
 
Constructor Summary
Dataset(FileFormat fileFormat, java.lang.String name, java.lang.String path)
           
Dataset(FileFormat fileFormat, java.lang.String name, java.lang.String path, long[] oid)
          Creates a Dataset object with a given file and dataset name and path.
 
Method Summary
static java.lang.String[] byteToString(byte[] bytes, int length)
          Converts an array of bytes into an array of String.
 void clearData()
          Removes the data value of this dataset in memory.
static java.lang.Object convertFromUnsignedC(java.lang.Object data_in)
          convert one-dimension array of unsigned C integer to appropriate Java integer.
static java.lang.Object convertToUnsignedC(java.lang.Object data_in)
          convert Java integer data back to unsigned C integer data.
abstract  Dataset copy(Group pgroup, java.lang.String name, long[] dims, java.lang.Object data)
          Copy this dataset to another group.
 long[] getChunkSize()
          Returns the chunk sizes.
 java.lang.String getCompression()
          Return the compression level.
 java.lang.Object getData()
          If data is loaded into memory, returns the data value, otherwise load the data value into memory and returns the data value.
abstract  Datatype getDatatype()
          returns the datatype of this dataset.
 java.lang.String[] getDimNames()
          returns the names of all dimensions
 long[] getDims()
          Returns the current dimension size of this dataset.
 int getHeight()
          Returns the height of the dataset.
 int getRank()
          Returns the rank of this dataset.
 long[] getSelectedDims()
          Returns the dimension size of the selected subset.
 int[] getSelectedIndex()
          Indices of the selected dimensions.
 long[] getStartDims()
          Returns the starting position of the selected subset.
 long[] getStride()
          Returns the selectedStride of the selected dataset.
 int getWidth()
          Returns the width of the dataset.
abstract  void init()
          Initializes the dataset such as dimension size of this dataset.
abstract  java.lang.Object read()
          Loads and returns the data value from file.
abstract  byte[] readBytes()
          Read data values of this dataset into byte array.
 void setData(java.lang.Object d)
           
static byte[] stringToByte(java.lang.String[] strings, int length)
          Converts a string array into an array of bytes.
 void write()
          Write the internal data values of this dataset from memory into file.
abstract  void write(java.lang.Object buf)
          Write data values into file.
 
Methods inherited from class ncsa.hdf.object.HObject
close, equalsOID, getFID, getFile, getFileFormat, getName, getOID, getPath, hasAttribute, open, setName, setPath, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface ncsa.hdf.object.DataFormat
getMetadata, removeMetadata, writeMetadata
 

Field Detail

H5Z_FILTER_CONFIG_DECODE_ENABLED

public static final java.lang.String H5Z_FILTER_CONFIG_DECODE_ENABLED
See Also:
Constant Field Values

H5Z_FILTER_CONFIG_ENCODE_ENABLED

public static final java.lang.String H5Z_FILTER_CONFIG_ENCODE_ENABLED
See Also:
Constant Field Values

data

protected java.lang.Object data
The buff which holds the content of this dataset. The type of the data object will be defined by implementing classes.


rank

protected int rank
The rank of this dataset.


dims

protected long[] dims
The current dimension sizes of this dataset


selectedDims

protected long[] selectedDims
The number of data points of each dimension of the selected subset. The select size must be less than or equal to the current dimension size. With both the starting position and selected sizes, the subset of a rectangle selection is fully defined.

For example, a 4 X 5 data set

     0,  1,  2,  3,  4
    10, 11, 12, 13, 14
    20, 21, 22, 23, 24
    30, 31, 32, 33, 34
 long[] dims = {4, 5};
 long[] startDims = {1, 2};
 long[] selectedDims = {3, 3};

 then the following subset is selected by the startDims and selectedDims
     12, 13, 14
     22, 23, 24
     32, 33, 34


startDims

protected long[] startDims
The starting position of each dimension of the selected subset. With both the starting position and selected sizes, the subset of a rectangle selection is fully defined.


selectedIndex

protected final int[] selectedIndex
Indices of selected dimensions. selectedIndex[] is provied for two purpose:
  1. selectedIndex[] is used to indicate the order of dimensions for display. selectedIndex[0] = row, selectedIndex[1] = column and selectedIndex[2] = depth. For example, for a four dimesion dataset, if selectedIndex[] = {1, 2, 3}, then dim[1] is selected as row index, dim[2] is selected as column index and dim[3] is selected as depth index.
  2. selectedIndex[] is also used to select dimensions for display for datasets with three or more dimensions. We assume that application such as HDFView can only display data up to three dimension (2D spreadsheet/image with a third dimension which the 2D spreadsheet/image is cut from). For dataset with more than three dimensions, we need selectedIndex[] to store which three dimensions are chosen to display. For example,


selectedStride

protected long[] selectedStride
The number of elements to move from the start location in each dimension. For example, if selectedStride[0] = 2, every other data point is selected along dim[0].


chunkSize

protected long[] chunkSize
The chunk size of each dimension


compression

protected java.lang.String compression
Compression level.


datatype

protected Datatype datatype
the datatype of this dataset.


dimNames

protected java.lang.String[] dimNames
names of dimensions

Constructor Detail

Dataset

public Dataset(FileFormat fileFormat,
               java.lang.String name,
               java.lang.String path)

Dataset

public Dataset(FileFormat fileFormat,
               java.lang.String name,
               java.lang.String path,
               long[] oid)
Creates a Dataset object with a given file and dataset name and path.

Parameters:
fileFormat - the HDF file.
name - the name of this Dataset.
path - the full path of this Dataset.
oid - the unique identifier of this dataset.
Method Detail

init

public abstract void init()
Initializes the dataset such as dimension size of this dataset. Sub-classes have to replace this interface. HDF4 and HDF5 datasets call the different library to have more detailed initialization.

The Dataset is designed in a way of "ask and load". When a data object is retrieved from file, it does not load the datatype and dataspce information, and data value into memory. When it is asked to load the data, teh data object first call init() to fill the datatype and dataspace information, then load the data content.


getRank

public final int getRank()
Returns the rank of this dataset.


getDims

public final long[] getDims()
Returns the current dimension size of this dataset.


getSelectedDims

public final long[] getSelectedDims()
Returns the dimension size of the selected subset. The SelectedDims is the number of data points of the selected subset. The select size must be less than or equal to the current dimension size. With both the starting position and selected sizes, the subset of a rectangle selection is fully defined.

For example, a 4 X 5 data set

     0,  1,  2,  3,  4
    10, 11, 12, 13, 14
    20, 21, 22, 23, 24
    30, 31, 32, 33, 34
 long[] dims = {4, 5};
 long[] startDims = {1, 2};
 long[] selectedDims = {3, 3};

 then the following subset is selected by the startDims and selectedDims
     12, 13, 14
     22, 23, 24
     32, 33, 34


getStartDims

public final long[] getStartDims()
Returns the starting position of the selected subset.


getStride

public final long[] getStride()
Returns the selectedStride of the selected dataset.


read

public abstract java.lang.Object read()
                               throws java.lang.Exception,
                                      java.lang.OutOfMemoryError
Loads and returns the data value from file.

Throws:
java.lang.Exception
java.lang.OutOfMemoryError

readBytes

public abstract byte[] readBytes()
                          throws java.lang.Exception
Read data values of this dataset into byte array. readBytes() loads data as arry of bytes instead of array of its datatype. For example, for an one-dimension 32-bit integer dataset of size 5, the readBytes() returns of a byte array of size 20 instead of a int array of 5.

readBytes() is most used for copy data values, at which case, data do not need to be changed or displayed.

Throws:
java.lang.Exception

write

public abstract void write(java.lang.Object buf)
                    throws java.lang.Exception
Write data values into file.

Parameters:
buf - the data to write
Throws:
java.lang.Exception

write

public final void write()
                 throws java.lang.Exception
Write the internal data values of this dataset from memory into file.

Throws:
java.lang.Exception

copy

public abstract Dataset copy(Group pgroup,
                             java.lang.String name,
                             long[] dims,
                             java.lang.Object data)
                      throws java.lang.Exception
Copy this dataset to another group.

Parameters:
pgroup - the group which the dataset is copied to.
name - the name of the new dataset.
dims - the dimension sizes of the the new dataset.
data - the data to be copied.
Returns:
the new dataset.
Throws:
java.lang.Exception

getDatatype

public abstract Datatype getDatatype()
returns the datatype of this dataset.


getData

public final java.lang.Object getData()
                               throws java.lang.Exception,
                                      java.lang.OutOfMemoryError
If data is loaded into memory, returns the data value, otherwise load the data value into memory and returns the data value.

Throws:
java.lang.Exception
java.lang.OutOfMemoryError

setData

public final void setData(java.lang.Object d)

clearData

public void clearData()
Removes the data value of this dataset in memory.


getHeight

public final int getHeight()
Returns the height of the dataset.


getWidth

public final int getWidth()
Returns the width of the dataset.


getSelectedIndex

public final int[] getSelectedIndex()
Indices of the selected dimensions. selectedIndex[] is provied for two purpose:
  1. selectedIndex[] is used to indicate the order of dimensions for display. selectedIndex[0] = row, selectedIndex[1] = column and selectedIndex[2] = depth. For example, for a four dimesion dataset, if selectedIndex[] = {1, 2, 3}, then dim[1] is selected as row index, dim[2] is selected as column index and dim[3] is selected as depth index.
  2. selectedIndex[] is also used to select dimensions for display for datasets with three or more dimensions. We assume that application such as HDFView can only display data up to three dimension (2D spreadsheet/image with a third dimension which the 2D spreadsheet/image is cut from). For dataset with more than three dimensions, we need selectedIndex[] to store which three dimensions are chosen to display. For example,


getCompression

public final java.lang.String getCompression()
Return the compression level.


getChunkSize

public final long[] getChunkSize()
Returns the chunk sizes.


convertFromUnsignedC

public static java.lang.Object convertFromUnsignedC(java.lang.Object data_in)
convert one-dimension array of unsigned C integer to appropriate Java integer. Because Java does not support unsigned integer, Unsigned C integers must be converted into its appropriate Java integer. Otherwise, the data value will not displayed correctly. For example, if an unsigned C byte x = 200 is stored into a signed Java byte x, x is -56 instead of 200. The following table is used to map the unsigned C integer to Java integer
Mapping Unsigned C Integers to Java Integers
Unsigned C Integer JAVA Integer
unsigned byte signed short
unsigned short signed int
unsigned int signed long
unsigned long signed long

Parameters:
data_in - the input 1D array of the unsigned C.
Returns:
the converted 1D array of Java integer data.

convertToUnsignedC

public static java.lang.Object convertToUnsignedC(java.lang.Object data_in)
convert Java integer data back to unsigned C integer data. It is used when Java data converted from unsigned C is writen back to file.

Parameters:
data_in - the input Java integer to be convert.
Returns:
the converted unsigned C integer.
See Also:
convertFromUnsignedC(Object data_in)

byteToString

public static final java.lang.String[] byteToString(byte[] bytes,
                                                    int length)
Converts an array of bytes into an array of String. For example,
 byte[] bytes = {65, 66, 67, 97, 98, 99};
  String[] strs = byteToString(bytes, 3);

  The "strs" is an array of string of size two with values
  strs[0]="ABC", and strs[1]="abc";

 

Parameters:
bytes - the array of bytes
length - the length of string
Returns:
the array of string.

stringToByte

public static final byte[] stringToByte(java.lang.String[] strings,
                                        int length)
Converts a string array into an array of bytes.

Parameters:
strings - the array of string
length - the length of string
Returns:
the array of bytes.
See Also:
byteToString(byte[] bytes, int length)

getDimNames

public final java.lang.String[] getDimNames()
returns the names of all dimensions