public abstract class Dataset extends HObject
This class provides two convenient functions, read()/write(), to read/write data values. Reading/writing data may take many library calls if we use the library APIs directly. The read() and write functions hide all the details of these calls from users.
For more details on dataset, see HDF5 User's Guide
ScalarDS
,
CompoundDS
,
Serialized FormModifier and Type | Field and Description |
---|---|
protected long[] |
chunkSize
The array of dimension sizes for a chunk.
|
protected String |
compression
The compression information.
|
static String |
compression_gzip_txt |
protected boolean |
convertByteToString
Flag to indicate if the byte[] array is converted to strings
|
protected Object |
convertedBuf
The array that holds the converted data of unsigned C-type integers.
|
protected Object |
data
The memory buffer that holds the raw data of the dataset.
|
protected Datatype |
datatype
The datatype object of the dataset.
|
protected String[] |
dimNames
Array of strings that represent the dimension names.
|
protected long[] |
dims
The current dimension sizes of the dataset
|
protected boolean |
enumConverted
Flag to indicate if the enum data is converted to strings.
|
protected String |
filters
The filters information.
|
protected boolean |
isDataLoaded
Flag to indicate if data values are loaded into memory.
|
protected long[] |
maxDims
The max dimension sizes of the dataset
|
protected long |
nPoints
The number of data points in the memory buffer.
|
protected Object |
originalBuf
The data buffer that contains the raw data directly reading from file
(before any data conversion).
|
protected int |
rank
The number of dimensions of the dataset.
|
protected long[] |
selectedDims
Array that contains the number of data points selected (for read/write)
in each dimension.
|
protected int[] |
selectedIndex
Array that contains the indices of the dimensions selected for display.
|
protected long[] |
selectedStride
The number of elements to move from the start location in each dimension.
|
protected long[] |
startDims
The starting position of each dimension of a selected subset.
|
protected String |
storage
The storage information.
|
protected String |
storage_layout
The storage layout information.
|
fileFormat, linkTargetObjName, oid, separator
Constructor and Description |
---|
Dataset(FileFormat theFile,
String name,
String path)
Constructs a Dataset object with a given file, name and path.
|
Dataset(FileFormat theFile,
String name,
String path,
long[] oid)
Deprecated.
Not for public use in the future.
Using Dataset(FileFormat, String, String) |
Modifier and Type | Method and Description |
---|---|
static String[] |
byteToString(byte[] bytes,
int length)
Converts an array of bytes into an array of Strings for a fixed string
dataset.
|
void |
clear()
Clears memory held by the dataset, such as the data buffer.
|
void |
clearData()
Clears the current data buffer in memory and forces the next read() to load
the data from file.
|
static Object |
convertFromUnsignedC(Object data_in)
Deprecated.
Not for public use in the future.
Using convertFromUnsignedC(Object, Object) |
static Object |
convertFromUnsignedC(Object data_in,
Object data_out)
Converts one-dimension array of unsigned C-type integers to a new array
of appropriate Java integer in memory.
|
static Object |
convertToUnsignedC(Object data_in)
Deprecated.
Not for public use in the future.
Using convertToUnsignedC(Object, Object) |
static Object |
convertToUnsignedC(Object data_in,
Object data_out)
Converts the array of converted unsigned integers back to unsigned C-type
integer data in memory.
|
abstract Dataset |
copy(Group pgroup,
String name,
long[] dims,
Object data)
Creates a new dataset and writes the data buffer to the new dataset.
|
long[] |
getChunkSize()
Returns the array that contains the dimension sizes of the chunk of the
dataset.
|
String |
getCompression()
Returns the string representation of compression information.
|
boolean |
getConvertByteToString()
Returns the flag that indicates if a byte array is converted to a string
array.
|
Object |
getData()
Returns the data buffer of the dataset in memory.
|
abstract Datatype |
getDatatype()
Returns the datatype object of the dataset.
|
String[] |
getDimNames()
Returns the array of strings that represent the dimension names.
|
long[] |
getDims()
Returns the array that contains the dimension sizes of the dataset.
|
String |
getFilters()
Returns the string representation of filter information.
|
int |
getHeight()
Returns the dimension size of the vertical axis.
|
long[] |
getMaxDims()
Returns the array that contains the max dimension sizes of the dataset.
|
Class |
getOriginalClass()
Get Class of the original data buffer if converted.
|
int |
getRank()
Returns the rank (number of dimensions) of the dataset.
|
long[] |
getSelectedDims()
Returns the dimension sizes of the selected subset.
|
int[] |
getSelectedIndex()
Returns the indices of display order.
|
int |
getSize(int tid)
Returns the size in bytes of a given datatype.
|
long[] |
getStartDims()
Returns the starting position of a selected subset.
|
String |
getStorage()
Returns the string representation of storage information.
|
String |
getStorageLayout()
Returns the string representation of storage layout information.
|
long[] |
getStride()
Returns the selectedStride of the selected dataset.
|
int |
getWidth()
Returns the dimension size of the horizontal axis.
|
abstract void |
init()
Retrieves datatype and dataspace information from file and sets the
dataset in memory.
|
boolean |
isEnumConverted()
Get flag that indicate if enum data is converted to strings.
|
boolean |
isString(int tid)
Checks if a given datatype is a string.
|
abstract Object |
read()
Reads the data from file.
|
abstract byte[] |
readBytes()
Reads the raw data of the dataset from file to a byte array.
|
void |
setConvertByteToString(boolean b)
Sets the flag that indicates if a byte array is converted to a string
array.
|
void |
setData(Object d)
Deprecated.
Not for public use in the future.
setData() is not safe to use because it changes memory buffer of the dataset object. Dataset operations such as write/read will fail if the buffer type or size is changed. |
void |
setEnumConverted(boolean b)
Set flag that indicate if enum data is converted to strings.
|
static byte[] |
stringToByte(String[] strings,
int length)
Converts a string array into an array of bytes for a fixed string
dataset.
|
void |
write()
Writes the memory buffer of this dataset to file.
|
abstract void |
write(Object buf)
Writes a memory buffer to the dataset in file.
|
close, debug, equals, equalsOID, getFID, getFile, getFileFormat, getFullName, getLinkTargetObjName, getName, getOID, getPath, open, setLinkTargetObjName, setName, setPath, toString
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
getMetadata, hasAttribute, removeMetadata, updateMetadata, writeMetadata
protected int rank
protected long[] dims
protected long[] maxDims
protected long[] selectedDims
The selected size must be less than or equal to the current dimension size. A subset of a rectangle selection is defined by the starting position and selected sizes.
For example, if a 4 X 5 dataset is as follows:
0, 1, 2, 3, 4 10, 11, 12, 13, 14 20, 21, 22, 23, 24 30, 31, 32, 33, 34 long[] dims = {4, 5}; long[] startDims = {1, 2}; long[] selectedDims = {3, 3}; then the following subset is selected by the startDims and selectedDims above: 12, 13, 14 22, 23, 24 32, 33, 34
protected long[] startDims
protected final int[] selectedIndex
selectedIndex[] is provided for two purposes:
protected long[] selectedStride
protected long[] chunkSize
protected String compression
public static final String compression_gzip_txt
protected String storage_layout
protected String[] dimNames
protected boolean convertByteToString
protected boolean isDataLoaded
protected long nPoints
protected Object originalBuf
protected Object convertedBuf
For example, Suppose that the original data is an array of unsigned 16-bit short integers. Since Java does not support unsigned integer, the data is converted to an array of 32-bit singed integer. In that case, the converted buffer is the array of 32-bit singed integer.
protected boolean enumConverted
public Dataset(FileFormat theFile, String name, String path)
theFile
- the file that contains the dataset.name
- the name of the Dataset, e.g. "dset1".path
- the full group path of this Dataset, e.g. "/arrays/".@Deprecated public Dataset(FileFormat theFile, String name, String path, long[] oid)
Dataset(FileFormat, String, String)
theFile
- the file that contains the dataset.name
- the name of the Dataset, e.g. "dset1".path
- the full group path of this Dataset, e.g. "/arrays/".oid
- the oid of this Dataset.public void clear()
public abstract void init()
The init() is designed to support lazy operation in a dataset object. When a data object is retrieved from file, the datatype, dataspace and raw data are not loaded into memory. When it is asked to read the raw data from file, init() is first called to get the datatype and dataspace information, then load the raw data from file.
init() is also used to reset the selection of a dataset (start, stride and count) to the default, which is the entire dataset for 1D or 2D datasets. In the following example, init() at step 1) retrieves datatype and dataspace information from file. getData() at step 3) reads only one data point. init() at step 4) resets the selection to the whole dataset. getData() at step 4) reads the values of whole dataset into memory.
dset = (Dataset) file.get(NAME_DATASET); // 1) get datatype and dataspace information from file dset.init(); rank = dset.getRank(); // rank = 2, a 2D dataset count = dset.getSelectedDims(); start = dset.getStartDims(); dims = dset.getDims(); // 2) select only one data point for (int i = 0; i < rank; i++) { start[0] = 0; count[i] = 1; } // 3) read one data point data = dset.getData(); // 4) reset selection to the whole dataset dset.init(); // 5) clean the memory data buffer dset.clearData(); // 6) Read the whole dataset data = dset.getData();
public final int getRank()
public final long[] getDims()
public final long[] getMaxDims()
public final long[] getSelectedDims()
The SelectedDims is the number of data points of the selected subset. Applications can use this array to change the size of selected subset. The selected size must be less than or equal to the current dimension size. Combined with the starting position, selected sizes and stride, the subset of a rectangle selection is fully defined.
For example, if a 4 X 5 dataset is as follows:
0, 1, 2, 3, 4 10, 11, 12, 13, 14 20, 21, 22, 23, 24 30, 31, 32, 33, 34 long[] dims = {4, 5}; long[] startDims = {1, 2}; long[] selectedDims = {3, 3}; long[] selectedStride = {1, 1}; then the following subset is selected by the startDims and selectedDims 12, 13, 14 22, 23, 24 32, 33, 34
public final long[] getStartDims()
Applications can use this array to change the starting position of a selection. Combined with the selected dimensions, selected sizes and stride, the subset of a rectangle selection is fully defined.
For example, if a 4 X 5 dataset is as follows:
0, 1, 2, 3, 4 10, 11, 12, 13, 14 20, 21, 22, 23, 24 30, 31, 32, 33, 34 long[] dims = {4, 5}; long[] startDims = {1, 2}; long[] selectedDims = {3, 3}; long[] selectedStride = {1, 1}; then the following subset is selected by the startDims and selectedDims 12, 13, 14 22, 23, 24 32, 33, 34
public final long[] getStride()
Applications can use this array to change how many elements to move in each dimension. Combined with the starting position and selected sizes, the subset of a rectangle selection is defined.
For example, if a 4 X 5 dataset is as follows:
0, 1, 2, 3, 4 10, 11, 12, 13, 14 20, 21, 22, 23, 24 30, 31, 32, 33, 34 long[] dims = {4, 5}; long[] startDims = {0, 0}; long[] selectedDims = {2, 2}; long[] selectedStride = {2, 3}; then the following subset is selected by the startDims and selectedDims 0, 3 20, 23
public final void setConvertByteToString(boolean b)
In a string dataset, the raw data from file is stored in a byte array. By default, this byte array is converted to an array of strings. For a large dataset (e.g. more than one million strings), the conversion takes a long time and requires a lot of memory space to store the strings. In some applications, such a conversion can be delayed. For example, A GUI application may convert only the part of the strings that is visible to the users, not the entire data array.
setConvertByteToString(boolean b) allows users to set the flag so that applications can choose to perform the byte-to-string conversion or not. If the flag is set to false, the getData() returns an array of byte instead of an array of strings.
b
- convert bytes to strings if b is true; otherwise, if false, do
not convert bytes to strings.public final boolean getConvertByteToString()
public abstract Object read() throws Exception, OutOfMemoryError
read() reads the data from file to a memory buffer and returns the memory buffer. The dataset object does not hold the memory buffer. To store the memory buffer in the dataset object, one must call getData().
By default, the whole dataset is read into memory. Users can also select a subset to read. Subsetting is done in an implicit way.
How to Select a Subset
A selection is specified by three arrays: start, stride and count.
The following example shows how to make a subset. In the example, the
dataset is a 4-dimensional array of [200][100][50][10], i.e. dims[0]=200;
dims[1]=100; dims[2]=50; dims[3]=10;
We want to select every other data point in dims[1] and dims[2]
int rank = dataset.getRank(); // number of dimensions of the dataset long[] dims = dataset.getDims(); // the dimension sizes of the dataset long[] selected = dataset.getSelectedDims(); // the selected size of the dataset long[] start = dataset.getStartDims(); // the offset of the selection long[] stride = dataset.getStride(); // the stride of the dataset int[] selectedIndex = dataset.getSelectedIndex(); // the selected dimensions for display // select dim1 and dim2 as 2D data for display, and slice through dim0 selectedIndex[0] = 1; selectedIndex[1] = 2; selectedIndex[1] = 0; // reset the selection arrays for (int i = 0; i < rank; i++) { start[i] = 0; selected[i] = 1; stride[i] = 1; } // set stride to 2 on dim1 and dim2 so that every other data point is // selected. stride[1] = 2; stride[2] = 2; // set the selection size of dim1 and dim2 selected[1] = dims[1] / stride[1]; selected[2] = dims[1] / stride[2]; // when dataset.getData() is called, the selection above will be used since // the dimension arrays are passed by reference. Changes of these arrays // outside the dataset object directly change the values of these array // in the dataset object.
For ScalarDS, the memory data buffer is a one-dimensional array of byte, short, int, float, double or String type based on the datatype of the dataset.
For CompoundDS, the memory data object is an java.util.List object. Each element of the list is a data array that corresponds to a compound field.
For example, if compound dataset "comp" has the following nested structure, and member datatypes
comp --> m01 (int) comp --> m02 (float) comp --> nest1 --> m11 (char) comp --> nest1 --> m12 (String) comp --> nest1 --> nest2 --> m21 (long) comp --> nest1 --> nest2 --> m22 (double)getData() returns a list of six arrays: {int[], float[], char[], String[], long[] and double[]}.
Exception
- if object can not be readOutOfMemoryError
- if memory is exhaustedgetData()
public abstract byte[] readBytes() throws Exception
readBytes() reads raw data to an array of bytes instead of array of its datatype. For example, for a one-dimension 32-bit integer dataset of size 5, readBytes() returns a byte array of size 20 instead of an int array of 5.
readBytes() can be used to copy data from one dataset to another efficiently because the raw data is not converted to its native type, it saves memory space and CPU time.
Exception
- if data can not be readpublic abstract void write(Object buf) throws Exception
buf
- the data to writeException
- if data can not be writtenpublic final void write() throws Exception
Exception
- if buffer can not be writtenpublic abstract Dataset copy(Group pgroup, String name, long[] dims, Object data) throws Exception
This function allows applications to create a new dataset for a given data buffer. For example, users can select a specific interesting part from a large image and create a new image with the selection.
The new dataset retains the datatype and dataset creation properties of this dataset.
pgroup
- the group which the dataset is copied to.name
- the name of the new dataset.dims
- the dimension sizes of the the new dataset.data
- the data values of the subset to be copied.Exception
- if dataset can not be copiedpublic abstract Datatype getDatatype()
public final Object getData() throws Exception, OutOfMemoryError
If data is already loaded into memory, returns the data; otherwise, calls read() to read data from file into a memory buffer and returns the memory buffer.
By default, the whole dataset is read into memory. Users can also select a subset to read. Subsetting is done in an implicit way.
How to Select a Subset
A selection is specified by three arrays: start, stride and count.
The following example shows how to make a subset. In the example, the
dataset is a 4-dimensional array of [200][100][50][10], i.e. dims[0]=200;
dims[1]=100; dims[2]=50; dims[3]=10;
We want to select every other data point in dims[1] and dims[2]
int rank = dataset.getRank(); // number of dimensions of the dataset long[] dims = dataset.getDims(); // the dimension sizes of the dataset long[] selected = dataset.getSelectedDims(); // the selected size of the dataet long[] start = dataset.getStartDims(); // the offset of the selection long[] stride = dataset.getStride(); // the stride of the dataset int[] selectedIndex = dataset.getSelectedIndex(); // the selected dimensions for display // select dim1 and dim2 as 2D data for display,and slice through dim0 selectedIndex[0] = 1; selectedIndex[1] = 2; selectedIndex[1] = 0; // reset the selection arrays for (int i = 0; i < rank; i++) { start[i] = 0; selected[i] = 1; stride[i] = 1; } // set stride to 2 on dim1 and dim2 so that every other data point is // selected. stride[1] = 2; stride[2] = 2; // set the selection size of dim1 and dim2 selected[1] = dims[1] / stride[1]; selected[2] = dims[1] / stride[2]; // when dataset.getData() is called, the selection above will be used since // the dimension arrays are passed by reference. Changes of these arrays // outside the dataset object directly change the values of these array // in the dataset object.
For ScalarDS, the memory data buffer is a one-dimensional array of byte, short, int, float, double or String type based on the datatype of the dataset.
For CompoundDS, the memory data object is an java.util.List object. Each element of the list is a data array that corresponds to a compound field.
For example, if compound dataset "comp" has the following nested structure, and member datatypes
comp --> m01 (int) comp --> m02 (float) comp --> nest1 --> m11 (char) comp --> nest1 --> m12 (String) comp --> nest1 --> nest2 --> m21 (long) comp --> nest1 --> nest2 --> m22 (double)getData() returns a list of six arrays: {int[], float[], char[], String[], long[] and double[]}.
Exception
- if object can not be readOutOfMemoryError
- if memory is exhausted@Deprecated public final void setData(Object d)
setData() is not safe to use because it changes memory buffer of the dataset object. Dataset operations such as write/read will fail if the buffer type or size is changed.
d
- the object datapublic void clearData()
The function read() loads data from file into memory only if the data is not read. If data is already in memory, read() just returns the memory buffer. Sometimes we want to force read() to re-read data from file. For example, when the selection is changed, we need to re-read the data.
public final int getHeight()
This function is used by GUI applications such as HDFView. GUI applications display a dataset in a 2D table or 2D image. The display order is specified by the index array of selectedIndex as follow:
int[] selectedIndex = dataset.getSelectedIndex(); selectedIndex[0] = 0; selectedIndex[1] = 1;
getSelectedIndex()
,
getWidth()
public final int getWidth()
This function is used by GUI applications such as HDFView. GUI applications display a dataset in 2D Table or 2D Image. The display order is specified by the index array of selectedIndex as follow:
int[] selectedIndex = dataset.getSelectedIndex(); selectedIndex[0] = 0; selectedIndex[1] = 1;
getSelectedIndex()
,
getHeight()
public final int[] getSelectedIndex()
selectedIndex[] is provided for two purposes:
For example, for a four dimension dataset, if selectedIndex[] = {1, 2, 3}, then dim[1] is selected as row index, dim[2] is selected as column index and dim[3] is selected as depth index.
public final String getCompression()
For example, "SZIP: Pixels per block = 8: H5Z_FILTER_CONFIG_DECODE_ENABLED".
public final String getFilters()
public final String getStorageLayout()
public final String getStorage()
public final long[] getChunkSize()
@Deprecated public static Object convertFromUnsignedC(Object data_in)
convertFromUnsignedC(Object, Object)
data_in
- the object datapublic static Object convertFromUnsignedC(Object data_in, Object data_out)
Since Java does not support unsigned integer, values of unsigned C-type integers must be converted into its appropriate Java integer. Otherwise, the data value will not displayed correctly. For example, if an unsigned C byte, x = 200, is stored into an Java byte y, y will be -56 instead of the correct value of 200.
Unsigned C integers are upgrade to Java integers according to the following table:
Unsigned C Integer | JAVA Integer |
unsigned byte | signed short |
unsigned short | signed int |
unsigned int | signed long |
unsigned long | signed long |
If memory data of unsigned integers is converted by convertFromUnsignedC(), convertToUnsignedC() must be called to convert the data back to unsigned C before data is written into file.
data_in
- the input 1D array of the unsigned C-type integers.data_out
- the output converted (or upgraded) 1D array of Java integers.convertToUnsignedC(Object, Object)
@Deprecated public static Object convertToUnsignedC(Object data_in)
convertToUnsignedC(Object, Object)
data_in
- the input 1D array of the unsigned C-type integers.public static Object convertToUnsignedC(Object data_in, Object data_out)
If memory data of unsigned integers is converted by convertFromUnsignedC(), convertToUnsignedC() must be called to convert the data back to unsigned C before data is written into file.
data_in
- the input array of the Java integer.data_out
- the output array of the unsigned C-type integer.convertFromUnsignedC(Object, Object)
public static final String[] byteToString(byte[] bytes, int length)
A C-string is an array of chars while an Java String is an object. When a string dataset is read into a Java application, the data is stored in an array of Java bytes. byteToString() is used to convert the array of bytes into an array of Java strings so that applications can display and modify the data content.
For example, the content of a two element C string dataset is {"ABC", "abc"}. Java applications will read the data into a byte array of {65, 66, 67, 97, 98, 99). byteToString(bytes, 3) returns an array of Java String of strs[0]="ABC", and strs[1]="abc".
If memory data of strings is converted to Java Strings, stringToByte() must be called to convert the memory data back to byte array before data is written to file.
bytes
- the array of bytes to convert.length
- the length of string.stringToByte(String[], int)
public static final byte[] stringToByte(String[] strings, int length)
If memory data of strings is converted to Java Strings, stringToByte() must be called to convert the memory data back to byte array before data is written to file.
strings
- the array of string.length
- the length of string.byteToString(byte[] bytes, int length)
public final String[] getDimNames()
Some datasets have pre-defined names for each dimension such as "Latitude" and "Longitude". getDimNames() returns these pre-defined names.
public boolean isString(int tid)
tid
- The data type identifier.public int getSize(int tid)
tid
- The data type identifier.public final Class getOriginalClass()
public boolean isEnumConverted()
public void setEnumConverted(boolean b)
b
- the enumConverted to setCopyright © 2017. All Rights Reserved.