h5repack
Introduction
With h5repack, you can write an HDF5 file to a new file and change the layout for objects in the new file.
Usage
h5repack [OPTIONS] file1 file2
- file1 Input HDF5 File
- file2 Output HDF5 File
Error Report Option
- –enable-error-stack Prints messages from the HDF5 error stack as they occur. Optional value 2 also prints file open errors, –enable-error-stack=2.
Options
- –help Print a usage message and exit
- –verbose=N Verbose mode, print object information. N - is an integer greater than 1, 2 displays read/write timing
- –version Print the library version number and exit
- –native Use a native HDF5 type when repacking
- –page-buffer-size=N Set the page buffer cache size, N=non-negative integers
- –src-vol-value Value (ID) of the VOL connector to use for opening the input HDF5 file specified
- –src-vol-name Name of the VOL connector to use for opening the input HDF5 file specified
- –src-vol-info VOL-specific info to pass to the VOL connector used for opening the input HDF5 file specified
- –dst-vol-value Value (ID) of the VOL connector to use for opening the output HDF5 file specified
- –dst-vol-name Name of the VOL connector to use for opening the output HDF5 file specified
- –dst-vol-info VOL-specific info to pass to the VOL connector used for opening the output HDF5 file specified
- –src-vfd-value Value (ID) of the VFL driver to use for opening the input HDF5 file specified
- –src-vfd-name Name of the VFL driver to use for opening the input HDF5 file specified
- –src-vfd-info VFD-specific info to pass to the VFL driver used for opening the input HDF5 file specified
- –dst-vfd-value Value (ID) of the VFL driver to use for opening the output HDF5 file specified
- –dst-vfd-name Name of the VFL driver to use for opening the output HDF5 file specified
- –dst-vfd-info VFD-specific info to pass to the VFL driver used for opening the output HDF5 file specified
- –latest Use latest version of file format This option will take precedence over the options –low and –high
- –low=BOUND The low bound for library release versions to use when creating objects in the file (default is H5F_LIBVER_EARLIEST)
- –high=BOUND The high bound for library release versions to use when creating objects in the file (default is H5F_LIBVER_LATEST)
- –merge Follow external soft link recursively and merge data
- –prune Do not follow external soft links and remove link
- –merge –prune Follow external link, merge data and remove dangling link
- –compact=L1 Maximum number of links in header messages
- –indexed=L2 Minimum number of links in the indexed format
- –ssize=S[:F] Shared object header message minimum size
- –minimum=M Do not apply the filter to datasets smaller than M
- –file=E Name of file E with the –file and –layout options
- –ublock=U Name of file U with user block data to be added
- –block=B Size of user block to be added
- –metadata_block_size=A Metadata block size for H5Pset_meta_block_size
- –threshold=T Threshold value for H5Pset_alignment
- –alignment=A Alignment value for H5Pset_alignment
- –sort_by=Q Sort groups and attributes by index Q
- –sort_order=Z Sort groups and attributes by order Z
- –filter=FILT Filter type
- –layout=LAYT Layout type
- –fs_strategy=FS_STRATEGY File space management strategy for H5Pset_file_space_strategy
- –fs_persist=FS_PERSIST Persisting or not persisting free-space for H5Pset_file_space_strategy
- –fs_threshold=FS_THRESHOLD : Free-space section threshold for H5Pset_file_space_strategy
- –fs_pagesize=FS_PAGESIZE File space page size for H5Pset_file_space_page_size
Arguments to Certain Options
- M - is an integer greater than 1, size of dataset in bytes (default is 0)
- E - is a filename.
- S - is an integer
- U - is a filename.
- T - is an integer
- A - is an integer greater than zero
- Q - is the sort index type for the input file. It can be "name" or "creation_order" (default)
- Z - is the sort order type for the input file. It can be "descending" or "ascending" (default)
- B - is the user block size, any value that is 512 or greater and is a power of 2 (1024 default)
- F - is the shared object header message type, any of <dspace|dtype|fill| pline|attr>. If F is not specified, S applies to all messages
Library Version Bounds
BOUND is an integer indicating the library release versions to use when creating objects in the file (see H5Pset_libver_bounds()):
File Strategy Settings
FS_STRATEGY is a string indicating the file space strategy used:
- FSM_AGGR The mechanisms used in managing file space are free-space managers, aggregators and virtual file driver.
- PAGE The mechanisms used in managing file space are free-space managers with embedded paged aggregation and virtual file driver.
- AGGR The mechanisms used in managing file space are aggregators and virtual file driver.
- NONE The mechanisms used in managing file space are virtual file driver.
- The default strategy when not set is FSM_AGGR without persisting free-space.
- FS_PERSIST is 1 for persisting free-space or 0 for not persisting free-space. The default when not set is not persisting free-space. The value is ignored for AGGR and NONE strategies.
- FS_THRESHOLD is the minimum size (in bytes) of free-space sections to be tracked by the library. The default when not set is 1. The value is ignored for AGGR and NONE strategies.
- FS_PAGESIZE is the size (in bytes) >=512 that is used by the library when the file space strategy PAGE is used. The default when not set is 4096.
Applying a Third-party Filter
FILT - is a string with the format:
- <objects list>:<name of filter>=<filter parameters>
- <objects list> is a comma separated list of object names, meaning apply compression only to those objects. If no names are specified, the filter is applied to all objects
- <name of filter> can be:
-
GZIP to apply the HDF5 GZIP filter (GZIP compression)
-
SZIP to apply the HDF5 SZIP filter (SZIP compression)
-
SHUF to apply the HDF5 shuffle filter
-
FLET to apply the HDF5 checksum filter
-
NBIT to apply the HDF5 NBIT filter (NBIT compression)
-
SOFF to apply the HDF5 Scale/Offset filter
-
UD to apply a user defined filter
-
NONE to remove all filters
- <filter parameters> is optional filter parameter information
-
GZIP=<deflation level> from 1-9
-
SZIP=<pixels per block,coding> pixels per block is a even number in 2-32 and coding method is either EC or NN
-
SHUF (no parameter)
-
FLET (no parameter)
-
NBIT (no parameter)
-
SOFF=<scale_factor,scale_type> scale_factor is an integer and scale_type is either IN or DS
-
UD=<filter_number,filter_flag,cd_value_count,value1[,value2,...,valueN]>
-
Required values filter_number, filter_flag, cd_value_count, value1
-
Optional values value2 to valueN
-
filter_flag 1 is OPTIONAL or 0 is MANDATORY
-
NONE (no parameter)
Layout Settings
LAYT - is a string with the format:
- <objects list>:<layout type>=<layout parameters>
- <objects list> is a comma separated list of object names, meaning that layout information is supplied for those objects. If no names are specified, the layout type is applied to all objects
- <layout type> can be:
-
CHUNK to apply chunking layout
-
COMPA to apply compact layout
-
CONTI to apply contiguous layout
- <layout parameters> is optional layout information
-
CHUNK=DIM[xDIM...xDIM], the chunk size of each dimension
-
COMPA (no parameter)
-
CONTI (no parameter)
NOTE
The environment variable H5TOOLS_BUFSIZE can be set to the number of MBs to change the default hyperslab buffer size from 32MB.
setenv H5TOOLS_BUFSIZE=64 to double the hyperslab buffer.
Usage Examples
- 1) h5repack –verbose –filter=GZIP=1 file1 file2
GZIP compression with level 1 to all objects
- 2) h5repack –verbose –filter=dset1:SZIP=8,NN file1 file2
SZIP compression with 8 pixels per block and NN coding method to object dset1
- 3) h5repack –verbose –layout=dset1,dset2:CHUNK=20x10 –filter=dset3,dset4,dset5:NONE file1 file2
Chunked layout, with a layout size of 20x10, to objects dset1 and dset2
and remove filters to objects dset3, dset4, dset5
- 4) h5repack –latest –compact=10 –ssize=20:dtype file1 file2
Using latest file format with maximum compact group size of 10 and
minimum shared datatype size of 20
- 5) h5repack –filter=SHUF –filter=GZIP=1 file1 file2
Add both filters SHUF and GZIP in this order to all datasets
- 6) h5repack –filter=UD=307,0,1,9 file1 file2
Add bzip2 filter to all datasets
- 7) h5repack –low=0 –high=1 file1 file2
Set low=H5F_LIBVER_EARLIEST and high=H5F_LIBVER_V18 via
H5Pset_libver_bounds() when creating the repacked file, file2