Sample Perl wrappers to HDF5 were created to illustrate how one might store genomic sequence data in HDF5, and to engage the bioinformatics community in these investigations.
The software distribution contains two modules: HDFPerl (wrappers) and BioHDF_Perl (high level APIs).
HDFPerl. Wrappers for a subset of the HDF5 functions have been developed to provide a simple Perl interface to HDF5.
BioHDF_Perl. A second Perl API has been implemented to illustrate how one might import genomic sequence data from FASTA format files into the HDF5 format. This API also creates indexes in HDF5 that allow limited search operations on data.
Performance Study: HDF5 vs. FASTA
A performance study was conducted in which we compare HDF5 with the FASTA format in terms of (a) storage use and (b) time to access genomic sequence data using traditional text-management tools for FASTA and BioHDF_Perl for HDF5. Results show that HDF5 can provide storage efficiency through its use of compression and still allow fast random access through its ability to store indexes along with compressed, chunked data.