BioHDF version 0.3 alpha
Scalable NGS Data Storage Based on HDF5
Data Structures | Typedefs | Enumerations | Functions
bioh5g_reads

Represents NGS reads (FASTQ, etc. More...

Data Structures

struct  bioh5g_read_data
 Read data container. More...

Typedefs

typedef struct _bioh5g_reads * bioh5g_reads
 BioHDF reads collection handle.
typedef struct
_bioh5g_reads_iterator * 
bioh5g_reads_iterator
 BioHDF reads iterator handle.
typedef struct
_bioh5g_reads_properties * 
bioh5g_reads_properties
 BioHDF reads creation properties.

Enumerations

enum  bioh5g_reads_type { BASE_SPACE, COLOR_SPACE }
 Describes the data "space" of the reads (base, color, etc.) More...
enum  bioh5g_reads_format { FASTQ_FORMAT, FASTA_FORMAT }
 The text file format for read I/O. More...

Functions

BIOHDF_API biohdf_error BIOH5Gcheck_reads_presence (biohdf_file file, const char *path, int *presence)
 Test if a reads collection exists.
BIOHDF_API biohdf_error BIOH5Gcreate_reads_collection (biohdf_file file, bioh5g_reads_properties properties, const char *path, bioh5g_reads *reads)
 Create (and open) a new reads collection.
BIOHDF_API biohdf_error BIOH5Gopen_reads_collection (biohdf_file file, const char *path, biohdf_open_mode mode, bioh5g_reads *reads)
 Open an existing reads collection.
BIOHDF_API biohdf_error BIOH5Gclose_reads_collection (bioh5g_reads *reads)
 Close an open reads collection.
BIOHDF_API biohdf_error BIOH5Gget_reads_count (const bioh5g_reads reads, int64_t *count)
 Get the number of stored reads in a collection.
BIOHDF_API biohdf_error BIOH5Gcreate_reads_iterator (const bioh5g_reads reads, bioh5g_reads_iterator *iter)
 Create an iterator for a reads collection.
BIOHDF_API biohdf_error BIOH5Gdestroy_reads_iterator (bioh5g_reads_iterator *iter)
 Destroy an iterator for a reads collection.
BIOHDF_API biohdf_error BIOH5Gadd_read (const bioh5g_reads reads, const bioh5g_read_data *data)
 Add a read to a collection.
BIOHDF_API biohdf_error BIOH5Gget_index_of_last_added_read (const bioh5g_reads reads, int64_t *index)
 Get the index of the last read that was added.
BIOHDF_API biohdf_error BIOH5Gget_next_read (bioh5g_reads_iterator iter, int64_t *index, bioh5g_read_data **data)
 Get the next read from a reads collection.
BIOHDF_API biohdf_error BIOH5Gget_read (const bioh5g_reads reads, int64_t index, bioh5g_read_data **data)
 Given a read index, get the read from a reads collection.
BIOHDF_API biohdf_error BIOH5Gfree_read (bioh5g_read_data **data)
 Free read data that has been obtained from the library.
BIOHDF_API biohdf_error BIOH5Gcreate_read_string (const bioh5g_read_data *read, bioh5g_reads_format format, char **read_string)
 Create a read string in a given format (FASTQ, etc.)
BIOHDF_API biohdf_error BIOH5Gwrite_read_to_stream (const bioh5g_read_data *read, bioh5g_reads_format format, FILE *stream)
 Output a read to a stream in a given format.

Functions: Data Accessors

BIOHDF_API biohdf_error BIOH5Gcreate_read_data (bioh5g_read_data **data)
 Create a read data container.
BIOHDF_API biohdf_error BIOH5Gget_read_identifier (bioh5g_read_data *data, char **identifier)
 Get the read identifier.
BIOHDF_API biohdf_error BIOH5Gset_read_identifier (bioh5g_read_data *data, char *identifier)
 Set the read identifier.
BIOHDF_API biohdf_error BIOH5Gget_read_sequence (bioh5g_read_data *data, char **sequence)
 Get the read sequence.
BIOHDF_API biohdf_error BIOH5Gset_read_sequence (bioh5g_read_data *data, char *sequence)
 Set the read sequence.
BIOHDF_API biohdf_error BIOH5Gget_read_quality_values (bioh5g_read_data *data, char **quality_values)
 Get the read quality values.
BIOHDF_API biohdf_error BIOH5Gset_read_quality_values (bioh5g_read_data *data, char *quality_values)
 Set the read quality values.

Functions: Collection Creation Properties

BIOHDF_API biohdf_error BIOH5Gcreate_reads_properties (bioh5g_reads_properties *props)
 Create a reads properties container.
BIOHDF_API biohdf_error BIOH5Gdestroy_reads_properties (bioh5g_reads_properties *props)
 Destroy a reads properties container.
BIOHDF_API biohdf_error BIOH5Gset_reads_properties_reads_type (bioh5g_reads_properties props, bioh5g_reads_type reads_type)
 Set reads properties reads type.
BIOHDF_API biohdf_error BIOH5Gset_reads_properties_chunk_size (bioh5g_reads_properties props, int64_t chunk_size)
 Set reads properties chunk size.
BIOHDF_API biohdf_error BIOH5Gset_reads_properties_compression_level (bioh5g_reads_properties props, compression_level level)
 Set reads properties compression level.
BIOHDF_API biohdf_error BIOH5Gset_reads_properties_sequences_scheme (bioh5g_reads_properties props, biohdf_string_storage_scheme scheme)
 Set reads properties sequences storage scheme.
BIOHDF_API biohdf_error BIOH5Gset_reads_properties_identifiers_scheme (bioh5g_reads_properties props, biohdf_string_storage_scheme scheme)
 Set reads properties identifiers storage scheme.
BIOHDF_API biohdf_error BIOH5Gset_reads_properties_sequences_length (bioh5g_reads_properties props, size_t length)
 Set reads properties sequences length.
BIOHDF_API biohdf_error BIOH5Gset_reads_properties_identifiers_length (bioh5g_reads_properties props, size_t length)
 Set reads properties identifiers length.

Detailed Description

Represents NGS reads (FASTQ, etc.

entries).


Enumeration Type Documentation

The text file format for read I/O.

BioHDF stores the FASTQ quality values verbatim and does no attempt to convert or normalize them between quality score schemes.

Enumerator:
FASTQ_FORMAT 

FASTQ format.

FASTA_FORMAT 

FASTA (or CSFASTA) format.

Describes the data "space" of the reads (base, color, etc.)

Reads are stored verbatim and not converted between spaces.

Enumerator:
BASE_SPACE 

Base/sequence space reads.

COLOR_SPACE 

Color space (SOLiD) reads.


Function Documentation

BIOHDF_API biohdf_error BIOH5Gadd_read ( const bioh5g_reads  reads,
const bioh5g_read_data data 
)

Add a read to a collection.

Parameters:
readsA BioHDF reads handle
dataA BioHDF read
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gcheck_reads_presence ( biohdf_file  file,
const char *  path,
int *  presence 
)

Test if a reads collection exists.

This function will return TRUE if a collection of the same type exists at the named location. If any other HDF5 or BioHDF object with that same name exists, TRUE will be returned as well as an error code, the assumption being that Bad Code(tm) that does not check return values will be more likely to attempt to open code (and fail), rather than create things (which my partially succeed, making a mess).

Parameters:
fileA BioHDF file handle
pathThe BioHDF path to the collection
[out]presenceTRUE if the collection exists, FALSE if it does not.
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gclose_reads_collection ( bioh5g_reads reads)

Close an open reads collection.

This function will set the collection handle to NULL after freeing it.

Parameters:
[in,out]readsA BioHDF reads handle
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gcreate_read_string ( const bioh5g_read_data read,
bioh5g_reads_format  format,
char **  read_string 
)

Create a read string in a given format (FASTQ, etc.)

The read sequence and identifier must not be NULL. If the format is FASTQ, the quality values must not be NULL. The sequence and quality values must also have the same length for FASTQ output.

Parameters:
readThe BioHDF read
formatThe format for the output string
[out]read_stringThe output string
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gcreate_reads_collection ( biohdf_file  file,
bioh5g_reads_properties  properties,
const char *  path,
bioh5g_reads reads 
)

Create (and open) a new reads collection.

The collection handle returned by this function will be ready to accept I/O.

Parameters:
fileA BioHDF file handle
propertiesCollection creation properties
pathThe BioHDF path to the collection
[out]readsA BioHDF reads handle
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gcreate_reads_iterator ( const bioh5g_reads  reads,
bioh5g_reads_iterator iter 
)

Create an iterator for a reads collection.

Parameters:
readsA BioHDF reads handle
[out]iterAn iterator for a reads collection
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gdestroy_reads_iterator ( bioh5g_reads_iterator iter)

Destroy an iterator for a reads collection.

The iterator is set to NULL as a part of deletion.

Parameters:
[in,out]iterAn iterator for a reads collection
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure
BIOHDF_API biohdf_error BIOH5Gfree_read ( bioh5g_read_data **  data)

Free read data that has been obtained from the library.

The data is set to NULL as a part of deletion.

Parameters:
[in,out]dataThe BioHDF read
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure
BIOHDF_API biohdf_error BIOH5Gget_index_of_last_added_read ( const bioh5g_reads  reads,
int64_t *  index 
)

Get the index of the last read that was added.

Useful for adding SAM lines where a link to a given read must be created.

Parameters:
readsA BioHDF reads handle
[out]indexThe index of the last read that was added
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gget_next_read ( bioh5g_reads_iterator  iter,
int64_t *  index,
bioh5g_read_data **  data 
)

Get the next read from a reads collection.

When the iterator has finished traversing the collection, data will be NULL, index will be -1 and the return value will be BIOHDF_NO_ERROR.

Parameters:
iterAn iterator for a reads collection
[out]indexThe index of this read
[out]dataThe BioHDF read
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gget_read ( const bioh5g_reads  reads,
int64_t  index,
bioh5g_read_data **  data 
)

Given a read index, get the read from a reads collection.

Parameters:
readsA BioHDF reads handle
indexThe index of this read
[out]dataThe BioHDF read
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gget_reads_count ( const bioh5g_reads  reads,
int64_t *  count 
)

Get the number of stored reads in a collection.

Parameters:
readsA BioHDF reads handle
[out]countThe number of reads in the collection
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gopen_reads_collection ( biohdf_file  file,
const char *  path,
biohdf_open_mode  mode,
bioh5g_reads reads 
)

Open an existing reads collection.

Parameters:
fileA BioHDF file handle
pathThe BioHDF path to the collection
modeThe access mode (read-only | read-write)
[out]readsA BioHDF reads handle
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gwrite_read_to_stream ( const bioh5g_read_data read,
bioh5g_reads_format  format,
FILE *  stream 
)

Output a read to a stream in a given format.

This saves you from having to create temp strings that will just be dumped to a stream.

The read sequence and identifier must not be NULL. If the format is FASTQ, the quality values must not be NULL. The sequence and quality values must also have the same length for FASTQ output.

Parameters:
readThe BioHDF read
formatThe format for the output string
streamThe output stream (can be STDOUT)
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

 All Data Structures Variables