BioHDF version 0.3 alpha
Scalable NGS Data Storage Based on HDF5
Data Structures | Defines | Typedefs | Enumerations | Functions
bioh5g_alignments

Represents NGS alignments (SAM, etc. More...

Data Structures

struct  bioh5g_alignment_data
 Alignment data container. More...

Defines

#define BIOH5G_READS_PATH_ATTR   "READS_PATH"
 Attribute name: BioHDF path to the associated reads.
#define BIOH5G_INDEX_METHOD_ATTR   "INDEX_METHOD"
 Attribute name: Describes how the alignments are indexed.

Typedefs

typedef struct _bioh5g_alignments * bioh5g_alignments
 BioHDF alignments collection handle.
typedef struct
_bioh5g_alignments_creation_properties * 
bioh5g_alignments_creation_properties
 BioHDF alignments creation properties.
typedef struct
_bioh5g_alignments_iterator * 
bioh5g_alignments_iterator
 BioHDF alignments iterator handle.

Enumerations

enum  bioh5g_alignments_format { SAM_FORMAT }
 The text file format for alignment I/O. More...
enum  bioh5g_alignments_index_method { UNINDEXED = 0, REF_POS_SECONDARY = 1 }
 The indexing scheme used on the alignments in this collection. More...

Functions

BIOHDF_API biohdf_error BIOH5Gcheck_alignments_presence (const biohdf_file file, const char *path, int *presence)
 Test if an alignments collection exists.
BIOHDF_API biohdf_error BIOH5Gcreate_alignments_collection (const biohdf_file file, const bioh5g_alignments_creation_properties props, const char *path, bioh5g_alignments *aligns)
 Create (and open) a new alignments collection.
BIOHDF_API biohdf_error BIOH5Gopen_alignments_collection (const biohdf_file file, const char *path, biohdf_open_mode mode, bioh5g_alignments *aligns)
 Open an existing alignments collection.
BIOHDF_API biohdf_error BIOH5Gclose_alignments_collection (bioh5g_alignments *aligns)
 Close an open alignments collection.
BIOHDF_API biohdf_error BIOH5Gget_reads_path (const bioh5g_alignments aligns, char **reads_path)
 Get the path to the associated reads.
BIOHDF_API biohdf_error BIOH5Gget_alignments_count (const bioh5g_alignments aligns, int64_t *count)
 Get the number of stored alignments in a collection.
BIOHDF_API biohdf_error BIOH5Gcreate_alignments_iterator (const bioh5g_alignments aligns, bioh5g_alignments_iterator *iter)
 Create an iterator for an alignments collection.
BIOHDF_API biohdf_error BIOH5Gadd_alignments_iterator_range_filter (bioh5g_alignments_iterator iter, const char *reference, int32_t start, int32_t end)
 Add a reference region filter to an alignments iterator.
BIOHDF_API biohdf_error BIOH5Gadd_alignments_iterator_mapq_filter (bioh5g_alignments_iterator iter, unsigned char min_mapq)
 Add a SAM MAPQ filter to an alignments iterator.
BIOHDF_API biohdf_error BIOH5Gadd_alignments_iterator_flags_filter (bioh5g_alignments_iterator iter, uint32_t mask)
 Add a SAM FLAGS filter to an alignments iterator.
BIOHDF_API biohdf_error BIOH5Gdestroy_alignments_iterator (bioh5g_alignments_iterator *iter)
 Destroy an iterator for an alignments collection.
BIOHDF_API biohdf_error BIOH5Gadd_alignment (const bioh5g_alignments aligns, const bioh5g_alignment_data *data)
 Add an alignment to a collection.
BIOHDF_API biohdf_error BIOH5Gget_index_of_last_added_alignment (const bioh5g_alignments aligns, int64_t *index)
 Get the index of the last alignment that was added.
BIOHDF_API biohdf_error BIOH5Gget_next_alignment (bioh5g_alignments_iterator iter, int64_t *index, bioh5g_alignment_data **data)
 Get the next alignment from an alignments collection.
BIOHDF_API biohdf_error BIOH5Gget_alignment (const bioh5g_alignments aligns, int64_t index, bioh5g_alignment_data **data)
 Given an alignment index, get the alignment from an alignment collection.
BIOHDF_API biohdf_error BIOH5Gfree_alignment_data (bioh5g_alignment_data **data)
 Free alignment data that has been obtained from the library.
BIOHDF_API biohdf_error BIOH5Gcreate_alignments_index (bioh5g_alignments aligns, bioh5g_alignments_index_method method, biohdf_index_creation_properties props)
 Create an index for an alignments collection.
BIOHDF_API biohdf_error BIOH5Gstore_alignment_file_header (const bioh5g_alignments aligns, bioh5g_alignments_format format, const char *header)
 Store a file header from an alignment file (e.g.
BIOHDF_API biohdf_error BIOH5Gget_alignment_file_header (const bioh5g_alignments aligns, bioh5g_alignments_format *format, char **header)
 Get a stored alignment file header.
BIOHDF_API biohdf_error BIOH5Gcreate_alignment_string (const bioh5g_alignment_data *alignment, const bioh5g_read_data *read, bioh5g_alignments_format format, char **alignment_string)
 Create an alignment string in a particular format.
BIOHDF_API biohdf_error BIOH5Gwrite_alignment_to_stream (const bioh5g_alignment_data *alignment, const bioh5g_read_data *read, bioh5g_alignments_format format, FILE *stream)
 Write an alignment string in a particular format to an output stream.

Functions: Data Accessors

BIOHDF_API biohdf_error BIOH5Gcreate_alignment_data (bioh5g_alignment_data **data)
BIOHDF_API biohdf_error BIOH5Gget_alignment_read_index (bioh5g_alignment_data *data, int64_t *read_index)
BIOHDF_API biohdf_error BIOH5Gset_alignment_read_index (bioh5g_alignment_data *data, int64_t read_index)
BIOHDF_API biohdf_error BIOH5Gget_alignment_reference (bioh5g_alignment_data *data, char **reference)
BIOHDF_API biohdf_error BIOH5Gset_alignment_reference (bioh5g_alignment_data *data, char *reference)
BIOHDF_API biohdf_error BIOH5Gget_alignment_position (bioh5g_alignment_data *data, int32_t *position)
BIOHDF_API biohdf_error BIOH5Gset_alignment_position (bioh5g_alignment_data *data, int32_t position)
BIOHDF_API biohdf_error BIOH5Gget_alignment_length (bioh5g_alignment_data *data, int32_t *length)
BIOHDF_API biohdf_error BIOH5Gset_alignment_length (bioh5g_alignment_data *data, int32_t length)
BIOHDF_API biohdf_error BIOH5Gget_alignment_sam_mapq (bioh5g_alignment_data *data, unsigned char *sam_mapq)
BIOHDF_API biohdf_error BIOH5Gset_alignment_sam_mapq (bioh5g_alignment_data *data, unsigned char sam_mapq)
BIOHDF_API biohdf_error BIOH5Gget_alignment_sam_flags (bioh5g_alignment_data *data, uint32_t *sam_flags)
BIOHDF_API biohdf_error BIOH5Gset_alignment_sam_flags (bioh5g_alignment_data *data, uint32_t sam_flags)
BIOHDF_API biohdf_error BIOH5Gget_alignment_sam_cigar (bioh5g_alignment_data *data, char **sam_cigar)
BIOHDF_API biohdf_error BIOH5Gset_alignment_sam_cigar (bioh5g_alignment_data *data, char *sam_cigar)
BIOHDF_API biohdf_error BIOH5Gget_alignment_sam_tags (bioh5g_alignment_data *data, char **sam_tags)
BIOHDF_API biohdf_error BIOH5Gset_alignment_sam_tags (bioh5g_alignment_data *data, char *sam_tags)
BIOHDF_API biohdf_error BIOH5Gget_alignment_sam_rnext (bioh5g_alignment_data *data, char **sam_rnext)
BIOHDF_API biohdf_error BIOH5Gset_alignment_sam_rnext (bioh5g_alignment_data *data, char *sam_rnext)
BIOHDF_API biohdf_error BIOH5Gget_alignment_sam_pnext (bioh5g_alignment_data *data, int32_t *sam_pnext)
BIOHDF_API biohdf_error BIOH5Gset_alignment_sam_pnext (bioh5g_alignment_data *data, int32_t sam_pnext)
BIOHDF_API biohdf_error BIOH5Gget_alignment_sam_tlen (bioh5g_alignment_data *data, int32_t *sam_tlen)
BIOHDF_API biohdf_error BIOH5Gset_alignment_sam_tlen (bioh5g_alignment_data *data, int32_t sam_tlen)

Functions: Collection Creation Properties

BIOHDF_API biohdf_error BIOH5Gcreate_alignments_properties (bioh5g_alignments_creation_properties *props)
BIOHDF_API biohdf_error BIOH5Gdestroy_alignments_properties (bioh5g_alignments_creation_properties *props)
BIOHDF_API biohdf_error BIOH5Gset_alignments_properties_reads_path (bioh5g_alignments_creation_properties props, char *reads_path)
BIOHDF_API biohdf_error BIOH5Gset_alignments_properties_refs_scheme (bioh5g_alignments_creation_properties props, biohdf_string_storage_scheme scheme)
BIOHDF_API biohdf_error BIOH5Gset_alignments_properties_tags_scheme (bioh5g_alignments_creation_properties props, biohdf_string_storage_scheme scheme)
BIOHDF_API biohdf_error BIOH5Gset_alignments_properties_cigar_scheme (bioh5g_alignments_creation_properties props, biohdf_string_storage_scheme scheme)
BIOHDF_API biohdf_error BIOH5Gset_alignments_properties_refs_length (bioh5g_alignments_creation_properties props, size_t length)
BIOHDF_API biohdf_error BIOH5Gset_alignments_properties_tags_length (bioh5g_alignments_creation_properties props, size_t length)
BIOHDF_API biohdf_error BIOH5Gset_alignments_properties_cigar_length (bioh5g_alignments_creation_properties props, size_t length)
BIOHDF_API biohdf_error BIOH5Gset_alignments_properties_chunk_size (bioh5g_alignments_creation_properties props, int64_t chunk_size)
BIOHDF_API biohdf_error BIOH5Gset_alignments_properties_compression_level (bioh5g_alignments_creation_properties props, compression_level level)

Detailed Description

Represents NGS alignments (SAM, etc.

entries).


Enumeration Type Documentation

The text file format for alignment I/O.

Enumerator:
SAM_FORMAT 

SAM format.

The indexing scheme used on the alignments in this collection.

Enumerator:
UNINDEXED 

No indexing, all queries are linear scans.

REF_POS_SECONDARY 

Sorted on the reference (strcmp order) and start position.

Secondary index (no physical data ordering)


Function Documentation

BIOHDF_API biohdf_error BIOH5Gadd_alignment ( const bioh5g_alignments  aligns,
const bioh5g_alignment_data data 
)

Add an alignment to a collection.

Parameters:
alignsA BioHDF alignments handle
dataA BioHDF alignment
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gadd_alignments_iterator_flags_filter ( bioh5g_alignments_iterator  iter,
uint32_t  mask 
)

Add a SAM FLAGS filter to an alignments iterator.

Only alignments which have all the bits in the mask set (an AND mask) will be returned.

See the SAM spec for the meanings of the individual flags.

Parameters:
iterAn iterator for an alignments collection
maskA SAM flags mask. Note that adding a subsequent mask to an iterator clobbers the old one. They are NOT combined with logical OR.
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gadd_alignments_iterator_mapq_filter ( bioh5g_alignments_iterator  iter,
unsigned char  min_mapq 
)

Add a SAM MAPQ filter to an alignments iterator.

Set a minimum MAPQ level on an iterator. When set, only alignments which have a MAPQ score above or equal to the minimum will be returned.

NOTE: Allowable MAPQ values are from 0 to 255.

Parameters:
iterAn iterator for an alignments collection
min_mapqThe minimum acceptable MAPQ value (inclusive)
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gadd_alignments_iterator_range_filter ( bioh5g_alignments_iterator  iter,
const char *  reference,
int32_t  start,
int32_t  end 
)

Add a reference region filter to an alignments iterator.

Add reference region filters one at a time. If no reference regions are specified, all alignments are returned.

Parameters:
iterAn iterator for an alignments collection
referenceThe reference name
startThe start point of the region (1-based, inclusive)
endThe end point of the region (1-based, inclusive)
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gcheck_alignments_presence ( const biohdf_file  file,
const char *  path,
int *  presence 
)

Test if an alignments collection exists.

This function will return TRUE if a collection of the same type exists at the named location. If any other HDF5 or BioHDF object with that same name exists, TRUE will be returned as well as an error code, the assumption being that Bad Code(tm) that does not check return values will be more likely to attempt to open code (and fail), rather than create things (which my partially succeed, making a mess).

Parameters:
fileA BioHDF file handle
pathThe BioHDF path to the collection
[out]presenceTRUE if the collection exists, FALSE if it does not.
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gclose_alignments_collection ( bioh5g_alignments aligns)

Close an open alignments collection.

This function will set the collection handle to NULL after freeing it.

Parameters:
[in,out]alignmentsA BioHDF alignments handle
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gcreate_alignment_string ( const bioh5g_alignment_data alignment,
const bioh5g_read_data read,
bioh5g_alignments_format  format,
char **  alignment_string 
)

Create an alignment string in a particular format.

Read data is normally required for correct SAM output. If no read data is supplied, the QNAME will be an arbitrary integer and SEQ and QUAL will both be '*'.

Parameters:
alignmentThe alignment data
readThe read data
formatThe format of the output string
[out]alignment_stringThe alignment string
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure
BIOHDF_API biohdf_error BIOH5Gcreate_alignments_collection ( const biohdf_file  file,
const bioh5g_alignments_creation_properties  props,
const char *  path,
bioh5g_alignments aligns 
)

Create (and open) a new alignments collection.

The collection handle returned by this function will be ready to accept I/O.

Parameters:
fileA BioHDF file handle
propertiesCollection creation properties
pathThe BioHDF path to the collection
[out]alignsA BioHDF alignments handle
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gcreate_alignments_index ( bioh5g_alignments  aligns,
bioh5g_alignments_index_method  method,
biohdf_index_creation_properties  props 
)

Create an index for an alignments collection.

A pre-existing index of the same method/type will be deleted.

Parameters:
alignsThe alignments collection
methodThe indexing method to use
propsIndex creation properties
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gcreate_alignments_iterator ( const bioh5g_alignments  aligns,
bioh5g_alignments_iterator iter 
)

Create an iterator for an alignments collection.

Parameters:
alignsA BioHDF alignments handle
[out]iterAn iterator for an alignments collection
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gdestroy_alignments_iterator ( bioh5g_alignments_iterator iter)

Destroy an iterator for an alignments collection.

The iterator is set to NULL as a part of deletion.

Parameters:
[in,out]iterAn iterator for an alignments collection
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure
BIOHDF_API biohdf_error BIOH5Gfree_alignment_data ( bioh5g_alignment_data **  data)

Free alignment data that has been obtained from the library.

The data is set to NULL as a part of deletion.

Parameters:
[in,out]dataThe BioHDF alignment
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure
BIOHDF_API biohdf_error BIOH5Gget_alignment ( const bioh5g_alignments  aligns,
int64_t  index,
bioh5g_alignment_data **  data 
)

Given an alignment index, get the alignment from an alignment collection.

Parameters:
alignsA BioHDF alignments handle
indexThe index of this alignment
[out]dataThe BioHDF alignment
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

Given an alignment index, get the alignment from an alignment collection.

CODE SUCCESS FAILURE

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gget_alignment_file_header ( const bioh5g_alignments  aligns,
bioh5g_alignments_format format,
char **  header 
)

Get a stored alignment file header.

NOTE: Headers are stored verbatim and are not parsed or generated. This function can only return a previously-stored header, it will not generate a header from scratch.

Parameters:
alignsThe alignments collection
[out]formatThe format of the header
[out]headerThe header string
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gget_alignments_count ( const bioh5g_alignments  aligns,
int64_t *  count 
)

Get the number of stored alignments in a collection.

Parameters:
alignsA BioHDF alignments handle
[out]countThe number of alignments in the collection
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gget_index_of_last_added_alignment ( const bioh5g_alignments  aligns,
int64_t *  index 
)

Get the index of the last alignment that was added.

Useful for higher level structures where links must be created.

Parameters:
alignsA BioHDF alignments handle
[out]indexThe index of the last alignment that was added
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gget_next_alignment ( bioh5g_alignments_iterator  iter,
int64_t *  index,
bioh5g_alignment_data **  data 
)

Get the next alignment from an alignments collection.

Parameters:
iterAn iterator for an alignments collection
[out]indexThe index of this alignment
[out]dataThe BioHDF alignment
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gget_reads_path ( const bioh5g_alignments  aligns,
char **  reads_path 
)

Get the path to the associated reads.

Each BioHDF alignments collection includes a link the associated reads collection.

An empty string ("\0") is returned if the link is not present.

Parameters:
alignmentsA BioHDF alignments handle
[out]reads_pathThe BioHDF path to the associated reads
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gopen_alignments_collection ( const biohdf_file  file,
const char *  path,
biohdf_open_mode  mode,
bioh5g_alignments aligns 
)

Open an existing alignments collection.

Parameters:
fileA BioHDF file handle
pathThe BioHDF path to the collection
modeThe access mode (read-only | read-write)
[out]alignmentsA BioHDF alignments handle
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gstore_alignment_file_header ( const bioh5g_alignments  aligns,
bioh5g_alignments_format  format,
const char *  header 
)

Store a file header from an alignment file (e.g.

SAM)

NOTE: Headers are stored verbatim and are not parsed.

Parameters:
alignsThe alignments collection
formatThe format of the header
headerThe header string to store
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

BIOHDF_API biohdf_error BIOH5Gwrite_alignment_to_stream ( const bioh5g_alignment_data alignment,
const bioh5g_read_data read,
bioh5g_alignments_format  format,
FILE *  stream 
)

Write an alignment string in a particular format to an output stream.

This saves you from having to create temp strings that will just be dumped to a stream.

Read data is normally required for correct SAM output. If no read data is supplied, the QNAME will be an arbitrary integer and SEQ and QUAL will both be '*'.

Parameters:
alignmentThe alignment data
readThe read data
formatThe format of the output string
streamThe output stream (can be STDOUT)
Returns:
BIOHDF_NO_ERROR on success, a BioHDF error code on failure

CHECK*PARAMETERS

CODE

SUCCESS

FAILURE

 All Data Structures Variables