Skip to contents

Check if input objects are ready for further analyses

Usage

check_input(seq = NULL, annotation = NULL, gene_field = "gene_id")

Arguments

seq

A list of AAStringSet objects, each list element containing protein sequences for a given species. This list must have names (not NULL), and names of each list element must match the names of list elements in annotation.

annotation

A GRangesList, CompressedGRangesList, or list of GRanges with the annotation for the sequences in seq. This list must have names (not NULL), and names of each list element must match the names of list elements in seq.

gene_field

Character, name of the column in the GRanges objects that contains gene IDs. Default: "gene_id".

Value

TRUE if the objects pass the check.

Details

This function checks the input data for 3 required conditions:

  1. Names of seq list (i.e., names(seq)) match the names of annotation GRangesList/CompressedGRangesList (i.e., names(annotation))

  2. For each species (list elements), the number of sequences in seq is not greater than the number of genes in annotation. This is a way to ensure users do not input the translated sequences for multiple isoforms of the same gene (generated by alternative splicing). Ideally, the number of sequences in seq should be equal to the number of genes in annotation, but this may not always stand true because of non-protein-coding genes.

  3. For each species, sequence names (i.e., names(seq[[x]]), equivalent to FASTA headers) match gene names in annotation.

Examples

data(annotation) 
data(proteomes)
check_input(proteomes, annotation)
#> [1] TRUE