Skip to contents

Classify gene pairs originating from transposon-derived duplications

Usage

get_transposed(
  pairs,
  blast_inter,
  annotation,
  evalue = 1e-10,
  anchors = 5,
  max_gaps = 25,
  collinearity_dir = NULL,
  outgroup_coverage = 70
)

Arguments

pairs

A 3-column data frame with columns dup1, dup2, and type indicating duplicated gene 1, duplicated gene 2, and the mode of duplication associated with the pair. This data frame is returned by get_tandem_proximal().

blast_inter

A list of data frames of length 1 containing BLAST tabular output for the comparison between the target species and an outgroup. Names of list elements must match the names of list elements in annotation. BLASTp, DIAMOND or simular programs must be run on processed sequence data as returned by syntenet::process_input().

annotation

A processed GRangesList or CompressedGRangesList object as returned by syntenet::process_input().

evalue

Numeric scalar indicating the E-value threshold. Default: 1e-10.

anchors

Numeric indicating the minimum required number of genes to call a syntenic block, as in syntenet::infer_syntenet. Default: 5.

max_gaps

Numeric indicating the number of upstream and downstream genes to search for anchors, as in syntenet::infer_syntenet. Default: 25.

collinearity_dir

Character indicating the path to the directory where .collinearity files will be stored. If NULL, files will be stored in a subdirectory of tempdir(). Default: NULL.

outgroup_coverage

Numeric indicating the minimum percentage of outgroup species to use to consider genes as transposed duplicates. Only valid if multiple outgroup species are present (see details below). Values should range from 0 to 100. Default: 70.

Value

A 3-column data frame with the following variables:

dup1

Character, duplicated gene 1.

dup2

Character, duplicated gene 2.

type

Factor of duplication types, with levels "SD" (segmental duplication), "TD" (tandem duplication), "PD" (proximal duplication), "TRD" (transposon-derived duplication), and "DD" (dispersed duplication).

Details

If the list of interspecies DIAMOND tables contain comparisons of the same species to multiple outgroups (e.g., 'speciesA_speciesB', 'speciesA_speciesC'), this function will check if gene pairs are classified as transposed (i.e., only one gene is an ancestral locus) in each of the outgroup species, and then calculate the percentage of outgroup species in which each pair is considered 'transposed'. For instance, gene pair 1 is transposed based on 30\ on 100\ based on 0\ Parameter outgroup_coverage lets you choose a minimum percentage cut-off to classify pairs as transposed.

Examples

data(diamond_inter)
data(diamond_intra)
data(yeast_seq)
data(yeast_annot)
data(fungi_kaks)
scerevisiae_kaks <- fungi_kaks$saccharomyces_cerevisiae

# Get processed annotation
pdata <- syntenet::process_input(yeast_seq, yeast_annot)
annotation <- pdata$annotation

# Get duplicated pairs
pairs <- scerevisiae_kaks[, c("dup1", "dup2", "type")]
pairs$dup1 <- paste0("Sce_", pairs$dup1)
pairs$dup2 <- paste0("Sce_", pairs$dup2)

# Classify pairs
trd <- get_transposed(pairs, diamond_inter, annotation)

annotation <- c(annotation, list(Cglabrata2 = annotation$Cglabrata))
blast_inter <- c(diamond_inter, list(Scerevisiae_Cglabrata2 = diamond_inter[[1]]))