Classify gene pairs originating from transposon-derived duplications
Source:R/utils_duplicate_classification.R
get_transposed.Rd
Classify gene pairs originating from transposon-derived duplications
Usage
get_transposed(
pairs,
blast_inter,
annotation,
evalue = 1e-10,
anchors = 5,
max_gaps = 25,
collinearity_dir = NULL,
outgroup_coverage = 70
)
Arguments
- pairs
A 3-column data frame with columns dup1, dup2, and type indicating duplicated gene 1, duplicated gene 2, and the mode of duplication associated with the pair. This data frame is returned by
get_tandem_proximal()
.- blast_inter
A list of data frames of length 1 containing BLAST tabular output for the comparison between the target species and an outgroup. Names of list elements must match the names of list elements in
annotation
. BLASTp, DIAMOND or simular programs must be run on processed sequence data as returned bysyntenet::process_input()
.- annotation
A processed GRangesList or CompressedGRangesList object as returned by
syntenet::process_input()
.- evalue
Numeric scalar indicating the E-value threshold. Default: 1e-10.
- anchors
Numeric indicating the minimum required number of genes to call a syntenic block, as in
syntenet::infer_syntenet
. Default: 5.- max_gaps
Numeric indicating the number of upstream and downstream genes to search for anchors, as in
syntenet::infer_syntenet
. Default: 25.- collinearity_dir
Character indicating the path to the directory where .collinearity files will be stored. If NULL, files will be stored in a subdirectory of
tempdir()
. Default: NULL.- outgroup_coverage
Numeric indicating the minimum percentage of outgroup species to use to consider genes as transposed duplicates. Only valid if multiple outgroup species are present (see details below). Values should range from 0 to 100. Default: 70.
Value
A 3-column data frame with the following variables:
- dup1
Character, duplicated gene 1.
- dup2
Character, duplicated gene 2.
- type
Factor of duplication types, with levels "SD" (segmental duplication), "TD" (tandem duplication), "PD" (proximal duplication), "TRD" (transposon-derived duplication), and "DD" (dispersed duplication).
Details
If the list of interspecies DIAMOND tables contain comparisons of the same species to multiple outgroups (e.g., 'speciesA_speciesB', 'speciesA_speciesC'), this function will check if gene pairs are classified as transposed (i.e., only one gene is an ancestral locus) in each of the outgroup species, and then calculate the percentage of outgroup species in which each pair is considered 'transposed'. For instance, gene pair 1 is transposed based on 30\ on 100\ based on 0\ Parameter outgroup_coverage lets you choose a minimum percentage cut-off to classify pairs as transposed.
Examples
data(diamond_inter)
data(diamond_intra)
data(yeast_seq)
data(yeast_annot)
data(fungi_kaks)
scerevisiae_kaks <- fungi_kaks$saccharomyces_cerevisiae
# Get processed annotation
pdata <- syntenet::process_input(yeast_seq, yeast_annot)
annotation <- pdata$annotation
# Get duplicated pairs
pairs <- scerevisiae_kaks[, c("dup1", "dup2", "type")]
pairs$dup1 <- paste0("Sce_", pairs$dup1)
pairs$dup2 <- paste0("Sce_", pairs$dup2)
# Classify pairs
trd <- get_transposed(pairs, diamond_inter, annotation)
annotation <- c(annotation, list(Cglabrata2 = annotation$Cglabrata))
blast_inter <- c(diamond_inter, list(Scerevisiae_Cglabrata2 = diamond_inter[[1]]))