Skip to contents

Compare inferred orthogroups to a reference set

Usage

compare_orthogroups(ref_orthogroups = NULL, test_orthogroups = NULL)

Arguments

ref_orthogroups

Reference orthogroups in a 3-column data frame with columns Orthogroup, Species, and Gene. This data frame can be created from the 'Orthogroups.tsv' file generated by OrthoFinder with the function read_orthogroups().

test_orthogroups

Test orthogroups that will be compared to ref_orthogroups in the same 3-column data frame format.

Value

A 2-column data frame with the following variables:

Orthogroup

Character of orthogroup IDs.

Preserved

A logical vector of preservation status. It is TRUE if the orthogroup in the reference set is fully preserved in the test set, and FALSE otherwise.

Details

This function compares a test set of orthogroups to a reference set and returns which orthogroups in the reference set are fully preserved in the test set (i.e., identical gene repertoire) and which are not. Species names (column 2) must be the same between reference and test set. If some species are not shared between reference and test sets, they will not be considered for the comparison.

Examples

set.seed(123)
data(og)
og <- og[1:5000, ]
ref <- og
# Shuffle genes to simulate a different set
test <- data.frame(
    Orthogroup = sample(og$Orthogroup, nrow(og), replace = FALSE),
    Species = og$Species,
    Gene = og$Gene
)
comparison <- compare_orthogroups(ref, test)

# Calculating percentage of preservation
sum(comparison$Preserved) / length(comparison$Preserved)
#> [1] 0