Find group-specific clusters based on user-defined species classification
Source:R/05_phylogenomic_profiling.R
find_GS_clusters.Rd
Find group-specific clusters based on user-defined species classification
Arguments
- profile_matrix
A matrix of phylogenomic profiles obtained with
phylogenomic_profile
.- species_annotation
A 2-column data frame with species IDs in the first column (same as column names of profile matrix), and species annotation (e.g., higher-level taxonomic information) in the second column.
- min_percentage
Numeric scalar with the minimum percentage of species in a group to consider group specificity. For instance, if a given cluster is present in only 1 group of species, but in less than min_percentage of the species for this group, it will not be considered a group-specific cluster. This filtering criterion is useful to differentiate group-specific clusters (e.g., family-specific) from subgroup-specific clusters (e.g., genus-specific). Default: 50.
Value
A data frame with the following variables:
- Group
To which group of species the cluster is specific.
- Percentage
Percentage of species from the group that are represented by the cluster.
- Cluster
Cluster ID.
Examples
data(clusters)
profile_matrix <- phylogenomic_profile(clusters)
# Species annotation
species_order <- c(
"vra", "van", "pvu", "gma", "cca", "tpr", "mtr", "adu", "lja",
"Lang", "car", "pmu", "ppe", "pbr", "mdo", "roc", "fve",
"Mnot", "Zjuj", "hlu", "jcu", "mes", "rco", "lus", "ptr"
)
species_annotation <- data.frame(
Species = species_order,
Family = c(rep("Fabaceae", 11), rep("Rosaceae", 6),
"Moraceae", "Ramnaceae", "Cannabaceae",
rep("Euphorbiaceae", 3), "Linaceae", "Salicaceae")
)
gs_clusters <- find_GS_clusters(profile_matrix, species_annotation)