Since Ensembl doesn’t provide species trees for all its instances, we inferred a species tree using a standard approach in phylogenomics. The following fragment was extracted from the manuscript associated with this publication:
BUSCO scores (Manni et al. 2021) for each species were obtained and visualized using the Bioconductor package cogeqc (Almeida‐Silva and Van de Peer 2023). BUSCO genes shared by >90% of the species were aligned with MAFFT (Katoh and Standley 2013), and multiple sequence alignments were concatenated and trimmed to remove alignment columns with >50% gaps. Filtered supermatrices were used for phylogeny inference with IQ-TREE2 (Minh et al. 2020). Oomycetes, red algae, Giardia lamblia, Mnemiopsis leidyi, and Saccharomyces cerevisiae were used as outgroups for Ensembl Fungi, Ensembl Plants, Ensembl Protists, Ensembl Metazoa, and Ensembl, respectively.
Although inferring trees from BUSCO genes is common practice, this method has limitations and can lead to wrong topologies. Our goal here was not test hypotheses on the phylogeny of each Ensembl instance (with more sophisticated methods such as partition models, microsynteny-based phylogenies, and gene tree-species tree reconciliation methods, for instance). Instead, we simply wanted to have species trees that could be used as a phylogenetic context to explore major patterns in the frequency of duplicated genes by mode.
If you have a curated species tree that includes all species in one (or more) of the Ensembl instances used here, we’d be happy to update the tree in this app. You can contribute data by opening an issue in this GitHub repo.