set.seed(123) # for reproducibility
# Load required packages
library(tidyverse)
library(BioNERO)
library(SummarizedExperiment)
library(here)
3 Inference and analysis of gene regulatory networks (GRNs)
In this lesson, you will learn how to infer gene regulatory networks (GRNs) from bulk RNA-seq data. At the end of the lesson, you will be able to:
- infer GRNs using different popular algorithms
- explore GRNs to extract subgraphs of interest
- visualize GRNs
Let’s start by loading the packages we will use.
3.1 Getting to know the example data
Here, we will use an example data set available in data/se_PRJNA800609.rda
in the Github repo associated with this course. This experiment comprises soybean (Glycine max) pods infected with the pathogenic fungus Colletotrichum truncatum, and data were downloaded from The Soybean Expression Atlas v2 (Almeida-Silva, Pedrosa-Silva, and Venancio 2023) using the BioProject accession PRJNA800609. The original data were generated by Zhu et al. (2022).
# Load expression data
load(here("data", "se_PRJNA800609.rda"))
# Rename object to a simpler name
<- se_PRJNA800609
exp rm(se_PRJNA800609)
# Take a look at the object
exp
class: SummarizedExperiment
dim: 31422 60
metadata(0):
assays(1): ''
rownames(31422): Glyma.15G153300 Glyma.15G153400 ... Glyma.09G145600
Glyma.09G145700
rowData names(0):
colnames(60): SAMN25263487 SAMN25263488 ... SAMN25263525 SAMN25263526
colData names(4): Part Cultivar Treatment Timepoint
We will also use soybean transcription factors obtained from PlantTFDB 4.0 (Jin et al. 2016), which are stored in data/gma_tfs.rda
.
# Load transcription factors
load(here("data", "gma_tfs.rda"))
head(gma_tfs)
[1] "Glyma.10G204400" "Glyma.02G099500" "Glyma.20G186200" "Glyma.01G087500"
[5] "Glyma.20G247300" "Glyma.01G000600"
length(gma_tfs)
[1] 3747
3.2 GRN inference
To infer GRNs, we need two input objects:
- A gene expression matrix (or a
SummarizedExperiment
object) - A vector of known regulators (e.g., TFs).
First, let’s preprocess our expression data with exp_preprocess()
. Here, for the sake of demonstration, we will only use the top 2000 genes with the highest variances.
# Preprocess the input data
<- exp_preprocess(
final_exp
exp,min_exp = 5,
variance_filter = TRUE,
n = 2000
)
Now we’re good to go. With BioNERO, one can infer GRNs using 3 popular algorithms: GENIE3 (huynh2010inferring?), ARACNE (Margolin et al. 2006), and CLR (Faith et al. 2007). Users can also infer GRNs using a combination of methods (a.k.a. “wisdom of the crowds” principle), which has been shown to lead to more accurate results (Marbach et al. 2012). Let’s demonstrate each of these approaches.
3.2.1 Individual algorithms
To infer GRNs using individual algorithms, you’d use the function grn_infer()
and specify the method in the method parameter. Below you can find a short description of each algorithm and how to run them.
GENIE3: a regression-tree based algorithm that decomposes the prediction of GRNs for n genes into n regression problems. For each regression problem, the expression profile of a target gene is predicted from the expression profiles of all other genes using random forests (default) or extra-trees.
# Infer a GRN with GENIE3
<- grn_infer(
grn_genie3
final_exp, method = "genie3",
regulators = gma_tfs,
1nTrees = 10
)
head(grn_genie3)
- 1
- Here, for demonstration purposes, we’re using only 10 trees, but one should use at least 1000 trees (default).
Node1 Node2 Weight
109961 Glyma.20G209700 Glyma.03G245300 0.3373682
160260 Glyma.12G216100 Glyma.07G149600 0.3122377
105394 Glyma.02G217800 Glyma.03G189100 0.2984574
211825 Glyma.16G012600 Glyma.14G201800 0.2816710
186303 Glyma.06G034000 Glyma.13G149100 0.2773383
51236 Glyma.11G049300 Glyma.19G121600 0.2736725
ARACNE: information-theoretic algorithm that aims to remove indirect interactions inferred by coexpression.
# Infer a GRN with ARACNE
<- grn_infer(
grn_aracne
final_exp, method = "aracne",
regulators = gma_tfs
)
head(grn_aracne)
Node1 Node2 Weight
70935 Glyma.20G209700 Glyma.01G037200 1.728753
89918 Glyma.04G226700 Glyma.02G149600 1.643472
238984 Glyma.07G212400 Glyma.10G224500 1.623464
211176 Glyma.07G212400 Glyma.14G195200 1.607392
91633 Glyma.20G209700 Glyma.02G195300 1.552133
290894 Glyma.04G226700 Glyma.09G143800 1.542347
CLR: extension of the relevance networks algorithm that uses mutual information to identify regulatory interactions.
# Infer a GRN with CLR
<- grn_infer(
grn_clr
final_exp,method = "clr",
regulators = gma_tfs
)
head(grn_clr)
Node1 Node2 Weight
260329 Glyma.15G152000 Glyma.12G009200 14.83700
154366 Glyma.19G260900 Glyma.07G049400 13.98161
197182 Glyma.16G017400 Glyma.13G289600 13.27572
97603 Glyma.15G029500 Glyma.02G309500 12.62377
55258 Glyma.13G344700 Glyma.19G211600 12.54761
226254 Glyma.16G017400 Glyma.09G227100 12.49256
Note that these functions return a fully connected graph, so you’ll need to filter the edge list to remove spurious edges. One way to do it is by removing edges below a particular value for the Weight
variable. However, since choosing a cut-off is not straightforward and often arbitrary, you can use the function grn_filter()
to perform a graph-based filtering. Briefly, this function splits the network in N subnetworks of increasing size and calculates the scale-free topology fit for each subgraph. In the end, the subgraph with the best scale-free topology fit will be chosen as the optimum.
This is how you can use it:
# Filter the GENIE3-derived GRN based on optimal SFT fit
<- grn_filter(grn_genie3)
filtered_grn head(filtered_grn)
Node1 Node2
109961 Glyma.20G209700 Glyma.03G245300
160260 Glyma.12G216100 Glyma.07G149600
105394 Glyma.02G217800 Glyma.03G189100
211825 Glyma.16G012600 Glyma.14G201800
186303 Glyma.06G034000 Glyma.13G149100
51236 Glyma.11G049300 Glyma.19G121600
- Filter the GRN obtained with GENIE3 to keep only edges with weight \(\ge\) 0.2. Then, answer the questions below:
- What is the top TF in number of targets?
- What genes are regulated by the top TF?
- What is the rank of the edge Glyma.07G212400 -> Glyma.10G224500 in all three networks? What is the mean rank?
3.2.2 Wisdom of the crowds
The “wisdom of the crowds” principle consists in combining inferences from multiple methods to obtain robust and more accurate predictions. As Marbach et al. (2012) stated in their paper:
We observe that no single inference method performs optimally across all datasets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse datasets.
In BioNERO, this is performed by computing average edge ranks across different networks, exactly as you did in the previous practice problem, and it can be executed with the function exp2grn()
. After computing average ranks for each edge, exp2grn()
sorts the edges based on ranks (in increasing order) and uses the graph-based filtering approach (as in grn_filter()
) to remove spurious edges.
# Infer GRN
<- exp2grn(
grn exp = final_exp,
regulators = gma_tfs,
nTrees = 10 # again, using only 10 trees for demonstration purposes
)
head(grn)
Regulator Target
376 Glyma.04G226700 Glyma.02G149600
2474 Glyma.20G209700 Glyma.01G037200
260 Glyma.03G247100 Glyma.02G022000
2020 Glyma.17G144100 Glyma.04G228400
1403 Glyma.12G117000 Glyma.11G137300
1056 Glyma.09G011800 Glyma.09G143800
Explore the output of the GRN inferred with exp2grn()
and answer the following questions.
- What regulator has the greatest number of targets?
- What genes are regulated by the top regulator?
- What is the minimum, maximum, mean, and median number of regulators per target?
3.3 GRN analyses
After you have a GRN, there are many things you can do with it. The downstream analyses you should do will really depend on the question you’re trying to address. Nevertheless, in this section you will see examples of common analyses people do, but keep in mind that these are not the only ones; you can (and should) be creative and explore different aspects of the GRN based on your research project.
3.3.1 Finding GRN hubs
Finding hubs in GRNs is a very common practice in publications, and the goal here is to identify the so-called master regulators of particular biological processes.
This can be performed with the function get_hubs_grn()
, which returns the top 10% regulators (adjustable) based on degree.
# Find GRN hubs
<- get_hubs_grn(grn)
grn_hubs
head(grn_hubs)
Gene Degree
1 Glyma.15G019400 13
2 Glyma.05G103300 12
3 Glyma.20G006400 12
4 Glyma.20G051500 12
5 Glyma.20G209700 11
6 Glyma.19G022200 11
What is the minimum, maximum, mean, and median degree for GRN hubs?
3.3.2 Exploring and visualizing GRNs
Another common analysis is to extract subgraphs for a particular group of genes (usually known to be involved in a process of interest) and plot them.
First, let’s plot the entire network. This is feasible here because we specifically filtered the expression data to have a small network (for demonstration purposes), but it is often not feasible in real-world networks.
# Plot the entire network
<- plot_grn(grn)
p_all
p_all
By default, only the top 5 hubs are labeled, but you can change that based on your preferences. For example:
# Label the top 10 hubs
plot_grn(grn, top_n_hubs = 10)
# Hide all labels
plot_grn(grn, show_labels = "none")
Now, we will visualize the subgraph for the top hub and its targets. You can do that by filtering the edge list to keep only edges containing the gene(s) you’re interested (here, the top hub).
# Keep only edges containing the top hub
<- grn_hubs$Gene[1]
top_hub <- grn[grn$Regulator == top_hub, ]
edges_top_hub
head(edges_top_hub)
Regulator Target
1789 Glyma.15G019400 Glyma.19G089800
1785 Glyma.15G019400 Glyma.13G116200
1770 Glyma.15G019400 Glyma.01G211300
1783 Glyma.15G019400 Glyma.11G252200
1779 Glyma.15G019400 Glyma.08G330600
1781 Glyma.15G019400 Glyma.10G070900
# Visualize the subgraph
plot_grn(edges_top_hub)
With smaller subgraphs, it’s nice to play with parameters in plot_grn()
to customize your visualization. For example:
# Show all nodes
plot_grn(edges_top_hub, show_labels = "all")
Finally, since the output of plot_grn()
is a ggplot object (same for plot_gcn()
and plot_ppi()
), you can store it in an object and modify it later. For example:
<- plot_grn(edges_top_hub)
p_grn
# Change colors of regulators and targets
+
p_grn scale_fill_manual(values = c("dodgerblue3", "firebrick3"))
# Remove legend
+
p_grn theme(legend.position = "none")
Create a subgraph with the top 10 hubs and their targets. Then, create different network visualizations based on the following instructions:
Network 1: Default parameters in plot_grn()
Network 2: Label all hubs.
Network 3: Label all hubs and change node colors (the fill aesthetics) so that regulators have the colors “black” and targets have the color “grey70”.
Network 4: Label all hubs and add the following plot title: “Network representation of the top 10 hubs and their targets”
Network 5: Label all hubs, change the network layout function to with_gem
, and move the legend to the bottom of the plot.
Session information
This chapter was created under the following conditions:
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os Ubuntu 22.04.3 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Brussels
date 2024-05-27
pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.2)
annotate 1.80.0 2023-10-24 [1] Bioconductor
AnnotationDbi 1.64.1 2023-11-03 [1] Bioconductor
backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.2)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.3.2)
Biobase * 2.62.0 2023-10-24 [1] Bioconductor
BiocGenerics * 0.48.1 2023-11-01 [1] Bioconductor
BiocParallel 1.37.0 2024-01-19 [1] Github (Bioconductor/BiocParallel@79a1b2d)
BioNERO * 1.11.3 2024-03-25 [1] Bioconductor
Biostrings 2.70.2 2024-01-28 [1] Bioconductor 3.18 (R 4.3.2)
bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.2)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.2)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.2)
blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.2)
cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.2)
checkmate 2.3.1 2023-12-04 [1] CRAN (R 4.3.2)
circlize 0.4.15 2022-05-10 [1] CRAN (R 4.3.2)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2)
clue 0.3-65 2023-09-23 [1] CRAN (R 4.3.2)
cluster 2.1.5 2023-11-27 [4] CRAN (R 4.3.2)
coda 0.19-4.1 2024-01-31 [1] CRAN (R 4.3.2)
codetools 0.2-19 2023-02-01 [4] CRAN (R 4.2.2)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.2)
ComplexHeatmap 2.18.0 2023-10-24 [1] Bioconductor
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.2)
data.table 1.15.0 2024-01-30 [1] CRAN (R 4.3.2)
DBI 1.2.1 2024-01-12 [1] CRAN (R 4.3.2)
DelayedArray 0.28.0 2023-10-24 [1] Bioconductor
digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.2)
doParallel 1.0.17 2022-02-07 [1] CRAN (R 4.3.2)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
dynamicTreeCut 1.63-1 2016-03-11 [1] CRAN (R 4.3.2)
edgeR 4.0.15 2024-02-11 [1] Bioconductor 3.18 (R 4.3.2)
evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2)
farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.2)
fastcluster 1.2.6 2024-01-12 [1] CRAN (R 4.3.2)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.2)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.2)
foreach 1.5.2 2022-02-02 [1] CRAN (R 4.3.2)
foreign 0.8-86 2023-11-28 [4] CRAN (R 4.3.2)
Formula 1.2-5 2023-02-24 [1] CRAN (R 4.3.2)
genefilter 1.84.0 2023-10-24 [1] Bioconductor
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.2)
GENIE3 1.24.0 2023-10-24 [1] Bioconductor
GenomeInfoDb * 1.38.6 2024-02-08 [1] Bioconductor 3.18 (R 4.3.2)
GenomeInfoDbData 1.2.11 2023-12-21 [1] Bioconductor
GenomicRanges * 1.54.1 2023-10-29 [1] Bioconductor
GetoptLong 1.0.5 2020-12-15 [1] CRAN (R 4.3.2)
ggdendro 0.1.23 2022-02-16 [1] CRAN (R 4.3.2)
ggnetwork 0.5.13 2024-02-14 [1] CRAN (R 4.3.2)
ggplot2 * 3.5.0 2024-02-23 [1] CRAN (R 4.3.2)
ggrepel 0.9.5 2024-01-10 [1] CRAN (R 4.3.2)
GlobalOptions 0.1.2 2020-06-10 [1] CRAN (R 4.3.2)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2)
GO.db 3.18.0 2024-01-09 [1] Bioconductor
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.2)
gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.2)
here * 1.0.1 2020-12-13 [1] CRAN (R 4.3.2)
Hmisc 5.1-1 2023-09-12 [1] CRAN (R 4.3.2)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.2)
htmlTable 2.4.2 2023-10-29 [1] CRAN (R 4.3.2)
htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.2)
httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.2)
igraph 2.0.1.1 2024-01-30 [1] CRAN (R 4.3.2)
impute 1.76.0 2023-10-24 [1] Bioconductor
intergraph 2.0-4 2024-02-01 [1] CRAN (R 4.3.2)
IRanges * 2.36.0 2023-10-24 [1] Bioconductor
iterators 1.0.14 2022-02-05 [1] CRAN (R 4.3.2)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.2)
KEGGREST 1.42.0 2023-10-24 [1] Bioconductor
knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.2)
lattice 0.22-5 2023-10-24 [4] CRAN (R 4.3.1)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2)
limma 3.58.1 2023-10-31 [1] Bioconductor
locfit 1.5-9.8 2023-06-11 [1] CRAN (R 4.3.2)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.2)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2)
MASS 7.3-60 2023-05-04 [4] CRAN (R 4.3.1)
Matrix 1.6-3 2023-11-14 [4] CRAN (R 4.3.2)
MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor
matrixStats * 1.2.0 2023-12-11 [1] CRAN (R 4.3.2)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.2)
mgcv 1.9-0 2023-07-11 [4] CRAN (R 4.3.1)
minet 3.60.0 2023-10-24 [1] Bioconductor
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.2)
NetRep 1.2.7 2023-08-19 [1] CRAN (R 4.3.2)
network 1.18.2 2023-12-05 [1] CRAN (R 4.3.2)
nlme 3.1-163 2023-08-09 [4] CRAN (R 4.3.1)
nnet 7.3-19 2023-05-03 [4] CRAN (R 4.3.1)
patchwork 1.2.0 2024-01-08 [1] CRAN (R 4.3.2)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.2)
plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.2)
png 0.1-8 2022-11-29 [1] CRAN (R 4.3.2)
preprocessCore 1.64.0 2023-10-24 [1] Bioconductor
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.2)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.2)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.2)
Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.2)
RCurl 1.98-1.14 2024-01-09 [1] CRAN (R 4.3.2)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.2)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.2)
RhpcBLASctl 0.23-42 2023-02-11 [1] CRAN (R 4.3.2)
rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.2)
rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2)
rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.2)
rpart 4.1.21 2023-10-09 [4] CRAN (R 4.3.1)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.2)
RSQLite 2.3.5 2024-01-21 [1] CRAN (R 4.3.2)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.2)
S4Arrays 1.2.0 2023-10-24 [1] Bioconductor
S4Vectors * 0.40.2 2023-11-23 [1] Bioconductor 3.18 (R 4.3.2)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.2)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.2)
shape 1.4.6 2021-05-19 [1] CRAN (R 4.3.2)
SparseArray 1.2.4 2024-02-11 [1] Bioconductor 3.18 (R 4.3.2)
statmod 1.5.0 2023-01-06 [1] CRAN (R 4.3.2)
statnet.common 4.9.0 2023-05-24 [1] CRAN (R 4.3.2)
stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.2)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.2)
SummarizedExperiment * 1.32.0 2023-10-24 [1] Bioconductor
survival 3.5-7 2023-08-14 [4] CRAN (R 4.3.1)
sva 3.50.0 2023-10-24 [1] Bioconductor
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.2)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.2)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.2)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.2)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.2)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.2)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2)
WGCNA 1.72-5 2023-12-07 [1] CRAN (R 4.3.2)
withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2)
xfun 0.42 2024-02-08 [1] CRAN (R 4.3.2)
XML 3.99-0.16.1 2024-01-22 [1] CRAN (R 4.3.2)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.2)
XVector 0.42.0 2023-10-24 [1] Bioconductor
yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2)
zlibbioc 1.48.0 2023-10-24 [1] Bioconductor
[1] /home/faalm/R/x86_64-pc-linux-gnu-library/4.3
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
──────────────────────────────────────────────────────────────────────────────