class: center, middle, inverse, title-slide #
cageminer: mining candidate genes by integrating GWAS and gene coexpression networks
### Fabrício Almeida-Silva
@almeidasilvaf
### UENF, Brazil ### August 6, 2021 --- ## The problem <br /> .center[GWAS can identify SNPs associated with traits, but not the underlying genes. Hundreds of putative candidate genes. What then? ] <img src="figs/drowning.png" width="65%" style="display: block; margin: auto;" /> --- ## *cageminer* to the rescue! <img src="figs/Fig1.png" width="95%" style="display: block; margin: auto;" /> --- ## Required input data .panelset.sideways[ .panel[.panel-name[SNP positions] A GRanges or GRangesList object. ```r set.seed(123) library(cageminer) data(snp_pos) snp_pos ``` ``` ## GRanges object with 116 ranges and 0 metadata columns: ## seqnames ranges strand ## <Rle> <IRanges> <Rle> ## 2 Chr02 149068682 * ## 3 Chr03 5274098 * ## 4 Chr05 27703815 * ## 5 Chr05 27761792 * ## 6 Chr05 27807397 * ## ... ... ... ... ## 114 Chr12 230514706 * ## 115 Chr12 230579178 * ## 116 Chr12 230812962 * ## 117 Chr12 230887290 * ## 118 Chr12 231022812 * ## ------- ## seqinfo: 8 sequences from an unspecified genome; no seqlengths ``` ] .panel[.panel-name[Gene coordinates] A GRanges object. ```r data(gene_ranges) head(gene_ranges) ``` ``` ## GRanges object with 6 ranges and 6 metadata columns: ## seqnames ranges strand | source type score phase ## <Rle> <IRanges> <Rle> | <factor> <factor> <numeric> <integer> ## [1] Chr01 63209-63880 - | PGA1.55 gene NA <NA> ## [2] Chr01 112298-112938 - | PGA1.55 gene NA <NA> ## [3] Chr01 117979-118392 + | PGA1.55 gene NA <NA> ## [4] Chr01 119464-119712 + | PGA1.55 gene NA <NA> ## [5] Chr01 119892-120101 + | PGA1.55 gene NA <NA> ## [6] Chr01 132987-134420 - | PGA1.55 gene NA <NA> ## ID Parent ## <character> <CharacterList> ## [1] CA01g00010 ## [2] CA01g00020 ## [3] CA01g00030 ## [4] CA01g00040 ## [5] CA01g00050 ## [6] CA01g00060 ## ------- ## seqinfo: 12 sequences from an unspecified genome; no seqlengths ``` ] .panel[.panel-name[Guide genes] Character vector or a 2-column data frame with genes in column 1 and gene functional class (e.g., pathway, GO) in column 2. ```r data(guides) head(guides) ``` ``` ## Gene Description ## 1 CA10g07770 response to stimulus ## 2 CA10g07770 response to stress ## 3 CA10g07770 cellular response to stimulus ## 4 CA10g07770 cellular response to stress ## 6 CA10g07770 regulation of cellular response to stress ## 8 CA10g07770 regulation of response to stimulus ``` ] .panel[.panel-name[Expression data] SummarizedExperiment object or matrix/data frame with genes in rows and samples in columns. ```r data(pepper_se) pepper_se ``` ``` ## class: SummarizedExperiment ## dim: 3892 45 ## metadata(0): ## assays(1): '' ## rownames(3892): CA02g23440 CA02g05510 ... CA03g35110 CA02g12750 ## rowData names(0): ## colnames(45): PL1 PL2 ... TMV-P0-3D TMV-P0-Up ## colData names(1): Condition ``` ] .panel[.panel-name[Coexpression network] List object returned by `BioNERO::exp2gcn()`. ```r # Infer network gcn <- BioNERO::exp2gcn(pepper_se, SFTpower = 12) ``` ] ] --- ## Exploratory analysis <br /> 🤔 Are SNPs evenly distributed across chromosomes? .pull-left[ ```r plot_snp_distribution(snp_pos) ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] --- ## Exploratory analysis <br /> 🤔 Are SNPs evenly distributed across chromosomes? (Handling multiple traits) .pull-left[ ```r # Simulate multiple traits by resampling snp_list <- GenomicRanges::GRangesList( Trait1 = sample(snp_pos, 20), Trait2 = sample(snp_pos, 20), Trait3 = sample(snp_pos, 20), Trait4 = sample(snp_pos, 20) ) # Visualize SNP distribution plot_snp_distribution(snp_list) ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> ] --- ## Exploratory analysis <br /> 🤔 Are SNPs physically close to each other? .pull-left[ ```r data(chr_length) plot_snp_circos(chr_length, gene_ranges, snp_pos) ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] --- ## Exploratory analysis <br /> 🤔 Are SNPs physically close to each other? (handling multiple traits) .pull-left[ ```r data(chr_length) plot_snp_circos(chr_length, gene_ranges, * snp_list) ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> ] --- ## Gene mining .panelset.sideways[ .panel[.panel-name[Step 1] ```r candidates1 <- mine_step1(gene_ranges, snp_pos) length(candidates1) ``` ``` ## [1] 1265 ``` ] .panel[.panel-name[Step 2] ```r candidates2 <- mine_step2(pepper_se, gcn = gcn, guides = guides$Gene, candidates = candidates1$ID) str(candidates2) ``` ``` ## List of 2 ## $ candidates: chr [1:37] "CA10g08490" "CA03g01790" "CA10g12640" "CA12g21230" ... ## $ enrichment:'data.frame': 1 obs. of 7 variables: ## ..$ TermID: chr "guide" ## ..$ genes : num 323 ## ..$ all : num 1303 ## ..$ pval : num 2.58e-05 ## ..$ padj : num 5.15e-05 ## ..$ GeneID: chr "CA01g00320,CA01g00480,CA01g00740,CA01g01120,CA01g01470,CA01g03330,CA01g03580,CA01g05790,CA01g06190,CA01g07040,C"| __truncated__ ## ..$ Module: chr "cyan" ``` ] .panel[.panel-name[Step 3] ```r candidates3 <- mine_step3(pepper_se, candidates = candidates2$candidates, sample_group = "PRR_stress") ``` <img src="slides_files/figure-html/unnamed-chunk-15-1.png" width="30%" style="display: block; margin: auto;" /> ```r candidates3 ``` ``` ## gene trait cor pvalue ## 243 CA12g18400 PRR_stress 0.5963394 1.540534e-05 ## 187 CA10g02780 PRR_stress 0.3201048 3.205995e-02 ## 19 CA02g16620 PRR_stress 0.3113993 3.732128e-02 ## 33 CA02g16900 PRR_stress -0.3204388 3.187100e-02 ## 103 CA03g03310 PRR_stress -0.3983772 6.720204e-03 ``` ] .panel[.panel-name[Wrapping up] `mine_candidates()` is a wrapper that calls all 3 `mine_step*()` functions sequentially. ```r candidates <- mine_candidates(gene_ranges = gene_ranges, marker_ranges = snp_pos, exp = pepper_se, gcn = gcn, guides = guides$Gene, sample_group = "PRR_stress") ``` ] ] --- ## Scoring and ranking candidates <br /> Useful to pick the top *n* genes for validation, for instance. .font140[ `$$S_i = r_{pb} \kappa$$` ] .center[ where: `\(\kappa\)` = 2 if the gene encodes a transcription factor `\(\kappa\)` = 2 if the gene is a hub `\(\kappa\)` = 3 is the gene is a hub + encodes a transcription factor ] --- ## Scoring and ranking candidates ```r data(tfs) data(hubs) # Score and rank candidates scored <- score_genes(candidates3, hubs$Gene, tfs$Gene_ID) scored ``` ``` ## gene trait cor pvalue score ## 243 CA12g18400 PRR_stress 0.5963394 1.540534e-05 0.5963394 ## 103 CA03g03310 PRR_stress -0.3983772 6.720204e-03 -0.3983772 ## 33 CA02g16900 PRR_stress -0.3204388 3.187100e-02 -0.3204388 ## 187 CA10g02780 PRR_stress 0.3201048 3.205995e-02 0.3201048 ## 19 CA02g16620 PRR_stress 0.3113993 3.732128e-02 0.3113993 ``` --- ## Summary From 1265 genes, `cageminer` found 5 high-confidence candidates (>99% reduction). <img src="slides_files/figure-html/unnamed-chunk-18-1.png" width="50%" style="display: block; margin: auto;" /> --- class: sydney-yellow, middle, center ## Here's where you can find me: <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> [@almeidasilvaf](https://twitter.com/almeidasilvaf) <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> [almeidasilvaf](https://github.com/almeidasilvaf/) <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"></path></svg> [almeidasilvaf.github.io/home](https://almeidasilvaf.github.io/home/) <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M464 64H48C21.49 64 0 85.49 0 112v288c0 26.51 21.49 48 48 48h416c26.51 0 48-21.49 48-48V112c0-26.51-21.49-48-48-48zm0 48v40.805c-22.422 18.259-58.168 46.651-134.587 106.49-16.841 13.247-50.201 45.072-73.413 44.701-23.208.375-56.579-31.459-73.413-44.701C106.18 199.465 70.425 171.067 48 152.805V112h416zM48 400V214.398c22.914 18.251 55.409 43.862 104.938 82.646 21.857 17.205 60.134 55.186 103.062 54.955 42.717.231 80.509-37.199 103.053-54.947 49.528-38.783 82.032-64.401 104.947-82.653V400H48z"></path></svg> [fabricio_almeidasilva@hotmail.com](mailto:fabricio_almeidasilva@hotmail.com) <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <g label="icon" id="layer6" groupmode="layer"> <path id="path2" d="m 336.6206,194.53756 c -7.12991,-3.32734 -13.8671,-5.55949 -20.25334,-6.61343 -6.36534,-1.09517 -16.57451,-1.61223 -30.71059,-1.61223 h -36.70409 v 152.74712 h 37.63425 c 14.6735,0 26.08126,-1.01267 34.22385,-3.01709 8.14259,-2.00442 14.92159,-4.52592 20.35674,-7.62608 5.43519,-3.07925 10.416,-6.8615 14.94192,-11.38742 14.4876,-14.71475 21.74129,-33.27334 21.74129,-55.7176 0,-22.05151 -7.44016,-40.05177 -22.34085,-53.98159 -5.49732,-5.16674 -11.82143,-9.44459 -18.88918,-12.79281 z M 255.99999,8.0000031 C 119.02153,8.0000031 8.0000034,119.04185 8.0000034,255.99998 8.0000034,392.95812 119.02153,504 255.99999,504 392.97849,504 504,392.95812 504,255.99998 504,119.04185 392.97849,8.0000031 255.99999,8.0000031 Z M 173.66372,365.51268 H 144.27546 V 160.1481 h 29.38826 z M 158.94954,138.69619 c -11.13935,0 -20.21208,-9.01056 -20.21208,-20.21208 0,-11.11841 9.05183,-20.191181 20.21208,-20.191181 11.18058,0 20.23244,9.051831 20.23244,20.191181 -0.0219,11.22184 -9.05186,20.21208 -20.23244,20.21208 z m 241.3866,163.59715 c -5.29051,12.54475 -12.83407,23.58066 -22.65053,33.08742 -9.98203,9.83734 -21.59659,17.19443 -34.84378,22.19616 -7.74983,3.01709 -14.83852,5.06335 -21.30725,6.11726 -6.4891,1.01267 -18.82759,1.50883 -37.07593,1.50883 H 219.5033 V 160.1481 h 69.23318 c 27.96195,0 50.03378,4.1541 66.31951,12.54476 16.26485,8.36977 29.18144,20.72859 38.79164,36.97254 9.61013,16.26483 14.4254,34.01757 14.4254,53.19607 0.0227,13.76426 -2.66619,26.90802 -7.93576,39.43187 z" style="stroke-width:0.07717"></path> </g></svg> [Fabricio Almeida-Silva](https://orcid.org/0000-0002-5314-2964)