Working with the Gene Ontology
Kevin Rue-Albrecht
University of Oxfordkevin.rue-albrecht@imm.ox.ac.uk
16 October 2024
Source:vignettes/gene-ontology.Rmd
gene-ontology.Rmd
Scenario
In this vignette, we demonstrate how one may use the package GO.db to dynamically display additional information about selected pathways in the interactive user interface.
Demonstration
Example data
First, we generate pathway analysis results for simulated data using fgsea.
In particular, we use the package org.Hs.eg.db to fetch real gene sets. To reduce memory footprint, we retain only the gene sets associated with 15 to 500 genes.
Then, we simulate a score for each of the gene present in any of those remaining gene sets. In practice, that score could be the log2 fold-change of the gene in a differential expression analysis (among other possibilities).
Finally, we perform an FGSEA on the simulated data.
library("org.Hs.eg.db")
library("fgsea")
# Example data ----
## Pathways
pathways <- select(org.Hs.eg.db, keys(org.Hs.eg.db, "SYMBOL"), c("GOALL"), keytype = "SYMBOL")
pathways <- subset(pathways, ONTOLOGYALL == "BP")
pathways <- unique(pathways[, c("SYMBOL", "GOALL")])
pathways <- split(pathways$SYMBOL, pathways$GOALL)
len_pathways <- lengths(pathways)
pathways <- pathways[len_pathways > 15 & len_pathways < 500]
## Features
set.seed(1)
# simulate a score for all genes found across all pathways
feature_stats <- rnorm(length(unique(unlist(pathways))))
names(feature_stats) <- unique(unlist(pathways))
# arbitrarily select a pathway to simulate enrichment
pathway_id <- "GO:0046324"
pathway_genes <- pathways[[pathway_id]]
# increase score of genes in the selected pathway to simulate enrichment
feature_stats[pathway_genes] <- feature_stats[pathway_genes] + 1
# fgsea ----
set.seed(42)
fgseaRes <- fgsea(pathways = pathways,
stats = feature_stats,
minSize = 15,
maxSize = 500)
head(fgseaRes[order(pval), ])
#> pathway pval padj log2err ES NES size
#> <char> <num> <num> <num> <num> <num> <int>
#> 1: GO:0046324 5.580596e-10 2.747886e-06 0.8012156 0.6285002 2.579677 60
#> 2: GO:0046326 3.235498e-08 7.965797e-05 0.7195128 0.6752985 2.492665 37
#> 3: GO:0010827 5.741078e-08 9.423023e-05 0.7195128 0.5326197 2.321216 78
#> 4: GO:0010828 3.204527e-07 3.944773e-04 0.6749629 0.6033620 2.334222 44
#> 5: GO:0046323 5.839380e-06 5.750622e-03 0.6105269 0.4855652 2.116329 79
#> 6: GO:0008645 4.383171e-05 3.597122e-02 0.5573322 0.3990657 1.864469 121
#> leadingEdge
#> <list>
#> 1: TNF, KLF....
#> 2: KLF15, F....
#> 3: TNF, KLF....
#> 4: KLF15, F....
#> 5: TNF, KLF....
#> 6: TNF, KLF....
Then, we embed the fgsea results in a SummarizedExperiment object.
In this case, we create an empty
?SummarizedExperiment-class
object, without any simulated
count data nor metadata, as we will not be using any of those data in
this example.
We then embed the pathway analysis results in the newly created
?SummarizedExperiment-class
object.
But first, we reorder the results by increasing p-value. Although not essential, this implicitly defines the default ordering of the table in the live app.
library("SummarizedExperiment")
library("iSEEpathways")
se <- SummarizedExperiment()
fgseaRes <- fgseaRes[order(pval), ]
se <- embedPathwaysResults(fgseaRes, se, name = "fgsea", class = "fgsea", pathwayType = "GO")
Pathway information
In this example, we configure the app option
PathwaysTable.select.details
to define a function that,
given the identifier of the GO term currently selected in a panel,
displays information about that GO term.
Although not essential, this is a user-friendly and immediate way to ‘translate’ machine-friendly database identifiers into human-friendly descriptions.
library("iSEE")
library("GO.db")
library("shiny")
go_details <- function(x) {
info <- select(GO.db, x, c("TERM", "ONTOLOGY", "DEFINITION"), "GOID")
html <- list(p(strong(info$GOID), ":", info$TERM, paste0("(", info$ONTOLOGY, ")")))
if (!is.na(info$DEFINITION)) {
html <- append(html, list(p(info$DEFINITION)))
}
tagList(html)
}
se <- registerAppOptions(se, PathwaysTable.select.details = go_details)
Live app
Finally, we configure the app initial state and launch the live app.
app <- iSEE(se, initial = list(
PathwaysTable(ResultName="fgsea", Selected = "GO:0046324", PanelWidth = 12L)
))
if (interactive()) {
shiny::runApp(app)
}
Reproducibility
The iSEEpathways package (Rue-Albrecht and Soneson, 2024) was made possible thanks to:
- R (R Core Team, 2024)
- BiocStyle (Oleś, 2024)
- knitr (Xie, 2024)
- RefManageR (McLean, 2017)
- rmarkdown (Allaire, Xie, Dervieux, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2024)
- sessioninfo (Wickham, Chang, Flight, Müller, and Hester, 2021)
- testthat (Wickham, 2011)
This package was developed using biocthis.
Code for creating the vignette
## Create the vignette
library("rmarkdown")
system.time(render("gene-ontology.Rmd", "BiocStyle::html_document"))
## Extract the R code
library("knitr")
knit("gene-ontology.Rmd", tangle = TRUE)
Date the vignette was generated.
#> [1] "2024-10-16 14:47:19 UTC"
Wallclock time spent generating the vignette.
#> Time difference of 34.019 secs
R
session information.
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.1 (2024-06-14)
#> os Ubuntu 22.04.5 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz UTC
#> date 2024-10-16
#> pandoc 3.4 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> abind 1.4-8 2024-09-12 [1] RSPM (R 4.4.0)
#> AnnotationDbi * 1.67.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> backports 1.5.0 2024-05-23 [1] RSPM (R 4.4.0)
#> bibtex 0.5.1 2023-01-26 [1] RSPM (R 4.4.0)
#> Biobase * 2.65.1 2024-08-28 [1] Bioconductor 3.20 (R 4.4.1)
#> BiocGenerics * 0.51.3 2024-10-02 [1] Bioconductor 3.20 (R 4.4.1)
#> BiocManager 1.30.25 2024-08-28 [2] CRAN (R 4.4.1)
#> BiocParallel 1.39.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> BiocStyle * 2.33.1 2024-06-12 [1] Bioconductor 3.20 (R 4.4.0)
#> Biostrings 2.73.2 2024-09-26 [1] Bioconductor 3.20 (R 4.4.1)
#> bit 4.5.0 2024-09-20 [1] RSPM (R 4.4.0)
#> bit64 4.5.2 2024-09-22 [1] RSPM (R 4.4.0)
#> blob 1.2.4 2023-03-17 [1] RSPM (R 4.4.0)
#> bookdown 0.41 2024-10-16 [1] RSPM (R 4.4.0)
#> bslib 0.8.0 2024-07-29 [2] RSPM (R 4.4.0)
#> cachem 1.1.0 2024-05-16 [2] RSPM (R 4.4.0)
#> circlize 0.4.16 2024-02-20 [1] RSPM (R 4.4.0)
#> cli 3.6.3 2024-06-21 [2] RSPM (R 4.4.0)
#> clue 0.3-65 2023-09-23 [1] RSPM (R 4.4.0)
#> cluster 2.1.6 2023-12-01 [3] CRAN (R 4.4.1)
#> codetools 0.2-20 2024-03-31 [3] CRAN (R 4.4.1)
#> colorspace 2.1-1 2024-07-26 [1] RSPM (R 4.4.0)
#> colourpicker 1.3.0 2023-08-21 [1] RSPM (R 4.4.0)
#> ComplexHeatmap 2.21.1 2024-09-24 [1] Bioconductor 3.20 (R 4.4.1)
#> cowplot 1.1.3 2024-01-22 [1] RSPM (R 4.4.0)
#> crayon 1.5.3 2024-06-20 [2] RSPM (R 4.4.0)
#> data.table 1.16.2 2024-10-10 [1] RSPM (R 4.4.0)
#> DBI 1.2.3 2024-06-02 [1] RSPM (R 4.4.0)
#> DelayedArray 0.31.14 2024-10-03 [1] Bioconductor 3.20 (R 4.4.1)
#> desc 1.4.3 2023-12-10 [2] RSPM (R 4.4.0)
#> digest 0.6.37 2024-08-19 [2] RSPM (R 4.4.0)
#> doParallel 1.0.17 2022-02-07 [1] RSPM (R 4.4.0)
#> dplyr 1.1.4 2023-11-17 [1] RSPM (R 4.4.0)
#> DT 0.33 2024-04-04 [1] RSPM (R 4.4.0)
#> evaluate 1.0.1 2024-10-10 [2] RSPM (R 4.4.0)
#> fansi 1.0.6 2023-12-08 [2] RSPM (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [2] RSPM (R 4.4.0)
#> fastmatch 1.1-4 2023-08-18 [1] RSPM (R 4.4.0)
#> fgsea * 1.31.6 2024-10-09 [1] Bioconductor 3.20 (R 4.4.1)
#> fontawesome 0.5.2 2023-08-19 [2] RSPM (R 4.4.0)
#> foreach 1.5.2 2022-02-02 [1] RSPM (R 4.4.0)
#> fs 1.6.4 2024-04-25 [2] RSPM (R 4.4.0)
#> generics 0.1.3 2022-07-05 [1] RSPM (R 4.4.0)
#> GenomeInfoDb * 1.41.2 2024-10-02 [1] Bioconductor 3.20 (R 4.4.1)
#> GenomeInfoDbData 1.2.13 2024-10-16 [1] Bioconductor
#> GenomicRanges * 1.57.2 2024-10-09 [1] Bioconductor 3.20 (R 4.4.1)
#> GetoptLong 1.0.5 2020-12-15 [1] RSPM (R 4.4.0)
#> ggplot2 3.5.1 2024-04-23 [1] RSPM (R 4.4.0)
#> ggrepel 0.9.6 2024-09-07 [1] RSPM (R 4.4.0)
#> GlobalOptions 0.1.2 2020-06-10 [1] RSPM (R 4.4.0)
#> glue 1.8.0 2024-09-30 [2] RSPM (R 4.4.0)
#> GO.db * 3.20.0 2024-10-16 [1] Bioconductor
#> gtable 0.3.5 2024-04-22 [1] RSPM (R 4.4.0)
#> highr 0.11 2024-05-26 [2] RSPM (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [2] RSPM (R 4.4.0)
#> htmlwidgets 1.6.4 2023-12-06 [2] RSPM (R 4.4.0)
#> httpuv 1.6.15 2024-03-26 [2] RSPM (R 4.4.0)
#> httr 1.4.7 2023-08-15 [1] RSPM (R 4.4.0)
#> igraph 2.0.3 2024-03-13 [1] RSPM (R 4.4.0)
#> IRanges * 2.39.2 2024-07-17 [1] Bioconductor 3.20 (R 4.4.1)
#> iSEE * 2.17.4 2024-09-03 [1] Bioconductor 3.20 (R 4.4.1)
#> iSEEpathways * 1.3.1 2024-10-16 [1] Bioconductor
#> iterators 1.0.14 2022-02-05 [1] RSPM (R 4.4.0)
#> jquerylib 0.1.4 2021-04-26 [2] RSPM (R 4.4.0)
#> jsonlite 1.8.9 2024-09-20 [2] RSPM (R 4.4.0)
#> KEGGREST 1.45.1 2024-06-17 [1] Bioconductor 3.20 (R 4.4.0)
#> knitr 1.48 2024-07-07 [2] RSPM (R 4.4.0)
#> later 1.3.2 2023-12-06 [2] RSPM (R 4.4.0)
#> lattice 0.22-6 2024-03-20 [3] CRAN (R 4.4.1)
#> lifecycle 1.0.4 2023-11-07 [2] RSPM (R 4.4.0)
#> listviewer 4.0.0 2023-09-30 [1] RSPM (R 4.4.0)
#> lubridate 1.9.3 2023-09-27 [1] RSPM (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [2] RSPM (R 4.4.0)
#> Matrix 1.7-0 2024-04-26 [3] CRAN (R 4.4.1)
#> MatrixGenerics * 1.17.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> matrixStats * 1.4.1 2024-09-08 [1] RSPM (R 4.4.0)
#> memoise 2.0.1 2021-11-26 [2] RSPM (R 4.4.0)
#> mgcv 1.9-1 2023-12-21 [3] CRAN (R 4.4.1)
#> mime 0.12 2021-09-28 [2] RSPM (R 4.4.0)
#> miniUI 0.1.1.1 2018-05-18 [2] RSPM (R 4.4.0)
#> munsell 0.5.1 2024-04-01 [1] RSPM (R 4.4.0)
#> nlme 3.1-166 2024-08-14 [2] RSPM (R 4.4.0)
#> org.Hs.eg.db * 3.20.0 2024-10-16 [1] Bioconductor
#> pillar 1.9.0 2023-03-22 [2] RSPM (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [2] RSPM (R 4.4.0)
#> pkgdown 2.1.1 2024-09-17 [2] RSPM (R 4.4.0)
#> plyr 1.8.9 2023-10-02 [1] RSPM (R 4.4.0)
#> png 0.1-8 2022-11-29 [1] RSPM (R 4.4.0)
#> promises 1.3.0 2024-04-05 [2] RSPM (R 4.4.0)
#> R6 2.5.1 2021-08-19 [2] RSPM (R 4.4.0)
#> ragg 1.3.3 2024-09-11 [2] RSPM (R 4.4.0)
#> RColorBrewer 1.1-3 2022-04-03 [1] RSPM (R 4.4.0)
#> Rcpp 1.0.13 2024-07-17 [2] RSPM (R 4.4.0)
#> RefManageR * 1.4.0 2022-09-30 [1] RSPM (R 4.4.0)
#> rintrojs 0.3.4 2024-01-11 [1] RSPM (R 4.4.0)
#> rjson 0.2.23 2024-09-16 [1] RSPM (R 4.4.0)
#> rlang 1.1.4 2024-06-04 [2] RSPM (R 4.4.0)
#> rmarkdown 2.28 2024-08-17 [2] RSPM (R 4.4.0)
#> RSQLite 2.3.7 2024-05-27 [1] RSPM (R 4.4.0)
#> S4Arrays 1.5.11 2024-10-14 [1] Bioconductor 3.20 (R 4.4.1)
#> S4Vectors * 0.43.2 2024-07-17 [1] Bioconductor 3.20 (R 4.4.1)
#> sass 0.4.9 2024-03-15 [2] RSPM (R 4.4.0)
#> scales 1.3.0 2023-11-28 [1] RSPM (R 4.4.0)
#> sessioninfo * 1.2.2 2021-12-06 [2] RSPM (R 4.4.0)
#> shape 1.4.6.1 2024-02-23 [1] RSPM (R 4.4.0)
#> shiny * 1.9.1 2024-08-01 [2] RSPM (R 4.4.0)
#> shinyAce 0.4.2 2022-05-06 [1] RSPM (R 4.4.0)
#> shinydashboard 0.7.2 2021-09-30 [1] RSPM (R 4.4.0)
#> shinyjs 2.1.0 2021-12-23 [1] RSPM (R 4.4.0)
#> shinyWidgets 0.8.7 2024-09-23 [1] RSPM (R 4.4.0)
#> SingleCellExperiment * 1.27.2 2024-05-24 [1] Bioconductor 3.20 (R 4.4.0)
#> SparseArray 1.5.44 2024-10-06 [1] Bioconductor 3.20 (R 4.4.1)
#> stringi 1.8.4 2024-05-06 [2] RSPM (R 4.4.0)
#> stringr 1.5.1 2023-11-14 [2] RSPM (R 4.4.0)
#> SummarizedExperiment * 1.35.4 2024-10-09 [1] Bioconductor 3.20 (R 4.4.1)
#> systemfonts 1.1.0 2024-05-15 [2] RSPM (R 4.4.0)
#> textshaping 0.4.0 2024-05-24 [2] RSPM (R 4.4.0)
#> tibble 3.2.1 2023-03-20 [2] RSPM (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [1] RSPM (R 4.4.0)
#> timechange 0.3.0 2024-01-18 [1] RSPM (R 4.4.0)
#> UCSC.utils 1.1.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [2] RSPM (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [2] RSPM (R 4.4.0)
#> vipor 0.4.7 2023-12-18 [1] RSPM (R 4.4.0)
#> viridisLite 0.4.2 2023-05-02 [1] RSPM (R 4.4.0)
#> xfun 0.48 2024-10-03 [2] RSPM (R 4.4.0)
#> xml2 1.3.6 2023-12-04 [2] RSPM (R 4.4.0)
#> xtable 1.8-4 2019-04-21 [2] RSPM (R 4.4.0)
#> XVector 0.45.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> yaml 2.3.10 2024-07-26 [2] RSPM (R 4.4.0)
#> zlibbioc 1.51.1 2024-06-05 [1] Bioconductor 3.20 (R 4.4.0)
#>
#> [1] /__w/_temp/Library
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/local/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Bibliography
This vignette was generated using BiocStyle (Oleś, 2024) with knitr (Xie, 2024) and rmarkdown (Allaire, Xie, Dervieux et al., 2024) running behind the scenes.
Citations made with RefManageR (McLean, 2017).
[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.28. 2024. URL: https://github.com/rstudio/rmarkdown.
[2] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.
[3] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.33.1. 2024. DOI: 10.18129/B9.bioc.BiocStyle. URL: https://bioconductor.org/packages/BiocStyle.
[4] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2024. URL: https://www.R-project.org/.
[5] K. Rue-Albrecht and C. Soneson. iSEEpathways: iSEE extension for panels related to pathway analysis. R package version 1.3.1. 2024. URL: https://github.com/iSEE/iSEEpathways.
[6] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
[7] H. Wickham, W. Chang, R. Flight, et al. sessioninfo: R Session Information. R package version 1.2.2, https://r-lib.github.io/sessioninfo/. 2021. URL: https://github.com/r-lib/sessioninfo#readme.
[8] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.48. 2024. URL: https://yihui.org/knitr/.