15 clusterProfiler and enrichplot
clusterProfiler is a comprehensive suite of enrichment tools. It has functions to run ORA or GSEA over commonly used databases (GO, KEGG, KEGG Modules, DAVID, Pathway Commons, WikiPathways) as well as universal enrichment functions to perform ORA or GSEA with novel species or custom gene sets. We will use these universal tools in the final activity of this workshop, focusing on the supported organisms and datbases for the present activity.
One of the key advantages of using R over web tools is flexibility with visualisations.
The same authors have released a plotting package enrichplot dedicated to plotting enrichment results. In this activity, we will perform GSEA with clusterProfiler
then explore many different visualisation options. At the end of the activity, we will have a poll to see which of the many plot types are the favourites! đ
Â
15.1 Supported databases, species and namespaces
One of the challenges when working with clusterProfiler
for FEA is that each enrichment function has different supported organisms and different namespace requirements, so you can not necessarily use all of the functions over the same gene list. In this activity, we will review the FEA functions and investigate their requirements, before performing a gene ID conversion with the bitr
function to enable compatability with our Pezzini et al 2017 dataset.
Letâs start by reviewing the dedicated database enrichment functions and what inbuilt support they have.
Database | GO | KEGG | KEGG Modules | Pathway Commons | WikiPathways | DAVID |
---|---|---|---|---|---|---|
GSEA function | gseGO | gseKEGG | gseMKEGG | gsePC | gseWP | NA |
ORA function | enrichGO | enrichKEGG | enrichMKEGG | enrichPC | enrichWP | enrichDAVID |
Supported species | Those with Bioconductor annotation package (20) | KEGG Organisms (thousands) | KEGG Organisms (thousands) | Human (or convert to UniProt IDs) | entrez | Those supported by DAVID |
Supported namespaces | differs depending on species | âkeggâ (compatible with entrez), âncbi-geneidâ, âncib-proteinidâ or âuniprotâ | âkeggâ (compatible with entrez), âncbi-geneidâ, âncib-proteinidâ or âuniprotâ | âhgncâ or âuniprot | entrez | Those supported by DAVID |
To obtain this information, 3 sources were required:
- The clusterProfiler user guide
- External websites (Bioconductor and KEGG Organisms)
-
clusterProfiler
functions (egget_wp_organisms()
,keytypes(org.Hs.eg.db)
)
This highlights the need to carefully review the tool you are using and explore the user guide and functionality.
Â
15.2 Activity overview
Since we have covered ORA with gprofiler
, we will perform a GSEA with clusterProfiler
using gseKEGG
.
- Explore the functions of
clusterProfiler
including which FEA functions support which organisms and which namespaces - Load input dataset (a gene matrix with adjusted P values and log2 fold change values)
- Extract the gene IDs and sort by log2 fold change to create the GSEA ranked gene list R object
- Use
bitr
to convert gene IDs from ENSEMBL to ENTREZ for compatability withgseKEGG
- Perform GSEA with
gseKEGG
- Visualise results with many different plot types from
enrichplot
Â
Letâs head over to RStudio now and try out some functions! đ
Â
⤠Go back to your RStudio interface and clear your environment by selecting Session
â Quit session
â Dont save
â Start new session
⤠Open the clusterProfiler.Rmd
notebook
Instructions for the analysis will continue from the R notebook.
Â
15.3 End of activity summary
- We have explored the supported organisms, namespaces and databases of the
clusterProfiler
enrichment functions - We have extracted a ranked gene list for GSEA and converted the gene IDs for compatability with
gseKEGG
- We have performed GSEA on the KEGG database with
gseKEGG
and visualised the results with multiple plot types - We have captured all version details relevant to the session within the R notebook and knit the file to HTML for record keeping
Â