18 Workshop wrap-up
Over the last two days, as well as the webinar in October, we’ve explored the statistical background, key considerations, and practical implementation of functional enrichment analysis, with hands-on experience with multiple web-based and R tools.
18.1 Summary of key messages
ORA and GSEA are different statistical analyses, and their inputs differ
GSEA: Kolmogorov-Smirnov test, requires a ranked yet unfiltered gene list ORA: Hypergeometric or Fisher’s Exact test, requires a filtered unranked gene list and experimental background gene list
Always correct for multiple testing
Never use unadjusted P values, as this will introduce many false positives. Different tools offer different multiple testing correction such as FDR or the more stringent BH. Always report your chosen method and the significance threshold applied to terms.
Different analysis methods will return different results
This is expected, due to underlying differences in database, algorithm, P value methods etc. As long as your methods are robust, sensible and reproducible, you can have confidence that your methods will stand up to scrutiny under peer review.
Ensure reproducibility
Lack of reproducibility through under-reporting methods is a common issue in this field (see Wijesooriya et al, linked below). Ensure to record all methodological details while you are working, including all the parameters and arguments applied, how the gene lists were generated, versions of databases and tools etc. If using R, specify a seed for constant random number generation in GSEA.
Interpret your results in their biological context
Functional categories are often broad and redundant. Use the FEA results as a guide, not the end point. Use visualisations and explore term redundancy methods to help focus results. Validate through aditional means according to the nature of your experiment, with the gold standard being wet-lab rather than in silico validation methods. Keep in mind the limitations of the input data when working with novel species with uncurated resources.
There are many databases and tool choices available
Suitability to your experiment depends on many factors, including:
- Your species, and what tools support it
- What databases and gene sets are relevant to your experiment, from the general (eg GO) to the specific (eg cancer pathways)
- Any privacy restrictions imposed on your data
- What is your skill level in R or desire to implement R code
- How much flexibility you want or require with visualisation
18.2 Relevant papers for further reading
Urgent need for consistent standards in functional enrichment analysis
Multiple sources of bias confound functional enrichment analysis of global -omics data
The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling
A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity
Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges
A critical comparison of topology-based pathway analysis methods
Ranking metrics in Gene Set Enrichment Analysis: Do they matter?
Methods and approaches in the topology-based analysis of biological pathways
Gene set analysis methods: Statistical models and methodological differences
A Comparative Study of Topology-based Pathway Enrichment Analysis Methods
Toward a Gold Standard for Benchmarking Gene Set Enrichment Analysis