Making sense of gene and proteins lists with functional enrichment analysis

6 Defining the Gene List

Starting from the differential expression results here, how do we obtain a gene list for enrichment analysis?

6.1 Activities

Today’s exercise follows the process of getting the differentially expressed gene list using Excel. You could use another spreadsheet program, or you may prefer using a programming language like R.

  1. Download the full table of data from either Degust or the CSV file here: Pezzini2016_SHSY5Ycelldiff_DE_table.csv. Import into Excel.

  2. How many genes are differentially expressed? In these results, the FDR column contains the corrected p-value, and the ‘differentiated’ column shows the log₂ fold change of differentiated cells versus untreated cells (log₂(diff) – log₂(undiff)); 0 indicates no change, 1 represents a doubling, and –1 a halving.

    • Significant at 0.01?

    • That’s a particularly large number of genes - perhaps not unexpected, given how much the cells are changed in this experiment. How many significant genes also have 2-fold change in expression?

    • For this workshop, get the genes with a FDR<0.01 and 2x fold change (log2(4)). Note - most experiments yield far less differential expression, but the difference between these two cell conditions is pretty extreme! Typically you would only filter at p<0.01 (and occasionally 2-fold change) - you might see tens to hundreds of results. However, this arbitrary threshold produces a more typical number of differentially expressed genes for downstream analysis. An alternative approach could be to take the top 500 genes.

Show There are 4923 differentially expressed genes, 2149 of which have a 2-fold change in expression. With the aggressive filtering, there are 792 genes left.
  1. How many genes are tested? This is your background.
Show 14420 genes tested.

6.2 Common gotcha

Can you find SEPT4? This highlights how Gene name errors are widespread in the scientific literature

You can’t revert gene names automatically (try converting them to text!). You have to avoid this issue in the first place by importing gene columns as ‘text’ in Excel. See the video from HUGO : https://www.genenames.org/help/faq/#!/#tocAnchor-1-25-1


6.3 Example

An example Excel document showing this filtering process is available here:: Pezzini2016_SHSY5Ycelldiff_DE_table_filtering.xlsx.