6 Defining the Gene List
Starting from the differential expression results here, how do we obtain a gene list for enrichment analysis?
6.1 Activities
Today’s exercise follows the process of getting the differentially expressed gene list using Excel. You could use another spreadsheet program, or you may prefer using a programming language like R.
Download the full table of data from either Degust or the CSV file here: Pezzini2016_SHSY5Ycelldiff_DE_table.csv. Import into Excel.
-
How many genes are differentially expressed? In these results, the FDR column contains the corrected p-value, and the ‘differentiated’ column shows the log₂ fold change of differentiated cells versus untreated cells (log₂(diff) – log₂(undiff)); 0 indicates no change, 1 represents a doubling, and –1 a halving.
Significant at 0.01?
That’s a particularly large number of genes - perhaps not unexpected, given how much the cells are changed in this experiment. How many significant genes also have 2-fold change in expression?
For this workshop, get the genes with a FDR<0.01 and 2x fold change (
log2(4)). Note - most experiments yield far less differential expression, but the difference between these two cell conditions is pretty extreme! Typically you would only filter at p<0.01 (and occasionally 2-fold change) - you might see tens to hundreds of results. However, this arbitrary threshold produces a more typical number of differentially expressed genes for downstream analysis. An alternative approach could be to take the top 500 genes.
Show
There are 4923 differentially expressed genes, 2149 of which have a 2-fold change in expression. With the aggressive filtering, there are 792 genes left.- How many genes are tested? This is your background.
Show
14420 genes tested.6.2 Common gotcha
Can you find SEPT4? This highlights how Gene name errors are widespread in the scientific literature
You can’t revert gene names automatically (try converting them to text!). You have to avoid this issue in the first place by importing gene columns as ‘text’ in Excel. See the video from HUGO : https://www.genenames.org/help/faq/#!/#tocAnchor-1-25-1
6.3 Example
An example Excel document showing this filtering process is available here:: Pezzini2016_SHSY5Ycelldiff_DE_table_filtering.xlsx.