7 Defining the genelist

Starting from those differential expression results here, how do we go about getting a genelist to calculate enrichment on?

7.1 Activities

Todays exercise follows the process of getting the differentially expressed gene list using excel. You could use another spreadsheet program, or some may prefer a programming language like R .

  1. Download the full table of data from either degust, or the csv file here: Pezzini2016_SHSY5Ycelldiff_DE_table.csv. Import into excel.

  2. How many genes are differentially expressed? In these results the FDR Column contains the corrected p-value, and the ‘differentiated’ column shows the log2 fold-change of differentiated cells vs untreated cells (log2(diff)-log2(undiff)); 0 is unchanged, 1 is doubled, -1 is halved.

    • Significant at 0.01?

    • That’s a particularly large number of genes - perhaps not unexpected given how much the cells are changed this experiment. How many significant genes also have 2-fold change in expression?

    • For this workshop, get the genes with a FDR <1x10^-4 and 2x fold change. Note that this is a ridiculous threshold - most experiments yeild far less differential expression, but the difference between these two cell conditions is pretty extreme! Typically you would only filter at p<0.01 (and occasionally 2-fold change) - you might see 10s to 100s of results. However, this arbitrary threshold gives a more typical number of differentially expressed genes for downstream analysis. An alternative approach could be to take the top 500 genes.

Show There are 4923 differentially expressed genes, 2149 of which have a 2-fold change in expression. With the aggressive filtering, there are 198 genes left.
  1. How many genes are tested? This is your background.
Show 14420 genes tested.

7.2 Common gotcha

Can you find SEPT4? Because Gene name errors are widespread in the scientific literature

You can’t revert the gene names automatically (try converting it to text!). You have to avoid it in the first place by importing gene columns as ‘text’ columns in excel. See video from HUGO : https://www.genenames.org/help/faq/#!/#tocAnchor-1-25-1


7.3 Example

An example excel document showing this filtering process is here: Pezzini2016_SHSY5Ycelldiff_DE_table_filtering.xlsx.