14 Cell cycle Assignment
In some datasets, the phase of cell cycle that a cell is in (G1/G2M/S) can account for alot of the observed transcriptomic variation. There may be clustering by phase, or separation in the UMAP by phase.
Seurat provides a simple method for assigning cell cycle state to each cell. Other methods are available.
More information about assigning cell cycle states to cells is in the cell cycle vignette
# A list of cell cycle markers, from Tirosh et al, 2015, is loaded with Seurat. We can
# segregate this list into markers of G2/M phase and markers of S phase
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes
# Use those lists with the cell cycle scoring function in Seurat.
seurat_object <- CellCycleScoring(seurat_object, s.features = s.genes, g2m.features = g2m.genes)
#> Warning: The following features are not present in the
#> object: TYMS, MCM2, MCM4, UNG, GINS2, CDCA7, DTL, UHRF1,
#> MLF1IP, HELLS, RAD51AP1, GMNN, WDR76, CCNE2, ATAD2, RAD51,
#> RRM2, CDC45, CDC6, EXO1, DSCC1, BLM, CASP8AP2, CLSPN,
#> POLA1, CHAF1B, BRIP1, E2F8, not searching for symbol
#> synonyms
#> Warning: The following features are not present in the
#> object: CDK1, NUSAP1, UBE2C, BIRC5, TPX2, TOP2A, NDC80,
#> NUF2, MKI67, FAM64A, CCNB2, CKAP2L, AURKB, BUB1, KIF11,
#> GTSE1, HJURP, CDCA3, CDC20, TTK, CDC25C, KIF2C, NCAPD2,
#> DLGAP5, CDCA2, CDCA8, ECT2, KIF23, HMMR, AURKA, PSRC1,
#> ANLN, CENPE, NEK2, GAS2L3, CENPA, not searching for symbol
#> synonyms
Which adds S.Score, G2M.Score and Phase calls to the metadata.
head(seurat_object@meta.data)
#> orig.ident nCount_RNA nFeature_RNA ind
#> AGGGCGCTATTTCC-1 SeuratProject 2053 532 1256
#> GGAGACGATTCGTT-1 SeuratProject 881 392 1256
#> CACCGTTGTCGTAG-1 SeuratProject 3130 1005 1016
#> TATCGTACACGCAT-1 SeuratProject 1042 549 1488
#> TACGAGACCTATTC-1 SeuratProject 2425 777 1244
#> GTACTACTCATACG-1 SeuratProject 3951 1064 1256
#> stim cell multiplets
#> AGGGCGCTATTTCC-1 stim CD14+ Monocytes singlet
#> GGAGACGATTCGTT-1 stim CD4 T cells singlet
#> CACCGTTGTCGTAG-1 ctrl FCGR3A+ Monocytes singlet
#> TATCGTACACGCAT-1 stim B cells singlet
#> TACGAGACCTATTC-1 stim CD4 T cells singlet
#> GTACTACTCATACG-1 ctrl FCGR3A+ Monocytes singlet
#> percent.mt RNA_snn_res.0.5 seurat_clusters
#> AGGGCGCTATTTCC-1 1.6336634 1 0
#> GGAGACGATTCGTT-1 4.8809524 6 15
#> CACCGTTGTCGTAG-1 1.0655473 4 11
#> TATCGTACACGCAT-1 3.0662710 3 4
#> TACGAGACCTATTC-1 1.0837849 0 8
#> GTACTACTCATACG-1 0.7137395 4 11
#> pca_clusters harmony_clusters
#> AGGGCGCTATTTCC-1 3 1
#> GGAGACGATTCGTT-1 0 6
#> CACCGTTGTCGTAG-1 9 4
#> TATCGTACACGCAT-1 6 3
#> TACGAGACCTATTC-1 0 0
#> GTACTACTCATACG-1 9 4
#> RNA_snn_res.0.1 RNA_snn_res.0.2
#> AGGGCGCTATTTCC-1 1 1
#> GGAGACGATTCGTT-1 0 0
#> CACCGTTGTCGTAG-1 4 4
#> TATCGTACACGCAT-1 3 3
#> TACGAGACCTATTC-1 0 0
#> GTACTACTCATACG-1 4 4
#> RNA_snn_res.0.3 RNA_snn_res.0.4
#> AGGGCGCTATTTCC-1 1 1
#> GGAGACGATTCGTT-1 0 6
#> CACCGTTGTCGTAG-1 4 4
#> TATCGTACACGCAT-1 3 3
#> TACGAGACCTATTC-1 0 0
#> GTACTACTCATACG-1 4 4
#> RNA_snn_res.0.6 RNA_snn_res.0.7
#> AGGGCGCTATTTCC-1 1 1
#> GGAGACGATTCGTT-1 7 7
#> CACCGTTGTCGTAG-1 5 6
#> TATCGTACACGCAT-1 3 4
#> TACGAGACCTATTC-1 0 3
#> GTACTACTCATACG-1 5 6
#> RNA_snn_res.0.8 RNA_snn_res.0.9
#> AGGGCGCTATTTCC-1 1 1
#> GGAGACGATTCGTT-1 9 9
#> CACCGTTGTCGTAG-1 7 7
#> TATCGTACACGCAT-1 3 3
#> TACGAGACCTATTC-1 0 0
#> GTACTACTCATACG-1 7 7
#> RNA_snn_res.1 RNA_snn_res.1.1
#> AGGGCGCTATTTCC-1 0 1
#> GGAGACGATTCGTT-1 9 8
#> CACCGTTGTCGTAG-1 8 7
#> TATCGTACACGCAT-1 3 4
#> TACGAGACCTATTC-1 6 0
#> GTACTACTCATACG-1 8 7
#> RNA_snn_res.1.2 RNA_snn_res.1.3
#> AGGGCGCTATTTCC-1 0 2
#> GGAGACGATTCGTT-1 10 11
#> CACCGTTGTCGTAG-1 8 9
#> TATCGTACACGCAT-1 3 3
#> TACGAGACCTATTC-1 2 7
#> GTACTACTCATACG-1 8 9
#> RNA_snn_res.1.4 RNA_snn_res.1.5
#> AGGGCGCTATTTCC-1 0 1
#> GGAGACGATTCGTT-1 12 12
#> CACCGTTGTCGTAG-1 10 10
#> TATCGTACACGCAT-1 1 3
#> TACGAGACCTATTC-1 2 4
#> GTACTACTCATACG-1 10 10
#> RNA_snn_res.1.6 RNA_snn_res.1.7
#> AGGGCGCTATTTCC-1 1 0
#> GGAGACGATTCGTT-1 12 13
#> CACCGTTGTCGTAG-1 10 11
#> TATCGTACACGCAT-1 2 1
#> TACGAGACCTATTC-1 9 7
#> GTACTACTCATACG-1 10 11
#> RNA_snn_res.1.8 RNA_snn_res.1.9
#> AGGGCGCTATTTCC-1 0 7
#> GGAGACGATTCGTT-1 13 14
#> CACCGTTGTCGTAG-1 11 12
#> TATCGTACACGCAT-1 1 0
#> TACGAGACCTATTC-1 8 8
#> GTACTACTCATACG-1 11 12
#> RNA_snn_res.2 Naive_CD4_T1 cell_label
#> AGGGCGCTATTTCC-1 0 -0.3540023 <NA>
#> GGAGACGATTCGTT-1 15 2.1376158 Naive_CD4_T
#> CACCGTTGTCGTAG-1 11 -1.1487836 <NA>
#> TATCGTACACGCAT-1 4 0.5557941 <NA>
#> TACGAGACCTATTC-1 8 1.4250065 Naive_CD4_T
#> GTACTACTCATACG-1 11 -0.2082793 <NA>
#> SingleR.labels S.Score G2M.Score
#> AGGGCGCTATTTCC-1 Monocyte -0.06808986 -0.08878508
#> GGAGACGATTCGTT-1 T_cells 0.09975063 -0.06812238
#> CACCGTTGTCGTAG-1 Monocyte -0.12770457 0.16052598
#> TATCGTACACGCAT-1 Neutrophils 0.08685571 -0.08316351
#> TACGAGACCTATTC-1 T_cells -0.02572695 0.02074759
#> GTACTACTCATACG-1 Monocyte -0.07294007 -0.01517046
#> Phase
#> AGGGCGCTATTTCC-1 G1
#> GGAGACGATTCGTT-1 S
#> CACCGTTGTCGTAG-1 G2M
#> TATCGTACACGCAT-1 S
#> TACGAGACCTATTC-1 G2M
#> GTACTACTCATACG-1 G1
We can then check the cell phase on the UMAP. In this dataset, phase isn’t driving the clustering, and would not require any further handling.
DimPlot(seurat_object, reduction = 'umap_harmony', group.by = "Phase")
Where a bias is present, your course of action depends on the task at hand. It might involve ‘regressing out’ the cell cycle variation when scaling data ScaleData(kang, vars.to.regress="Phase")
, omitting cell-cycle dominated clusters, or just accounting for it in your differential expression calculations.
If you are working with non-human data, you will need to convert these gene lists, or find new cell cycle associated genes in your species.