14 Cell cycle Assignment

In some datasets, the phase of cell cycle that a cell is in (G1/G2M/S) can account for alot of the observed transcriptomic variation. There may be clustering by phase, or separation in the UMAP by phase.

Seurat provides a simple method for assigning cell cycle state to each cell. Other methods are available.

More information about assigning cell cycle states to cells is in the cell cycle vignette

# A list of cell cycle markers, from Tirosh et al, 2015, is loaded with Seurat.  We can
# segregate this list into markers of G2/M phase and markers of S phase
s.genes   <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes

# Use those lists with the cell cycle scoring function in Seurat.
seurat_object <- CellCycleScoring(seurat_object, s.features = s.genes, g2m.features = g2m.genes)
#> Warning: The following features are not present in the
#> object: TYMS, MCM2, MCM4, UNG, GINS2, CDCA7, DTL, UHRF1,
#> MLF1IP, HELLS, RAD51AP1, GMNN, WDR76, CCNE2, ATAD2, RAD51,
#> RRM2, CDC45, CDC6, EXO1, DSCC1, BLM, CASP8AP2, CLSPN,
#> POLA1, CHAF1B, BRIP1, E2F8, not searching for symbol
#> synonyms
#> Warning: The following features are not present in the
#> object: CDK1, NUSAP1, UBE2C, BIRC5, TPX2, TOP2A, NDC80,
#> NUF2, MKI67, FAM64A, CCNB2, CKAP2L, AURKB, BUB1, KIF11,
#> GTSE1, HJURP, CDCA3, CDC20, TTK, CDC25C, KIF2C, NCAPD2,
#> DLGAP5, CDCA2, CDCA8, ECT2, KIF23, HMMR, AURKA, PSRC1,
#> ANLN, CENPE, NEK2, GAS2L3, CENPA, not searching for symbol
#> synonyms

Which adds S.Score, G2M.Score and Phase calls to the metadata.

head(seurat_object@meta.data)
#>                     orig.ident nCount_RNA nFeature_RNA  ind
#> AGGGCGCTATTTCC-1 SeuratProject       2053          532 1256
#> GGAGACGATTCGTT-1 SeuratProject        881          392 1256
#> CACCGTTGTCGTAG-1 SeuratProject       3130         1005 1016
#> TATCGTACACGCAT-1 SeuratProject       1042          549 1488
#> TACGAGACCTATTC-1 SeuratProject       2425          777 1244
#> GTACTACTCATACG-1 SeuratProject       3951         1064 1256
#>                  stim              cell multiplets
#> AGGGCGCTATTTCC-1 stim   CD14+ Monocytes    singlet
#> GGAGACGATTCGTT-1 stim       CD4 T cells    singlet
#> CACCGTTGTCGTAG-1 ctrl FCGR3A+ Monocytes    singlet
#> TATCGTACACGCAT-1 stim           B cells    singlet
#> TACGAGACCTATTC-1 stim       CD4 T cells    singlet
#> GTACTACTCATACG-1 ctrl FCGR3A+ Monocytes    singlet
#>                  percent.mt RNA_snn_res.0.5 seurat_clusters
#> AGGGCGCTATTTCC-1  1.6336634               1               0
#> GGAGACGATTCGTT-1  4.8809524               6              15
#> CACCGTTGTCGTAG-1  1.0655473               4              11
#> TATCGTACACGCAT-1  3.0662710               3               4
#> TACGAGACCTATTC-1  1.0837849               0               8
#> GTACTACTCATACG-1  0.7137395               4              11
#>                  pca_clusters harmony_clusters
#> AGGGCGCTATTTCC-1            3                1
#> GGAGACGATTCGTT-1            0                6
#> CACCGTTGTCGTAG-1            9                4
#> TATCGTACACGCAT-1            6                3
#> TACGAGACCTATTC-1            0                0
#> GTACTACTCATACG-1            9                4
#>                  RNA_snn_res.0.1 RNA_snn_res.0.2
#> AGGGCGCTATTTCC-1               1               1
#> GGAGACGATTCGTT-1               0               0
#> CACCGTTGTCGTAG-1               4               4
#> TATCGTACACGCAT-1               3               3
#> TACGAGACCTATTC-1               0               0
#> GTACTACTCATACG-1               4               4
#>                  RNA_snn_res.0.3 RNA_snn_res.0.4
#> AGGGCGCTATTTCC-1               1               1
#> GGAGACGATTCGTT-1               0               6
#> CACCGTTGTCGTAG-1               4               4
#> TATCGTACACGCAT-1               3               3
#> TACGAGACCTATTC-1               0               0
#> GTACTACTCATACG-1               4               4
#>                  RNA_snn_res.0.6 RNA_snn_res.0.7
#> AGGGCGCTATTTCC-1               1               1
#> GGAGACGATTCGTT-1               7               7
#> CACCGTTGTCGTAG-1               5               6
#> TATCGTACACGCAT-1               3               4
#> TACGAGACCTATTC-1               0               3
#> GTACTACTCATACG-1               5               6
#>                  RNA_snn_res.0.8 RNA_snn_res.0.9
#> AGGGCGCTATTTCC-1               1               1
#> GGAGACGATTCGTT-1               9               9
#> CACCGTTGTCGTAG-1               7               7
#> TATCGTACACGCAT-1               3               3
#> TACGAGACCTATTC-1               0               0
#> GTACTACTCATACG-1               7               7
#>                  RNA_snn_res.1 RNA_snn_res.1.1
#> AGGGCGCTATTTCC-1             0               1
#> GGAGACGATTCGTT-1             9               8
#> CACCGTTGTCGTAG-1             8               7
#> TATCGTACACGCAT-1             3               4
#> TACGAGACCTATTC-1             6               0
#> GTACTACTCATACG-1             8               7
#>                  RNA_snn_res.1.2 RNA_snn_res.1.3
#> AGGGCGCTATTTCC-1               0               2
#> GGAGACGATTCGTT-1              10              11
#> CACCGTTGTCGTAG-1               8               9
#> TATCGTACACGCAT-1               3               3
#> TACGAGACCTATTC-1               2               7
#> GTACTACTCATACG-1               8               9
#>                  RNA_snn_res.1.4 RNA_snn_res.1.5
#> AGGGCGCTATTTCC-1               0               1
#> GGAGACGATTCGTT-1              12              12
#> CACCGTTGTCGTAG-1              10              10
#> TATCGTACACGCAT-1               1               3
#> TACGAGACCTATTC-1               2               4
#> GTACTACTCATACG-1              10              10
#>                  RNA_snn_res.1.6 RNA_snn_res.1.7
#> AGGGCGCTATTTCC-1               1               0
#> GGAGACGATTCGTT-1              12              13
#> CACCGTTGTCGTAG-1              10              11
#> TATCGTACACGCAT-1               2               1
#> TACGAGACCTATTC-1               9               7
#> GTACTACTCATACG-1              10              11
#>                  RNA_snn_res.1.8 RNA_snn_res.1.9
#> AGGGCGCTATTTCC-1               0               7
#> GGAGACGATTCGTT-1              13              14
#> CACCGTTGTCGTAG-1              11              12
#> TATCGTACACGCAT-1               1               0
#> TACGAGACCTATTC-1               8               8
#> GTACTACTCATACG-1              11              12
#>                  RNA_snn_res.2 Naive_CD4_T1  cell_label
#> AGGGCGCTATTTCC-1             0   -0.3540023        <NA>
#> GGAGACGATTCGTT-1            15    2.1376158 Naive_CD4_T
#> CACCGTTGTCGTAG-1            11   -1.1487836        <NA>
#> TATCGTACACGCAT-1             4    0.5557941        <NA>
#> TACGAGACCTATTC-1             8    1.4250065 Naive_CD4_T
#> GTACTACTCATACG-1            11   -0.2082793        <NA>
#>                  SingleR.labels     S.Score   G2M.Score
#> AGGGCGCTATTTCC-1       Monocyte -0.06808986 -0.08878508
#> GGAGACGATTCGTT-1        T_cells  0.09975063 -0.06812238
#> CACCGTTGTCGTAG-1       Monocyte -0.12770457  0.16052598
#> TATCGTACACGCAT-1    Neutrophils  0.08685571 -0.08316351
#> TACGAGACCTATTC-1        T_cells -0.02572695  0.02074759
#> GTACTACTCATACG-1       Monocyte -0.07294007 -0.01517046
#>                  Phase
#> AGGGCGCTATTTCC-1    G1
#> GGAGACGATTCGTT-1     S
#> CACCGTTGTCGTAG-1   G2M
#> TATCGTACACGCAT-1     S
#> TACGAGACCTATTC-1   G2M
#> GTACTACTCATACG-1    G1

We can then check the cell phase on the UMAP. In this dataset, phase isn’t driving the clustering, and would not require any further handling.

DimPlot(seurat_object, reduction = 'umap_harmony', group.by = "Phase")

Where a bias is present, your course of action depends on the task at hand. It might involve ‘regressing out’ the cell cycle variation when scaling data ScaleData(kang, vars.to.regress="Phase"), omitting cell-cycle dominated clusters, or just accounting for it in your differential expression calculations.

If you are working with non-human data, you will need to convert these gene lists, or find new cell cycle associated genes in your species.