12 SingleR

#install.packages("BiocManager")
#BiocManager::install(c("SingleCellExperiment","SingleR","celldex"),ask=F)
library(SingleCellExperiment)
library(SingleR)
library(celldex)

In this workshop we have focused on the Seurat package. However, there is another whole ecosystem of R packages for single cell analysis within Bioconductor. We won’t go into any detail on these packages in this workshop, but there is good material describing the object type online : OSCA.

For now, we’ll just convert our Seurat object into an object called SingleCellExperiment. Some popular packages from Bioconductor that work with this type are Slingshot, Scran, Scater.

sce <- as.SingleCellExperiment(pbmc)
sce
#> class: SingleCellExperiment 
#> dim: 13714 2638 
#> metadata(0):
#> assays(3): counts logcounts scaledata
#> rownames(13714): AL627309.1 AP006222.2 ... PNRC2.1
#>   SRSF10.1
#> rowData names(0):
#> colnames(2638): AAACATACAACCAC-1 AAACATTGAGCTAC-1 ...
#>   TTTGCATGAGAGGC-1 TTTGCATGCCTCAC-1
#> colData names(28): orig.ident nCount_RNA ...
#>   cell_label ident
#> reducedDimNames(2): PCA UMAP
#> mainExpName: RNA
#> altExpNames(0):

We will now use a package called SingleR to label each cell. SingleR uses a reference data set of cell types with expression data to infer the best label for each cell. A convenient collection of cell type reference is in the celldex package which currently contains the follow sets:

ls('package:celldex')
#> [1] "BlueprintEncodeData"             
#> [2] "DatabaseImmuneCellExpressionData"
#> [3] "HumanPrimaryCellAtlasData"       
#> [4] "ImmGenData"                      
#> [5] "MonacoImmuneData"                
#> [6] "MouseRNAseqData"                 
#> [7] "NovershternHematopoieticData"

In this example, we’ll use the HumanPrimaryCellAtlasData set, which contains high-level, and fine-grained label types. Lets download the reference dataset

# This too is a sce object,
# colData is equivalent to seurat's metadata
ref.set <- celldex::HumanPrimaryCellAtlasData()
#> see ?celldex and browseVignettes('celldex') for documentation
#> loading from cache
#> see ?celldex and browseVignettes('celldex') for documentation
#> loading from cache

The “main” labels.

unique(ref.set$label.main)
#>  [1] "DC"                   "Smooth_muscle_cells" 
#>  [3] "Epithelial_cells"     "B_cell"              
#>  [5] "Neutrophils"          "T_cells"             
#>  [7] "Monocyte"             "Erythroblast"        
#>  [9] "BM & Prog."           "Endothelial_cells"   
#> [11] "Gametocytes"          "Neurons"             
#> [13] "Keratinocytes"        "HSC_-G-CSF"          
#> [15] "Macrophage"           "NK_cell"             
#> [17] "Embryonic_stem_cells" "Tissue_stem_cells"   
#> [19] "Chondrocytes"         "Osteoblasts"         
#> [21] "BM"                   "Platelets"           
#> [23] "Fibroblasts"          "iPS_cells"           
#> [25] "Hepatocytes"          "MSC"                 
#> [27] "Neuroepithelial_cell" "Astrocyte"           
#> [29] "HSC_CD34+"            "CMP"                 
#> [31] "GMP"                  "MEP"                 
#> [33] "Myelocyte"            "Pre-B_cell_CD34-"    
#> [35] "Pro-B_cell_CD34+"     "Pro-Myelocyte"

An example of the types of “fine” labels.

head(unique(ref.set$label.fine))
#> [1] "DC:monocyte-derived:immature"       
#> [2] "DC:monocyte-derived:Galectin-1"     
#> [3] "DC:monocyte-derived:LPS"            
#> [4] "DC:monocyte-derived"                
#> [5] "Smooth_muscle_cells:bronchial:vit_D"
#> [6] "Smooth_muscle_cells:bronchial"

Now we’ll label our cells using the SingleCellExperiment object, with the above reference set.

pred.cnts <- SingleR::SingleR(test = sce, ref = ref.set, labels = ref.set$label.main)

Keep any types that have more than 10 cells to the label, and put those labels back on our Seurat object and plot our on our umap.

lbls.keep <- table(pred.cnts$labels)>10
pbmc$SingleR.labels <- ifelse(lbls.keep[pred.cnts$labels], pred.cnts$labels, 'Other')
DimPlot(pbmc, reduction='umap', group.by='SingleR.labels')

It is nice to see that even though SingleR does not use the clusters we computed earlier, the labels do seem to match those clusters reasonably well.

11 Cluster Markers

13 Differential Expression