12 SingleR

#install.packages("BiocManager")
#BiocManager::install(c("SingleCellExperiment","SingleR","celldex"),ask=F)
library(SingleCellExperiment)
library(SingleR)
library(celldex)

In this workshop we have focused on the Seurat package. However, there is another whole ecosystem of R packages for single cell analysis within Bioconductor. We won’t go into any detail on these packages in this workshop, but there is good material describing the object type online : OSCA.

For now, we’ll just convert our Seurat object into an object called SingleCellExperiment. Some popular packages from Bioconductor that work with this type are Slingshot, Scran, Scater.

sce <- as.SingleCellExperiment(pbmc)
sce
#> class: SingleCellExperiment 
#> dim: 13714 2638 
#> metadata(0):
#> assays(3): counts logcounts scaledata
#> rownames(13714): AL627309.1 AP006222.2 ... PNRC2.1
#>   SRSF10.1
#> rowData names(0):
#> colnames(2638): AAACATACAACCAC-1 AAACATTGAGCTAC-1 ...
#>   TTTGCATGAGAGGC-1 TTTGCATGCCTCAC-1
#> colData names(28): orig.ident nCount_RNA ...
#>   cell_label ident
#> reducedDimNames(2): PCA UMAP
#> mainExpName: RNA
#> altExpNames(0):

We will now use a package called SingleR to label each cell. SingleR uses a reference data set of cell types with expression data to infer the best label for each cell. A convenient collection of cell type reference is in the celldex package which currently contains the follow sets:

ls('package:celldex')
#> [1] "BlueprintEncodeData"             
#> [2] "DatabaseImmuneCellExpressionData"
#> [3] "HumanPrimaryCellAtlasData"       
#> [4] "ImmGenData"                      
#> [5] "MonacoImmuneData"                
#> [6] "MouseRNAseqData"                 
#> [7] "NovershternHematopoieticData"

In this example, we’ll use the HumanPrimaryCellAtlasData set, which contains high-level, and fine-grained label types. Lets download the reference dataset

# This too is a sce object,
# colData is equivalent to seurat's metadata
ref.set <- celldex::HumanPrimaryCellAtlasData()
#> see ?celldex and browseVignettes('celldex') for documentation
#> loading from cache
#> see ?celldex and browseVignettes('celldex') for documentation
#> loading from cache

The “main” labels.

unique(ref.set$label.main)
#>  [1] "DC"                   "Smooth_muscle_cells" 
#>  [3] "Epithelial_cells"     "B_cell"              
#>  [5] "Neutrophils"          "T_cells"             
#>  [7] "Monocyte"             "Erythroblast"        
#>  [9] "BM & Prog."           "Endothelial_cells"   
#> [11] "Gametocytes"          "Neurons"             
#> [13] "Keratinocytes"        "HSC_-G-CSF"          
#> [15] "Macrophage"           "NK_cell"             
#> [17] "Embryonic_stem_cells" "Tissue_stem_cells"   
#> [19] "Chondrocytes"         "Osteoblasts"         
#> [21] "BM"                   "Platelets"           
#> [23] "Fibroblasts"          "iPS_cells"           
#> [25] "Hepatocytes"          "MSC"                 
#> [27] "Neuroepithelial_cell" "Astrocyte"           
#> [29] "HSC_CD34+"            "CMP"                 
#> [31] "GMP"                  "MEP"                 
#> [33] "Myelocyte"            "Pre-B_cell_CD34-"    
#> [35] "Pro-B_cell_CD34+"     "Pro-Myelocyte"

An example of the types of “fine” labels.

head(unique(ref.set$label.fine))
#> [1] "DC:monocyte-derived:immature"       
#> [2] "DC:monocyte-derived:Galectin-1"     
#> [3] "DC:monocyte-derived:LPS"            
#> [4] "DC:monocyte-derived"                
#> [5] "Smooth_muscle_cells:bronchial:vit_D"
#> [6] "Smooth_muscle_cells:bronchial"

Now we’ll label our cells using the SingleCellExperiment object, with the above reference set.

pred.cnts <- SingleR::SingleR(test = sce, ref = ref.set, labels = ref.set$label.main)

Keep any types that have more than 10 cells to the label, and put those labels back on our Seurat object and plot our on our umap.

lbls.keep <- table(pred.cnts$labels)>10
pbmc$SingleR.labels <- ifelse(lbls.keep[pred.cnts$labels], pred.cnts$labels, 'Other')
DimPlot(pbmc, reduction='umap', group.by='SingleR.labels')

It is nice to see that even though SingleR does not use the clusters we computed earlier, the labels do seem to match those clusters reasonably well.