Introduction
Recent advancements in multiplexed tissue imaging allow for examination of tissue microenvironments in great detail. These cutting-edge technologies offer invaluable insights into cellular heterogeneity and spatial architectures, playing a crucial role in decoding mechanisms of treatment response and disease progression.
However, gaining a deep understanding of complex spatial patterns
remains challenging. SpaTopic
implements a novel spatial
topic model to integrate both cell type and spatial information to
identify the complex spatial tissue structures without human
intervention. The Collapsed Gibbs sampling algorithm is used for model
inference. Contrasting to computationally intensive
K-nearest-neighbor-based cell neighborhood analysis approaches,
SpaTopic
is more scalable to large-scale image datasets
without extracting neighborhood information for every single cell.
SpaTopic
can be applied either on a single image or
across multiple images.
Set-up
We use a non-small cell lung cancer image to illustrate how to use
SpaTopic
. The data object here can be download from here,
with original public resources available on the nanostring
website. These images were generated using a 960-plex CoxMx RNA
panel on the Nanostring CoxMx Spatial Molecular Imager platform. We
selected Lung5-1 sample and annotated cells using Azimuth based on the
human lung reference v1.0. The Lung5-1 sample contains 38 annotated cell
types. Since we used healthy lung tissue as the reference, tumor cells
were labeled as ’basal’ cells. More informaion can be found here.
## We use Seurat v5 package to visualize the results.
## If you still use Seurat v4, you will have the error
library(Seurat, quietly = TRUE);packageVersion("Seurat")
#> [1] '5.0.2'
## Load the Seurat object for the image
load("~/Documents/Research/github/SpaTopic_data/nanostring_example.rdata")
## for large dataset
options(future.globals.maxSize = 1e9)
We can use the Seurat function ImageDimPlot
to visualize
the distribution of cell types on the image.
library(ggplot2)
celltype.plot <-ImageDimPlot(nano.obj, fov = "lung5.rep1", axes = TRUE, cols = "glasbey",dark.background = T)
celltype.plot+theme(legend.position = "bottom",legend.direction = "vertical")
Topic Inference on a Single Image
Now, our data is ready. Below we show an example how to use
SpaTopic
to identify tissue architectures from multiplexed
images.
Input
The required input of SpaTopic is a data frame containing cells within on a single image or a list of data frames for multiple images. Each data frame consists of four columns: The image ID, X, Y cell coordinates, and cell type.
You may use the function Seurat5obj_to_SpaTopic()
to
extract input data from a typical Seurat v5 object. The column name for
cell type information need to be provided via option
group.by
.
library(SpaTopic);packageVersion("SpaTopic")
#> [1] '1.1.0.9900'
library(sf)
## Prepare input from Seurat Object
dataset<-Seurat5obj_to_SpaTopic(object = nano.obj, group.by = "predicted.annotation.l1",image = "image1")
head(dataset)
#> image X Y type
#> 1_1 image1 4215.889 158847.7 Dendritic
#> 2_1 image1 6092.889 158834.7 Macrophage
#> 3_1 image1 7214.889 158843.7 Neuroendocrine
#> 4_1 image1 7418.889 158813.7 Macrophage
#> 5_1 image1 7446.889 158845.7 Macrophage
#> 6_1 image1 3254.889 158838.7 CD4 T
Gibbs Sampling
This step takes around 90 seconds on a regular laptop
## Gibbs sampling for SpaTopic
system.time(gibbs.res<-SpaTopic_inference(dataset, ntopics = 7, sigma = 50, region_radius = 400))
#> number of cells per image:
#> 100149
#> Start initialization...
#> Numer of Initializations:
#> 10
#> Min perplexity during initialization:
#> 11.6302259709245
#> number of region centers selected:
#> 971
#> number of cells per region on average:
#> 103.140061791967
#> Finish initialization. Start Gibbs sampling...
#> Gibbs sampling done.
#> Output model perplexity:
#> 11.3156329380619
#> user system elapsed
#> 64.992 0.374 65.689
Topic Content and Distribution
SpaTopic identify seven topics from the image. Below we use the heatmap to show the cell type composition within each topic.
library(pheatmap)
m <- as.data.frame(gibbs.res$Beta)
pheatmap::pheatmap(t(m))
We assign each cell to a topic with the highest posterior probability and visualize the distribution of cell topics over the image.
prob<-as.matrix(gibbs.res$Z.trace)
nano.obj$Topic<-as.factor(apply(prob,1,which.max))
library(ggplot2)
palatte<- c("#0000FFFF","#FF0000FF","#00FF00FF","#009FFFFF","#FF00B6FF","#005300FF","#FFD300FF")
ImageDimPlot(nano.obj, fov = "lung5.rep1", group.by = "Topic", axes = TRUE,
dark.background = T,cols = palatte) + ggtitle("Topic")
Compare to BuildNicheAssay() in Seurat v5
We compare SpaTopic
to the function
BuildNicheAssay()
in Seurat v5. It took around 5 min on the
same laptop.
### NOT RUN!! We use the pre-computed result
system.time(nano.obj <- BuildNicheAssay(object = nano.obj, "lung5.rep1", group.by = "predicted.annotation.l1",
niches.k = 7, neighbors.k = 100))
We also visualize the distribution of seven niches over the same image.
nano.obj$niches<-factor(nano.obj$niches)
nano.obj$niches<-ordered(nano.obj$niches,levels = c(1,2,3,4,5,6,7))
## try to match the colors of topics
palatte2<- c("#FF00B6FF","#0000FFFF","#FFD300FF","#009FFFFF","#FF0000FF","#005300FF","#00FF00FF")
ImageDimPlot(nano.obj, fov = "lung5.rep1", group.by = "niches", axes = TRUE, dark.background = T,cols = palatte2) + ggtitle("Niches")
Topic Inference on Multiple Images
SpaTopic can identify common tissue patterns across multiple images. The input should be a list of data frames. See an example below (not run).
## tissue1, tissue2 are data frames of two different images.
gibbs.res<-SpaTopic_inference(list(A = tissue1, B = tissue2), ntopics = 7, sigma = 50, region_radius = 400)
Please check more examples in SpaTopic Home Page.