Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics

[

Spatially resolved transcriptomics (SRT) is revolutionizing Genomics by enabling the high-throughput measurement of gene expression while preserving spatial context. Unlike single-cell RNA sequencing (scRNA-seq), which captures transcriptomes without spatial location information, SRT allows researchers to map gene expression to precise locations within a tissue, providing insights into tissue organization, cellular interactions, and spatially coordinated gene activity. The increasing volume and complexity of SRT data necessitate the development of robust statistical and computational methods, making this field highly relevant to data scientists, statisticians, and machine learning (ML) professionals. Techniques such as spatial statistics, graph-based models, and deep learning have been applied to extract meaningful biological insights from these data.

A key step in SRT analysis is the detection of spatially variable genes (SVGs)—genes whose expression varies non-randomly across spatial locations. Identifying SVGs is crucial for characterizing tissue architecture, functional gene modules, and cellular heterogeneity. However, despite the rapid development of computational methods for SVG detection, these methods vary widely in their definitions and statistical frameworks, leading to inconsistent results and challenges in interpretation.

In our recent review published in [1], we systematically examined 34 peer-reviewed SVG detection methods and introduced a classification framework that clarifies the biological significance of various SVG types. This text provides an summary of our findings, specializing in the three major categories of SVGs and the statistical principles underlying their detection.

SVG detection methods aim to uncover genes whose spatial expression reflects biological patterns fairly than technical noise. Based on our review of 34 peer-reviewed methods, we categorize SVGs into three groups: Overall SVGs, Cell-Type-Specific SVGs, and Spatial-Domain-Marker SVGs (Figure 2).

Image created by the authors, adapted from [1]. Publication timeline of 34 SVG detection methods. Colours represent three SVG categories: overall SVGs (green), cell-type-specific SVGs (red), and spatial-domain-marker SVGs (purple).

Methods for detecting the three SVG categories serve different purposes (Fig. 3). First, the detection of overall SVGs screens informative genes for downstream analyses, including the identification of spatial domains and functional gene modules. Second, detecting cell-type-specific SVGs goals to disclose spatial variation inside a cell type and help discover distinct cell subpopulations or states inside cell types. Third, spatial-domain-marker SVG detection is used to seek out marker genes to annotate and interpret spatial domains already detected. These markers help understand the molecular mechanisms underlying spatial domains and assist in annotating tissue layers in other datasets.

Image created by the authors, adapted from [1]. Conceptual visualization of three SVG categories: overall SVGs, cell-type-specific SVGs, and spatial-domain-marker SVGs. The left column shows a tissue slice with two cell types and three spatial domains. The proper column shows exemplar genes with colours representing the expression levels shown for an overall SVG, a cell-type-specific SVG, and a spatial-domain-marker SVG, respectively.

The connection among the many three SVG categories will depend on the detection methods, particularly the null and alternative hypotheses they employ. If an overall SVG detection method uses the null hypothesis that a non-SVG’s expression is independent of spatial location and the choice hypothesis that any deviation from this independence indicates an SVG, then its SVGs should theoretically include each cell-type-specific SVGs and spatial-domain-marker SVGs. For instance, DESpace [2] is a technique that detects each overall SVGs and spatial-domain-marker SVGs, and its detected overall SVGs have to be marker genes for some spatial domains. This inclusion relationship holds true except in extreme scenarios, equivalent to when a gene exhibits opposite cell-type-specific spatial patterns that effectively cancel one another out. Nevertheless, if an overall SVG detection method’s alternative hypothesis is defined for a selected spatial expression pattern, then its SVGs may not include some cell-type-specific SVGs or spatial-domain-marker SVGs.

To know how SVGs are detected, we categorized the statistical approaches into three major varieties of hypothesis tests:

Dependence Test – Examines the dependence between a gene’s expression level and the spatial location.
Regression Fixed-Effect Test – Examines whether some or the entire fixed-effect covariates, for example, spatial location, contribute to the mean of the response variable, i.e., a gene’s expression.
Regression Random-Effect Test (Variance Component Test) – Examines whether the random-effect covariates, for example, spatial location, contribute to the variance of the response variable, i.e., a gene’s expression.

To further explain how these tests are used for SVG detection, we denote Y as gene’s expression level and S because the spatial locations. Dependence test is probably the most general hypothesis test for SVG detection. For a given gene, it decides whether the gene’s expression level Y is independent of the spatial location S, i.e., the null hypothesis is:

There are two varieties of regression tests: fixed-effect tests, where the effect of the spatial location is assumed to be fixed, and random-effect tests, which assume the effect of the spatial location as random. To clarify these two varieties of tests, we use a linear mixed model for a given gene for example:

where the response variable ( Y_i ) is the gene’s expression level at spot ( i ), ( x_i ) ( epsilon ) ( R^p ) indicates the fixed-effect covariates of spot ( i ), ( z_i ) ( epsilon ) ( R^q ) denotes the random-effect covariates of spot ( i ), and ( epsilon_i ) is the random measurement error at spot ( i ) with zero mean. Within the model parameters, ( beta_0 ) is the (fixed) intercept, ( beta ) ( epsilon ) ( R^p ) indicates the fixed effects, and ( gamma ) ( epsilon ) ( R^q ) denotes the random effects with zero means and the covariance matrix:

On this linear mixed model, independence is assumed between random effect and random errors and amongst random errors.

Fixed-effect tests examine whether some or the entire fixed-effect covariates ( x_i ) (depending on spatial locations S) contribute to the mean of the response variable. If all fixed-effect covariates make no contribution, then:

The null hypothesis

implies

Random-effect tests examine whether the random-effect covariates ( z_i ) (depending on spatial locations S) contribute to the variance of the response variable Var⁡Yi, specializing in the decomposition:

and testing if the contribution of the random-effect covariates is zero. The null hypothesis:

implies

Among the many 23 methods that use frequentist hypothesis tests, dependence tests and random-effect regression tests have been primarily applied to detect overall SVGs, whereas fixed-effect regression tests have been used across all three SVG categories. Understanding these distinctions is essential to choosing the correct method for specific research questions.

Improving SVG detection methods requires balancing detection power, specificity, and scalability while addressing key challenges in spatial transcriptomics evaluation. Future developments should concentrate on adapting methods to different SRT technologies and tissue types, in addition to extending support for multi-sample SRT data to boost biological insights. Moreover, strengthening statistical rigor and validation frameworks will likely be crucial for ensuring the reliability of SVG detection. Benchmarking studies also need refinement, with clearer evaluation metrics and standardized datasets to offer robust method comparisons.

References

[1] Yan, G., Hua, S.H. & Li, J.J. (2025). Categorization of 34 computational methods to detect spatially variable genes from spatially resolved transcriptomics data. , 16, 1141. https://doi.org/10.1038/s41467-025-56080-w

[2] Cai, P., Robinson, M. D., & Tiberi, S. (2024). DESpace: spatially variable gene detection via differential expression testing of spatial clusters. Bioinformatics, 40(2). https://doi.org/10.1093/bioinformatics/btae027

]

Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics

References

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Methods to Use Easy Data Contracts in Python for Data Scientists

NVIDIA Open Sources Audio2Face Animation Model

Blazingly fast whisper transcriptions with Inference Endpoints

OpenAI CEO declares “code red” as Gemini gains 200 million users in 3 months

AlphaFold: Five Years of Impact

Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics

References

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.