Synthetic dataset templates
GeneVector ships with a small catalog of synthetic spatial transcriptomics datasets, each modeling a different biological pathology. They share a common output contract (AnnData + ground-truth dict) and are useful for benchmarking, testing, and feature development across spatial-omics tools.
from genevector.benchmarks.synthetic import (
build_paracrine_dataset,
build_niche_dataset,
build_gradient_dataset,
build_pathology,
list_templates,
)
print(list_templates()) # name → description map
adata, ground_truth = build_paracrine_dataset(seed=42)
The full reference, including the ground-truth schema and per-template parameter tables, lives in docs/synthetic_templates.md.
Templates
build_paracrine_dataset— two intermixed cell types with N ligand-receptor pairs and configurable mixing.build_niche_dataset— tumor blob with surrounding T cells; niche genes induced by local tumor density.build_gradient_dataset— 1D axial pathology, cells along a line with smooth monotone and peaked gene expression.build_pathology— full grafiti-derived FOV with paracrine, niche, T-rare, and housekeeping overlays composed.