GeneVector
GeneVector is a Python library for single-cell RNA sequencing analysis that learns distributed gene representations using a neural embedding approach. It enables gene co-expression analysis, cell type annotation, and metagene discovery through vector arithmetic operations.
Key Features
Gene embeddings from co-expression patterns using mutual information, correlation, or custom metrics
Automated cell type annotation using marker gene sets with probabilistic assignment
Metagene discovery through embedding clustering
Graph-aware targets for spatial transcriptomics and TCR/immune profiling data
High-performance backends: Rust (PyO3), Numba JIT, CUDA GPU, and vectorized NumPy
Score caching to disk for instant re-runs
Vector arithmetic for intuitive gene relationship analysis
from genevector.data import GeneVectorDataset
from genevector.model import GeneVector
from genevector.embedding import GeneEmbedding, CellEmbedding
dataset = GeneVectorDataset(adata)
model = GeneVector(dataset, output_file="genes.vec", emb_dimension=100)
model.train(1000, threshold=1e-6)
gene_embed = GeneEmbedding("genes.vec", dataset, vector="average")
cell_embed = CellEmbedding(dataset, gene_embed)
adata = cell_embed.get_adata()
Getting Started
User Guide