GeneVector

https://badge.fury.io/py/genevector.svg https://github.com/nceglia/genevector/actions/workflows/tests.yml/badge.svg https://img.shields.io/badge/License-MIT-yellow.svg

GeneVector is a Python library for single-cell RNA sequencing analysis that learns distributed gene representations using a neural embedding approach. It enables gene co-expression analysis, cell type annotation, and metagene discovery through vector arithmetic operations.

Key Features

  • Gene embeddings from co-expression patterns using mutual information, correlation, or custom metrics

  • Automated cell type annotation using marker gene sets with probabilistic assignment

  • Metagene discovery through embedding clustering

  • Graph-aware targets for spatial transcriptomics and TCR/immune profiling data

  • High-performance backends: Rust (PyO3), Numba JIT, CUDA GPU, and vectorized NumPy

  • Score caching to disk for instant re-runs

  • Vector arithmetic for intuitive gene relationship analysis

from genevector.data import GeneVectorDataset
from genevector.model import GeneVector
from genevector.embedding import GeneEmbedding, CellEmbedding

dataset = GeneVectorDataset(adata)
model = GeneVector(dataset, output_file="genes.vec", emb_dimension=100)
model.train(1000, threshold=1e-6)

gene_embed = GeneEmbedding("genes.vec", dataset, vector="average")
cell_embed = CellEmbedding(dataset, gene_embed)
adata = cell_embed.get_adata()