Co-expression Targets ===================== GeneVector can train gene embeddings on different co-expression metrics. The target function defines what relationship between gene pairs the model learns to reproduce with dot products in the embedding space. Built-in Targets ---------------- .. list-table:: :header-rows: 1 :widths: 15 15 70 * - Target - Speed - Description * - ``mi`` - varies - Mutual information (default). Captures nonlinear statistical dependence. Multiple backends available. * - ``pearson`` - instant - Pearson correlation coefficient. Linear co-expression. * - ``spearman`` - instant - Spearman rank correlation. Monotonic co-expression, robust to outliers. * - ``jaccard`` - instant - Jaccard index on binarized expression (detected / not detected). * - ``cosine`` - instant - Cosine similarity between gene expression vectors across cells. Usage ----- .. code-block:: python from genevector.data import GeneVectorDataset # Default: signed mutual information dataset = GeneVectorDataset(adata, target="mi", signed_mi=True) # Pearson correlation dataset = GeneVectorDataset(adata, target="pearson") # Spearman rank correlation dataset = GeneVectorDataset(adata, target="spearman") # Jaccard index dataset = GeneVectorDataset(adata, target="jaccard") # Cosine similarity dataset = GeneVectorDataset(adata, target="cosine") The ``mi_backend`` parameter only applies when ``target="mi"``. The matrix-based targets (Pearson, Spearman, Jaccard, cosine) compute in seconds via BLAS regardless of gene count. Graph-Aware Targets ------------------- Graph-aware targets measure co-expression across graph neighbors rather than within individual cells. The ``graph`` parameter accepts any scipy sparse adjacency matrix — spatial neighbors, TCR similarity, or custom graphs. .. code-block:: python import squidpy as sq # Build a spatial neighbor graph sq.gr.spatial_neighbors(adata, n_neighs=10, coord_type="generic") graph = adata.obsp["spatial_connectivities"] # Cross-correlation between self-expression and neighbor-aggregated expression dataset = GeneVectorDataset(adata, target="graph_xcorr", target_kwargs={"graph": graph, "aggr": "mean"}) The graph is **domain-agnostic** — the same target works on spatial, TCR, or any graph topology: .. code-block:: python from genevector.graphs import build_clonotype_graph # Same target, different graph clone_graph = build_clonotype_graph(adata, clone_key="clone_id") dataset = GeneVectorDataset(adata, target="graph_xcorr", target_kwargs={"graph": clone_graph}) Custom Targets -------------- Register a custom target function: .. code-block:: python from genevector.metrics import register_target @register_target("my_metric") def my_target(X, gene_names, **kwargs): # Compute pairwise scores # Must return dict[str, dict[str, float]] scores = {} # ... your computation ... return scores dataset = GeneVectorDataset(adata, target="my_metric") Or pass a callable directly without registration: .. code-block:: python dataset = GeneVectorDataset(adata, target=lambda X, names, **kw: my_score_function(X, names)) Caching ------- All computed target scores are cached automatically to ``~/.genevector/cache/``. Cache keys incorporate the expression matrix, gene list, target function name, and all parameters — different configurations never collide. .. code-block:: python # Disable caching dataset = GeneVectorDataset(adata, use_cache=False) # Clear the cache from genevector.cache import clear_cache clear_cache()