genevector.metrics

genevector/metrics.py — co-expression target functions.

compute_mi_gpu(X, gene_names, n_bins=10, signed=False, device='cuda')[source]

GPU-accelerated MI using PyTorch scatter_add for joint histograms.

compute_mi_numba(X, gene_names, n_bins=10, signed=False)[source]

Numba-accelerated MI computation.

compute_mi_rust(X, gene_names, n_bins=10, signed=False)[source]

Rust-accelerated MI computation via PyO3.

compute_mi_vectorized(X, gene_names, n_bins=10, signed=False)[source]

Compute MI for all gene pairs using vectorized discretization.

Parameters:
  • X (sparse or dense matrix, shape (n_cells, n_genes))

  • gene_names (list of str)

  • n_bins (int)

  • signed (bool) – If True, multiply MI by sign of Pearson correlation.

Returns:

mi_scores – mi_scores[gene_a][gene_b] = float

Return type:

dict of dict

discretize_genes(X, n_bins=10)[source]

Discretize each gene’s expression into integer bin indices.

Parameters:
  • X (scipy.sparse.csr_matrix or np.ndarray) – Cells x genes expression matrix.

  • n_bins (int) – Number of bins per gene (excluding the zero bin).

Returns:

  • X_disc (np.ndarray, shape (n_cells, n_genes), dtype=np.int32) – Discretized expression. 0 = zero expression, 1..n_bins = quantile bins of nonzero expression.

  • n_bins_per_gene (np.ndarray, shape (n_genes,), dtype=np.int32) – Actual number of bins used per gene (may be < n_bins if a gene has fewer unique nonzero values).

get_target_function(name)[source]

Look up a registered target function by name.

Parameters:

name (str) – Name of the registered target.

Returns:

The target function.

Return type:

callable

Raises:

ValueError – If name is not registered.

register_target(name)[source]

Decorator to register a target function.

target_cosine(X, gene_names, **kwargs)[source]

Cosine similarity between gene expression vectors (each gene is a vector across cells).

target_jaccard(X, gene_names, **kwargs)[source]

Jaccard index on binarized expression (gene detected / not detected).

target_mi(X, gene_names, signed=False, backend='auto', device='cpu', n_bins=10, **kwargs)[source]

Mutual information (optionally signed by Pearson correlation).

target_pearson(X, gene_names, **kwargs)[source]

Pearson correlation between all gene pairs.

target_spearman(X, gene_names, **kwargs)[source]

Spearman rank correlation between all gene pairs.