genevector.model

GeneVector neural embedding model for gene co-expression learning.

class GeneVector(dataset, output_file, emb_dimension=100, batch_size=None, gain=1, c=100.0, device='cpu', init_ortho=False)[source]

Bases: object

GeneVector framework for training a gene embedding.

Parameters:
  • dataset (GeneVector.dataset.GeneVectorDataset) – GeneVector dataset.

  • output_file (int or None (default).) – Flat file to store gene embedding. Input weights and output weights stored in with “2” suffix.

  • emb_dimension – Number of hidden units and dimension of latent representation.

  • batch_size – Size to batch gene pairs, defaults to all gene pairs.

  • gain (int) – Scale factor of orthogonal weight initialization.

  • device (str) – Sets Torch device (“cpu”, “cuda:0”, “mps”)

__init__(dataset, output_file, emb_dimension=100, batch_size=None, gain=1, c=100.0, device='cpu', init_ortho=False)[source]

Constructor method

load(filepath)[source]

Load model state dict from file.

Parameters:

filepath (str) – Path to saved model state dict.

plot(fname=None, log=False)[source]

Plot training loss curve.

Parameters:
  • fname (str, optional) – File path to save figure.

  • log (bool) – If True, use log scale for x-axis.

save(filepath)[source]

Save model state dict to file.

Parameters:

filepath (str) – Output file path.

train(epochs, threshold=None, update_interval=20, alpha=0.0, beta=0.0)[source]

Trains the model for the specified number of epochs or until the loss falls below the threshold.

Parameters:
  • epchs – Maximum number of epochs.

  • threshold (float) – Stopping critera.

  • update_interval (int) – Number of epochs between printing loss to stdout.

  • alpha (float) – Coefficient of orthogonality penalty.

  • beta (float) – Coefficient of magnitude scaling.

class GeneVectorModel(num_embeddings, embedding_dim, gain=1.0, init_ortho=True)[source]

Bases: Module

GeneVector PyTorch model.

Parameters:
  • dataset (GeneVector.dataset.GeneVectorDataset) – num_embeddings.

  • output_file (int or None (default).) – Flat file to store gene embedding. Input weights and output weights stored in with “2” suffix.

  • emb_dimension – Number of hidden units and dimension of latent representation.

  • batch_size – Size to batch gene pairs, defaults to all gene pairs.

  • gain (int) – Scale factor of orthogonal weight initialization.

  • device (str) – Sets Torch device (“cpu”, “cuda:0”, “mps”)

__init__(num_embeddings, embedding_dim, gain=1.0, init_ortho=True)[source]

Initialize the embedding model.

Parameters:
  • num_embeddings (int) – Number of genes (vocabulary size).

  • embedding_dim (int) – Dimension of gene embedding vectors.

  • gain (float) – Scale factor for orthogonal weight initialization.

  • init_ortho (bool) – If True, use orthogonal initialization. Otherwise uniform(-1, 1).

forward(i_indices, j_indices)[source]

Compute dot product between gene embedding pairs.

Parameters:
  • i_indices (torch.LongTensor) – Indices for first gene in each pair.

  • j_indices (torch.LongTensor) – Indices for second gene in each pair.

Returns:

Dot product scores for each gene pair.

Return type:

torch.Tensor

save_embedding(id2word, file_name, layer)[source]

Save embedding weights to a .vec text file.

Parameters:
  • id2word (dict) – Mapping from gene index to gene symbol.

  • file_name (str) – Output file path.

  • layer (int) – 0 for input weights (wi), 1 for output weights (wj).