genevector.model

GeneVector neural embedding model for gene co-expression learning.

class GeneVector(dataset, output_file, emb_dimension=100, batch_size=None, gain=1, c=100.0, device='cpu', init_ortho=False)[source]

Bases: object

GeneVector framework for training a gene embedding.

Parameters:

dataset (GeneVector.dataset.GeneVectorDataset) – GeneVector dataset.
output_file (int or None (default).) – Flat file to store gene embedding. Input weights and output weights stored in with “2” suffix.
emb_dimension – Number of hidden units and dimension of latent representation.
batch_size – Size to batch gene pairs, defaults to all gene pairs.
gain (int) – Scale factor of orthogonal weight initialization.
device (str) – Sets Torch device (“cpu”, “cuda:0”, “mps”)

__init__(dataset, output_file, emb_dimension=100, batch_size=None, gain=1, c=100.0, device='cpu', init_ortho=False)[source]: Constructor method

load(filepath)[source]

Load model state dict from file.

Parameters:: filepath (str) – Path to saved model state dict.

plot(fname=None, log=False)[source]

Plot training loss curve.

Parameters:

fname (str, optional) – File path to save figure.
log (bool) – If True, use log scale for x-axis.

save(filepath)[source]

Save model state dict to file.

Parameters:: filepath (str) – Output file path.

train(epochs, threshold=None, update_interval=20, alpha=0.0, beta=0.0)[source]

Trains the model for the specified number of epochs or until the loss falls below the threshold.

Parameters:

epchs – Maximum number of epochs.
threshold (float) – Stopping critera.
update_interval (int) – Number of epochs between printing loss to stdout.
alpha (float) – Coefficient of orthogonality penalty.
beta (float) – Coefficient of magnitude scaling.

class GeneVectorModel(num_embeddings, embedding_dim, gain=1.0, init_ortho=True)[source]

Bases: Module

GeneVector PyTorch model.

Parameters:

dataset (GeneVector.dataset.GeneVectorDataset) – num_embeddings.
output_file (int or None (default).) – Flat file to store gene embedding. Input weights and output weights stored in with “2” suffix.
emb_dimension – Number of hidden units and dimension of latent representation.
batch_size – Size to batch gene pairs, defaults to all gene pairs.
gain (int) – Scale factor of orthogonal weight initialization.
device (str) – Sets Torch device (“cpu”, “cuda:0”, “mps”)

__init__(num_embeddings, embedding_dim, gain=1.0, init_ortho=True)[source]

Initialize the embedding model.

Parameters:

num_embeddings (int) – Number of genes (vocabulary size).
embedding_dim (int) – Dimension of gene embedding vectors.
gain (float) – Scale factor for orthogonal weight initialization.
init_ortho (bool) – If True, use orthogonal initialization. Otherwise uniform(-1, 1).

forward(i_indices, j_indices)[source]

Compute dot product between gene embedding pairs.

Parameters:

i_indices (torch.LongTensor) – Indices for first gene in each pair.
j_indices (torch.LongTensor) – Indices for second gene in each pair.

Returns:

Dot product scores for each gene pair.

Return type:

torch.Tensor

save_embedding(id2word, file_name, layer)[source]

Save embedding weights to a .vec text file.

Parameters:

id2word (dict) – Mapping from gene index to gene symbol.
file_name (str) – Output file path.
layer (int) – 0 for input weights (wi), 1 for output weights (wj).