genevector.model
GeneVector neural embedding model for gene co-expression learning.
- class GeneVector(dataset, output_file, emb_dimension=100, batch_size=None, gain=1, c=100.0, device='cpu', init_ortho=False)[source]
Bases:
objectGeneVector framework for training a gene embedding.
- Parameters:
dataset (GeneVector.dataset.GeneVectorDataset) – GeneVector dataset.
output_file (int or None (default).) – Flat file to store gene embedding. Input weights and output weights stored in with “2” suffix.
emb_dimension – Number of hidden units and dimension of latent representation.
batch_size – Size to batch gene pairs, defaults to all gene pairs.
gain (int) – Scale factor of orthogonal weight initialization.
device (str) – Sets Torch device (“cpu”, “cuda:0”, “mps”)
- __init__(dataset, output_file, emb_dimension=100, batch_size=None, gain=1, c=100.0, device='cpu', init_ortho=False)[source]
Constructor method
- load(filepath)[source]
Load model state dict from file.
- Parameters:
filepath (str) – Path to saved model state dict.
- plot(fname=None, log=False)[source]
Plot training loss curve.
- Parameters:
fname (str, optional) – File path to save figure.
log (bool) – If True, use log scale for x-axis.
- save(filepath)[source]
Save model state dict to file.
- Parameters:
filepath (str) – Output file path.
- train(epochs, threshold=None, update_interval=20, alpha=0.0, beta=0.0)[source]
Trains the model for the specified number of epochs or until the loss falls below the threshold.
- Parameters:
epchs – Maximum number of epochs.
threshold (float) – Stopping critera.
update_interval (int) – Number of epochs between printing loss to stdout.
alpha (float) – Coefficient of orthogonality penalty.
beta (float) – Coefficient of magnitude scaling.
- class GeneVectorModel(num_embeddings, embedding_dim, gain=1.0, init_ortho=True)[source]
Bases:
ModuleGeneVector PyTorch model.
- Parameters:
dataset (GeneVector.dataset.GeneVectorDataset) – num_embeddings.
output_file (int or None (default).) – Flat file to store gene embedding. Input weights and output weights stored in with “2” suffix.
emb_dimension – Number of hidden units and dimension of latent representation.
batch_size – Size to batch gene pairs, defaults to all gene pairs.
gain (int) – Scale factor of orthogonal weight initialization.
device (str) – Sets Torch device (“cpu”, “cuda:0”, “mps”)
- __init__(num_embeddings, embedding_dim, gain=1.0, init_ortho=True)[source]
Initialize the embedding model.
- Parameters:
num_embeddings (int) – Number of genes (vocabulary size).
embedding_dim (int) – Dimension of gene embedding vectors.
gain (float) – Scale factor for orthogonal weight initialization.
init_ortho (bool) – If True, use orthogonal initialization. Otherwise uniform(-1, 1).
- forward(i_indices, j_indices)[source]
Compute dot product between gene embedding pairs.
- Parameters:
i_indices (torch.LongTensor) – Indices for first gene in each pair.
j_indices (torch.LongTensor) – Indices for second gene in each pair.
- Returns:
Dot product scores for each gene pair.
- Return type:
torch.Tensor