Advanced ESM Platform

A scientific computing skill for protein analysis using ESM (Evolutionary Scale Modeling) — Meta AI's family of protein language models that generate embeddings, predict structure, and annotate function from amino acid sequences alone without requiring multiple sequence alignments.

When to Use This Skill

Choose Advanced ESM Platform when:

Generating protein embeddings for downstream ML tasks
Predicting protein structure from sequence (ESMFold)
Computing variant effect predictions for protein engineering
Performing zero-shot protein function annotation

Consider alternatives when:

You need the highest accuracy structure prediction (use AlphaFold2)
You need protein-protein complex prediction (use AlphaFold-Multimer)
You need protein design/engineering (use ProteinMPNN or RFdiffusion)
You need sequence alignment or homology search (use BLAST/HHblits)

Quick Start


claude "Generate ESM embeddings for a protein sequence and predict structure"


import torch
import esm

# Load ESM-2 model
model, alphabet = esm.pretrained.esm2_t33_650M_UR50D()
batch_converter = alphabet.get_batch_converter()
model.eval()

# Prepare sequence
data = [
    ("protein1", "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGD")
]

batch_labels, batch_strs, batch_tokens = batch_converter(data)

# Generate embeddings
with torch.no_grad():
    results = model(batch_tokens, repr_layers=[33], return_contacts=True)

# Per-residue embeddings (for downstream tasks)
embeddings = results["representations"][33]
print(f"Embedding shape: {embeddings.shape}")
# (batch, seq_len, 1280)

# Contact prediction
contacts = results["contacts"]
print(f"Contact map shape: {contacts.shape}")

Core Concepts

ESM Model Family

Model	Parameters	Use Case
ESM-2 (8M)	8M	Fast embedding, limited tasks
ESM-2 (150M)	150M	Good balance of speed and quality
ESM-2 (650M)	650M	High-quality embeddings
ESM-2 (3B)	3B	Best embeddings (GPU required)
ESMFold	—	Structure prediction from sequence
ESM-1v	650M	Variant effect prediction

ESMFold Structure Prediction


import esm

# Load ESMFold
model = esm.pretrained.esmfold_v1()
model = model.eval().cuda()

# Predict structure
sequence = "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGD"

with torch.no_grad():
    output = model.infer_pdb(sequence)

# Save PDB file
with open("prediction.pdb", "w") as f:
    f.write(output)

# Check confidence (pLDDT)
with torch.no_grad():
    output = model.infer(sequence)
    plddt = output["plddt"].mean().item()
    print(f"Mean pLDDT: {plddt:.1f}")

Variant Effect Prediction


import esm

# Load ESM-1v for variant scoring
model, alphabet = esm.pretrained.esm1v_t33_650M_UR90S_1()
batch_converter = alphabet.get_batch_converter()
model.eval()

# Score mutations using masked marginal probability
def score_variant(sequence, position, wt_aa, mut_aa):
    """Score a single amino acid substitution"""
    data = [("protein", sequence)]
    _, _, tokens = batch_converter(data)

    # Mask the position of interest
    tokens[0, position + 1] = alphabet.mask_idx  # +1 for BOS token

    with torch.no_grad():
        logits = model(tokens)["logits"]

    # Log-likelihood ratio
    wt_score = logits[0, position + 1, alphabet.get_idx(wt_aa)].item()
    mut_score = logits[0, position + 1, alphabet.get_idx(mut_aa)].item()

    return mut_score - wt_score  # Positive = favorable mutation

score = score_variant("MKTAYIAKQRQ...", 5, "I", "V")
print(f"I5V score: {score:.3f}")

Configuration

Parameter	Description	Default
`model_name`	ESM model variant	`esm2_t33_650M_UR50D`
`repr_layers`	Embedding layers to extract	`[33]` (last layer)
`return_contacts`	Compute contact predictions	`False`
`device`	CPU or CUDA GPU	Auto-detect
`batch_size`	Sequences per batch	`1`

Best Practices

Choose model size based on your GPU memory. ESM-2 650M requires ~4GB GPU memory, 3B requires ~12GB. If you lack a GPU, use the 150M model on CPU — it's slower but still produces useful embeddings.
Use the last layer for general-purpose embeddings. Layer 33 (final) embeddings capture the most abstract protein features. Earlier layers capture more local sequence patterns. For contact prediction, use the model's built-in contact head.
Average per-residue embeddings for protein-level features. For classification tasks (function prediction, localization), mean-pool the per-residue embeddings to get a fixed-size protein representation. This works better than using only the CLS token.
Use ESMFold for speed, AlphaFold2 for accuracy. ESMFold predicts structure in seconds without MSA computation, making it ideal for screening and rapid prototyping. For publication-quality structures, validate top candidates with AlphaFold2.
Batch sequences of similar length. Padding short sequences to match long ones in a batch wastes computation. Group sequences by length and process each group separately for optimal throughput.

Common Issues

CUDA out of memory on long sequences. ESM models' memory usage scales quadratically with sequence length. For proteins >1000 residues, use the 150M model instead of 650M, or split into domains. ESMFold has a practical limit around 400 residues on consumer GPUs.

Embedding quality seems poor for short peptides. ESM was trained on full-length protein sequences. Very short peptides (<20 residues) may not produce meaningful embeddings. For short peptides, consider using specialized peptide models.

ESMFold pLDDT is low for the entire protein. Low overall pLDDT may indicate an intrinsically disordered protein or a multi-domain protein where ESMFold struggles with inter-domain positioning. Check per-residue pLDDT to identify which regions are confident and which are uncertain.

⚠️ Loading Issue

Advanced Esm Platform

Advanced ESM Platform

When to Use This Skill

Quick Start

Core Concepts

ESM Model Family

ESMFold Structure Prediction

Variant Effect Prediction

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace