Comprehensive Mlops Weights And
Boost productivity using this track, experiments, automatic, logging. Includes structured workflows, validation checks, and reusable patterns for ai research.
Comprehensive MLOps with Weights & Biases
Overview
Weights & Biases (W&B) is the leading MLOps platform for experiment tracking, hyperparameter optimization, model versioning, and team collaboration across the entire machine learning lifecycle. With over 200,000 ML practitioners and 100+ framework integrations, W&B provides a unified dashboard for logging metrics, comparing runs, managing artifacts, and orchestrating hyperparameter sweeps. This template covers everything from basic experiment tracking through advanced production MLOps workflows including model registry, dataset versioning, and automated evaluation pipelines.
When to Use
- Experiment tracking: You need to log, visualize, and compare training metrics across dozens or hundreds of runs
- Hyperparameter optimization: You want automated sweep strategies (Bayesian, grid, random) with early stopping
- Model registry: You need versioned model artifacts with lineage tracking from data to deployment
- Team collaboration: Multiple researchers need shared dashboards, reports, and reproducible experiments
- Production monitoring: You need to track model performance, data drift, and pipeline health over time
- Artifact management: You need to version datasets, models, and evaluation results with full provenance
Choose alternatives when you need fully open-source self-hosted solutions (consider MLflow), or when you only need basic TensorBoard-style visualization.
Quick Start
# Install W&B pip install wandb # Authenticate (opens browser for API key) wandb login # Or set API key via environment variable export WANDB_API_KEY=your_api_key_here
import wandb # Initialize a tracked run run = wandb.init( project="my-first-project", config={ "learning_rate": 0.001, "epochs": 10, "batch_size": 32, "architecture": "ResNet50" } ) # Log metrics during training for epoch in range(10): train_loss = train_one_epoch(model, train_loader) val_loss = evaluate(model, val_loader) wandb.log({ "epoch": epoch, "train/loss": train_loss, "val/loss": val_loss }) # Finish the run wandb.finish()
Core Concepts
Projects and Runs
Every experiment execution is a Run that belongs to a Project. Runs capture configuration, metrics, system information, and artifacts automatically.
run = wandb.init( project="image-classification", name="resnet50-baseline-v2", # Human-readable run name tags=["baseline", "resnet", "v2"], # Filterable tags group="architecture-comparison", # Group related runs job_type="train", # Categorize run type notes="Testing ResNet50 with augmented data pipeline" ) # Access run metadata print(f"Run ID: {run.id}") print(f"Run URL: {run.url}") print(f"Run name: {run.name}")
Configuration Tracking
W&B captures hyperparameters and makes them searchable and comparable across all runs.
config = { "model": { "architecture": "ResNet50", "pretrained": True, "dropout": 0.3 }, "training": { "learning_rate": 0.001, "batch_size": 32, "epochs": 50, "optimizer": "AdamW", "weight_decay": 0.01 }, "data": { "dataset": "ImageNet", "augmentation": "randaugment", "image_size": 224 } } wandb.init(project="my-project", config=config) # Access nested config lr = wandb.config["training"]["learning_rate"] # Update config mid-run wandb.config.update({"training.warmup_steps": 500})
Metric Logging
# Log scalar metrics wandb.log({"loss": 0.5, "accuracy": 0.92}) # Log with explicit step wandb.log({"loss": 0.3}, step=500) # Log images wandb.log({"predictions": [ wandb.Image(img, caption=f"Pred: {pred}") for img, pred in zip(images, predictions) ]}) # Log histograms wandb.log({"weight_distribution": wandb.Histogram(model.fc.weight.data.cpu())}) # Log confusion matrix wandb.log({"conf_mat": wandb.plot.confusion_matrix( probs=None, y_true=ground_truth, preds=predictions, class_names=["cat", "dog", "bird"] )}) # Log tables for detailed inspection table = wandb.Table( columns=["image", "prediction", "ground_truth", "confidence"], data=[ [wandb.Image(img), pred, gt, conf] for img, pred, gt, conf in zip(images, preds, gts, confs) ] ) wandb.log({"prediction_table": table})
Hyperparameter Sweeps
# Define sweep configuration sweep_config = { "method": "bayes", # Bayesian optimization "metric": { "name": "val/accuracy", "goal": "maximize" }, "parameters": { "learning_rate": { "distribution": "log_uniform_values", "min": 1e-5, "max": 1e-2 }, "batch_size": { "values": [16, 32, 64, 128] }, "optimizer": { "values": ["adam", "sgd", "adamw"] }, "dropout": { "distribution": "uniform", "min": 0.1, "max": 0.5 } }, "early_terminate": { "type": "hyperband", "min_iter": 5, "eta": 3 } } # Create sweep sweep_id = wandb.sweep(sweep_config, project="my-project") # Define training function def train(): run = wandb.init() config = wandb.config model = build_model(config) optimizer = get_optimizer(config.optimizer, config.learning_rate) for epoch in range(50): train_loss = train_epoch(model, optimizer, config.batch_size) val_acc = validate(model) wandb.log({"train/loss": train_loss, "val/accuracy": val_acc}) # Launch sweep agent (runs 100 trials) wandb.agent(sweep_id, function=train, count=100)
Artifacts and Model Registry
# Log a dataset artifact dataset_artifact = wandb.Artifact( name="training-data-v3", type="dataset", description="Cleaned and augmented training split", metadata={"size": "50K images", "split": "train", "version": "3.0"} ) dataset_artifact.add_dir("data/train/") wandb.log_artifact(dataset_artifact) # Log a model artifact model_artifact = wandb.Artifact( name="resnet50-classifier", type="model", metadata={"accuracy": 0.95, "architecture": "ResNet50"} ) model_artifact.add_file("checkpoints/best_model.pth") wandb.log_artifact(model_artifact, aliases=["best", "production"]) # Use artifacts in downstream runs run = wandb.init(project="evaluation") artifact = run.use_artifact("training-data-v3:latest") data_dir = artifact.download() # Link model to registry run.link_artifact(model_artifact, "model-registry/production-classifier")
Framework Integrations
# --- HuggingFace Transformers --- from transformers import Trainer, TrainingArguments training_args = TrainingArguments( output_dir="./results", report_to="wandb", run_name="bert-finetune-v1", logging_steps=50, save_steps=500 ) trainer = Trainer(model=model, args=training_args, train_dataset=ds) trainer.train() # --- PyTorch Lightning --- from pytorch_lightning.loggers import WandbLogger wandb_logger = WandbLogger(project="lightning-project", log_model="all") trainer = pl.Trainer(logger=wandb_logger, max_epochs=10) trainer.fit(model, datamodule=dm) # --- Keras / TensorFlow --- from wandb.integration.keras import WandbMetricsLogger model.fit(x_train, y_train, callbacks=[WandbMetricsLogger()])
Configuration Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
project | str | None | Project name for grouping runs |
name | str | Auto | Human-readable run name |
config | dict | None | Hyperparameters and settings |
tags | list | [] | Filterable labels for runs |
group | str | None | Group name for related runs |
job_type | str | None | Run category (train, eval, etc.) |
mode | str | "online" | "online", "offline", or "disabled" |
resume | str | None | "allow", "must", "never", or run ID |
save_code | bool | False | Save main script and git patch |
notes | str | None | Markdown notes for the run |
| Sweep Parameter | Values | Description |
|---|---|---|
method | "bayes", "grid", "random" | Search strategy |
metric.goal | "minimize", "maximize" | Optimization direction |
early_terminate.type | "hyperband" | Early stopping strategy |
early_terminate.eta | int | Reduction factor for Hyperband |
parameters.*.distribution | "uniform", "log_uniform_values", "normal", "categorical" | Parameter sampling |
Best Practices
-
Use hierarchical metric names: Prefix metrics with
train/,val/,test/to organize dashboards cleanly and enable automatic grouping in the W&B UI. -
Log system metrics alongside training metrics: Track GPU utilization, memory usage, and throughput to identify bottlenecks early and correlate hardware performance with model quality.
-
Version everything as artifacts: Datasets, model checkpoints, evaluation results, and configuration files should all be versioned artifacts with metadata for complete reproducibility.
-
Use sweep early termination: Configure Hyperband early stopping in sweeps to kill underperforming trials quickly and focus compute budget on promising hyperparameter regions.
-
Create shareable reports: Use W&B Reports to combine charts, tables, and markdown narratives into documents that serve as experiment summaries for team reviews.
-
Tag runs systematically: Establish a tagging convention (e.g., experiment phase, model family, dataset version) so runs remain filterable as your project grows to hundreds of experiments.
-
Set
save_code=True: Enable code saving to capture the exact script and git diff for every run, ensuring you can always reproduce any experiment. -
Use
wandb.watch()for gradient tracking: Callwandb.watch(model)to log gradient and parameter histograms, helping diagnose vanishing or exploding gradients during training. -
Configure offline mode for clusters: Use
WANDB_MODE=offlinefor training on nodes without internet, then sync withwandb syncwhen connectivity is available. -
Separate sweep agents from training logic: Keep your training function pure and parameterized by
wandb.configso the same code works for manual runs and automated sweeps.
Troubleshooting
Runs not appearing in dashboard
Ensure wandb.init() is called before any logging. Check that WANDB_MODE is not set to disabled. Verify API key with wandb login --verify.
Sweep agent exits immediately
Confirm the sweep ID is correct and the sweep has not been stopped in the UI. Check that the training function calls wandb.init() at the start.
Large artifacts failing to upload
Break large datasets into multiple smaller artifacts. Use artifact.add_reference() for cloud-stored data instead of uploading directly. Increase timeout with WANDB_HTTP_TIMEOUT=300.
Offline runs not syncing
Run wandb sync --sync-all in the directory containing the wandb/ folder. Ensure the offline runs directory has not been moved or renamed.
Duplicate metric names causing chart issues Use consistent metric naming across all runs in a project. Avoid logging the same metric name with different step frequencies in a single run.
Memory usage growing during long training
Call wandb.log() with commit=True (the default) to flush data. For very long runs, consider increasing the _stats_sample_rate_seconds setting.
Config not showing in UI
Pass config to wandb.init(config=...) rather than logging it as a metric. For nested configs, use dot notation in the UI filter: config.model.architecture.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.