Pro Pymc Workspace
All-in-one skill covering bayesian, modeling, pymc, build. Includes structured workflows, validation checks, and reusable patterns for scientific.
Pro PyMC Workspace
Build Bayesian statistical models using PyMC, a probabilistic programming library for Python. This skill covers model specification, prior selection, MCMC sampling, posterior analysis, model comparison, and hierarchical modeling for data analysis and scientific inference.
When to Use This Skill
Choose Pro PyMC Workspace when you need to:
- Fit Bayesian regression, hierarchical, or mixture models to data
- Quantify uncertainty in parameter estimates with credible intervals
- Perform Bayesian model comparison and selection
- Build custom probabilistic models beyond standard frequentist methods
Consider alternatives when:
- You need simple frequentist statistics (use scipy.stats or statsmodels)
- You need deep probabilistic models at scale (use NumPyro or Pyro)
- You need Gaussian process modeling specifically (use GPyTorch)
Quick Start
pip install pymc arviz matplotlib
import pymc as pm import numpy as np import arviz as az # Generate sample data np.random.seed(42) x = np.linspace(0, 10, 50) y = 2.5 * x + 1.0 + np.random.normal(0, 1.5, 50) # Build Bayesian linear regression with pm.Model() as model: # Priors slope = pm.Normal("slope", mu=0, sigma=10) intercept = pm.Normal("intercept", mu=0, sigma=10) sigma = pm.HalfNormal("sigma", sigma=5) # Likelihood mu = slope * x + intercept y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y) # Sample posterior trace = pm.sample(2000, tune=1000, cores=4, random_seed=42) # Summarize results print(az.summary(trace, var_names=["slope", "intercept", "sigma"])) # Plot posteriors az.plot_posterior(trace, var_names=["slope", "intercept"])
Core Concepts
Model Components
| Component | PyMC Class | Purpose |
|---|---|---|
| Prior | pm.Normal, pm.HalfNormal, pm.Uniform | Parameter beliefs before data |
| Likelihood | pm.Normal(observed=...) | Data-generating process |
| Deterministic | pm.Deterministic | Derived quantities to track |
| Sampler | pm.sample() | MCMC posterior sampling (NUTS) |
| Prior predictive | pm.sample_prior_predictive() | Simulate from priors |
| Posterior predictive | pm.sample_posterior_predictive() | Generate predictions |
Hierarchical Model
import pymc as pm import numpy as np import arviz as az # Data: test scores from multiple schools n_schools = 8 school_effects = np.array([28, 8, -3, 7, -1, 1, 18, 12]) school_se = np.array([15, 10, 16, 11, 9, 11, 10, 18]) with pm.Model() as hierarchical_model: # Hyperpriors (population-level) mu = pm.Normal("mu", mu=0, sigma=15) tau = pm.HalfCauchy("tau", beta=10) # School-level effects (partial pooling) theta = pm.Normal("theta", mu=mu, sigma=tau, shape=n_schools) # Likelihood y = pm.Normal("y", mu=theta, sigma=school_se, observed=school_effects) # Sample trace = pm.sample(4000, tune=2000, cores=4) # Analyze print(az.summary(trace, var_names=["mu", "tau", "theta"])) az.plot_forest(trace, var_names=["theta"], combined=True, hdi_prob=0.95)
Model Comparison
import pymc as pm import arviz as az import numpy as np # Fit two competing models x = np.random.randn(100) y = 3 * x + 0.5 * x**2 + np.random.randn(100) * 0.5 # Model 1: Linear with pm.Model() as linear_model: a = pm.Normal("a", 0, 10) b = pm.Normal("b", 0, 10) sigma = pm.HalfNormal("sigma", 5) mu = a + b * x pm.Normal("y", mu=mu, sigma=sigma, observed=y) trace_linear = pm.sample(2000, tune=1000) pm.compute_log_likelihood(trace_linear) # Model 2: Quadratic with pm.Model() as quad_model: a = pm.Normal("a", 0, 10) b = pm.Normal("b", 0, 10) c = pm.Normal("c", 0, 10) sigma = pm.HalfNormal("sigma", 5) mu = a + b * x + c * x**2 pm.Normal("y", mu=mu, sigma=sigma, observed=y) trace_quad = pm.sample(2000, tune=1000) pm.compute_log_likelihood(trace_quad) # Compare using LOO-CV comparison = az.compare( {"linear": trace_linear, "quadratic": trace_quad}, ic="loo" ) print(comparison)
Configuration
| Parameter | Description | Default |
|---|---|---|
draws | Number of posterior samples | 1000 |
tune | Number of tuning samples | 1000 |
cores | Parallel sampling chains | 4 |
chains | Number of MCMC chains | 4 |
target_accept | NUTS acceptance rate target | 0.8 |
random_seed | Reproducibility seed | None |
init | Initialization method | "jitter+adapt_diag" |
Best Practices
-
Run prior predictive checks — Before fitting, call
pm.sample_prior_predictive()and plot the implied data range. If your priors allow physically impossible values (negative heights, probabilities > 1), tighten them. Good priors constrain the model to plausible parameter space without being overly informative. -
Check convergence diagnostics — After sampling, verify that R-hat < 1.01 for all parameters, effective sample size > 400, and no divergent transitions. Use
az.summary(trace)andaz.plot_trace(trace)to diagnose. Divergences indicate model misspecification or difficult posterior geometry. -
Reparameterize when sampling is slow — Centered parameterizations of hierarchical models often have funnel-shaped posteriors. Use non-centered parameterization:
offset = pm.Normal("offset", 0, 1); theta = mu + tau * offsetinstead oftheta = pm.Normal("theta", mu, tau). -
Use informative priors when you have domain knowledge — Weakly informative priors (wide normals) work for exploration, but incorporating domain knowledge through informative priors improves estimation efficiency and prevents unreasonable estimates, especially with small datasets.
-
Run posterior predictive checks — After fitting, generate predictions with
pm.sample_posterior_predictive(trace)and compare to observed data. If the model can't reproduce key features of your data (mean, variance, shape), the model is misspecified.
Common Issues
Divergent transitions during sampling — Divergences indicate the sampler is struggling with the posterior geometry. Increase target_accept to 0.95-0.99, reparameterize hierarchical models (use non-centered form), or simplify the model. Never ignore divergences — they indicate unreliable posterior estimates.
Sampling is extremely slow — PyMC's NUTS sampler scales poorly with high-dimensional models (>1000 parameters). Use the JAX/NumPyro backend for speed: pm.sample(nuts_sampler="numpyro"). This can be 10-100x faster for complex models.
ArviZ plotting fails with InferenceData errors — Ensure you're using compatible versions of PyMC and ArviZ. PyMC v5+ returns InferenceData objects by default. If older code expects MultiTrace, update plotting calls to use ArviZ functions (az.plot_posterior) instead of PyMC's deprecated plotting methods.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.