Pro PyMC Workspace

Build Bayesian statistical models using PyMC, a probabilistic programming library for Python. This skill covers model specification, prior selection, MCMC sampling, posterior analysis, model comparison, and hierarchical modeling for data analysis and scientific inference.

When to Use This Skill

Choose Pro PyMC Workspace when you need to:

Fit Bayesian regression, hierarchical, or mixture models to data
Quantify uncertainty in parameter estimates with credible intervals
Perform Bayesian model comparison and selection
Build custom probabilistic models beyond standard frequentist methods

Consider alternatives when:

You need simple frequentist statistics (use scipy.stats or statsmodels)
You need deep probabilistic models at scale (use NumPyro or Pyro)
You need Gaussian process modeling specifically (use GPyTorch)

Quick Start


pip install pymc arviz matplotlib


import pymc as pm
import numpy as np
import arviz as az

# Generate sample data
np.random.seed(42)
x = np.linspace(0, 10, 50)
y = 2.5 * x + 1.0 + np.random.normal(0, 1.5, 50)

# Build Bayesian linear regression
with pm.Model() as model:
    # Priors
    slope = pm.Normal("slope", mu=0, sigma=10)
    intercept = pm.Normal("intercept", mu=0, sigma=10)
    sigma = pm.HalfNormal("sigma", sigma=5)

    # Likelihood
    mu = slope * x + intercept
    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y)

    # Sample posterior
    trace = pm.sample(2000, tune=1000, cores=4, random_seed=42)

# Summarize results
print(az.summary(trace, var_names=["slope", "intercept", "sigma"]))

# Plot posteriors
az.plot_posterior(trace, var_names=["slope", "intercept"])

Core Concepts

Model Components

Component	PyMC Class	Purpose
Prior	`pm.Normal`, `pm.HalfNormal`, `pm.Uniform`	Parameter beliefs before data
Likelihood	`pm.Normal(observed=...)`	Data-generating process
Deterministic	`pm.Deterministic`	Derived quantities to track
Sampler	`pm.sample()`	MCMC posterior sampling (NUTS)
Prior predictive	`pm.sample_prior_predictive()`	Simulate from priors
Posterior predictive	`pm.sample_posterior_predictive()`	Generate predictions

Hierarchical Model


import pymc as pm
import numpy as np
import arviz as az

# Data: test scores from multiple schools
n_schools = 8
school_effects = np.array([28, 8, -3, 7, -1, 1, 18, 12])
school_se = np.array([15, 10, 16, 11, 9, 11, 10, 18])

with pm.Model() as hierarchical_model:
    # Hyperpriors (population-level)
    mu = pm.Normal("mu", mu=0, sigma=15)
    tau = pm.HalfCauchy("tau", beta=10)

    # School-level effects (partial pooling)
    theta = pm.Normal("theta", mu=mu, sigma=tau, shape=n_schools)

    # Likelihood
    y = pm.Normal("y", mu=theta, sigma=school_se,
                  observed=school_effects)

    # Sample
    trace = pm.sample(4000, tune=2000, cores=4)

# Analyze
print(az.summary(trace, var_names=["mu", "tau", "theta"]))
az.plot_forest(trace, var_names=["theta"],
               combined=True, hdi_prob=0.95)

Model Comparison


import pymc as pm
import arviz as az
import numpy as np

# Fit two competing models
x = np.random.randn(100)
y = 3 * x + 0.5 * x**2 + np.random.randn(100) * 0.5

# Model 1: Linear
with pm.Model() as linear_model:
    a = pm.Normal("a", 0, 10)
    b = pm.Normal("b", 0, 10)
    sigma = pm.HalfNormal("sigma", 5)
    mu = a + b * x
    pm.Normal("y", mu=mu, sigma=sigma, observed=y)
    trace_linear = pm.sample(2000, tune=1000)
    pm.compute_log_likelihood(trace_linear)

# Model 2: Quadratic
with pm.Model() as quad_model:
    a = pm.Normal("a", 0, 10)
    b = pm.Normal("b", 0, 10)
    c = pm.Normal("c", 0, 10)
    sigma = pm.HalfNormal("sigma", 5)
    mu = a + b * x + c * x**2
    pm.Normal("y", mu=mu, sigma=sigma, observed=y)
    trace_quad = pm.sample(2000, tune=1000)
    pm.compute_log_likelihood(trace_quad)

# Compare using LOO-CV
comparison = az.compare(
    {"linear": trace_linear, "quadratic": trace_quad},
    ic="loo"
)
print(comparison)

Configuration

Parameter	Description	Default
`draws`	Number of posterior samples	`1000`
`tune`	Number of tuning samples	`1000`
`cores`	Parallel sampling chains	`4`
`chains`	Number of MCMC chains	`4`
`target_accept`	NUTS acceptance rate target	`0.8`
`random_seed`	Reproducibility seed	`None`
`init`	Initialization method	`"jitter+adapt_diag"`

Best Practices

Run prior predictive checks — Before fitting, call pm.sample_prior_predictive() and plot the implied data range. If your priors allow physically impossible values (negative heights, probabilities > 1), tighten them. Good priors constrain the model to plausible parameter space without being overly informative.
Check convergence diagnostics — After sampling, verify that R-hat < 1.01 for all parameters, effective sample size > 400, and no divergent transitions. Use az.summary(trace) and az.plot_trace(trace) to diagnose. Divergences indicate model misspecification or difficult posterior geometry.
Reparameterize when sampling is slow — Centered parameterizations of hierarchical models often have funnel-shaped posteriors. Use non-centered parameterization: offset = pm.Normal("offset", 0, 1); theta = mu + tau * offset instead of theta = pm.Normal("theta", mu, tau).
Use informative priors when you have domain knowledge — Weakly informative priors (wide normals) work for exploration, but incorporating domain knowledge through informative priors improves estimation efficiency and prevents unreasonable estimates, especially with small datasets.
Run posterior predictive checks — After fitting, generate predictions with pm.sample_posterior_predictive(trace) and compare to observed data. If the model can't reproduce key features of your data (mean, variance, shape), the model is misspecified.

Common Issues

Divergent transitions during sampling — Divergences indicate the sampler is struggling with the posterior geometry. Increase target_accept to 0.95-0.99, reparameterize hierarchical models (use non-centered form), or simplify the model. Never ignore divergences — they indicate unreliable posterior estimates.

Sampling is extremely slow — PyMC's NUTS sampler scales poorly with high-dimensional models (>1000 parameters). Use the JAX/NumPyro backend for speed: pm.sample(nuts_sampler="numpyro"). This can be 10-100x faster for complex models.

ArviZ plotting fails with InferenceData errors — Ensure you're using compatible versions of PyMC and ArviZ. PyMC v5+ returns InferenceData objects by default. If older code expects MultiTrace, update plotting calls to use ArviZ functions (az.plot_posterior) instead of PyMC's deprecated plotting methods.

⚠️ Loading Issue

Pro Pymc Workspace

Pro PyMC Workspace

When to Use This Skill

Quick Start

Core Concepts

Model Components

Hierarchical Model

Model Comparison

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace