P

Pro Pymc Workspace

All-in-one skill covering bayesian, modeling, pymc, build. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Pro PyMC Workspace

Build Bayesian statistical models using PyMC, a probabilistic programming library for Python. This skill covers model specification, prior selection, MCMC sampling, posterior analysis, model comparison, and hierarchical modeling for data analysis and scientific inference.

When to Use This Skill

Choose Pro PyMC Workspace when you need to:

  • Fit Bayesian regression, hierarchical, or mixture models to data
  • Quantify uncertainty in parameter estimates with credible intervals
  • Perform Bayesian model comparison and selection
  • Build custom probabilistic models beyond standard frequentist methods

Consider alternatives when:

  • You need simple frequentist statistics (use scipy.stats or statsmodels)
  • You need deep probabilistic models at scale (use NumPyro or Pyro)
  • You need Gaussian process modeling specifically (use GPyTorch)

Quick Start

pip install pymc arviz matplotlib
import pymc as pm import numpy as np import arviz as az # Generate sample data np.random.seed(42) x = np.linspace(0, 10, 50) y = 2.5 * x + 1.0 + np.random.normal(0, 1.5, 50) # Build Bayesian linear regression with pm.Model() as model: # Priors slope = pm.Normal("slope", mu=0, sigma=10) intercept = pm.Normal("intercept", mu=0, sigma=10) sigma = pm.HalfNormal("sigma", sigma=5) # Likelihood mu = slope * x + intercept y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y) # Sample posterior trace = pm.sample(2000, tune=1000, cores=4, random_seed=42) # Summarize results print(az.summary(trace, var_names=["slope", "intercept", "sigma"])) # Plot posteriors az.plot_posterior(trace, var_names=["slope", "intercept"])

Core Concepts

Model Components

ComponentPyMC ClassPurpose
Priorpm.Normal, pm.HalfNormal, pm.UniformParameter beliefs before data
Likelihoodpm.Normal(observed=...)Data-generating process
Deterministicpm.DeterministicDerived quantities to track
Samplerpm.sample()MCMC posterior sampling (NUTS)
Prior predictivepm.sample_prior_predictive()Simulate from priors
Posterior predictivepm.sample_posterior_predictive()Generate predictions

Hierarchical Model

import pymc as pm import numpy as np import arviz as az # Data: test scores from multiple schools n_schools = 8 school_effects = np.array([28, 8, -3, 7, -1, 1, 18, 12]) school_se = np.array([15, 10, 16, 11, 9, 11, 10, 18]) with pm.Model() as hierarchical_model: # Hyperpriors (population-level) mu = pm.Normal("mu", mu=0, sigma=15) tau = pm.HalfCauchy("tau", beta=10) # School-level effects (partial pooling) theta = pm.Normal("theta", mu=mu, sigma=tau, shape=n_schools) # Likelihood y = pm.Normal("y", mu=theta, sigma=school_se, observed=school_effects) # Sample trace = pm.sample(4000, tune=2000, cores=4) # Analyze print(az.summary(trace, var_names=["mu", "tau", "theta"])) az.plot_forest(trace, var_names=["theta"], combined=True, hdi_prob=0.95)

Model Comparison

import pymc as pm import arviz as az import numpy as np # Fit two competing models x = np.random.randn(100) y = 3 * x + 0.5 * x**2 + np.random.randn(100) * 0.5 # Model 1: Linear with pm.Model() as linear_model: a = pm.Normal("a", 0, 10) b = pm.Normal("b", 0, 10) sigma = pm.HalfNormal("sigma", 5) mu = a + b * x pm.Normal("y", mu=mu, sigma=sigma, observed=y) trace_linear = pm.sample(2000, tune=1000) pm.compute_log_likelihood(trace_linear) # Model 2: Quadratic with pm.Model() as quad_model: a = pm.Normal("a", 0, 10) b = pm.Normal("b", 0, 10) c = pm.Normal("c", 0, 10) sigma = pm.HalfNormal("sigma", 5) mu = a + b * x + c * x**2 pm.Normal("y", mu=mu, sigma=sigma, observed=y) trace_quad = pm.sample(2000, tune=1000) pm.compute_log_likelihood(trace_quad) # Compare using LOO-CV comparison = az.compare( {"linear": trace_linear, "quadratic": trace_quad}, ic="loo" ) print(comparison)

Configuration

ParameterDescriptionDefault
drawsNumber of posterior samples1000
tuneNumber of tuning samples1000
coresParallel sampling chains4
chainsNumber of MCMC chains4
target_acceptNUTS acceptance rate target0.8
random_seedReproducibility seedNone
initInitialization method"jitter+adapt_diag"

Best Practices

  1. Run prior predictive checks — Before fitting, call pm.sample_prior_predictive() and plot the implied data range. If your priors allow physically impossible values (negative heights, probabilities > 1), tighten them. Good priors constrain the model to plausible parameter space without being overly informative.

  2. Check convergence diagnostics — After sampling, verify that R-hat < 1.01 for all parameters, effective sample size > 400, and no divergent transitions. Use az.summary(trace) and az.plot_trace(trace) to diagnose. Divergences indicate model misspecification or difficult posterior geometry.

  3. Reparameterize when sampling is slow — Centered parameterizations of hierarchical models often have funnel-shaped posteriors. Use non-centered parameterization: offset = pm.Normal("offset", 0, 1); theta = mu + tau * offset instead of theta = pm.Normal("theta", mu, tau).

  4. Use informative priors when you have domain knowledge — Weakly informative priors (wide normals) work for exploration, but incorporating domain knowledge through informative priors improves estimation efficiency and prevents unreasonable estimates, especially with small datasets.

  5. Run posterior predictive checks — After fitting, generate predictions with pm.sample_posterior_predictive(trace) and compare to observed data. If the model can't reproduce key features of your data (mean, variance, shape), the model is misspecified.

Common Issues

Divergent transitions during sampling — Divergences indicate the sampler is struggling with the posterior geometry. Increase target_accept to 0.95-0.99, reparameterize hierarchical models (use non-centered form), or simplify the model. Never ignore divergences — they indicate unreliable posterior estimates.

Sampling is extremely slow — PyMC's NUTS sampler scales poorly with high-dimensional models (>1000 parameters). Use the JAX/NumPyro backend for speed: pm.sample(nuts_sampler="numpyro"). This can be 10-100x faster for complex models.

ArviZ plotting fails with InferenceData errors — Ensure you're using compatible versions of PyMC and ArviZ. PyMC v5+ returns InferenceData objects by default. If older code expects MultiTrace, update plotting calls to use ArviZ functions (az.plot_posterior) instead of PyMC's deprecated plotting methods.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates