Statistical Analysis Dynamic

Conduct rigorous statistical analysis with hypothesis testing, regression modeling, ANOVA, non-parametric tests, and effect size estimation using Python. This skill covers test selection, assumption checking, multiple comparison correction, power analysis, and result reporting in APA format.

When to Use This Skill

Choose Statistical Analysis Dynamic when you need to:

Select and execute the appropriate statistical test for your experimental design
Check test assumptions (normality, homoscedasticity, independence) and choose alternatives
Perform multiple comparison corrections (Bonferroni, FDR, Tukey HSD)
Report results with effect sizes, confidence intervals, and proper statistical formatting

Consider alternatives when:

You need Bayesian inference and posterior distributions (use PyMC)
You need time-series specific methods (use statsmodels TSA module)
You need machine learning predictive models (use scikit-learn)

Quick Start


pip install scipy statsmodels pingouin numpy pandas


import numpy as np
import pandas as pd
from scipy import stats
import pingouin as pg

# Generate sample data
np.random.seed(42)
control = np.random.normal(loc=100, scale=15, size=30)
treatment = np.random.normal(loc=110, scale=15, size=30)

# 1. Check normality
_, p_control = stats.shapiro(control)
_, p_treatment = stats.shapiro(treatment)
print(f"Normality p-values: control={p_control:.3f}, treatment={p_treatment:.3f}")

# 2. Check equal variances
_, p_levene = stats.levene(control, treatment)
print(f"Levene's test p-value: {p_levene:.3f}")

# 3. Independent samples t-test
t_stat, p_value = stats.ttest_ind(control, treatment)
cohens_d = (treatment.mean() - control.mean()) / np.sqrt(
    (control.std()**2 + treatment.std()**2) / 2
)
print(f"\nt({len(control)+len(treatment)-2}) = {t_stat:.3f}, p = {p_value:.3f}")
print(f"Cohen's d = {cohens_d:.3f}")
print(f"Control: M = {control.mean():.1f}, SD = {control.std():.1f}")
print(f"Treatment: M = {treatment.mean():.1f}, SD = {treatment.std():.1f}")

# Using pingouin for comprehensive output
result = pg.ttest(treatment, control, paired=False)
print(f"\n{result.to_string()}")

Core Concepts

Test Selection Guide

Design	Parametric Test	Non-parametric Alternative
2 independent groups	Independent t-test	Mann-Whitney U
2 paired groups	Paired t-test	Wilcoxon signed-rank
3+ independent groups	One-way ANOVA	Kruskal-Wallis H
3+ paired groups	Repeated measures ANOVA	Friedman test
2 categorical variables	Chi-square test	Fisher's exact test
Correlation	Pearson r	Spearman rho / Kendall tau
Prediction (continuous)	Linear regression	—
Prediction (binary)	Logistic regression	—
2+ factors	Factorial ANOVA	Aligned rank transform

ANOVA with Post-hoc Tests


import numpy as np
import pandas as pd
from scipy import stats
import pingouin as pg

# Create data for 4 treatment groups
np.random.seed(42)
data = pd.DataFrame({
    'score': np.concatenate([
        np.random.normal(50, 10, 25),
        np.random.normal(55, 10, 25),
        np.random.normal(60, 10, 25),
        np.random.normal(52, 10, 25),
    ]),
    'group': np.repeat(['Placebo', 'Low Dose', 'High Dose', 'Combination'], 25)
})

# Check ANOVA assumptions
# Normality per group
for group in data['group'].unique():
    subset = data[data['group'] == group]['score']
    _, p = stats.shapiro(subset)
    print(f"  {group}: Shapiro p = {p:.3f}")

# Homogeneity of variances
_, p_levene = stats.levene(*[data[data['group'] == g]['score']
                              for g in data['group'].unique()])
print(f"Levene's p = {p_levene:.3f}")

# One-way ANOVA
aov = pg.anova(data=data, dv='score', between='group', detailed=True)
print(f"\nANOVA Results:")
print(aov.to_string())

# Post-hoc: Tukey HSD
posthoc = pg.pairwise_tukey(data=data, dv='score', between='group')
print(f"\nTukey HSD Post-hoc:")
print(posthoc[['A', 'B', 'diff', 'p-tukey', 'hedges']].to_string())

Configuration

Parameter	Description	Default
`alpha`	Significance level	`0.05`
`alternative`	Test directionality (two-sided, greater, less)	`"two-sided"`
`correction`	Multiple comparison method (bonferroni, fdr_bh, holm)	`"fdr_bh"`
`effect_size`	Effect size measure (cohen_d, eta_squared, r)	Test-dependent
`confidence_level`	CI level for estimates	`0.95`
`normality_test`	Normality check method	`"shapiro"`
`variance_test`	Homoscedasticity test	`"levene"`
`power_target`	Target statistical power	`0.80`

Best Practices

Always check assumptions before running parametric tests — Run Shapiro-Wilk for normality and Levene's test for homoscedasticity. If assumptions are violated (p < 0.05), use non-parametric alternatives or robust methods. Violating assumptions inflates Type I error rates and produces unreliable p-values.
Report effect sizes alongside p-values — P-values depend on sample size and don't indicate practical importance. Always report Cohen's d for t-tests (0.2=small, 0.5=medium, 0.8=large), eta-squared for ANOVA, and odds ratios for logistic regression. A statistically significant result with a tiny effect size may not be meaningful.
Apply multiple comparison correction when testing more than one hypothesis — Without correction, testing 20 comparisons at α=0.05 yields one false positive on average. Use Benjamini-Hochberg FDR for exploratory analyses (controls false discovery rate) and Bonferroni for confirmatory analyses (controls family-wise error rate).
Conduct power analysis before collecting data — Use statsmodels.stats.power or pingouin.power_ttest to determine required sample sizes. Underpowered studies waste resources and produce unreliable results. Target at least 80% power for the minimum meaningful effect size in your domain.
Use pingouin for clean, comprehensive statistical output — Pingouin provides publication-ready output with effect sizes, confidence intervals, and Bayesian factors in a single function call. It's more concise than scipy.stats and includes assumption checks. Use it for routine statistical analysis and scipy for custom procedures.

Common Issues

P-value is exactly 0.000 or displays as 0 — Very small p-values underflow floating-point precision. Report as "p < 0.001" rather than "p = 0.000". For exact values, use stats.ttest_ind(a, b) which returns a float, then format: f"p = {p:.2e}" for scientific notation (e.g., p = 3.45e-12).

ANOVA is significant but no post-hoc pairs are significant — ANOVA's omnibus test has more power than pairwise comparisons because it tests the global null. Post-hoc corrections (Tukey, Bonferroni) further reduce power. This is normal with borderline significance. Report the ANOVA result and note that specific pairwise differences couldn't be isolated.

Non-parametric tests disagree with parametric tests — Parametric and non-parametric tests answer slightly different questions (means vs. distributions/ranks). Disagreement often means the effect is weak or assumption-dependent. Report both results transparently and let the reader evaluate which is more appropriate for the data.

⚠️ Loading Issue

Statistical Analysis Dynamic

Statistical Analysis Dynamic

When to Use This Skill

Quick Start

Core Concepts

Test Selection Guide

ANOVA with Post-hoc Tests

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace