Statistical Analysis Dynamic
Enterprise-grade skill for statistical, analysis, toolkit, hypothesis. Includes structured workflows, validation checks, and reusable patterns for scientific.
Statistical Analysis Dynamic
Conduct rigorous statistical analysis with hypothesis testing, regression modeling, ANOVA, non-parametric tests, and effect size estimation using Python. This skill covers test selection, assumption checking, multiple comparison correction, power analysis, and result reporting in APA format.
When to Use This Skill
Choose Statistical Analysis Dynamic when you need to:
- Select and execute the appropriate statistical test for your experimental design
- Check test assumptions (normality, homoscedasticity, independence) and choose alternatives
- Perform multiple comparison corrections (Bonferroni, FDR, Tukey HSD)
- Report results with effect sizes, confidence intervals, and proper statistical formatting
Consider alternatives when:
- You need Bayesian inference and posterior distributions (use PyMC)
- You need time-series specific methods (use statsmodels TSA module)
- You need machine learning predictive models (use scikit-learn)
Quick Start
pip install scipy statsmodels pingouin numpy pandas
import numpy as np import pandas as pd from scipy import stats import pingouin as pg # Generate sample data np.random.seed(42) control = np.random.normal(loc=100, scale=15, size=30) treatment = np.random.normal(loc=110, scale=15, size=30) # 1. Check normality _, p_control = stats.shapiro(control) _, p_treatment = stats.shapiro(treatment) print(f"Normality p-values: control={p_control:.3f}, treatment={p_treatment:.3f}") # 2. Check equal variances _, p_levene = stats.levene(control, treatment) print(f"Levene's test p-value: {p_levene:.3f}") # 3. Independent samples t-test t_stat, p_value = stats.ttest_ind(control, treatment) cohens_d = (treatment.mean() - control.mean()) / np.sqrt( (control.std()**2 + treatment.std()**2) / 2 ) print(f"\nt({len(control)+len(treatment)-2}) = {t_stat:.3f}, p = {p_value:.3f}") print(f"Cohen's d = {cohens_d:.3f}") print(f"Control: M = {control.mean():.1f}, SD = {control.std():.1f}") print(f"Treatment: M = {treatment.mean():.1f}, SD = {treatment.std():.1f}") # Using pingouin for comprehensive output result = pg.ttest(treatment, control, paired=False) print(f"\n{result.to_string()}")
Core Concepts
Test Selection Guide
| Design | Parametric Test | Non-parametric Alternative |
|---|---|---|
| 2 independent groups | Independent t-test | Mann-Whitney U |
| 2 paired groups | Paired t-test | Wilcoxon signed-rank |
| 3+ independent groups | One-way ANOVA | Kruskal-Wallis H |
| 3+ paired groups | Repeated measures ANOVA | Friedman test |
| 2 categorical variables | Chi-square test | Fisher's exact test |
| Correlation | Pearson r | Spearman rho / Kendall tau |
| Prediction (continuous) | Linear regression | — |
| Prediction (binary) | Logistic regression | — |
| 2+ factors | Factorial ANOVA | Aligned rank transform |
ANOVA with Post-hoc Tests
import numpy as np import pandas as pd from scipy import stats import pingouin as pg # Create data for 4 treatment groups np.random.seed(42) data = pd.DataFrame({ 'score': np.concatenate([ np.random.normal(50, 10, 25), np.random.normal(55, 10, 25), np.random.normal(60, 10, 25), np.random.normal(52, 10, 25), ]), 'group': np.repeat(['Placebo', 'Low Dose', 'High Dose', 'Combination'], 25) }) # Check ANOVA assumptions # Normality per group for group in data['group'].unique(): subset = data[data['group'] == group]['score'] _, p = stats.shapiro(subset) print(f" {group}: Shapiro p = {p:.3f}") # Homogeneity of variances _, p_levene = stats.levene(*[data[data['group'] == g]['score'] for g in data['group'].unique()]) print(f"Levene's p = {p_levene:.3f}") # One-way ANOVA aov = pg.anova(data=data, dv='score', between='group', detailed=True) print(f"\nANOVA Results:") print(aov.to_string()) # Post-hoc: Tukey HSD posthoc = pg.pairwise_tukey(data=data, dv='score', between='group') print(f"\nTukey HSD Post-hoc:") print(posthoc[['A', 'B', 'diff', 'p-tukey', 'hedges']].to_string())
Configuration
| Parameter | Description | Default |
|---|---|---|
alpha | Significance level | 0.05 |
alternative | Test directionality (two-sided, greater, less) | "two-sided" |
correction | Multiple comparison method (bonferroni, fdr_bh, holm) | "fdr_bh" |
effect_size | Effect size measure (cohen_d, eta_squared, r) | Test-dependent |
confidence_level | CI level for estimates | 0.95 |
normality_test | Normality check method | "shapiro" |
variance_test | Homoscedasticity test | "levene" |
power_target | Target statistical power | 0.80 |
Best Practices
-
Always check assumptions before running parametric tests — Run Shapiro-Wilk for normality and Levene's test for homoscedasticity. If assumptions are violated (p < 0.05), use non-parametric alternatives or robust methods. Violating assumptions inflates Type I error rates and produces unreliable p-values.
-
Report effect sizes alongside p-values — P-values depend on sample size and don't indicate practical importance. Always report Cohen's d for t-tests (0.2=small, 0.5=medium, 0.8=large), eta-squared for ANOVA, and odds ratios for logistic regression. A statistically significant result with a tiny effect size may not be meaningful.
-
Apply multiple comparison correction when testing more than one hypothesis — Without correction, testing 20 comparisons at α=0.05 yields one false positive on average. Use Benjamini-Hochberg FDR for exploratory analyses (controls false discovery rate) and Bonferroni for confirmatory analyses (controls family-wise error rate).
-
Conduct power analysis before collecting data — Use
statsmodels.stats.powerorpingouin.power_ttestto determine required sample sizes. Underpowered studies waste resources and produce unreliable results. Target at least 80% power for the minimum meaningful effect size in your domain. -
Use pingouin for clean, comprehensive statistical output — Pingouin provides publication-ready output with effect sizes, confidence intervals, and Bayesian factors in a single function call. It's more concise than scipy.stats and includes assumption checks. Use it for routine statistical analysis and scipy for custom procedures.
Common Issues
P-value is exactly 0.000 or displays as 0 — Very small p-values underflow floating-point precision. Report as "p < 0.001" rather than "p = 0.000". For exact values, use stats.ttest_ind(a, b) which returns a float, then format: f"p = {p:.2e}" for scientific notation (e.g., p = 3.45e-12).
ANOVA is significant but no post-hoc pairs are significant — ANOVA's omnibus test has more power than pairwise comparisons because it tests the global null. Post-hoc corrections (Tukey, Bonferroni) further reduce power. This is normal with borderline significance. Report the ANOVA result and note that specific pairwise differences couldn't be isolated.
Non-parametric tests disagree with parametric tests — Parametric and non-parametric tests answer slightly different questions (means vs. distributions/ranks). Disagreement often means the effect is weak or assumption-dependent. Report both results transparently and let the reader evaluate which is more appropriate for the data.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.