Scientific Critical Thinking Studio
Enterprise-grade skill for evaluate, research, rigor, assess. Includes structured workflows, validation checks, and reusable patterns for scientific.
Scientific Critical Thinking Studio
Evaluate research papers, experimental designs, and scientific claims using structured critical analysis frameworks. This skill covers study design assessment, statistical evaluation, logical fallacy detection, evidence quality grading, and constructing evidence-based arguments.
When to Use This Skill
Choose Scientific Critical Thinking Studio when you need to:
- Evaluate the quality and validity of published research studies
- Identify methodological weaknesses, biases, and logical fallacies in scientific arguments
- Grade evidence quality using established frameworks (GRADE, Oxford levels)
- Construct rigorous, evidence-based arguments for proposals or reviews
Consider alternatives when:
- You need to conduct a systematic literature review (use Literature Review Complete)
- You need statistical analysis of your own data (use statistical analysis tools)
- You need peer review writing specifically (use Master Peer Suite)
Quick Start
from dataclasses import dataclass, field from typing import List from enum import Enum class EvidenceLevel(Enum): HIGH = "High - RCT or systematic review" MODERATE = "Moderate - Cohort study or downgraded RCT" LOW = "Low - Case-control or case series" VERY_LOW = "Very Low - Expert opinion or mechanistic reasoning" @dataclass class CriticalAppraisal: title: str study_type: str sample_size: int evidence_level: EvidenceLevel strengths: List[str] = field(default_factory=list) weaknesses: List[str] = field(default_factory=list) biases: List[str] = field(default_factory=list) verdict: str = "" def summary(self): print(f"Study: {self.title}") print(f"Type: {self.study_type}, N={self.sample_size}") print(f"Evidence: {self.evidence_level.value}") print(f"\nStrengths:") for s in self.strengths: print(f" + {s}") print(f"Weaknesses:") for w in self.weaknesses: print(f" - {w}") print(f"Biases:") for b in self.biases: print(f" ! {b}") print(f"\nVerdict: {self.verdict}") appraisal = CriticalAppraisal( title="Effect of Drug X on Blood Pressure", study_type="Randomized Controlled Trial", sample_size=150, evidence_level=EvidenceLevel.HIGH, strengths=[ "Double-blind, placebo-controlled design", "Pre-registered primary endpoint", "Adequate follow-up period (12 months)" ], weaknesses=[ "Single-center study limits generalizability", "High dropout rate (18%) may introduce attrition bias", "Secondary outcomes not corrected for multiple testing" ], biases=["Potential selection bias - strict inclusion criteria"], verdict="Moderate confidence - results support efficacy but external validity limited" ) appraisal.summary()
Core Concepts
Critical Analysis Framework
| Dimension | Questions to Ask | Red Flags |
|---|---|---|
| Study Design | RCT? Blinding? Control group? | No control, unblinded |
| Sample | Size adequate? Representative? | Small N, convenience sample |
| Methods | Reproducible? Validated measures? | Vague methods, novel metrics |
| Statistics | Appropriate tests? Effect sizes? | p-hacking, no corrections |
| Results | Consistent with methods? All reported? | Selective reporting, HARKing |
| Interpretation | Claims match evidence? Alternative explanations? | Overclaiming, causal language |
| Conflicts | Funding sources? Author interests? | Industry-funded, undisclosed |
Bias Detection Checklist
def assess_biases(study_info): """Evaluate potential biases in a research study.""" biases = [] # Selection bias if not study_info.get("randomization"): biases.append(("Selection bias", "HIGH", "No randomization — groups may differ systematically")) elif not study_info.get("allocation_concealment"): biases.append(("Selection bias", "MODERATE", "Randomized but allocation not concealed")) # Performance bias if not study_info.get("blinding_participants"): biases.append(("Performance bias", "HIGH", "Participants knew their group assignment")) # Detection bias if not study_info.get("blinding_assessors"): biases.append(("Detection bias", "MODERATE", "Outcome assessors were not blinded")) # Attrition bias dropout = study_info.get("dropout_rate", 0) if dropout > 20: biases.append(("Attrition bias", "HIGH", f"Dropout rate {dropout}% exceeds 20% threshold")) elif dropout > 10: biases.append(("Attrition bias", "MODERATE", f"Dropout rate {dropout}% is moderate")) # Reporting bias if not study_info.get("pre_registered"): biases.append(("Reporting bias", "MODERATE", "Study was not pre-registered")) if study_info.get("selective_reporting"): biases.append(("Reporting bias", "HIGH", "Not all pre-registered outcomes were reported")) # Conflict of interest if study_info.get("industry_funded"): biases.append(("Funding bias", "MODERATE", "Industry-funded — potential conflict of interest")) return biases study = { "randomization": True, "allocation_concealment": True, "blinding_participants": True, "blinding_assessors": False, "dropout_rate": 15, "pre_registered": True, "selective_reporting": False, "industry_funded": True } biases = assess_biases(study) for name, risk, explanation in biases: print(f"[{risk}] {name}: {explanation}")
Configuration
| Parameter | Description | Default |
|---|---|---|
framework | Appraisal framework (GRADE, Cochrane, CASP) | "GRADE" |
evidence_hierarchy | Evidence level system | Oxford Levels |
bias_domains | Bias categories to assess | Cochrane RoB 2 |
significance_threshold | p-value interpretation cutoff | 0.05 |
effect_size_benchmarks | Small/medium/large definitions | Cohen's conventions |
output_format | Appraisal report format | "structured" |
Best Practices
-
Evaluate the study design before the results — The most rigorous statistics can't fix a fundamentally flawed design. Assess randomization, blinding, controls, and sample size before looking at p-values. A well-designed null result is more informative than a significant finding from a biased study.
-
Distinguish correlation from causation explicitly — Observational studies can only show associations. Only randomized experiments with proper controls can establish causality. Flag any study that uses causal language ("X causes Y") without an experimental design.
-
Check for multiple testing problems — When a study tests many hypotheses, some will appear significant by chance. Check whether the authors applied Bonferroni, FDR, or other multiple comparison corrections. If 20 outcomes are tested at α=0.05, expect 1 false positive on average.
-
Look for pre-registration — Pre-registered studies (on clinicaltrials.gov or OSF) specified their hypotheses and analysis plan before seeing data. This prevents p-hacking and HARKing (Hypothesizing After Results are Known). Absence of pre-registration is a yellow flag, not a red flag.
-
Consider the full body of evidence — Never base conclusions on a single study. Look for systematic reviews, meta-analyses, and replication studies. A striking finding that hasn't been replicated should be treated with skepticism regardless of its p-value.
Common Issues
Confusing statistical significance with practical importance — A p-value of 0.001 means the result is unlikely under the null hypothesis, not that the effect is large or meaningful. Always check the effect size, confidence interval, and clinical/practical significance alongside the p-value.
Survivorship bias in study selection — Published studies skew toward positive results (publication bias). The absence of negative results in the literature doesn't mean the intervention works — it may mean negative results weren't published. Look for funnel plot asymmetry in meta-analyses.
Ecological fallacy in interpreting group-level data — Associations found at the group level don't necessarily apply to individuals. A country with higher chocolate consumption and more Nobel laureates doesn't mean chocolate makes individuals smarter. Always check whether individual-level data supports group-level claims.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.