Scientific Critical Thinking Studio

Evaluate research papers, experimental designs, and scientific claims using structured critical analysis frameworks. This skill covers study design assessment, statistical evaluation, logical fallacy detection, evidence quality grading, and constructing evidence-based arguments.

When to Use This Skill

Choose Scientific Critical Thinking Studio when you need to:

Evaluate the quality and validity of published research studies
Identify methodological weaknesses, biases, and logical fallacies in scientific arguments
Grade evidence quality using established frameworks (GRADE, Oxford levels)
Construct rigorous, evidence-based arguments for proposals or reviews

Consider alternatives when:

You need to conduct a systematic literature review (use Literature Review Complete)
You need statistical analysis of your own data (use statistical analysis tools)
You need peer review writing specifically (use Master Peer Suite)

Quick Start


from dataclasses import dataclass, field
from typing import List
from enum import Enum

class EvidenceLevel(Enum):
    HIGH = "High - RCT or systematic review"
    MODERATE = "Moderate - Cohort study or downgraded RCT"
    LOW = "Low - Case-control or case series"
    VERY_LOW = "Very Low - Expert opinion or mechanistic reasoning"

@dataclass
class CriticalAppraisal:
    title: str
    study_type: str
    sample_size: int
    evidence_level: EvidenceLevel
    strengths: List[str] = field(default_factory=list)
    weaknesses: List[str] = field(default_factory=list)
    biases: List[str] = field(default_factory=list)
    verdict: str = ""

    def summary(self):
        print(f"Study: {self.title}")
        print(f"Type: {self.study_type}, N={self.sample_size}")
        print(f"Evidence: {self.evidence_level.value}")
        print(f"\nStrengths:")
        for s in self.strengths:
            print(f"  + {s}")
        print(f"Weaknesses:")
        for w in self.weaknesses:
            print(f"  - {w}")
        print(f"Biases:")
        for b in self.biases:
            print(f"  ! {b}")
        print(f"\nVerdict: {self.verdict}")

appraisal = CriticalAppraisal(
    title="Effect of Drug X on Blood Pressure",
    study_type="Randomized Controlled Trial",
    sample_size=150,
    evidence_level=EvidenceLevel.HIGH,
    strengths=[
        "Double-blind, placebo-controlled design",
        "Pre-registered primary endpoint",
        "Adequate follow-up period (12 months)"
    ],
    weaknesses=[
        "Single-center study limits generalizability",
        "High dropout rate (18%) may introduce attrition bias",
        "Secondary outcomes not corrected for multiple testing"
    ],
    biases=["Potential selection bias - strict inclusion criteria"],
    verdict="Moderate confidence - results support efficacy but external validity limited"
)
appraisal.summary()

Core Concepts

Critical Analysis Framework

Dimension	Questions to Ask	Red Flags
Study Design	RCT? Blinding? Control group?	No control, unblinded
Sample	Size adequate? Representative?	Small N, convenience sample
Methods	Reproducible? Validated measures?	Vague methods, novel metrics
Statistics	Appropriate tests? Effect sizes?	p-hacking, no corrections
Results	Consistent with methods? All reported?	Selective reporting, HARKing
Interpretation	Claims match evidence? Alternative explanations?	Overclaiming, causal language
Conflicts	Funding sources? Author interests?	Industry-funded, undisclosed

Bias Detection Checklist


def assess_biases(study_info):
    """Evaluate potential biases in a research study."""
    biases = []

    # Selection bias
    if not study_info.get("randomization"):
        biases.append(("Selection bias", "HIGH",
                       "No randomization — groups may differ systematically"))
    elif not study_info.get("allocation_concealment"):
        biases.append(("Selection bias", "MODERATE",
                       "Randomized but allocation not concealed"))

    # Performance bias
    if not study_info.get("blinding_participants"):
        biases.append(("Performance bias", "HIGH",
                       "Participants knew their group assignment"))

    # Detection bias
    if not study_info.get("blinding_assessors"):
        biases.append(("Detection bias", "MODERATE",
                       "Outcome assessors were not blinded"))

    # Attrition bias
    dropout = study_info.get("dropout_rate", 0)
    if dropout > 20:
        biases.append(("Attrition bias", "HIGH",
                       f"Dropout rate {dropout}% exceeds 20% threshold"))
    elif dropout > 10:
        biases.append(("Attrition bias", "MODERATE",
                       f"Dropout rate {dropout}% is moderate"))

    # Reporting bias
    if not study_info.get("pre_registered"):
        biases.append(("Reporting bias", "MODERATE",
                       "Study was not pre-registered"))
    if study_info.get("selective_reporting"):
        biases.append(("Reporting bias", "HIGH",
                       "Not all pre-registered outcomes were reported"))

    # Conflict of interest
    if study_info.get("industry_funded"):
        biases.append(("Funding bias", "MODERATE",
                       "Industry-funded — potential conflict of interest"))

    return biases

study = {
    "randomization": True,
    "allocation_concealment": True,
    "blinding_participants": True,
    "blinding_assessors": False,
    "dropout_rate": 15,
    "pre_registered": True,
    "selective_reporting": False,
    "industry_funded": True
}

biases = assess_biases(study)
for name, risk, explanation in biases:
    print(f"[{risk}] {name}: {explanation}")

Configuration

Parameter	Description	Default
`framework`	Appraisal framework (GRADE, Cochrane, CASP)	`"GRADE"`
`evidence_hierarchy`	Evidence level system	Oxford Levels
`bias_domains`	Bias categories to assess	Cochrane RoB 2
`significance_threshold`	p-value interpretation cutoff	`0.05`
`effect_size_benchmarks`	Small/medium/large definitions	Cohen's conventions
`output_format`	Appraisal report format	`"structured"`

Best Practices

Evaluate the study design before the results — The most rigorous statistics can't fix a fundamentally flawed design. Assess randomization, blinding, controls, and sample size before looking at p-values. A well-designed null result is more informative than a significant finding from a biased study.
Distinguish correlation from causation explicitly — Observational studies can only show associations. Only randomized experiments with proper controls can establish causality. Flag any study that uses causal language ("X causes Y") without an experimental design.
Check for multiple testing problems — When a study tests many hypotheses, some will appear significant by chance. Check whether the authors applied Bonferroni, FDR, or other multiple comparison corrections. If 20 outcomes are tested at α=0.05, expect 1 false positive on average.
Look for pre-registration — Pre-registered studies (on clinicaltrials.gov or OSF) specified their hypotheses and analysis plan before seeing data. This prevents p-hacking and HARKing (Hypothesizing After Results are Known). Absence of pre-registration is a yellow flag, not a red flag.
Consider the full body of evidence — Never base conclusions on a single study. Look for systematic reviews, meta-analyses, and replication studies. A striking finding that hasn't been replicated should be treated with skepticism regardless of its p-value.

Common Issues

Confusing statistical significance with practical importance — A p-value of 0.001 means the result is unlikely under the null hypothesis, not that the effect is large or meaningful. Always check the effect size, confidence interval, and clinical/practical significance alongside the p-value.

Survivorship bias in study selection — Published studies skew toward positive results (publication bias). The absence of negative results in the literature doesn't mean the intervention works — it may mean negative results weren't published. Look for funnel plot asymmetry in meta-analyses.

Ecological fallacy in interpreting group-level data — Associations found at the group level don't necessarily apply to individuals. A country with higher chocolate consumption and more Nobel laureates doesn't mean chocolate makes individuals smarter. Always check whether individual-level data supports group-level claims.

⚠️ Loading Issue

Scientific Critical Thinking Studio

Scientific Critical Thinking Studio

When to Use This Skill

Quick Start

Core Concepts

Critical Analysis Framework

Bias Detection Checklist

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace