Advanced Hypothesis Generation

A scientific computing skill for systematic scientific hypothesis generation — structured approaches to formulating testable predictions from observations, data patterns, and theoretical frameworks. Advanced Hypothesis Generation provides workflows for exploratory analysis, pattern recognition, and formal hypothesis construction in any scientific domain.

When to Use This Skill

Choose Advanced Hypothesis Generation when:

Formulating research questions from exploratory data analysis
Structuring observations into formal, testable hypotheses
Designing experiments to discriminate between competing hypotheses
Building hypothesis registries for systematic research programs

Consider alternatives when:

You need AI-automated hypothesis generation (use Hypogenic)
You need statistical hypothesis testing (use scipy.stats or statsmodels)
You need literature-based hypothesis mining (use text mining tools)
You need causal inference (use DoWhy or causal discovery algorithms)

Quick Start


claude "Help me generate hypotheses from this gene expression data analysis"


# Structured hypothesis generation workflow
import pandas as pd
import numpy as np
from scipy import stats

# Step 1: Exploratory observation
data = pd.read_csv("expression_data.csv")
correlation = data["gene_a"].corr(data["gene_b"])
print(f"Observation: Gene A and Gene B are correlated (r={correlation:.3f})")

# Step 2: Generate competing hypotheses
hypotheses = [
    {
        "id": "H1",
        "statement": "Gene A directly regulates Gene B expression",
        "mechanism": "Transcriptional activation via promoter binding",
        "prediction": "Knockout of Gene A reduces Gene B expression >50%",
        "experiment": "CRISPR knockout + qRT-PCR",
        "falsifiable": True
    },
    {
        "id": "H2",
        "statement": "Gene A and Gene B are co-regulated by a shared upstream factor",
        "mechanism": "Common transcription factor drives both genes",
        "prediction": "TF knockdown reduces both Gene A and B equally",
        "experiment": "TF siRNA + dual qRT-PCR",
        "falsifiable": True
    },
    {
        "id": "H3",
        "statement": "Correlation is confounded by cell type composition",
        "mechanism": "Both genes are markers of the same cell type",
        "prediction": "Correlation disappears after cell type deconvolution",
        "experiment": "Single-cell RNA-seq or deconvolution analysis",
        "falsifiable": True
    }
]

# Step 3: Evaluate and rank
for h in hypotheses:
    print(f"\n{h['id']}: {h['statement']}")
    print(f"  Prediction: {h['prediction']}")
    print(f"  Test: {h['experiment']}")

Core Concepts

Hypothesis Structure

Component	Description	Example
Observation	What you measured/observed	"Gene A correlates with Gene B (r=0.82)"
Statement	The proposed explanation	"Gene A activates Gene B transcription"
Mechanism	How it would work	"Via direct promoter binding"
Prediction	Testable consequence	"Gene A KO reduces Gene B >50%"
Experiment	How to test	"CRISPR knockout + qPCR"
Null Hypothesis	Alternative explanation	"No causal relationship exists"

Hypothesis Generation Strategies


# Strategy 1: Pattern-based (data-driven)
def pattern_hypotheses(data):
    """Generate hypotheses from statistical patterns"""
    hypotheses = []
    # Find strong correlations
    corr = data.corr()
    for i, col1 in enumerate(corr.columns):
        for j, col2 in enumerate(corr.columns):
            if i < j and abs(corr.iloc[i, j]) > 0.7:
                hypotheses.append(f"{col1} and {col2} share a regulatory mechanism")
    return hypotheses

# Strategy 2: Anomaly-based (surprise-driven)
def anomaly_hypotheses(data, expected_model):
    """Generate hypotheses from unexpected observations"""
    residuals = data["observed"] - expected_model.predict(data)
    outliers = data[abs(residuals) > 2 * residuals.std()]
    return [f"Sample {idx} deviates due to an unmodeled factor"
            for idx in outliers.index]

# Strategy 3: Comparative (difference-driven)
def comparative_hypotheses(group1, group2, features):
    """Generate hypotheses from group differences"""
    hypotheses = []
    for feature in features:
        stat, pval = stats.mannwhitneyu(group1[feature], group2[feature])
        if pval < 0.001:
            direction = "higher" if group1[feature].mean() > group2[feature].mean() else "lower"
            hypotheses.append(f"{feature} is {direction} in group 1, suggesting differential regulation")
    return hypotheses

Hypothesis Registry


import json
from datetime import datetime

class HypothesisRegistry:
    def __init__(self, filepath="hypothesis_registry.json"):
        self.filepath = filepath
        self.hypotheses = self._load()

    def add(self, statement, mechanism, prediction, experiment, priority="medium"):
        entry = {
            "id": f"H{len(self.hypotheses)+1:03d}",
            "statement": statement,
            "mechanism": mechanism,
            "prediction": prediction,
            "experiment": experiment,
            "priority": priority,
            "status": "proposed",
            "created": datetime.now().isoformat(),
            "evidence_for": [],
            "evidence_against": []
        }
        self.hypotheses.append(entry)
        self._save()
        return entry["id"]

    def update_status(self, hyp_id, status, evidence=None):
        for h in self.hypotheses:
            if h["id"] == hyp_id:
                h["status"] = status
                if evidence:
                    if status == "supported":
                        h["evidence_for"].append(evidence)
                    elif status == "refuted":
                        h["evidence_against"].append(evidence)

Configuration

Parameter	Description	Default
`generation_strategy`	Pattern, anomaly, or comparative	`pattern`
`significance_threshold`	P-value cutoff for patterns	`0.001`
`correlation_threshold`	Minimum correlation for hypotheses	`0.7`
`max_hypotheses`	Maximum hypotheses to generate	`10`
`require_falsifiable`	Only generate falsifiable hypotheses	`true`

Best Practices

Generate multiple competing hypotheses. Never test a single hypothesis in isolation. Generate at least 3 competing explanations for each observation. This prevents confirmation bias and ensures experiments can discriminate between alternatives.
Make predictions specific and quantitative. "Gene B will decrease" is weak. "Gene B expression will decrease >50% within 48h of Gene A knockout" is testable and falsifiable. Specific predictions enable clear experimental interpretation.
Include a null/confounding hypothesis. Always include a hypothesis that the observed pattern is due to a confounding factor (batch effects, sample composition, technical artifact). This keeps the analysis honest and often reveals important controls to include.
Document hypotheses before testing. Register hypotheses with predictions before running experiments (pre-registration). This prevents post-hoc rationalization and p-hacking, and creates an audit trail of the scientific reasoning process.
Update the registry with results. After each experiment, update the hypothesis status with supporting or refuting evidence. This creates a living document of the research program's evolution and prevents revisiting already-tested ideas.

Common Issues

Too many hypotheses generated from high-dimensional data. In genomics and metabolomics, thousands of features produce millions of correlations. Apply strict significance thresholds, require biological plausibility, and limit to the top N most interesting patterns.

Hypotheses are unfalsifiable. Every hypothesis must have a clear experiment that could disprove it. "Gene X plays a role in cancer" is unfalsifiable — "Gene X knockout reduces tumor growth >30% in mouse xenografts" is falsifiable.

Competing hypotheses aren't truly independent. If H1 being true automatically makes H2 true, they're not competing. Ensure each hypothesis proposes a distinct mechanism that can be independently verified or refuted.

⚠️ Loading Issue

Advanced Hypothesis Generation

Advanced Hypothesis Generation

When to Use This Skill

Quick Start

Core Concepts

Hypothesis Structure

Hypothesis Generation Strategies

Hypothesis Registry

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace