A

Advanced Hypothesis Generation

Powerful skill for generate, testable, hypotheses, formulate. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Advanced Hypothesis Generation

A scientific computing skill for systematic scientific hypothesis generation — structured approaches to formulating testable predictions from observations, data patterns, and theoretical frameworks. Advanced Hypothesis Generation provides workflows for exploratory analysis, pattern recognition, and formal hypothesis construction in any scientific domain.

When to Use This Skill

Choose Advanced Hypothesis Generation when:

  • Formulating research questions from exploratory data analysis
  • Structuring observations into formal, testable hypotheses
  • Designing experiments to discriminate between competing hypotheses
  • Building hypothesis registries for systematic research programs

Consider alternatives when:

  • You need AI-automated hypothesis generation (use Hypogenic)
  • You need statistical hypothesis testing (use scipy.stats or statsmodels)
  • You need literature-based hypothesis mining (use text mining tools)
  • You need causal inference (use DoWhy or causal discovery algorithms)

Quick Start

claude "Help me generate hypotheses from this gene expression data analysis"
# Structured hypothesis generation workflow import pandas as pd import numpy as np from scipy import stats # Step 1: Exploratory observation data = pd.read_csv("expression_data.csv") correlation = data["gene_a"].corr(data["gene_b"]) print(f"Observation: Gene A and Gene B are correlated (r={correlation:.3f})") # Step 2: Generate competing hypotheses hypotheses = [ { "id": "H1", "statement": "Gene A directly regulates Gene B expression", "mechanism": "Transcriptional activation via promoter binding", "prediction": "Knockout of Gene A reduces Gene B expression >50%", "experiment": "CRISPR knockout + qRT-PCR", "falsifiable": True }, { "id": "H2", "statement": "Gene A and Gene B are co-regulated by a shared upstream factor", "mechanism": "Common transcription factor drives both genes", "prediction": "TF knockdown reduces both Gene A and B equally", "experiment": "TF siRNA + dual qRT-PCR", "falsifiable": True }, { "id": "H3", "statement": "Correlation is confounded by cell type composition", "mechanism": "Both genes are markers of the same cell type", "prediction": "Correlation disappears after cell type deconvolution", "experiment": "Single-cell RNA-seq or deconvolution analysis", "falsifiable": True } ] # Step 3: Evaluate and rank for h in hypotheses: print(f"\n{h['id']}: {h['statement']}") print(f" Prediction: {h['prediction']}") print(f" Test: {h['experiment']}")

Core Concepts

Hypothesis Structure

ComponentDescriptionExample
ObservationWhat you measured/observed"Gene A correlates with Gene B (r=0.82)"
StatementThe proposed explanation"Gene A activates Gene B transcription"
MechanismHow it would work"Via direct promoter binding"
PredictionTestable consequence"Gene A KO reduces Gene B >50%"
ExperimentHow to test"CRISPR knockout + qPCR"
Null HypothesisAlternative explanation"No causal relationship exists"

Hypothesis Generation Strategies

# Strategy 1: Pattern-based (data-driven) def pattern_hypotheses(data): """Generate hypotheses from statistical patterns""" hypotheses = [] # Find strong correlations corr = data.corr() for i, col1 in enumerate(corr.columns): for j, col2 in enumerate(corr.columns): if i < j and abs(corr.iloc[i, j]) > 0.7: hypotheses.append(f"{col1} and {col2} share a regulatory mechanism") return hypotheses # Strategy 2: Anomaly-based (surprise-driven) def anomaly_hypotheses(data, expected_model): """Generate hypotheses from unexpected observations""" residuals = data["observed"] - expected_model.predict(data) outliers = data[abs(residuals) > 2 * residuals.std()] return [f"Sample {idx} deviates due to an unmodeled factor" for idx in outliers.index] # Strategy 3: Comparative (difference-driven) def comparative_hypotheses(group1, group2, features): """Generate hypotheses from group differences""" hypotheses = [] for feature in features: stat, pval = stats.mannwhitneyu(group1[feature], group2[feature]) if pval < 0.001: direction = "higher" if group1[feature].mean() > group2[feature].mean() else "lower" hypotheses.append(f"{feature} is {direction} in group 1, suggesting differential regulation") return hypotheses

Hypothesis Registry

import json from datetime import datetime class HypothesisRegistry: def __init__(self, filepath="hypothesis_registry.json"): self.filepath = filepath self.hypotheses = self._load() def add(self, statement, mechanism, prediction, experiment, priority="medium"): entry = { "id": f"H{len(self.hypotheses)+1:03d}", "statement": statement, "mechanism": mechanism, "prediction": prediction, "experiment": experiment, "priority": priority, "status": "proposed", "created": datetime.now().isoformat(), "evidence_for": [], "evidence_against": [] } self.hypotheses.append(entry) self._save() return entry["id"] def update_status(self, hyp_id, status, evidence=None): for h in self.hypotheses: if h["id"] == hyp_id: h["status"] = status if evidence: if status == "supported": h["evidence_for"].append(evidence) elif status == "refuted": h["evidence_against"].append(evidence)

Configuration

ParameterDescriptionDefault
generation_strategyPattern, anomaly, or comparativepattern
significance_thresholdP-value cutoff for patterns0.001
correlation_thresholdMinimum correlation for hypotheses0.7
max_hypothesesMaximum hypotheses to generate10
require_falsifiableOnly generate falsifiable hypothesestrue

Best Practices

  1. Generate multiple competing hypotheses. Never test a single hypothesis in isolation. Generate at least 3 competing explanations for each observation. This prevents confirmation bias and ensures experiments can discriminate between alternatives.

  2. Make predictions specific and quantitative. "Gene B will decrease" is weak. "Gene B expression will decrease >50% within 48h of Gene A knockout" is testable and falsifiable. Specific predictions enable clear experimental interpretation.

  3. Include a null/confounding hypothesis. Always include a hypothesis that the observed pattern is due to a confounding factor (batch effects, sample composition, technical artifact). This keeps the analysis honest and often reveals important controls to include.

  4. Document hypotheses before testing. Register hypotheses with predictions before running experiments (pre-registration). This prevents post-hoc rationalization and p-hacking, and creates an audit trail of the scientific reasoning process.

  5. Update the registry with results. After each experiment, update the hypothesis status with supporting or refuting evidence. This creates a living document of the research program's evolution and prevents revisiting already-tested ideas.

Common Issues

Too many hypotheses generated from high-dimensional data. In genomics and metabolomics, thousands of features produce millions of correlations. Apply strict significance thresholds, require biological plausibility, and limit to the top N most interesting patterns.

Hypotheses are unfalsifiable. Every hypothesis must have a clear experiment that could disprove it. "Gene X plays a role in cancer" is unfalsifiable — "Gene X knockout reduces tumor growth >30% in mouse xenografts" is falsifiable.

Competing hypotheses aren't truly independent. If H1 being true automatically makes H2 true, they're not competing. Ensure each hypothesis proposes a distinct mechanism that can be independently verified or refuted.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates