Gwas Database Kit
Enterprise-grade skill for query, nhgri, gwas, catalog. Includes structured workflows, validation checks, and reusable patterns for scientific.
GWAS Database Kit
A scientific computing skill for querying the GWAS Catalog — the comprehensive repository of published genome-wide association studies maintained by NHGRI-EBI. GWAS Database Kit helps you search for trait-associated genetic variants, retrieve association statistics, and build genetic risk profiles from GWAS data.
When to Use This Skill
Choose GWAS Database Kit when:
- Searching for genetic variants associated with specific traits or diseases
- Retrieving GWAS summary statistics (p-values, effect sizes, risk alleles)
- Analyzing shared genetic architecture across multiple traits
- Building polygenic risk score variant lists from published GWAS
Consider alternatives when:
- You need individual-level genotype data (use UK Biobank, dbGaP)
- You need variant annotations (use ClinVar, gnomAD)
- You need functional annotations (use ENCODE, Roadmap)
- You need fine-mapping results (use specific study supplements)
Quick Start
claude "Find GWAS associations for Type 2 Diabetes with genome-wide significance"
import requests import pandas as pd # GWAS Catalog REST API base_url = "https://www.ebi.ac.uk/gwas/rest/api" # Search associations by trait response = requests.get( f"{base_url}/efoTraits/search/findBySearchTerm", params={"searchTerm": "type 2 diabetes"} ) traits = response.json()["_embedded"]["efoTraits"] trait_id = traits[0]["shortForm"] # e.g., "EFO_0001360" # Get associations assoc_resp = requests.get( f"{base_url}/efoTraits/{trait_id}/associations", params={"size": 20} ) associations = assoc_resp.json()["_embedded"]["associations"] for assoc in associations[:10]: snp = assoc["snps"][0]["rsId"] if assoc["snps"] else "N/A" pval = assoc.get("pvalue", "N/A") risk = assoc.get("riskFrequency", "N/A") print(f" {snp}: p={pval}, risk allele freq={risk}")
Core Concepts
GWAS Catalog Data Model
| Entity | Description | Key Fields |
|---|---|---|
| Study | Published GWAS | PMID, trait, sample size |
| Association | SNP-trait link | rsID, p-value, OR/beta |
| SNP | Genetic variant | rsID, chromosome, position |
| EFO Trait | Standardized trait term | EFO ID, trait name |
Association Retrieval
def get_gwas_associations(trait_term, p_threshold=5e-8): """Get genome-wide significant associations for a trait""" # Find trait EFO ID traits_resp = requests.get( f"{base_url}/efoTraits/search/findBySearchTerm", params={"searchTerm": trait_term} ) traits = traits_resp.json()["_embedded"]["efoTraits"] if not traits: return pd.DataFrame() trait_id = traits[0]["shortForm"] # Paginate through associations all_assocs = [] page = 0 while True: resp = requests.get( f"{base_url}/efoTraits/{trait_id}/associations", params={"size": 100, "page": page} ) data = resp.json() assocs = data["_embedded"]["associations"] all_assocs.extend(assocs) if page >= data["page"]["totalPages"] - 1: break page += 1 # Filter by p-value threshold results = [] for a in all_assocs: pval = float(a.get("pvalue", 1)) if pval <= p_threshold: results.append({ "rsid": a["snps"][0]["rsId"] if a["snps"] else None, "pvalue": pval, "risk_allele": a.get("strongestRiskAlleles", [{}])[0].get("riskAlleleName", ""), "or_beta": a.get("orPerCopyNum") or a.get("betaNum"), "study": a.get("study", {}).get("publicationInfo", {}).get("pubmedId", "") }) return pd.DataFrame(results) t2d_snps = get_gwas_associations("type 2 diabetes") print(f"Genome-wide significant SNPs: {len(t2d_snps)}")
Cross-Trait Analysis
def shared_loci(trait1_term, trait2_term, window_kb=500): """Find shared genetic loci between two traits""" assocs1 = get_gwas_associations(trait1_term) assocs2 = get_gwas_associations(trait2_term) shared = [] for _, a1 in assocs1.iterrows(): for _, a2 in assocs2.iterrows(): if a1["rsid"] == a2["rsid"]: shared.append({ "rsid": a1["rsid"], f"{trait1_term}_pval": a1["pvalue"], f"{trait2_term}_pval": a2["pvalue"], }) return pd.DataFrame(shared)
Configuration
| Parameter | Description | Default |
|---|---|---|
api_base_url | GWAS Catalog API base | https://www.ebi.ac.uk/gwas/rest/api |
p_threshold | Significance threshold | 5e-8 |
page_size | Results per API page | 100 |
include_ancestry | Include sample ancestry info | true |
trait_ontology | EFO or other ontology | EFO |
Best Practices
-
Use EFO trait IDs for precise queries. Trait names can be ambiguous — "diabetes" matches multiple conditions. Search for the specific EFO term first, then query by EFO ID for clean results.
-
Apply genome-wide significance threshold. The standard GWAS threshold is p < 5×10⁻⁸. Including sub-threshold associations inflates false positives. Only relax the threshold for exploratory analyses with appropriate caveats.
-
Account for linkage disequilibrium. Multiple significant SNPs near each other may tag the same causal variant. Clump associations by LD (using reference panels like 1000 Genomes) to identify independent signals.
-
Check sample ancestry. GWAS results are ancestry-specific due to different LD patterns and allele frequencies. European-derived GWAS may not transfer to other populations. Note the discovery and replication ancestries.
-
Combine with functional annotations. GWAS identifies associated loci, not causal variants. Overlay GWAS hits with functional data (eQTLs, chromatin accessibility, protein function) to prioritize candidate causal variants.
Common Issues
API returns no associations for a known trait. The trait name may not match EFO terminology. Search the EFO ontology browser (ebi.ac.uk/ols) for the correct term. "Heart attack" should be "myocardial infarction" in EFO.
Too many associations for well-studied traits. Traits like height or BMI have thousands of associations. Filter by study size, ancestry, or specific genomic regions. Use the /associations/search/byPvalueAndPubmedId endpoint for targeted queries.
Effect sizes not comparable across studies. Some GWAS report odds ratios (binary traits), others report beta coefficients (continuous traits). Ensure you're comparing the same effect metric and that the effect allele orientation is consistent.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.