Kegg Database Elite
Battle-tested skill for direct, rest, access, kegg. Includes structured workflows, validation checks, and reusable patterns for scientific.
KEGG Database Elite
Master biological pathway analysis and molecular interaction networks using the KEGG (Kyoto Encyclopedia of Genes and Genomes) database. This skill helps you query pathway maps, retrieve gene and compound data, perform pathway enrichment analysis, and visualize metabolic networks for systems biology research.
When to Use This Skill
Choose KEGG Database Elite when you need to:
- Map genes or proteins to biological pathways and functional annotations
- Perform pathway enrichment analysis on gene expression datasets
- Retrieve metabolic pathway diagrams and reaction networks
- Cross-reference molecular data across KEGG's integrated databases (GENES, COMPOUND, REACTION, PATHWAY)
Consider alternatives when:
- You need protein structure data (use UniProt or PDB skills)
- You need literature-based pathway analysis (use Reactome or WikiPathways)
- You need drug-target interaction data primarily (use DrugBank or ChEMBL)
Quick Start
# Install KEGG API client pip install bioservices requests
from bioservices import KEGG k = KEGG() # Find pathways for a gene pathways = k.get_pathway_by_gene("7157", "hsa") # TP53 in human print(pathways) # Get pathway details pathway = k.get("hsa04115") # p53 signaling pathway print(pathway) # Search for compounds results = k.find("compound", "glucose") print(results)
Core Concepts
KEGG Database Structure
| Database | Content | Example ID |
|---|---|---|
| PATHWAY | Metabolic and signaling pathway maps | hsa04110 |
| GENES | Gene catalogs for complete genomes | hsa:7157 |
| COMPOUND | Small molecules and metabolites | C00031 |
| REACTION | Biochemical reactions | R00259 |
| ENZYME | Enzyme nomenclature (EC numbers) | 2.7.1.1 |
| DISEASE | Human diseases with molecular basis | H00004 |
| DRUG | Approved and experimental drugs | D00123 |
| ORTHOLOGY | Ortholog groups across species | K04451 |
Pathway Enrichment Analysis
from bioservices import KEGG from scipy.stats import hypergeom import pandas as pd k = KEGG() def kegg_enrichment(gene_list, organism="hsa"): """Perform KEGG pathway enrichment analysis.""" # Get all pathways for the organism all_pathways = k.pathwayIds results = [] for pathway_id in all_pathways: if not pathway_id.startswith(organism): continue # Get genes in this pathway pathway_genes = k.get_genes_by_pathway(pathway_id) pathway_gene_ids = {g.split(":")[1] for g in pathway_genes} # Calculate overlap overlap = set(gene_list) & pathway_gene_ids if len(overlap) < 2: continue # Hypergeometric test M = 20000 # approximate genome size n = len(pathway_gene_ids) N = len(gene_list) k_hits = len(overlap) pval = hypergeom.sf(k_hits - 1, M, n, N) results.append({ "pathway": pathway_id, "overlap": k_hits, "pathway_size": n, "p_value": pval, "genes": list(overlap) }) df = pd.DataFrame(results) df = df.sort_values("p_value") return df # Example: enrichment for a set of cancer-related genes cancer_genes = ["7157", "672", "675", "5728", "3845"] enriched = kegg_enrichment(cancer_genes) print(enriched.head(10))
Cross-Database Linking
# Link genes to compounds via pathways def gene_to_compounds(gene_id, organism="hsa"): """Find compounds related to a gene through shared pathways.""" pathways = k.get_pathway_by_gene(gene_id, organism) compounds = {} for pathway_id in pathways: cpds = k.get_compounds_by_pathway(pathway_id) for cpd in cpds: compounds[cpd] = compounds.get(cpd, []) compounds[cpd].append(pathway_id) return compounds # Get compounds related to BRCA1 cpds = gene_to_compounds("672") print(f"Found {len(cpds)} related compounds")
Configuration
| Parameter | Description | Default |
|---|---|---|
organism | KEGG organism code | "hsa" (human) |
database | Target KEGG database | "pathway" |
output_format | Response format | "text" |
cache_results | Cache API responses locally | true |
max_retries | Retry count for API failures | 3 |
batch_size | Genes per batch query | 10 |
Best Practices
-
Batch your API requests — KEGG rate-limits individual queries. Group gene lookups into batches of 10 and add small delays between requests to avoid being blocked.
-
Cache pathway maps locally — Pathway data changes infrequently. Store retrieved pathway information in a local SQLite or JSON cache to reduce redundant API calls during iterative analysis.
-
Use organism-specific codes — Always prefix gene IDs with the correct organism code (hsa for human, mmu for mouse, sce for yeast). Omitting the prefix returns ambiguous cross-species results.
-
Apply multiple-testing correction — When performing pathway enrichment on large gene lists, apply Benjamini-Hochberg FDR correction to p-values. Raw p-values from dozens of pathway tests will produce false positives.
-
Combine with Gene Ontology — KEGG pathways capture curated metabolic and signaling routes but miss many biological processes. Cross-reference your enrichment results with GO term analysis for a more complete functional picture.
Common Issues
Empty results for valid gene IDs — KEGG uses its own gene identifiers, not Entrez or Ensembl IDs directly. Convert your gene identifiers using k.conv("hsa", "ncbi-geneid") before querying. Also verify the organism code matches your species.
Pathway image rendering fails — The KEGG REST API returns pathway images as PNG binary data, not URLs. Save the response bytes directly to a file with open("pathway.png", "wb").write(response) rather than trying to decode it as text.
Rate limiting and connection timeouts — KEGG's public API enforces strict rate limits, especially for non-academic IPs. Implement exponential backoff with time.sleep(2 ** attempt) between retries. For large-scale analyses, download the KEGG FTP data files instead of making thousands of API calls.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.