KEGG Database Elite

Master biological pathway analysis and molecular interaction networks using the KEGG (Kyoto Encyclopedia of Genes and Genomes) database. This skill helps you query pathway maps, retrieve gene and compound data, perform pathway enrichment analysis, and visualize metabolic networks for systems biology research.

When to Use This Skill

Choose KEGG Database Elite when you need to:

Map genes or proteins to biological pathways and functional annotations
Perform pathway enrichment analysis on gene expression datasets
Retrieve metabolic pathway diagrams and reaction networks
Cross-reference molecular data across KEGG's integrated databases (GENES, COMPOUND, REACTION, PATHWAY)

Consider alternatives when:

You need protein structure data (use UniProt or PDB skills)
You need literature-based pathway analysis (use Reactome or WikiPathways)
You need drug-target interaction data primarily (use DrugBank or ChEMBL)

Quick Start


# Install KEGG API client
pip install bioservices requests


from bioservices import KEGG

k = KEGG()

# Find pathways for a gene
pathways = k.get_pathway_by_gene("7157", "hsa")  # TP53 in human
print(pathways)

# Get pathway details
pathway = k.get("hsa04115")  # p53 signaling pathway
print(pathway)

# Search for compounds
results = k.find("compound", "glucose")
print(results)

Core Concepts

KEGG Database Structure

Database	Content	Example ID
PATHWAY	Metabolic and signaling pathway maps	hsa04110
GENES	Gene catalogs for complete genomes	hsa:7157
COMPOUND	Small molecules and metabolites	C00031
REACTION	Biochemical reactions	R00259
ENZYME	Enzyme nomenclature (EC numbers)	2.7.1.1
DISEASE	Human diseases with molecular basis	H00004
DRUG	Approved and experimental drugs	D00123
ORTHOLOGY	Ortholog groups across species	K04451

Pathway Enrichment Analysis


from bioservices import KEGG
from scipy.stats import hypergeom
import pandas as pd

k = KEGG()

def kegg_enrichment(gene_list, organism="hsa"):
    """Perform KEGG pathway enrichment analysis."""
    # Get all pathways for the organism
    all_pathways = k.pathwayIds

    results = []
    for pathway_id in all_pathways:
        if not pathway_id.startswith(organism):
            continue

        # Get genes in this pathway
        pathway_genes = k.get_genes_by_pathway(pathway_id)
        pathway_gene_ids = {g.split(":")[1] for g in pathway_genes}

        # Calculate overlap
        overlap = set(gene_list) & pathway_gene_ids
        if len(overlap) < 2:
            continue

        # Hypergeometric test
        M = 20000  # approximate genome size
        n = len(pathway_gene_ids)
        N = len(gene_list)
        k_hits = len(overlap)

        pval = hypergeom.sf(k_hits - 1, M, n, N)

        results.append({
            "pathway": pathway_id,
            "overlap": k_hits,
            "pathway_size": n,
            "p_value": pval,
            "genes": list(overlap)
        })

    df = pd.DataFrame(results)
    df = df.sort_values("p_value")
    return df

# Example: enrichment for a set of cancer-related genes
cancer_genes = ["7157", "672", "675", "5728", "3845"]
enriched = kegg_enrichment(cancer_genes)
print(enriched.head(10))

Cross-Database Linking


# Link genes to compounds via pathways
def gene_to_compounds(gene_id, organism="hsa"):
    """Find compounds related to a gene through shared pathways."""
    pathways = k.get_pathway_by_gene(gene_id, organism)

    compounds = {}
    for pathway_id in pathways:
        cpds = k.get_compounds_by_pathway(pathway_id)
        for cpd in cpds:
            compounds[cpd] = compounds.get(cpd, [])
            compounds[cpd].append(pathway_id)

    return compounds

# Get compounds related to BRCA1
cpds = gene_to_compounds("672")
print(f"Found {len(cpds)} related compounds")

Configuration

Parameter	Description	Default
`organism`	KEGG organism code	`"hsa"` (human)
`database`	Target KEGG database	`"pathway"`
`output_format`	Response format	`"text"`
`cache_results`	Cache API responses locally	`true`
`max_retries`	Retry count for API failures	`3`
`batch_size`	Genes per batch query	`10`

Best Practices

Batch your API requests — KEGG rate-limits individual queries. Group gene lookups into batches of 10 and add small delays between requests to avoid being blocked.
Cache pathway maps locally — Pathway data changes infrequently. Store retrieved pathway information in a local SQLite or JSON cache to reduce redundant API calls during iterative analysis.
Use organism-specific codes — Always prefix gene IDs with the correct organism code (hsa for human, mmu for mouse, sce for yeast). Omitting the prefix returns ambiguous cross-species results.
Apply multiple-testing correction — When performing pathway enrichment on large gene lists, apply Benjamini-Hochberg FDR correction to p-values. Raw p-values from dozens of pathway tests will produce false positives.
Combine with Gene Ontology — KEGG pathways capture curated metabolic and signaling routes but miss many biological processes. Cross-reference your enrichment results with GO term analysis for a more complete functional picture.

Common Issues

Empty results for valid gene IDs — KEGG uses its own gene identifiers, not Entrez or Ensembl IDs directly. Convert your gene identifiers using k.conv("hsa", "ncbi-geneid") before querying. Also verify the organism code matches your species.

Pathway image rendering fails — The KEGG REST API returns pathway images as PNG binary data, not URLs. Save the response bytes directly to a file with open("pathway.png", "wb").write(response) rather than trying to decode it as text.

Rate limiting and connection timeouts — KEGG's public API enforces strict rate limits, especially for non-academic IPs. Implement exponential backoff with time.sleep(2 ** attempt) between retries. For large-scale analyses, download the KEGG FTP data files instead of making thousands of API calls.

⚠️ Loading Issue

Kegg Database Elite

KEGG Database Elite

When to Use This Skill

Quick Start

Core Concepts

KEGG Database Structure

Pathway Enrichment Analysis

Cross-Database Linking

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace