K

Kegg Database Elite

Battle-tested skill for direct, rest, access, kegg. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

KEGG Database Elite

Master biological pathway analysis and molecular interaction networks using the KEGG (Kyoto Encyclopedia of Genes and Genomes) database. This skill helps you query pathway maps, retrieve gene and compound data, perform pathway enrichment analysis, and visualize metabolic networks for systems biology research.

When to Use This Skill

Choose KEGG Database Elite when you need to:

  • Map genes or proteins to biological pathways and functional annotations
  • Perform pathway enrichment analysis on gene expression datasets
  • Retrieve metabolic pathway diagrams and reaction networks
  • Cross-reference molecular data across KEGG's integrated databases (GENES, COMPOUND, REACTION, PATHWAY)

Consider alternatives when:

  • You need protein structure data (use UniProt or PDB skills)
  • You need literature-based pathway analysis (use Reactome or WikiPathways)
  • You need drug-target interaction data primarily (use DrugBank or ChEMBL)

Quick Start

# Install KEGG API client pip install bioservices requests
from bioservices import KEGG k = KEGG() # Find pathways for a gene pathways = k.get_pathway_by_gene("7157", "hsa") # TP53 in human print(pathways) # Get pathway details pathway = k.get("hsa04115") # p53 signaling pathway print(pathway) # Search for compounds results = k.find("compound", "glucose") print(results)

Core Concepts

KEGG Database Structure

DatabaseContentExample ID
PATHWAYMetabolic and signaling pathway mapshsa04110
GENESGene catalogs for complete genomeshsa:7157
COMPOUNDSmall molecules and metabolitesC00031
REACTIONBiochemical reactionsR00259
ENZYMEEnzyme nomenclature (EC numbers)2.7.1.1
DISEASEHuman diseases with molecular basisH00004
DRUGApproved and experimental drugsD00123
ORTHOLOGYOrtholog groups across speciesK04451

Pathway Enrichment Analysis

from bioservices import KEGG from scipy.stats import hypergeom import pandas as pd k = KEGG() def kegg_enrichment(gene_list, organism="hsa"): """Perform KEGG pathway enrichment analysis.""" # Get all pathways for the organism all_pathways = k.pathwayIds results = [] for pathway_id in all_pathways: if not pathway_id.startswith(organism): continue # Get genes in this pathway pathway_genes = k.get_genes_by_pathway(pathway_id) pathway_gene_ids = {g.split(":")[1] for g in pathway_genes} # Calculate overlap overlap = set(gene_list) & pathway_gene_ids if len(overlap) < 2: continue # Hypergeometric test M = 20000 # approximate genome size n = len(pathway_gene_ids) N = len(gene_list) k_hits = len(overlap) pval = hypergeom.sf(k_hits - 1, M, n, N) results.append({ "pathway": pathway_id, "overlap": k_hits, "pathway_size": n, "p_value": pval, "genes": list(overlap) }) df = pd.DataFrame(results) df = df.sort_values("p_value") return df # Example: enrichment for a set of cancer-related genes cancer_genes = ["7157", "672", "675", "5728", "3845"] enriched = kegg_enrichment(cancer_genes) print(enriched.head(10))

Cross-Database Linking

# Link genes to compounds via pathways def gene_to_compounds(gene_id, organism="hsa"): """Find compounds related to a gene through shared pathways.""" pathways = k.get_pathway_by_gene(gene_id, organism) compounds = {} for pathway_id in pathways: cpds = k.get_compounds_by_pathway(pathway_id) for cpd in cpds: compounds[cpd] = compounds.get(cpd, []) compounds[cpd].append(pathway_id) return compounds # Get compounds related to BRCA1 cpds = gene_to_compounds("672") print(f"Found {len(cpds)} related compounds")

Configuration

ParameterDescriptionDefault
organismKEGG organism code"hsa" (human)
databaseTarget KEGG database"pathway"
output_formatResponse format"text"
cache_resultsCache API responses locallytrue
max_retriesRetry count for API failures3
batch_sizeGenes per batch query10

Best Practices

  1. Batch your API requests — KEGG rate-limits individual queries. Group gene lookups into batches of 10 and add small delays between requests to avoid being blocked.

  2. Cache pathway maps locally — Pathway data changes infrequently. Store retrieved pathway information in a local SQLite or JSON cache to reduce redundant API calls during iterative analysis.

  3. Use organism-specific codes — Always prefix gene IDs with the correct organism code (hsa for human, mmu for mouse, sce for yeast). Omitting the prefix returns ambiguous cross-species results.

  4. Apply multiple-testing correction — When performing pathway enrichment on large gene lists, apply Benjamini-Hochberg FDR correction to p-values. Raw p-values from dozens of pathway tests will produce false positives.

  5. Combine with Gene Ontology — KEGG pathways capture curated metabolic and signaling routes but miss many biological processes. Cross-reference your enrichment results with GO term analysis for a more complete functional picture.

Common Issues

Empty results for valid gene IDs — KEGG uses its own gene identifiers, not Entrez or Ensembl IDs directly. Convert your gene identifiers using k.conv("hsa", "ncbi-geneid") before querying. Also verify the organism code matches your species.

Pathway image rendering fails — The KEGG REST API returns pathway images as PNG binary data, not URLs. Save the response bytes directly to a file with open("pathway.png", "wb").write(response) rather than trying to decode it as text.

Rate limiting and connection timeouts — KEGG's public API enforces strict rate limits, especially for non-academic IPs. Implement exponential backoff with time.sleep(2 ** attempt) between retries. For large-scale analyses, download the KEGG FTP data files instead of making thousands of API calls.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates