Master HMDB Database

A scientific computing skill for querying the Human Metabolome Database (HMDB) — the comprehensive resource for human metabolite data including chemical, clinical, and molecular biology information for over 220,000 metabolite entries.

When to Use This Skill

Choose Master HMDB Database when:

Looking up metabolite properties (structure, classification, pathways)
Finding disease-associated metabolites and biomarkers
Retrieving metabolite concentration ranges in biofluids
Identifying metabolites from mass spectrometry or NMR data

Consider alternatives when:

You need drug information (use DrugBank)
You need enzyme kinetics (use BRENDA)
You need metabolic pathway analysis (use KEGG)
You need metabolomics data processing (use MetaboAnalyst or XCMS)

Quick Start


claude "Look up glucose in HMDB and find its concentration in blood"


import requests
import xml.etree.ElementTree as ET

# HMDB API
hmdb_id = "HMDB0000122"  # Glucose
url = f"https://hmdb.ca/metabolites/{hmdb_id}.xml"

response = requests.get(url)
root = ET.fromstring(response.content)

# Parse metabolite info
ns = {"hmdb": "http://www.hmdb.ca"}
name = root.find("hmdb:name", ns).text
formula = root.find("hmdb:chemical_formula", ns).text
avg_mass = root.find("hmdb:average_molecular_weight", ns).text

print(f"Name: {name}")
print(f"Formula: {formula}")
print(f"Molecular Weight: {avg_mass}")

# Get normal concentrations
concentrations = root.findall(".//hmdb:normal_concentration", ns)
for conc in concentrations[:5]:
    biofluid = conc.find("hmdb:biospecimen", ns)
    value = conc.find("hmdb:concentration_value", ns)
    units = conc.find("hmdb:concentration_units", ns)
    if biofluid is not None and value is not None:
        print(f"  {biofluid.text}: {value.text} {units.text if units is not None else ''}")

Core Concepts

HMDB Data Categories

Category	Description	Example
Chemical Data	Structure, formula, properties	MW, SMILES, InChI
Taxonomy	Chemical classification	Organic acids, amino acids
Biological Properties	Pathways, enzymes, diseases	Glycolysis, diabetes
Concentrations	Normal ranges in biofluids	Blood glucose: 3.9-5.6 mmol/L
Spectra	NMR, MS reference spectra	Mass fragments, chemical shifts
Ontology	Metabolite function classification	Energy metabolism

Search and Identification


# Search metabolites by name
search_url = "https://hmdb.ca/unearth/q"
response = requests.get(search_url, params={
    "query": "tryptophan",
    "searcher": "metabolites",
    "button": ""
})

# Mass-based identification
def search_by_mass(exact_mass, tolerance_ppm=10):
    """Search HMDB by exact mass for metabolite identification"""
    delta = exact_mass * tolerance_ppm / 1e6
    mass_min = exact_mass - delta
    mass_max = exact_mass + delta

    url = "https://hmdb.ca/spectra/ms/search"
    response = requests.get(url, params={
        "utf8": "✓",
        "query_masses": str(exact_mass),
        "tolerance": str(tolerance_ppm),
        "tolerance_units": "ppm",
        "adduct_type": "[M+H]+",
        "commit": "Search"
    })
    return response

# Example: Search for metabolite with m/z 205.097
results = search_by_mass(205.097)

Pathway Context


def get_metabolite_pathways(hmdb_id):
    """Get metabolic pathways for a metabolite"""
    url = f"https://hmdb.ca/metabolites/{hmdb_id}.xml"
    response = requests.get(url)
    root = ET.fromstring(response.content)
    ns = {"hmdb": "http://www.hmdb.ca"}

    pathways = []
    for pathway in root.findall(".//hmdb:pathway", ns):
        name = pathway.find("hmdb:name", ns)
        smpdb = pathway.find("hmdb:smpdb_id", ns)
        kegg = pathway.find("hmdb:kegg_map_id", ns)
        if name is not None:
            pathways.append({
                "name": name.text,
                "smpdb_id": smpdb.text if smpdb is not None else None,
                "kegg_id": kegg.text if kegg is not None else None
            })
    return pathways

Configuration

Parameter	Description	Default
`api_format`	Response format (xml, json)	`xml`
`mass_tolerance_ppm`	Mass search tolerance	`10`
`adduct_type`	MS adduct for mass search	`[M+H]+`
`biospecimen_filter`	Filter by biofluid type	None (all)
`include_spectra`	Include reference spectra	`false`

Best Practices

Use HMDB IDs for unambiguous lookup. Metabolite names can be ambiguous (e.g., "glucose" vs. "D-glucose" vs. "alpha-D-glucose"). Use the HMDB ID (HMDB0000122) for precise identification.
Check concentration units carefully. HMDB reports concentrations in various units (µM, mM, mg/dL) depending on the biofluid and study. Always verify units before comparing values across metabolites or studies.
Use mass-based search for untargeted metabolomics. When identifying unknown peaks from LC-MS data, search by exact mass with appropriate adduct types ([M+H]+, [M-H]-, [M+Na]+) and a mass tolerance of 5-10 ppm.
Cross-reference with KEGG for pathway context. HMDB provides metabolite-level detail; KEGG provides pathway-level context. Use HMDB for identification and properties, KEGG for understanding metabolic context and enzyme connections.
Download the full database for batch analyses. For metabolomics studies identifying hundreds of metabolites, download the HMDB XML dump rather than making individual API calls. Parse the XML locally for much faster batch lookups.

Common Issues

XML parsing fails on special characters. Some HMDB entries contain non-standard characters in descriptions. Use a lenient XML parser or encode the response content properly before parsing.

Concentration ranges vary widely. Normal metabolite concentrations depend on age, sex, diet, and measurement method. Report ranges rather than single values, and note the population and analytical method from the source study.

Mass search returns too many candidates. Exact mass alone often identifies multiple possible metabolites. Narrow results by: matching retention time to standards, checking isotope patterns, using MS/MS fragmentation data, and filtering by biological plausibility for the sample type.

⚠️ Loading Issue

Master Hmdb Database

Master HMDB Database

When to Use This Skill

Quick Start

Core Concepts

HMDB Data Categories

Search and Identification

Pathway Context

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace