M

Master Hmdb Database

All-in-one skill covering access, human, metabolome, database. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Master HMDB Database

A scientific computing skill for querying the Human Metabolome Database (HMDB) โ€” the comprehensive resource for human metabolite data including chemical, clinical, and molecular biology information for over 220,000 metabolite entries.

When to Use This Skill

Choose Master HMDB Database when:

  • Looking up metabolite properties (structure, classification, pathways)
  • Finding disease-associated metabolites and biomarkers
  • Retrieving metabolite concentration ranges in biofluids
  • Identifying metabolites from mass spectrometry or NMR data

Consider alternatives when:

  • You need drug information (use DrugBank)
  • You need enzyme kinetics (use BRENDA)
  • You need metabolic pathway analysis (use KEGG)
  • You need metabolomics data processing (use MetaboAnalyst or XCMS)

Quick Start

claude "Look up glucose in HMDB and find its concentration in blood"
import requests import xml.etree.ElementTree as ET # HMDB API hmdb_id = "HMDB0000122" # Glucose url = f"https://hmdb.ca/metabolites/{hmdb_id}.xml" response = requests.get(url) root = ET.fromstring(response.content) # Parse metabolite info ns = {"hmdb": "http://www.hmdb.ca"} name = root.find("hmdb:name", ns).text formula = root.find("hmdb:chemical_formula", ns).text avg_mass = root.find("hmdb:average_molecular_weight", ns).text print(f"Name: {name}") print(f"Formula: {formula}") print(f"Molecular Weight: {avg_mass}") # Get normal concentrations concentrations = root.findall(".//hmdb:normal_concentration", ns) for conc in concentrations[:5]: biofluid = conc.find("hmdb:biospecimen", ns) value = conc.find("hmdb:concentration_value", ns) units = conc.find("hmdb:concentration_units", ns) if biofluid is not None and value is not None: print(f" {biofluid.text}: {value.text} {units.text if units is not None else ''}")

Core Concepts

HMDB Data Categories

CategoryDescriptionExample
Chemical DataStructure, formula, propertiesMW, SMILES, InChI
TaxonomyChemical classificationOrganic acids, amino acids
Biological PropertiesPathways, enzymes, diseasesGlycolysis, diabetes
ConcentrationsNormal ranges in biofluidsBlood glucose: 3.9-5.6 mmol/L
SpectraNMR, MS reference spectraMass fragments, chemical shifts
OntologyMetabolite function classificationEnergy metabolism

Search and Identification

# Search metabolites by name search_url = "https://hmdb.ca/unearth/q" response = requests.get(search_url, params={ "query": "tryptophan", "searcher": "metabolites", "button": "" }) # Mass-based identification def search_by_mass(exact_mass, tolerance_ppm=10): """Search HMDB by exact mass for metabolite identification""" delta = exact_mass * tolerance_ppm / 1e6 mass_min = exact_mass - delta mass_max = exact_mass + delta url = "https://hmdb.ca/spectra/ms/search" response = requests.get(url, params={ "utf8": "โœ“", "query_masses": str(exact_mass), "tolerance": str(tolerance_ppm), "tolerance_units": "ppm", "adduct_type": "[M+H]+", "commit": "Search" }) return response # Example: Search for metabolite with m/z 205.097 results = search_by_mass(205.097)

Pathway Context

def get_metabolite_pathways(hmdb_id): """Get metabolic pathways for a metabolite""" url = f"https://hmdb.ca/metabolites/{hmdb_id}.xml" response = requests.get(url) root = ET.fromstring(response.content) ns = {"hmdb": "http://www.hmdb.ca"} pathways = [] for pathway in root.findall(".//hmdb:pathway", ns): name = pathway.find("hmdb:name", ns) smpdb = pathway.find("hmdb:smpdb_id", ns) kegg = pathway.find("hmdb:kegg_map_id", ns) if name is not None: pathways.append({ "name": name.text, "smpdb_id": smpdb.text if smpdb is not None else None, "kegg_id": kegg.text if kegg is not None else None }) return pathways

Configuration

ParameterDescriptionDefault
api_formatResponse format (xml, json)xml
mass_tolerance_ppmMass search tolerance10
adduct_typeMS adduct for mass search[M+H]+
biospecimen_filterFilter by biofluid typeNone (all)
include_spectraInclude reference spectrafalse

Best Practices

  1. Use HMDB IDs for unambiguous lookup. Metabolite names can be ambiguous (e.g., "glucose" vs. "D-glucose" vs. "alpha-D-glucose"). Use the HMDB ID (HMDB0000122) for precise identification.

  2. Check concentration units carefully. HMDB reports concentrations in various units (ยตM, mM, mg/dL) depending on the biofluid and study. Always verify units before comparing values across metabolites or studies.

  3. Use mass-based search for untargeted metabolomics. When identifying unknown peaks from LC-MS data, search by exact mass with appropriate adduct types ([M+H]+, [M-H]-, [M+Na]+) and a mass tolerance of 5-10 ppm.

  4. Cross-reference with KEGG for pathway context. HMDB provides metabolite-level detail; KEGG provides pathway-level context. Use HMDB for identification and properties, KEGG for understanding metabolic context and enzyme connections.

  5. Download the full database for batch analyses. For metabolomics studies identifying hundreds of metabolites, download the HMDB XML dump rather than making individual API calls. Parse the XML locally for much faster batch lookups.

Common Issues

XML parsing fails on special characters. Some HMDB entries contain non-standard characters in descriptions. Use a lenient XML parser or encode the response content properly before parsing.

Concentration ranges vary widely. Normal metabolite concentrations depend on age, sex, diet, and measurement method. Report ranges rather than single values, and note the population and analytical method from the source study.

Mass search returns too many candidates. Exact mass alone often identifies multiple possible metabolites. Narrow results by: matching retention time to standards, checking isotope patterns, using MS/MS fragmentation data, and filtering by biological plausibility for the sample type.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates