Ultimate Metabolomics Workbench Database
Boost productivity using this access, metabolomics, workbench, rest. Includes structured workflows, validation checks, and reusable patterns for scientific.
Ultimate Metabolomics Workbench Database
Access and analyze metabolomics research data from the Metabolomics Workbench, a comprehensive NIH-funded repository of metabolomics studies. This skill covers programmatic API access to studies, metabolite data, chemical structures, and analysis results for metabolomics research.
When to Use This Skill
Choose Ultimate Metabolomics Workbench Database when you need to:
- Query publicly available metabolomics study data and results
- Search for metabolite concentrations across diseases and tissues
- Download and reanalyze raw or processed metabolomics datasets
- Cross-reference metabolites with pathways and biological functions
Consider alternatives when:
- You need metabolite identification from raw spectra (use matchms or SIRIUS)
- You need metabolic pathway analysis (use KEGG or MetaboAnalyst)
- You need protein-level metabolomics data (use proteomics databases)
Quick Start
# Install required packages pip install requests pandas matplotlib
import requests import pandas as pd BASE_URL = "https://www.metabolomicsworkbench.org/rest" # Search for studies by disease response = requests.get(f"{BASE_URL}/study/study_title/cancer/summary") studies = response.json() print(f"Found {len(studies)} cancer-related studies") # Get metabolites from a specific study study_id = "ST000001" response = requests.get(f"{BASE_URL}/study/study_id/{study_id}/metabolites") metabolites = response.json() df = pd.DataFrame(metabolites) print(f"Study {study_id}: {len(df)} metabolites") print(df[["metabolite_name", "refmet_name"]].head(10))
Core Concepts
REST API Endpoints
| Endpoint | Description | Example |
|---|---|---|
/study/study_id/{id}/summary | Study metadata and design | Study details |
/study/study_id/{id}/metabolites | Metabolites measured in study | Metabolite list |
/study/study_id/{id}/data | Concentration/intensity data | Raw measurements |
/compound/name/{name}/all | Compound information by name | Chemical properties |
/compound/regno/{id}/all | Compound by registry number | Cross-references |
/refmet/name/{name} | Standardized metabolite name | RefMet mapping |
/study/disease/{disease}/summary | Studies by disease category | Disease search |
Metabolite Data Retrieval and Analysis
import requests import pandas as pd import numpy as np class MetabolomicsWorkbench: BASE_URL = "https://www.metabolomicsworkbench.org/rest" def get_study_data(self, study_id): """Get full metabolomics data matrix for a study.""" response = requests.get( f"{self.BASE_URL}/study/study_id/{study_id}/data" ) return response.json() def get_study_factors(self, study_id): """Get experimental factors (groups) for a study.""" response = requests.get( f"{self.BASE_URL}/study/study_id/{study_id}/factors" ) return response.json() def search_metabolite(self, name): """Search for a metabolite across all studies.""" response = requests.get( f"{self.BASE_URL}/compound/name/{name}/all" ) return response.json() def differential_analysis(self, study_id): """Perform basic differential analysis between groups.""" data = self.get_study_data(study_id) factors = self.get_study_factors(study_id) df_data = pd.DataFrame(data) df_factors = pd.DataFrame(factors) # Merge data with group assignments merged = df_data.merge(df_factors, on="sample_id") # Get unique groups groups = merged["group"].unique() if len(groups) < 2: return None results = [] metabolite_cols = [c for c in merged.columns if c not in ["sample_id", "group"]] from scipy.stats import mannwhitneyu for col in metabolite_cols: g1 = merged[merged["group"] == groups[0]][col].astype(float) g2 = merged[merged["group"] == groups[1]][col].astype(float) if len(g1) > 1 and len(g2) > 1: stat, pval = mannwhitneyu(g1, g2, alternative="two-sided") fc = g2.mean() / g1.mean() if g1.mean() > 0 else np.inf results.append({ "metabolite": col, "fold_change": fc, "log2_fc": np.log2(fc) if fc > 0 else np.nan, "p_value": pval, "group1_mean": g1.mean(), "group2_mean": g2.mean() }) df_results = pd.DataFrame(results) df_results["fdr"] = df_results["p_value"] * len(df_results) / \ (df_results["p_value"].rank()) return df_results.sort_values("p_value") mw = MetabolomicsWorkbench()
Cross-Study Metabolite Comparison
def compare_metabolite_across_studies(metabolite_name, study_ids): """Compare a metabolite's levels across multiple studies.""" mw = MetabolomicsWorkbench() comparison = [] for study_id in study_ids: try: data = mw.get_study_data(study_id) df = pd.DataFrame(data) if metabolite_name in df.columns: values = df[metabolite_name].astype(float).dropna() comparison.append({ "study_id": study_id, "n_samples": len(values), "mean": values.mean(), "std": values.std(), "median": values.median() }) except Exception as e: print(f"Error with {study_id}: {e}") return pd.DataFrame(comparison) # Compare glucose levels across studies results = compare_metabolite_across_studies( "Glucose", ["ST000001", "ST000002", "ST000003"] ) print(results)
Configuration
| Parameter | Description | Default |
|---|---|---|
base_url | Metabolomics Workbench REST API URL | "https://www.metabolomicsworkbench.org/rest" |
output_format | Response format | "json" |
timeout | API request timeout (seconds) | 30 |
cache_responses | Cache API responses locally | true |
refmet_mapping | Map to standardized RefMet names | true |
significance_threshold | p-value cutoff for differential analysis | 0.05 |
Best Practices
-
Use RefMet standardized names — Metabolite naming varies across studies (e.g., "Glucose" vs "D-Glucose" vs "Glc"). Use the RefMet API endpoint to map all metabolite names to standardized nomenclature before cross-study comparisons.
-
Check study design before analysis — Review the study's factors, sample sizes, and experimental design through the summary endpoint before downloading data. Some studies have confounding variables or unbalanced groups that require special statistical handling.
-
Apply appropriate normalization — Raw metabolomics data varies in scale and distribution across platforms (NMR vs MS). Apply log transformation and median normalization before comparing across samples. Never compare raw intensities between different analytical platforms.
-
Cache API responses for large analyses — The Metabolomics Workbench API can be slow for bulk data retrieval. Cache responses locally in JSON or pickle files to avoid repeated downloads during iterative analysis.
-
Report metabolite identifiers consistently — Include InChIKey, HMDB ID, or KEGG compound ID alongside metabolite names in your results. This enables unambiguous cross-referencing with other databases and reproducing your analysis.
Common Issues
API returns empty results for valid study IDs — Some studies are registered but not yet publicly released. Check the study's status through the summary endpoint — only studies with status "public" have downloadable data. Private studies return empty arrays without error messages.
Metabolite names don't match across studies — The same metabolite often has different names in different studies due to inconsistent annotation. Use the RefMet mapping endpoint (/refmet/name/{name}) to resolve synonyms. When exact matches fail, try partial name matching or search by InChIKey.
Numeric data contains mixed types and missing values — Study data matrices may contain text annotations mixed with numeric values (e.g., "ND" for not detected, "<LOD" for below limit). Convert to numeric with pd.to_numeric(column, errors='coerce') and handle NaN values explicitly rather than treating them as zeros.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.