U

Ultimate Metabolomics Workbench Database

Boost productivity using this access, metabolomics, workbench, rest. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Ultimate Metabolomics Workbench Database

Access and analyze metabolomics research data from the Metabolomics Workbench, a comprehensive NIH-funded repository of metabolomics studies. This skill covers programmatic API access to studies, metabolite data, chemical structures, and analysis results for metabolomics research.

When to Use This Skill

Choose Ultimate Metabolomics Workbench Database when you need to:

  • Query publicly available metabolomics study data and results
  • Search for metabolite concentrations across diseases and tissues
  • Download and reanalyze raw or processed metabolomics datasets
  • Cross-reference metabolites with pathways and biological functions

Consider alternatives when:

  • You need metabolite identification from raw spectra (use matchms or SIRIUS)
  • You need metabolic pathway analysis (use KEGG or MetaboAnalyst)
  • You need protein-level metabolomics data (use proteomics databases)

Quick Start

# Install required packages pip install requests pandas matplotlib
import requests import pandas as pd BASE_URL = "https://www.metabolomicsworkbench.org/rest" # Search for studies by disease response = requests.get(f"{BASE_URL}/study/study_title/cancer/summary") studies = response.json() print(f"Found {len(studies)} cancer-related studies") # Get metabolites from a specific study study_id = "ST000001" response = requests.get(f"{BASE_URL}/study/study_id/{study_id}/metabolites") metabolites = response.json() df = pd.DataFrame(metabolites) print(f"Study {study_id}: {len(df)} metabolites") print(df[["metabolite_name", "refmet_name"]].head(10))

Core Concepts

REST API Endpoints

EndpointDescriptionExample
/study/study_id/{id}/summaryStudy metadata and designStudy details
/study/study_id/{id}/metabolitesMetabolites measured in studyMetabolite list
/study/study_id/{id}/dataConcentration/intensity dataRaw measurements
/compound/name/{name}/allCompound information by nameChemical properties
/compound/regno/{id}/allCompound by registry numberCross-references
/refmet/name/{name}Standardized metabolite nameRefMet mapping
/study/disease/{disease}/summaryStudies by disease categoryDisease search

Metabolite Data Retrieval and Analysis

import requests import pandas as pd import numpy as np class MetabolomicsWorkbench: BASE_URL = "https://www.metabolomicsworkbench.org/rest" def get_study_data(self, study_id): """Get full metabolomics data matrix for a study.""" response = requests.get( f"{self.BASE_URL}/study/study_id/{study_id}/data" ) return response.json() def get_study_factors(self, study_id): """Get experimental factors (groups) for a study.""" response = requests.get( f"{self.BASE_URL}/study/study_id/{study_id}/factors" ) return response.json() def search_metabolite(self, name): """Search for a metabolite across all studies.""" response = requests.get( f"{self.BASE_URL}/compound/name/{name}/all" ) return response.json() def differential_analysis(self, study_id): """Perform basic differential analysis between groups.""" data = self.get_study_data(study_id) factors = self.get_study_factors(study_id) df_data = pd.DataFrame(data) df_factors = pd.DataFrame(factors) # Merge data with group assignments merged = df_data.merge(df_factors, on="sample_id") # Get unique groups groups = merged["group"].unique() if len(groups) < 2: return None results = [] metabolite_cols = [c for c in merged.columns if c not in ["sample_id", "group"]] from scipy.stats import mannwhitneyu for col in metabolite_cols: g1 = merged[merged["group"] == groups[0]][col].astype(float) g2 = merged[merged["group"] == groups[1]][col].astype(float) if len(g1) > 1 and len(g2) > 1: stat, pval = mannwhitneyu(g1, g2, alternative="two-sided") fc = g2.mean() / g1.mean() if g1.mean() > 0 else np.inf results.append({ "metabolite": col, "fold_change": fc, "log2_fc": np.log2(fc) if fc > 0 else np.nan, "p_value": pval, "group1_mean": g1.mean(), "group2_mean": g2.mean() }) df_results = pd.DataFrame(results) df_results["fdr"] = df_results["p_value"] * len(df_results) / \ (df_results["p_value"].rank()) return df_results.sort_values("p_value") mw = MetabolomicsWorkbench()

Cross-Study Metabolite Comparison

def compare_metabolite_across_studies(metabolite_name, study_ids): """Compare a metabolite's levels across multiple studies.""" mw = MetabolomicsWorkbench() comparison = [] for study_id in study_ids: try: data = mw.get_study_data(study_id) df = pd.DataFrame(data) if metabolite_name in df.columns: values = df[metabolite_name].astype(float).dropna() comparison.append({ "study_id": study_id, "n_samples": len(values), "mean": values.mean(), "std": values.std(), "median": values.median() }) except Exception as e: print(f"Error with {study_id}: {e}") return pd.DataFrame(comparison) # Compare glucose levels across studies results = compare_metabolite_across_studies( "Glucose", ["ST000001", "ST000002", "ST000003"] ) print(results)

Configuration

ParameterDescriptionDefault
base_urlMetabolomics Workbench REST API URL"https://www.metabolomicsworkbench.org/rest"
output_formatResponse format"json"
timeoutAPI request timeout (seconds)30
cache_responsesCache API responses locallytrue
refmet_mappingMap to standardized RefMet namestrue
significance_thresholdp-value cutoff for differential analysis0.05

Best Practices

  1. Use RefMet standardized names — Metabolite naming varies across studies (e.g., "Glucose" vs "D-Glucose" vs "Glc"). Use the RefMet API endpoint to map all metabolite names to standardized nomenclature before cross-study comparisons.

  2. Check study design before analysis — Review the study's factors, sample sizes, and experimental design through the summary endpoint before downloading data. Some studies have confounding variables or unbalanced groups that require special statistical handling.

  3. Apply appropriate normalization — Raw metabolomics data varies in scale and distribution across platforms (NMR vs MS). Apply log transformation and median normalization before comparing across samples. Never compare raw intensities between different analytical platforms.

  4. Cache API responses for large analyses — The Metabolomics Workbench API can be slow for bulk data retrieval. Cache responses locally in JSON or pickle files to avoid repeated downloads during iterative analysis.

  5. Report metabolite identifiers consistently — Include InChIKey, HMDB ID, or KEGG compound ID alongside metabolite names in your results. This enables unambiguous cross-referencing with other databases and reproducing your analysis.

Common Issues

API returns empty results for valid study IDs — Some studies are registered but not yet publicly released. Check the study's status through the summary endpoint — only studies with status "public" have downloadable data. Private studies return empty arrays without error messages.

Metabolite names don't match across studies — The same metabolite often has different names in different studies due to inconsistent annotation. Use the RefMet mapping endpoint (/refmet/name/{name}) to resolve synonyms. When exact matches fail, try partial name matching or search by InChIKey.

Numeric data contains mixed types and missing values — Study data matrices may contain text annotations mixed with numeric values (e.g., "ND" for not detected, "<LOD" for below limit). Convert to numeric with pd.to_numeric(column, errors='coerce') and handle NaN values explicitly rather than treating them as zeros.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates