Pro PyOpenMS

Process mass spectrometry data for proteomics and metabolomics using PyOpenMS, the Python bindings for the OpenMS library. This skill covers raw data handling, peak picking, feature detection, peptide identification, and building automated LC-MS/MS analysis workflows.

When to Use This Skill

Choose Pro PyOpenMS when you need to:

Read and manipulate raw mass spectrometry data (mzML, mzXML)
Perform peak picking, centroiding, and feature detection on LC-MS data
Build custom proteomics or metabolomics analysis pipelines
Process MS/MS spectra for peptide or metabolite identification

Consider alternatives when:

You need spectral library matching for metabolomics (use matchms)
You need statistical analysis of quantified features (use PyDESeq2 or limma)
You need GUI-based proteomics analysis (use MaxQuant or FragPipe)

Quick Start


pip install pyopenms


import pyopenms as oms

# Load an mzML file
exp = oms.MSExperiment()
oms.MzMLFile().load("sample.mzML", exp)

print(f"Spectra: {exp.getNrSpectra()}")
print(f"Chromatograms: {exp.getNrChromatograms()}")

# Access individual spectra
for i, spec in enumerate(exp):
    if i >= 3:
        break
    print(f"Spectrum {i}: MS{spec.getMSLevel()}, "
          f"RT={spec.getRT():.1f}s, "
          f"Peaks={spec.size()}")

Core Concepts

Data Processing Pipeline

Step	Class	Description
Load data	`MzMLFile`	Read raw MS data
Centroiding	`PeakPickerHiRes`	Convert profile to centroid
Smoothing	`GaussFilter`	Noise reduction
Feature detection	`FeatureFinder`	Find LC-MS features
Map alignment	`MapAlignmentAlgorithm`	Retention time correction
Feature linking	`FeatureGroupingAlgorithm`	Match features across runs
ID search	`SimpleSearchEngine`	Peptide identification

Feature Detection Workflow


import pyopenms as oms

def detect_features(mzml_path):
    """Detect features from LC-MS data."""
    # Load data
    exp = oms.MSExperiment()
    oms.MzMLFile().load(mzml_path, exp)

    # Peak picking (profile → centroid)
    picker = oms.PeakPickerHiRes()
    picked_exp = oms.MSExperiment()
    picker.pickExperiment(exp, picked_exp)

    # Feature detection
    ff = oms.FeatureFinder()
    ff_params = oms.FeatureFinderAlgorithmPicked().getDefaults()

    features = oms.FeatureMap()
    seeds = oms.FeatureMap()

    ff.run(
        "centroided",
        picked_exp,
        features,
        ff_params,
        seeds
    )

    print(f"Detected {features.size()} features")

    # Extract feature information
    for i in range(min(10, features.size())):
        f = features[i]
        print(f"  m/z={f.getMZ():.4f}, RT={f.getRT():.1f}s, "
              f"intensity={f.getIntensity():.0f}, "
              f"charge={f.getCharge()}")

    return features

features = detect_features("sample.mzML")

Multi-Sample Analysis


import pyopenms as oms
from pathlib import Path

def multi_sample_analysis(mzml_files):
    """Process multiple LC-MS runs with alignment and linking."""
    feature_maps = []

    for mzml_path in mzml_files:
        # Load and pick peaks
        exp = oms.MSExperiment()
        oms.MzMLFile().load(str(mzml_path), exp)

        picker = oms.PeakPickerHiRes()
        picked = oms.MSExperiment()
        picker.pickExperiment(exp, picked)

        # Detect features
        ff = oms.FeatureFinder()
        features = oms.FeatureMap()
        seeds = oms.FeatureMap()
        ff.run("centroided", picked, features,
               oms.FeatureFinderAlgorithmPicked().getDefaults(), seeds)

        feature_maps.append(features)
        print(f"{Path(mzml_path).name}: {features.size()} features")

    # Align retention times
    aligner = oms.MapAlignmentAlgorithmPoseClustering()
    reference = feature_maps[0]
    for i in range(1, len(feature_maps)):
        transformations = []
        aligner.align(reference, feature_maps[i], transformations)

    # Link features across samples
    linker = oms.FeatureGroupingAlgorithmQT()
    consensus = oms.ConsensusMap()
    linker.group(feature_maps, consensus)

    print(f"\nConsensus features: {consensus.size()}")
    return consensus

files = list(Path("data/").glob("*.mzML"))
consensus = multi_sample_analysis(files)

Configuration

Parameter	Description	Default
`signal_to_noise`	S/N threshold for peak picking	`1.0`
`mass_tolerance`	m/z tolerance for feature detection (ppm)	`10`
`rt_tolerance`	Retention time tolerance (seconds)	`30`
`min_trace_length`	Minimum chromatographic trace length	`5`
`max_charge`	Maximum charge state to consider	`5`
`isotope_filtering`	Isotope pattern filtering stringency	`"relaxed"`

Best Practices

Centroid data before feature detection — Most feature detection algorithms require centroided data. Apply PeakPickerHiRes to profile-mode data before running FeatureFinder. Running feature detection on profile data produces poor results or fails entirely.
Check MS levels in your data — Verify whether your data contains MS1 only or MS1+MS2 spectra with spec.getMSLevel(). Feature detection works on MS1 data; peptide identification requires MS2 spectra. Processing the wrong MS level produces empty results.
Set mass tolerance based on your instrument — Use 5-10 ppm for Orbitrap instruments and 20-50 ppm for TOF instruments. Setting tolerance too tight misses real features; too wide creates false features from noise peaks or neighboring ions.
Align retention times before feature linking — Chromatographic drift between runs causes features to be missed during linking. Always apply MapAlignmentAlgorithmPoseClustering before FeatureGroupingAlgorithmQT when analyzing multiple samples.
Export results to standard formats — Save feature maps as featureXML and consensus maps as consensusXML for interoperability with other tools. For downstream statistics, export to CSV with TextFile writer or convert to pandas DataFrame.

Common Issues

Feature detection finds zero features — Check that your data is centroided (not profile mode). Also verify that signal_to_noise isn't too high for your data quality. Lower the threshold to 0.5-1.0 for noisy data. Print spectrum statistics to confirm peaks exist in the expected m/z and RT ranges.

Retention time alignment fails — Alignment requires sufficient common features between runs. If samples have very different compositions, alignment will fail. Ensure at least some shared analytes exist across all runs, or manually specify landmark features for alignment.

Memory errors with large mzML files — High-resolution LC-MS files can be several gigabytes. Use oms.OnDiscMSExperiment for memory-mapped access instead of loading everything into RAM. Process spectra one at a time for operations that don't require random access.

⚠️ Loading Issue

Pro Pyopenms

Pro PyOpenMS

When to Use This Skill

Quick Start

Core Concepts

Data Processing Pipeline

Feature Detection Workflow

Multi-Sample Analysis

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace