P

Pro Pyopenms

Production-ready skill that handles python, interface, openms, mass. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Pro PyOpenMS

Process mass spectrometry data for proteomics and metabolomics using PyOpenMS, the Python bindings for the OpenMS library. This skill covers raw data handling, peak picking, feature detection, peptide identification, and building automated LC-MS/MS analysis workflows.

When to Use This Skill

Choose Pro PyOpenMS when you need to:

  • Read and manipulate raw mass spectrometry data (mzML, mzXML)
  • Perform peak picking, centroiding, and feature detection on LC-MS data
  • Build custom proteomics or metabolomics analysis pipelines
  • Process MS/MS spectra for peptide or metabolite identification

Consider alternatives when:

  • You need spectral library matching for metabolomics (use matchms)
  • You need statistical analysis of quantified features (use PyDESeq2 or limma)
  • You need GUI-based proteomics analysis (use MaxQuant or FragPipe)

Quick Start

pip install pyopenms
import pyopenms as oms # Load an mzML file exp = oms.MSExperiment() oms.MzMLFile().load("sample.mzML", exp) print(f"Spectra: {exp.getNrSpectra()}") print(f"Chromatograms: {exp.getNrChromatograms()}") # Access individual spectra for i, spec in enumerate(exp): if i >= 3: break print(f"Spectrum {i}: MS{spec.getMSLevel()}, " f"RT={spec.getRT():.1f}s, " f"Peaks={spec.size()}")

Core Concepts

Data Processing Pipeline

StepClassDescription
Load dataMzMLFileRead raw MS data
CentroidingPeakPickerHiResConvert profile to centroid
SmoothingGaussFilterNoise reduction
Feature detectionFeatureFinderFind LC-MS features
Map alignmentMapAlignmentAlgorithmRetention time correction
Feature linkingFeatureGroupingAlgorithmMatch features across runs
ID searchSimpleSearchEnginePeptide identification

Feature Detection Workflow

import pyopenms as oms def detect_features(mzml_path): """Detect features from LC-MS data.""" # Load data exp = oms.MSExperiment() oms.MzMLFile().load(mzml_path, exp) # Peak picking (profile → centroid) picker = oms.PeakPickerHiRes() picked_exp = oms.MSExperiment() picker.pickExperiment(exp, picked_exp) # Feature detection ff = oms.FeatureFinder() ff_params = oms.FeatureFinderAlgorithmPicked().getDefaults() features = oms.FeatureMap() seeds = oms.FeatureMap() ff.run( "centroided", picked_exp, features, ff_params, seeds ) print(f"Detected {features.size()} features") # Extract feature information for i in range(min(10, features.size())): f = features[i] print(f" m/z={f.getMZ():.4f}, RT={f.getRT():.1f}s, " f"intensity={f.getIntensity():.0f}, " f"charge={f.getCharge()}") return features features = detect_features("sample.mzML")

Multi-Sample Analysis

import pyopenms as oms from pathlib import Path def multi_sample_analysis(mzml_files): """Process multiple LC-MS runs with alignment and linking.""" feature_maps = [] for mzml_path in mzml_files: # Load and pick peaks exp = oms.MSExperiment() oms.MzMLFile().load(str(mzml_path), exp) picker = oms.PeakPickerHiRes() picked = oms.MSExperiment() picker.pickExperiment(exp, picked) # Detect features ff = oms.FeatureFinder() features = oms.FeatureMap() seeds = oms.FeatureMap() ff.run("centroided", picked, features, oms.FeatureFinderAlgorithmPicked().getDefaults(), seeds) feature_maps.append(features) print(f"{Path(mzml_path).name}: {features.size()} features") # Align retention times aligner = oms.MapAlignmentAlgorithmPoseClustering() reference = feature_maps[0] for i in range(1, len(feature_maps)): transformations = [] aligner.align(reference, feature_maps[i], transformations) # Link features across samples linker = oms.FeatureGroupingAlgorithmQT() consensus = oms.ConsensusMap() linker.group(feature_maps, consensus) print(f"\nConsensus features: {consensus.size()}") return consensus files = list(Path("data/").glob("*.mzML")) consensus = multi_sample_analysis(files)

Configuration

ParameterDescriptionDefault
signal_to_noiseS/N threshold for peak picking1.0
mass_tolerancem/z tolerance for feature detection (ppm)10
rt_toleranceRetention time tolerance (seconds)30
min_trace_lengthMinimum chromatographic trace length5
max_chargeMaximum charge state to consider5
isotope_filteringIsotope pattern filtering stringency"relaxed"

Best Practices

  1. Centroid data before feature detection — Most feature detection algorithms require centroided data. Apply PeakPickerHiRes to profile-mode data before running FeatureFinder. Running feature detection on profile data produces poor results or fails entirely.

  2. Check MS levels in your data — Verify whether your data contains MS1 only or MS1+MS2 spectra with spec.getMSLevel(). Feature detection works on MS1 data; peptide identification requires MS2 spectra. Processing the wrong MS level produces empty results.

  3. Set mass tolerance based on your instrument — Use 5-10 ppm for Orbitrap instruments and 20-50 ppm for TOF instruments. Setting tolerance too tight misses real features; too wide creates false features from noise peaks or neighboring ions.

  4. Align retention times before feature linking — Chromatographic drift between runs causes features to be missed during linking. Always apply MapAlignmentAlgorithmPoseClustering before FeatureGroupingAlgorithmQT when analyzing multiple samples.

  5. Export results to standard formats — Save feature maps as featureXML and consensus maps as consensusXML for interoperability with other tools. For downstream statistics, export to CSV with TextFile writer or convert to pandas DataFrame.

Common Issues

Feature detection finds zero features — Check that your data is centroided (not profile mode). Also verify that signal_to_noise isn't too high for your data quality. Lower the threshold to 0.5-1.0 for noisy data. Print spectrum statistics to confirm peaks exist in the expected m/z and RT ranges.

Retention time alignment fails — Alignment requires sufficient common features between runs. If samples have very different compositions, alignment will fail. Ensure at least some shared analytes exist across all runs, or manually specify landmark features for alignment.

Memory errors with large mzML files — High-resolution LC-MS files can be several gigabytes. Use oms.OnDiscMSExperiment for memory-mapped access instead of loading everything into RAM. Process spectra one at a time for operations that don't require random access.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates