Pathml Kit
Boost productivity using this computational, pathology, toolkit, analyzing. Includes structured workflows, validation checks, and reusable patterns for scientific.
PathML Kit
Process and analyze whole-slide pathology images using PathML, a Python toolkit for computational pathology. This skill covers slide preprocessing, tissue detection, tile extraction, stain normalization, feature extraction, and machine learning workflows for digital pathology.
When to Use This Skill
Choose PathML Kit when you need to:
- Preprocess whole-slide images (WSI) for machine learning pipelines
- Extract and normalize tissue tiles from H&E or IHC-stained slides
- Build computational pathology workflows with consistent preprocessing
- Apply pre-trained pathology models or train custom classifiers on slide data
Consider alternatives when:
- You need radiology image analysis (use MONAI or TorchIO)
- You need basic image processing without pathology context (use scikit-image or OpenCV)
- You need manual slide annotation without automation (use QuPath)
Quick Start
# Install PathML pip install pathml
from pathml.core import SlideData from pathml.preprocessing import Pipeline, BoxBlur, TissueDetectionHE # Load a whole-slide image slide = SlideData("tumor_sample.svs", name="tumor_001") print(f"Slide dimensions: {slide.slide.dimensions}") print(f"Magnification: {slide.slide.magnification}") print(f"Num levels: {slide.slide.level_count}") # Create preprocessing pipeline pipeline = Pipeline([ BoxBlur(kernel_size=15), TissueDetectionHE( mask_name="tissue", min_region_size=5000, threshold=30 ) ]) # Run pipeline on the slide slide.run(pipeline, tile_size=256, level=0) print(f"Extracted {len(slide.tiles)} tissue tiles")
Core Concepts
Pipeline Components
| Component | Purpose | Parameters |
|---|---|---|
TissueDetectionHE | Detect tissue in H&E slides | threshold, min_region_size |
StainNormalization | Normalize staining variations | target, method |
BoxBlur | Gaussian smoothing | kernel_size |
BinaryThreshold | Binary mask creation | threshold |
MorphOpen/MorphClose | Morphological operations | kernel_size |
TileExtractor | Extract tiles from regions | tile_size, stride |
ForegroundDetection | General foreground segmentation | min_area |
Complete Pathology Workflow
from pathml.core import SlideData, Tile from pathml.preprocessing import Pipeline from pathml.preprocessing.transforms import ( TissueDetectionHE, StainNormalizationMacenko, MedianBlur ) import numpy as np def process_slide(slide_path, tile_size=256, target_mpp=0.5): """Full pathology preprocessing pipeline.""" slide = SlideData(slide_path) # Choose appropriate level for target resolution level = 0 if hasattr(slide.slide, "mpp") and slide.slide.mpp: scale = target_mpp / slide.slide.mpp level = int(np.log2(scale)) if scale > 1 else 0 # Build preprocessing pipeline pipeline = Pipeline([ MedianBlur(kernel_size=5), TissueDetectionHE( mask_name="tissue", min_region_size=10000, threshold=25 ), StainNormalizationMacenko(target="reference_slide.svs") ]) # Run pipeline slide.run( pipeline, tile_size=tile_size, level=level, overwrite_existing=True ) # Extract tiles with sufficient tissue content tissue_tiles = [] for tile in slide.tiles: mask = tile.masks.get("tissue", None) if mask is not None: tissue_fraction = mask.sum() / mask.size if tissue_fraction > 0.5: # >50% tissue tissue_tiles.append(tile) print(f"Total tiles: {len(slide.tiles)}") print(f"Tissue tiles (>50%): {len(tissue_tiles)}") return slide, tissue_tiles slide, tiles = process_slide("specimen.svs")
Feature Extraction for ML
import torch import torchvision.models as models import torchvision.transforms as T import numpy as np def extract_tile_features(tiles, model_name="resnet50"): """Extract deep learning features from tissue tiles.""" # Load pre-trained model model = getattr(models, model_name)(pretrained=True) model = torch.nn.Sequential(*list(model.children())[:-1]) # Remove FC model.eval() transform = T.Compose([ T.ToPILImage(), T.Resize(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) features = [] with torch.no_grad(): for tile in tiles: img = tile.image tensor = transform(img).unsqueeze(0) feat = model(tensor).squeeze().numpy() features.append(feat) feature_matrix = np.stack(features) print(f"Feature matrix: {feature_matrix.shape}") return feature_matrix # Extract features for downstream classification features = extract_tile_features(tiles)
Configuration
| Parameter | Description | Default |
|---|---|---|
tile_size | Tile dimensions in pixels | 256 |
level | Pyramid level for processing | 0 (highest resolution) |
stride | Tile extraction stride | Equal to tile_size |
tissue_threshold | Minimum tissue fraction per tile | 0.5 |
stain_method | Normalization method (Macenko, Vahadane) | "macenko" |
target_mpp | Target microns per pixel | 0.5 |
Best Practices
-
Start at a lower resolution for tissue detection — Run tissue detection at level 2 or 3 (lower resolution) to save memory and time, then apply the tissue mask to extract tiles at full resolution. Tissue boundaries don't need pixel-level precision.
-
Normalize staining before feature extraction — H&E staining intensity varies significantly between labs and even between slides from the same lab. Apply Macenko or Vahadane stain normalization to a consistent reference before training ML models. Without normalization, models learn staining variation rather than morphology.
-
Filter tiles by tissue content — Many tiles from slide edges contain mostly background (white space). Set a minimum tissue fraction threshold (50-70%) to exclude low-information tiles. This reduces dataset size and prevents the model from learning to classify background.
-
Use multiple instance learning for slide-level labels — Most clinical labels apply to the entire slide, not individual tiles. Use MIL (multiple instance learning) approaches like attention-based pooling to aggregate tile-level features into slide-level predictions.
-
Store tile coordinates for spatial analysis — When extracting tiles, record the (x, y) coordinates of each tile in the original slide. This enables spatial analysis, heatmap generation, and mapping predictions back to specific regions of the slide.
Common Issues
Slide loading fails with "unsupported format" — PathML relies on OpenSlide for reading WSI formats (.svs, .ndpi, .mrxs). Install OpenSlide system library: brew install openslide (macOS) or apt-get install openslide-tools (Linux). If the format is truly unsupported, convert to TIFF first using bioformats.
Memory errors on large whole-slide images — WSIs at full resolution can be 100,000+ pixels wide. Never load the entire slide at once. Use PathML's tile-based processing pipeline which loads and processes one tile at a time, or work at a lower pyramid level for initial analysis.
Stain normalization changes tissue appearance dramatically — If the normalized output looks wrong (inverted colors, purple tissue turning blue), verify that the reference slide has typical H&E staining. The normalization target must be a representative high-quality slide. Also check that the input slide is actually H&E, not IHC or special stain.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.