Pro Pyhealth
Powerful skill for comprehensive, healthcare, toolkit, developing. Includes structured workflows, validation checks, and reusable patterns for scientific.
Pro PyHealth
Build healthcare AI models using PyHealth, a comprehensive Python library for clinical machine learning. This skill covers patient data processing, clinical prediction tasks, model training, evaluation, and deployment for electronic health record (EHR) analysis.
When to Use This Skill
Choose Pro PyHealth when you need to:
- Build predictive models from electronic health record (EHR) data
- Handle clinical coding systems (ICD, CPT, ATC, NDC, LOINC)
- Train models for clinical tasks (mortality, readmission, drug recommendation)
- Process temporal patient data with visit-level and event-level representations
Consider alternatives when:
- You need medical image analysis (use MONAI or TorchXRayVision)
- You need clinical NLP and text extraction (use MedSpaCy or SciSpaCy)
- You need real-time clinical decision support systems (use dedicated CDSS platforms)
Quick Start
pip install pyhealth
from pyhealth.datasets import MIMIC3Dataset from pyhealth.tasks import mortality_prediction_mimic3_fn from pyhealth.models import Transformer from pyhealth.trainer import Trainer # Load MIMIC-III dataset dataset = MIMIC3Dataset( root="/path/to/mimic3", tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"], code_mapping={"ICD9CM": "CCSCM", "ICD9PROC": "CCSPROC", "NDC": "ATC"} ) # Set up prediction task mimic3_ds = dataset.set_task(mortality_prediction_mimic3_fn) # Split data train_ds, val_ds, test_ds = mimic3_ds.split( ratios=[0.8, 0.1, 0.1], seed=42 ) # Build and train model model = Transformer( dataset=train_ds, feature_keys=["conditions", "procedures", "drugs"], label_key="label", mode="binary" ) trainer = Trainer(model=model, metrics=["pr_auc", "roc_auc", "f1"]) trainer.train( train_dataloader=train_ds.get_dataloader(batch_size=64), val_dataloader=val_ds.get_dataloader(batch_size=64), epochs=20 )
Core Concepts
Supported Data Sources and Tasks
| Dataset | Source | Patients | Tasks Available |
|---|---|---|---|
| MIMIC-III | Beth Israel | 46K | Mortality, readmission, LOS, drug rec |
| MIMIC-IV | Beth Israel | 315K | Same as MIMIC-III + more |
| eICU | Philips | 200K | ICU mortality, LOS prediction |
| OMOP | Various | Custom | Configurable clinical tasks |
| Task | Type | Clinical Use |
|---|---|---|
| Mortality prediction | Binary classification | ICU risk stratification |
| Readmission prediction | Binary classification | Discharge planning |
| Length of stay | Regression / Multi-class | Resource planning |
| Drug recommendation | Multi-label classification | Medication management |
| Diagnosis prediction | Multi-label classification | Clinical decision support |
Custom Clinical Task Definition
from pyhealth.datasets import SampleDataset def custom_readmission_task(patient): """Define 30-day readmission prediction task.""" samples = [] visits = patient.visits for i in range(len(visits) - 1): current_visit = visits[i] next_visit = visits[i + 1] # Calculate days between visits days_gap = (next_visit.encounter_time - current_visit.encounter_time).days # Features from current visit conditions = current_visit.get_code_list("DIAGNOSES_ICD") procedures = current_visit.get_code_list("PROCEDURES_ICD") drugs = current_visit.get_code_list("PRESCRIPTIONS") # Label: readmitted within 30 days label = 1 if days_gap <= 30 else 0 samples.append({ "visit_id": current_visit.visit_id, "patient_id": patient.patient_id, "conditions": conditions, "procedures": procedures, "drugs": drugs, "label": label }) return samples # Apply custom task task_dataset = dataset.set_task(custom_readmission_task) print(f"Samples: {len(task_dataset)}") print(f"Positive rate: {sum(s['label'] for s in task_dataset) / len(task_dataset):.2%}")
Configuration
| Parameter | Description | Default |
|---|---|---|
tables | EHR tables to load | ["DIAGNOSES_ICD"] |
code_mapping | Code system transformations | {} |
feature_keys | Input features for the model | Task-dependent |
label_key | Target variable column | "label" |
mode | Prediction type (binary, multi-label, multiclass) | "binary" |
batch_size | Training batch size | 64 |
epochs | Number of training epochs | 20 |
learning_rate | Optimizer learning rate | 1e-3 |
Best Practices
-
Use code mapping to standardize clinical codes — Map granular codes (10,000+ ICD-9 codes) to clinical groupings (CCS categories, ~300 groups) with
code_mapping. This reduces vocabulary size, improves generalization, and makes results clinically interpretable. -
Handle class imbalance in clinical tasks — Most clinical prediction tasks are heavily imbalanced (e.g., 5% mortality rate). Use PR-AUC instead of ROC-AUC as the primary metric, apply class weighting in the loss function, or oversample the minority class.
-
Split by patient, not by visit — Always split data so all visits from one patient appear in the same split. Random visit-level splitting leaks information (a patient's earlier visit in training helps predict their later visit in test), producing optimistically biased metrics.
-
Start with simple models before complex architectures — Logistic regression or GRU models often perform within a few percentage points of Transformer models on tabular EHR data. Establish baselines with simpler models before investing in complex architectures.
-
Report calibration alongside discrimination — A model with high AUC but poor calibration gives overconfident or underconfident predictions. Use calibration plots and Brier scores to assess whether predicted probabilities match observed event rates, which is critical for clinical decision-making.
Common Issues
Dataset loading fails with memory errors — MIMIC-III/IV datasets are large. Use tables parameter to load only the tables you need rather than all tables. For MIMIC-IV with 300K+ patients, consider loading a subset first with patient ID filtering for development.
Code mapping produces empty feature lists — Not all clinical codes map successfully to target ontologies. Check the mapping coverage with dataset.stat() and handle unmapped codes by either keeping the original code or filtering out visits with no mapped codes.
Model performance is surprisingly high — Suspiciously high AUC (>0.95) often indicates data leakage. Check that: (1) future data isn't used as features, (2) the label itself isn't included in features, (3) patient-level splitting is used, and (4) the task definition doesn't inadvertently include the outcome.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.