Ultimate Post-Training Framework

End-to-end post-training framework covering the full lifecycle from data curation through SFT, alignment (DPO/PPO/GRPO), safety tuning, evaluation, and deployment — with automated pipeline management and quality gates.

When to Use

Use this framework when:

Need a production-grade, end-to-end post-training system
Managing multiple training stages with quality checkpoints
Building reproducible training pipelines across model families
Teams collaborating on alignment research with shared infrastructure

Use simpler tools when:

Single-stage training (just SFT or just DPO) → use TRL
Quick experiments without pipeline overhead → use TRL or torchforge
Enterprise MoE-specific training → use Miles

Quick Start

Define Your Pipeline


# pipeline.yaml
name: "llama3-production-alignment"
base_model: "meta-llama/Llama-3.1-8B"
output_dir: "./training-outputs"

stages:
  sft:
    type: supervised_fine_tuning
    dataset: "your-org/instruction-data"
    config:
      learning_rate: 2e-5
      epochs: 3
      max_seq_len: 4096
      lora_r: 16

  alignment:
    type: dpo
    dataset: "your-org/preference-data"
    depends_on: sft
    config:
      beta: 0.1
      learning_rate: 5e-7
      epochs: 1

  safety:
    type: constitutional_ai
    dataset: "your-org/safety-data"
    depends_on: alignment
    config:
      principles: ["helpful", "harmless", "honest"]

  evaluation:
    type: benchmark
    depends_on: safety
    benchmarks: [mt_bench, alpaca_eval, safety_bench, mmlu]
    gates:
      mt_bench: ">= 7.5"
      safety_pass_rate: ">= 0.95"
      mmlu: ">= 0.65"

Execute


# Run full pipeline
post-train run pipeline.yaml

# Resume from alignment stage
post-train run pipeline.yaml --resume alignment

# Dry run (validate config without training)
post-train validate pipeline.yaml

Core Concepts

Pipeline Architecture

Data Curation → SFT → Alignment → Safety → Evaluation → Deployment
     |             |         |          |          |
     v             v         v          v          v
  Filters      LoRA/Full   DPO/PPO   ConstitAI   Benchmarks
  Dedup        Checkpts    GRPO      Red-team    Quality gates
  Quality                  REINFORCE  Filtering   A/B testing

Stage Types

Stage	Input	Output	Purpose
`data_curation`	Raw data	Clean dataset	Filter, deduplicate, quality score
`sft`	Base model + instructions	SFT checkpoint	Instruction following
`dpo`	SFT checkpoint + preferences	Aligned checkpoint	Preference alignment
`ppo`	SFT checkpoint + reward model	Aligned checkpoint	RL-based alignment
`safety`	Aligned checkpoint + safety data	Safe checkpoint	Safety tuning
`evaluation`	Any checkpoint	Metrics report	Quality validation

Quality Gates

Gates prevent bad models from advancing through the pipeline:


gates:
  mt_bench: ">= 7.5"          # Conversational quality
  safety_pass_rate: ">= 0.95"  # Safety compliance
  mmlu: ">= 0.65"             # Knowledge retention
  toxicity_rate: "<= 0.02"     # Toxicity limit

If any gate fails, the pipeline stops and reports which metrics didn't meet thresholds.

Configuration

Parameter	Description
`name`	Pipeline identifier for tracking
`base_model`	Starting model or checkpoint path
`output_dir`	Root directory for all outputs
`stages`	Ordered dict of training stages
`stages.*.type`	Stage algorithm type
`stages.*.depends_on`	Previous stage dependency
`stages.*.config`	Algorithm-specific hyperparameters
`stages.*.gates`	Quality thresholds to pass

Best Practices

Start with data quality — invest 60% of your time in data curation, not hyperparameter tuning
Use quality gates aggressively — catch regressions at each stage, not just at the end
Version everything — pipeline configs, datasets, and checkpoints should be tracked together
Validate on smaller models first — run the full pipeline on a 1B model before committing to 70B
Monitor knowledge retention — MMLU/knowledge benchmarks should not drop significantly after alignment
Include safety at every stage — don't defer safety evaluation to the final stage

Common Issues

Pipeline stage fails to start: Check depends_on references match actual stage names. Verify the previous stage's checkpoint exists at the expected path.

Quality gate failure after alignment: Common cause is over-optimization. Increase DPO beta or reduce PPO learning rate. Ensure training data diversity — narrow datasets cause catastrophic forgetting.

Reproducibility across runs: Set random seeds in every stage config. Pin library versions. Use deterministic CUDA operations (CUBLAS_WORKSPACE_CONFIG=:4096:8).

⚠️ Loading Issue

Ultimate Post Framework

Ultimate Post-Training Framework

When to Use

Quick Start

Define Your Pipeline

Execute

Core Concepts

Pipeline Architecture

Stage Types

Quality Gates

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace