Master Post-Training Suite

Comprehensive post-training workflow orchestrator that combines SFT, preference alignment, reward modeling, and safety tuning into a unified pipeline for producing production-ready language models.

When to Use

Use this suite when:

Building a complete post-training pipeline from base model to deployment
Need to coordinate SFT → alignment → safety → evaluation stages
Managing multiple training runs with different configurations
Producing models that meet production quality, safety, and performance bars

Use individual tools instead when:

Only need one training stage (e.g., just SFT or just DPO)
Experimenting with a single algorithm
Quick fine-tuning without full pipeline requirements

Quick Start

Pipeline Definition


# post-training-pipeline.yaml
pipeline:
  name: "llama-3-alignment"
  base_model: "meta-llama/Llama-3.1-8B"

stages:
  - name: sft
    type: supervised_fine_tuning
    config:
      dataset: "your-org/instruction-dataset"
      learning_rate: 2e-5
      num_epochs: 3
      max_seq_length: 4096

  - name: dpo
    type: direct_preference_optimization
    config:
      dataset: "your-org/preference-dataset"
      beta: 0.1
      learning_rate: 5e-7
      num_epochs: 1

  - name: safety
    type: safety_tuning
    config:
      dataset: "your-org/safety-dataset"
      method: "constitutional_ai"
      principles: ["helpful", "harmless", "honest"]

  - name: eval
    type: evaluation
    benchmarks:
      - mt_bench
      - alpaca_eval
      - safety_bench
    thresholds:
      mt_bench_score: 7.5
      safety_pass_rate: 0.95

Run the Pipeline


# Execute full pipeline
post-train run --config post-training-pipeline.yaml

# Resume from a specific stage
post-train run --config pipeline.yaml --start-from dpo

# Run evaluation only
post-train eval --config pipeline.yaml --checkpoint ./checkpoints/dpo-final

Core Concepts

Pipeline Stages

Base Model
    |
+-------------------+
|  Stage 1: SFT     |  Instruction following
+--------+----------+
         |
+-------------------+
|  Stage 2: DPO     |  Preference alignment
+--------+----------+
         |
+-------------------+
|  Stage 3: Safety  |  Safety tuning
+--------+----------+
         |
+-------------------+
|  Stage 4: Eval    |  Quality gates
+--------+----------+
         |
  Production Model

Stage Dependencies

Each stage takes the output of the previous stage as input. Quality gates at evaluation stages ensure only models meeting thresholds proceed.

Data Requirements

Stage	Data Format	Minimum Size	Quality Priority
SFT	Instruction-response pairs	10K+	Very high
DPO	Chosen/rejected pairs	5K+	High
Safety	Safety scenarios + responses	2K+	Critical
Eval	Benchmark test sets	Standard	N/A

Configuration

Parameter	Description
`pipeline.name`	Unique pipeline identifier
`pipeline.base_model`	Starting model checkpoint
`stages[].type`	Training stage type
`stages[].config`	Stage-specific hyperparameters
`stages[].depends_on`	Previous stage dependency
`eval.thresholds`	Quality gate criteria
`eval.benchmarks`	Evaluation benchmark list

Best Practices

Invest in SFT data quality — the foundation determines everything downstream
Set quality gates at each evaluation stage to catch regressions early
Version your pipeline configs alongside model checkpoints for reproducibility
Run safety evaluation at every stage — not just at the end
Use smaller models for pipeline validation before training large models
Keep detailed logs of training metrics, data versions, and hyperparameters

Common Issues

Stage 2 degrades Stage 1 quality: Increase DPO beta to constrain divergence. Ensure preference data doesn't contradict SFT training data. Add SFT-quality samples to the DPO dataset as positive examples.

Safety tuning reduces helpfulness: Balance safety data with helpful examples. Use a helpfulness reward during safety tuning. Evaluate helpfulness metrics alongside safety at every checkpoint.

Pipeline takes too long: Run SFT with LoRA for faster iteration. Use smaller validation sets during development. Only run full evaluation at the final stage.

⚠️ Loading Issue

Master Post Suite

Master Post-Training Suite

When to Use

Quick Start

Pipeline Definition

Run the Pipeline

Core Concepts

Pipeline Stages

Stage Dependencies

Data Requirements

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace