Ultimate Post Framework
Enterprise-grade skill for provides, guidance, training, llms. Includes structured workflows, validation checks, and reusable patterns for ai research.
Ultimate Post-Training Framework
End-to-end post-training framework covering the full lifecycle from data curation through SFT, alignment (DPO/PPO/GRPO), safety tuning, evaluation, and deployment — with automated pipeline management and quality gates.
When to Use
Use this framework when:
- Need a production-grade, end-to-end post-training system
- Managing multiple training stages with quality checkpoints
- Building reproducible training pipelines across model families
- Teams collaborating on alignment research with shared infrastructure
Use simpler tools when:
- Single-stage training (just SFT or just DPO) → use TRL
- Quick experiments without pipeline overhead → use TRL or torchforge
- Enterprise MoE-specific training → use Miles
Quick Start
Define Your Pipeline
# pipeline.yaml name: "llama3-production-alignment" base_model: "meta-llama/Llama-3.1-8B" output_dir: "./training-outputs" stages: sft: type: supervised_fine_tuning dataset: "your-org/instruction-data" config: learning_rate: 2e-5 epochs: 3 max_seq_len: 4096 lora_r: 16 alignment: type: dpo dataset: "your-org/preference-data" depends_on: sft config: beta: 0.1 learning_rate: 5e-7 epochs: 1 safety: type: constitutional_ai dataset: "your-org/safety-data" depends_on: alignment config: principles: ["helpful", "harmless", "honest"] evaluation: type: benchmark depends_on: safety benchmarks: [mt_bench, alpaca_eval, safety_bench, mmlu] gates: mt_bench: ">= 7.5" safety_pass_rate: ">= 0.95" mmlu: ">= 0.65"
Execute
# Run full pipeline post-train run pipeline.yaml # Resume from alignment stage post-train run pipeline.yaml --resume alignment # Dry run (validate config without training) post-train validate pipeline.yaml
Core Concepts
Pipeline Architecture
Data Curation → SFT → Alignment → Safety → Evaluation → Deployment
| | | | |
v v v v v
Filters LoRA/Full DPO/PPO ConstitAI Benchmarks
Dedup Checkpts GRPO Red-team Quality gates
Quality REINFORCE Filtering A/B testing
Stage Types
| Stage | Input | Output | Purpose |
|---|---|---|---|
data_curation | Raw data | Clean dataset | Filter, deduplicate, quality score |
sft | Base model + instructions | SFT checkpoint | Instruction following |
dpo | SFT checkpoint + preferences | Aligned checkpoint | Preference alignment |
ppo | SFT checkpoint + reward model | Aligned checkpoint | RL-based alignment |
safety | Aligned checkpoint + safety data | Safe checkpoint | Safety tuning |
evaluation | Any checkpoint | Metrics report | Quality validation |
Quality Gates
Gates prevent bad models from advancing through the pipeline:
gates: mt_bench: ">= 7.5" # Conversational quality safety_pass_rate: ">= 0.95" # Safety compliance mmlu: ">= 0.65" # Knowledge retention toxicity_rate: "<= 0.02" # Toxicity limit
If any gate fails, the pipeline stops and reports which metrics didn't meet thresholds.
Configuration
| Parameter | Description |
|---|---|
name | Pipeline identifier for tracking |
base_model | Starting model or checkpoint path |
output_dir | Root directory for all outputs |
stages | Ordered dict of training stages |
stages.*.type | Stage algorithm type |
stages.*.depends_on | Previous stage dependency |
stages.*.config | Algorithm-specific hyperparameters |
stages.*.gates | Quality thresholds to pass |
Best Practices
- Start with data quality — invest 60% of your time in data curation, not hyperparameter tuning
- Use quality gates aggressively — catch regressions at each stage, not just at the end
- Version everything — pipeline configs, datasets, and checkpoints should be tracked together
- Validate on smaller models first — run the full pipeline on a 1B model before committing to 70B
- Monitor knowledge retention — MMLU/knowledge benchmarks should not drop significantly after alignment
- Include safety at every stage — don't defer safety evaluation to the final stage
Common Issues
Pipeline stage fails to start:
Check depends_on references match actual stage names. Verify the previous stage's checkpoint exists at the expected path.
Quality gate failure after alignment:
Common cause is over-optimization. Increase DPO beta or reduce PPO learning rate. Ensure training data diversity — narrow datasets cause catastrophic forgetting.
Reproducibility across runs:
Set random seeds in every stage config. Pin library versions. Use deterministic CUDA operations (CUBLAS_WORKSPACE_CONFIG=:4096:8).
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.