U

Ultimate Post Framework

Enterprise-grade skill for provides, guidance, training, llms. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

Ultimate Post-Training Framework

End-to-end post-training framework covering the full lifecycle from data curation through SFT, alignment (DPO/PPO/GRPO), safety tuning, evaluation, and deployment — with automated pipeline management and quality gates.

When to Use

Use this framework when:

  • Need a production-grade, end-to-end post-training system
  • Managing multiple training stages with quality checkpoints
  • Building reproducible training pipelines across model families
  • Teams collaborating on alignment research with shared infrastructure

Use simpler tools when:

  • Single-stage training (just SFT or just DPO) → use TRL
  • Quick experiments without pipeline overhead → use TRL or torchforge
  • Enterprise MoE-specific training → use Miles

Quick Start

Define Your Pipeline

# pipeline.yaml name: "llama3-production-alignment" base_model: "meta-llama/Llama-3.1-8B" output_dir: "./training-outputs" stages: sft: type: supervised_fine_tuning dataset: "your-org/instruction-data" config: learning_rate: 2e-5 epochs: 3 max_seq_len: 4096 lora_r: 16 alignment: type: dpo dataset: "your-org/preference-data" depends_on: sft config: beta: 0.1 learning_rate: 5e-7 epochs: 1 safety: type: constitutional_ai dataset: "your-org/safety-data" depends_on: alignment config: principles: ["helpful", "harmless", "honest"] evaluation: type: benchmark depends_on: safety benchmarks: [mt_bench, alpaca_eval, safety_bench, mmlu] gates: mt_bench: ">= 7.5" safety_pass_rate: ">= 0.95" mmlu: ">= 0.65"

Execute

# Run full pipeline post-train run pipeline.yaml # Resume from alignment stage post-train run pipeline.yaml --resume alignment # Dry run (validate config without training) post-train validate pipeline.yaml

Core Concepts

Pipeline Architecture

Data Curation → SFT → Alignment → Safety → Evaluation → Deployment
     |             |         |          |          |
     v             v         v          v          v
  Filters      LoRA/Full   DPO/PPO   ConstitAI   Benchmarks
  Dedup        Checkpts    GRPO      Red-team    Quality gates
  Quality                  REINFORCE  Filtering   A/B testing

Stage Types

StageInputOutputPurpose
data_curationRaw dataClean datasetFilter, deduplicate, quality score
sftBase model + instructionsSFT checkpointInstruction following
dpoSFT checkpoint + preferencesAligned checkpointPreference alignment
ppoSFT checkpoint + reward modelAligned checkpointRL-based alignment
safetyAligned checkpoint + safety dataSafe checkpointSafety tuning
evaluationAny checkpointMetrics reportQuality validation

Quality Gates

Gates prevent bad models from advancing through the pipeline:

gates: mt_bench: ">= 7.5" # Conversational quality safety_pass_rate: ">= 0.95" # Safety compliance mmlu: ">= 0.65" # Knowledge retention toxicity_rate: "<= 0.02" # Toxicity limit

If any gate fails, the pipeline stops and reports which metrics didn't meet thresholds.

Configuration

ParameterDescription
namePipeline identifier for tracking
base_modelStarting model or checkpoint path
output_dirRoot directory for all outputs
stagesOrdered dict of training stages
stages.*.typeStage algorithm type
stages.*.depends_onPrevious stage dependency
stages.*.configAlgorithm-specific hyperparameters
stages.*.gatesQuality thresholds to pass

Best Practices

  1. Start with data quality — invest 60% of your time in data curation, not hyperparameter tuning
  2. Use quality gates aggressively — catch regressions at each stage, not just at the end
  3. Version everything — pipeline configs, datasets, and checkpoints should be tracked together
  4. Validate on smaller models first — run the full pipeline on a 1B model before committing to 70B
  5. Monitor knowledge retention — MMLU/knowledge benchmarks should not drop significantly after alignment
  6. Include safety at every stage — don't defer safety evaluation to the final stage

Common Issues

Pipeline stage fails to start: Check depends_on references match actual stage names. Verify the previous stage's checkpoint exists at the expected path.

Quality gate failure after alignment: Common cause is over-optimization. Increase DPO beta or reduce PPO learning rate. Ensure training data diversity — narrow datasets cause catastrophic forgetting.

Reproducibility across runs: Set random seeds in every stage config. Pin library versions. Use deterministic CUDA operations (CUBLAS_WORKSPACE_CONFIG=:4096:8).

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates