M

Master Post Suite

Comprehensive skill designed for fine, tune, llms, using. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

Master Post-Training Suite

Comprehensive post-training workflow orchestrator that combines SFT, preference alignment, reward modeling, and safety tuning into a unified pipeline for producing production-ready language models.

When to Use

Use this suite when:

  • Building a complete post-training pipeline from base model to deployment
  • Need to coordinate SFT → alignment → safety → evaluation stages
  • Managing multiple training runs with different configurations
  • Producing models that meet production quality, safety, and performance bars

Use individual tools instead when:

  • Only need one training stage (e.g., just SFT or just DPO)
  • Experimenting with a single algorithm
  • Quick fine-tuning without full pipeline requirements

Quick Start

Pipeline Definition

# post-training-pipeline.yaml pipeline: name: "llama-3-alignment" base_model: "meta-llama/Llama-3.1-8B" stages: - name: sft type: supervised_fine_tuning config: dataset: "your-org/instruction-dataset" learning_rate: 2e-5 num_epochs: 3 max_seq_length: 4096 - name: dpo type: direct_preference_optimization config: dataset: "your-org/preference-dataset" beta: 0.1 learning_rate: 5e-7 num_epochs: 1 - name: safety type: safety_tuning config: dataset: "your-org/safety-dataset" method: "constitutional_ai" principles: ["helpful", "harmless", "honest"] - name: eval type: evaluation benchmarks: - mt_bench - alpaca_eval - safety_bench thresholds: mt_bench_score: 7.5 safety_pass_rate: 0.95

Run the Pipeline

# Execute full pipeline post-train run --config post-training-pipeline.yaml # Resume from a specific stage post-train run --config pipeline.yaml --start-from dpo # Run evaluation only post-train eval --config pipeline.yaml --checkpoint ./checkpoints/dpo-final

Core Concepts

Pipeline Stages

Base Model
    |
+-------------------+
|  Stage 1: SFT     |  Instruction following
+--------+----------+
         |
+-------------------+
|  Stage 2: DPO     |  Preference alignment
+--------+----------+
         |
+-------------------+
|  Stage 3: Safety  |  Safety tuning
+--------+----------+
         |
+-------------------+
|  Stage 4: Eval    |  Quality gates
+--------+----------+
         |
  Production Model

Stage Dependencies

Each stage takes the output of the previous stage as input. Quality gates at evaluation stages ensure only models meeting thresholds proceed.

Data Requirements

StageData FormatMinimum SizeQuality Priority
SFTInstruction-response pairs10K+Very high
DPOChosen/rejected pairs5K+High
SafetySafety scenarios + responses2K+Critical
EvalBenchmark test setsStandardN/A

Configuration

ParameterDescription
pipeline.nameUnique pipeline identifier
pipeline.base_modelStarting model checkpoint
stages[].typeTraining stage type
stages[].configStage-specific hyperparameters
stages[].depends_onPrevious stage dependency
eval.thresholdsQuality gate criteria
eval.benchmarksEvaluation benchmark list

Best Practices

  1. Invest in SFT data quality — the foundation determines everything downstream
  2. Set quality gates at each evaluation stage to catch regressions early
  3. Version your pipeline configs alongside model checkpoints for reproducibility
  4. Run safety evaluation at every stage — not just at the end
  5. Use smaller models for pipeline validation before training large models
  6. Keep detailed logs of training metrics, data versions, and hyperparameters

Common Issues

Stage 2 degrades Stage 1 quality: Increase DPO beta to constrain divergence. Ensure preference data doesn't contradict SFT training data. Add SFT-quality samples to the DPO dataset as positive examples.

Safety tuning reduces helpfulness: Balance safety data with helpful examples. Use a helpfulness reward during safety tuning. Evaluate helpfulness metrics alongside safety at every checkpoint.

Pipeline takes too long: Run SFT with LoRA for faster iteration. Use smaller validation sets during development. Only run full evaluation at the final stage.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates