Master Post Suite
Comprehensive skill designed for fine, tune, llms, using. Includes structured workflows, validation checks, and reusable patterns for ai research.
Master Post-Training Suite
Comprehensive post-training workflow orchestrator that combines SFT, preference alignment, reward modeling, and safety tuning into a unified pipeline for producing production-ready language models.
When to Use
Use this suite when:
- Building a complete post-training pipeline from base model to deployment
- Need to coordinate SFT → alignment → safety → evaluation stages
- Managing multiple training runs with different configurations
- Producing models that meet production quality, safety, and performance bars
Use individual tools instead when:
- Only need one training stage (e.g., just SFT or just DPO)
- Experimenting with a single algorithm
- Quick fine-tuning without full pipeline requirements
Quick Start
Pipeline Definition
# post-training-pipeline.yaml pipeline: name: "llama-3-alignment" base_model: "meta-llama/Llama-3.1-8B" stages: - name: sft type: supervised_fine_tuning config: dataset: "your-org/instruction-dataset" learning_rate: 2e-5 num_epochs: 3 max_seq_length: 4096 - name: dpo type: direct_preference_optimization config: dataset: "your-org/preference-dataset" beta: 0.1 learning_rate: 5e-7 num_epochs: 1 - name: safety type: safety_tuning config: dataset: "your-org/safety-dataset" method: "constitutional_ai" principles: ["helpful", "harmless", "honest"] - name: eval type: evaluation benchmarks: - mt_bench - alpaca_eval - safety_bench thresholds: mt_bench_score: 7.5 safety_pass_rate: 0.95
Run the Pipeline
# Execute full pipeline post-train run --config post-training-pipeline.yaml # Resume from a specific stage post-train run --config pipeline.yaml --start-from dpo # Run evaluation only post-train eval --config pipeline.yaml --checkpoint ./checkpoints/dpo-final
Core Concepts
Pipeline Stages
Base Model
|
+-------------------+
| Stage 1: SFT | Instruction following
+--------+----------+
|
+-------------------+
| Stage 2: DPO | Preference alignment
+--------+----------+
|
+-------------------+
| Stage 3: Safety | Safety tuning
+--------+----------+
|
+-------------------+
| Stage 4: Eval | Quality gates
+--------+----------+
|
Production Model
Stage Dependencies
Each stage takes the output of the previous stage as input. Quality gates at evaluation stages ensure only models meeting thresholds proceed.
Data Requirements
| Stage | Data Format | Minimum Size | Quality Priority |
|---|---|---|---|
| SFT | Instruction-response pairs | 10K+ | Very high |
| DPO | Chosen/rejected pairs | 5K+ | High |
| Safety | Safety scenarios + responses | 2K+ | Critical |
| Eval | Benchmark test sets | Standard | N/A |
Configuration
| Parameter | Description |
|---|---|
pipeline.name | Unique pipeline identifier |
pipeline.base_model | Starting model checkpoint |
stages[].type | Training stage type |
stages[].config | Stage-specific hyperparameters |
stages[].depends_on | Previous stage dependency |
eval.thresholds | Quality gate criteria |
eval.benchmarks | Evaluation benchmark list |
Best Practices
- Invest in SFT data quality — the foundation determines everything downstream
- Set quality gates at each evaluation stage to catch regressions early
- Version your pipeline configs alongside model checkpoints for reproducibility
- Run safety evaluation at every stage — not just at the end
- Use smaller models for pipeline validation before training large models
- Keep detailed logs of training metrics, data versions, and hyperparameters
Common Issues
Stage 2 degrades Stage 1 quality:
Increase DPO beta to constrain divergence. Ensure preference data doesn't contradict SFT training data. Add SFT-quality samples to the DPO dataset as positive examples.
Safety tuning reduces helpfulness: Balance safety data with helpful examples. Use a helpfulness reward during safety tuning. Evaluate helpfulness metrics alongside safety at every checkpoint.
Pipeline takes too long: Run SFT with LoRA for faster iteration. Use smaller validation sets during development. Only run full evaluation at the final stage.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.