Post Training Miles Engine
Powerful skill for provides, guidance, enterprise, grade. Includes structured workflows, validation checks, and reusable patterns for ai research.
Miles Post-Training Engine
Enterprise-grade reinforcement learning framework for large-scale model post-training, optimized for MoE models (1TB+), FP8/INT4 quantization-aware training, and bit-wise identical train-inference alignment.
When to Use
Choose Miles when:
- Training 1TB+ MoE models (DeepSeek V3, Qwen3-MoE)
- Need FP8 or INT4 quantization-aware training
- Require bit-wise identical results between training and inference
- Production RL at enterprise scale with fault tolerance
- Multi-node distributed training across 100+ GPUs
Consider alternatives when:
- Smaller models (< 70B parameters) → use TRL with GRPO
- Research experiments without production requirements → use OpenRLHF
- PyTorch-native approach without Ray → use torchforge
- Simple RLHF/DPO fine-tuning → use TRL directly
Quick Start
Installation
# Clone and install git clone https://github.com/miles-ai/miles.git cd miles pip install -e . # With distributed support pip install -e ".[distributed]"
Basic Training Run
from miles import MilesTrainer, MilesConfig config = MilesConfig( model_name="deepseek-ai/DeepSeek-V3", training_method="grpo", precision="fp8", num_nodes=4, gpus_per_node=8, batch_size=256, reward_model="deepseek-ai/DeepSeek-V3-RM", ) trainer = MilesTrainer(config) trainer.train(dataset="your-dataset")
Multi-Node Launch
# Launch across 4 nodes with 8 GPUs each miles launch \ --config configs/deepseek-v3-grpo.yaml \ --nodes 4 \ --gpus-per-node 8 \ --precision fp8
Core Concepts
MoE Training Stability
Miles addresses critical challenges in Mixture-of-Experts training:
- Expert load balancing during RL training
- Gradient scaling across experts
- Router stability under policy updates
- Memory-efficient expert parallelism
FP8 Training Pipeline
config = MilesConfig( precision="fp8", fp8_config={ "amax_history_len": 1024, "amax_compute_algo": "max", "fp8_format": "e4m3", } )
Train-Inference Alignment
Miles guarantees bit-wise identical outputs between training and inference by:
- Using identical numerical implementations across both pipelines
- Synchronizing RNG states across distributed workers
- Matching attention implementations exactly
Configuration
| Parameter | Default | Description |
|---|---|---|
training_method | "grpo" | RL algorithm (grpo, ppo, dpo, reinforce) |
precision | "bf16" | Training precision (fp8, bf16, fp32) |
num_nodes | 1 | Number of compute nodes |
gpus_per_node | 8 | GPUs per node |
batch_size | 128 | Global batch size |
max_seq_len | 4096 | Maximum sequence length |
checkpoint_interval | 1000 | Steps between checkpoints |
fault_tolerance | True | Auto-recovery from failures |
Best Practices
- Use FP8 for 1TB+ models — reduces memory by 2x vs BF16 with minimal accuracy loss
- Enable fault tolerance for long training runs — auto-recovery saves days of re-training
- Monitor expert load balance during MoE training to catch routing collapse early
- Use the built-in evaluation suite to track reward model agreement throughout training
- Start with smaller batch sizes and scale up — ensures stability before committing resources
- Validate train-inference alignment with the included verification scripts
Common Issues
Expert routing collapse: Increase the load balance loss coefficient. Monitor per-expert utilization and restart with adjusted auxiliary loss if any expert drops below 5% utilization.
FP8 training divergence: Check amax history length — too short causes scale oscillation. Switch to e4m3 format if e5m2 shows instability. Fall back to BF16 for the first 1000 steps before switching to FP8.
Out of memory on large MoE models:
Use expert parallelism across GPUs. Reduce max_seq_len or enable gradient checkpointing. Consider pipeline parallelism for models exceeding single-node memory.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.