Advanced Fine Tuning Llama
Boost productivity using this expert, guidance, fine, tuning. Includes structured workflows, validation checks, and reusable patterns for ai research.
Fine-Tuning with LLaMA Factory
Overview
A comprehensive skill for fine-tuning LLMs using LLaMA Factory — an all-in-one framework with a web UI, CLI, and API for fine-tuning 100+ LLM architectures. Supports SFT, RLHF, DPO, PPO, KTO, and more — with LoRA, QLoRA, full fine-tuning, and GaLore across single-GPU to multi-node setups.
When to Use
- Fine-tuning with a visual web interface (no code needed)
- Need support for 100+ model architectures
- Want one tool for SFT, DPO, RLHF, and evaluation
- Quick experimentation with different training methods
- Need built-in dataset management and visualization
- Training on consumer hardware (4-bit QLoRA)
Quick Start
# Install git clone https://github.com/hiyouga/LLaMA-Factory cd LLaMA-Factory pip install -e ".[torch,metrics]" # Launch web UI llamafactory-cli webui # Or train via CLI llamafactory-cli train config.yaml # Or use the API llamafactory-cli api config.yaml
Web UI Workflow
1. Open web UI: llamafactory-cli webui
2. Select model (Llama, Mistral, Gemma, Qwen, etc.)
3. Choose training method (SFT, DPO, PPO, etc.)
4. Select adapter (LoRA, QLoRA, Full)
5. Upload or select dataset
6. Configure hyperparameters
7. Click "Start Training"
8. Monitor loss curves in real-time
9. Export merged model
CLI Configuration
Supervised Fine-Tuning (SFT)
# sft_config.yaml model_name_or_path: meta-llama/Llama-3-8B-Instruct stage: sft finetuning_type: lora dataset: my_dataset template: llama3 cutoff_len: 4096 lora_rank: 16 lora_alpha: 32 lora_target: all per_device_train_batch_size: 2 gradient_accumulation_steps: 8 learning_rate: 5e-5 num_train_epochs: 3 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true output_dir: ./saves/llama3-8b-sft logging_steps: 10 save_steps: 500
DPO Alignment
stage: dpo finetuning_type: lora model_name_or_path: ./saves/llama3-8b-sft # Start from SFT checkpoint dataset: dpo_preference_data template: llama3 dpo_beta: 0.1 dpo_loss: sigmoid per_device_train_batch_size: 2 learning_rate: 5e-6 num_train_epochs: 1
Custom Dataset Registration
// data/dataset_info.json — register your dataset { "my_dataset": { "file_name": "my_data.json", "columns": { "prompt": "instruction", "query": "input", "response": "output" } } }
Supported Training Methods
| Method | Stage | Description |
|---|---|---|
| SFT | sft | Supervised fine-tuning on instruction data |
| RM | rm | Reward model training |
| PPO | ppo | Proximal Policy Optimization (RLHF) |
| DPO | dpo | Direct Preference Optimization |
| KTO | kto | Kahneman-Tversky Optimization |
| ORPO | orpo | Odds Ratio Preference Optimization |
| SimPO | simpo | Simple Preference Optimization |
| Pre-Train | pt | Continued pre-training |
Best Practices
- Start with the web UI — Great for experimentation before committing to CLI configs
- Use LoRA with rank 16-32 — Best quality/memory tradeoff
- SFT first, then DPO — Alignment works best on an already instruction-tuned model
- Use the built-in templates — Don't write chat templates manually; select from 100+
- Monitor loss curves — Web UI shows real-time training progress
- Export merged models — Use
llamafactory-cli exportfor deployment - Version your dataset_info.json — Track dataset configurations in git
- Use QLoRA for large models — 4-bit training on consumer GPUs
- Evaluate before deploying — Use built-in evaluation or lm-eval integration
- Start with small epochs — 1-3 epochs usually sufficient; more risks overfitting
Troubleshooting
Web UI not starting
# Check port availability lsof -i :7860 # Try different port GRADIO_SERVER_PORT=8080 llamafactory-cli webui
Template mismatch
# List available templates llamafactory-cli template list # Common templates: llama3, chatglm, qwen, mistral
Training too slow
# Enable flash attention and gradient checkpointing flash_attn: fa2 gradient_checkpointing: true # Use packing packing: true
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.