Fine-Tuning with LLaMA Factory

Overview

A comprehensive skill for fine-tuning LLMs using LLaMA Factory — an all-in-one framework with a web UI, CLI, and API for fine-tuning 100+ LLM architectures. Supports SFT, RLHF, DPO, PPO, KTO, and more — with LoRA, QLoRA, full fine-tuning, and GaLore across single-GPU to multi-node setups.

When to Use

Fine-tuning with a visual web interface (no code needed)
Need support for 100+ model architectures
Want one tool for SFT, DPO, RLHF, and evaluation
Quick experimentation with different training methods
Need built-in dataset management and visualization
Training on consumer hardware (4-bit QLoRA)

Quick Start


# Install
git clone https://github.com/hiyouga/LLaMA-Factory
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

# Launch web UI
llamafactory-cli webui

# Or train via CLI
llamafactory-cli train config.yaml

# Or use the API
llamafactory-cli api config.yaml

Web UI Workflow

1. Open web UI: llamafactory-cli webui
2. Select model (Llama, Mistral, Gemma, Qwen, etc.)
3. Choose training method (SFT, DPO, PPO, etc.)
4. Select adapter (LoRA, QLoRA, Full)
5. Upload or select dataset
6. Configure hyperparameters
7. Click "Start Training"
8. Monitor loss curves in real-time
9. Export merged model

CLI Configuration

Supervised Fine-Tuning (SFT)


# sft_config.yaml
model_name_or_path: meta-llama/Llama-3-8B-Instruct
stage: sft
finetuning_type: lora

dataset: my_dataset
template: llama3
cutoff_len: 4096

lora_rank: 16
lora_alpha: 32
lora_target: all

per_device_train_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 5e-5
num_train_epochs: 3
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true

output_dir: ./saves/llama3-8b-sft
logging_steps: 10
save_steps: 500

DPO Alignment


stage: dpo
finetuning_type: lora
model_name_or_path: ./saves/llama3-8b-sft  # Start from SFT checkpoint

dataset: dpo_preference_data
template: llama3

dpo_beta: 0.1
dpo_loss: sigmoid

per_device_train_batch_size: 2
learning_rate: 5e-6
num_train_epochs: 1

Custom Dataset Registration


// data/dataset_info.json — register your dataset
{
  "my_dataset": {
    "file_name": "my_data.json",
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "response": "output"
    }
  }
}

Supported Training Methods

Method	Stage	Description
SFT	`sft`	Supervised fine-tuning on instruction data
RM	`rm`	Reward model training
PPO	`ppo`	Proximal Policy Optimization (RLHF)
DPO	`dpo`	Direct Preference Optimization
KTO	`kto`	Kahneman-Tversky Optimization
ORPO	`orpo`	Odds Ratio Preference Optimization
SimPO	`simpo`	Simple Preference Optimization
Pre-Train	`pt`	Continued pre-training

Best Practices

Start with the web UI — Great for experimentation before committing to CLI configs
Use LoRA with rank 16-32 — Best quality/memory tradeoff
SFT first, then DPO — Alignment works best on an already instruction-tuned model
Use the built-in templates — Don't write chat templates manually; select from 100+
Monitor loss curves — Web UI shows real-time training progress
Export merged models — Use llamafactory-cli export for deployment
Version your dataset_info.json — Track dataset configurations in git
Use QLoRA for large models — 4-bit training on consumer GPUs
Evaluate before deploying — Use built-in evaluation or lm-eval integration
Start with small epochs — 1-3 epochs usually sufficient; more risks overfitting

Troubleshooting

Web UI not starting


# Check port availability
lsof -i :7860
# Try different port
GRADIO_SERVER_PORT=8080 llamafactory-cli webui

Template mismatch


# List available templates
llamafactory-cli template list
# Common templates: llama3, chatglm, qwen, mistral

Training too slow


# Enable flash attention and gradient checkpointing
flash_attn: fa2
gradient_checkpointing: true
# Use packing
packing: true

⚠️ Loading Issue

Advanced Fine Tuning Llama