A

Advanced Fine Tuning Llama

Boost productivity using this expert, guidance, fine, tuning. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

Fine-Tuning with LLaMA Factory

Overview

A comprehensive skill for fine-tuning LLMs using LLaMA Factory — an all-in-one framework with a web UI, CLI, and API for fine-tuning 100+ LLM architectures. Supports SFT, RLHF, DPO, PPO, KTO, and more — with LoRA, QLoRA, full fine-tuning, and GaLore across single-GPU to multi-node setups.

When to Use

  • Fine-tuning with a visual web interface (no code needed)
  • Need support for 100+ model architectures
  • Want one tool for SFT, DPO, RLHF, and evaluation
  • Quick experimentation with different training methods
  • Need built-in dataset management and visualization
  • Training on consumer hardware (4-bit QLoRA)

Quick Start

# Install git clone https://github.com/hiyouga/LLaMA-Factory cd LLaMA-Factory pip install -e ".[torch,metrics]" # Launch web UI llamafactory-cli webui # Or train via CLI llamafactory-cli train config.yaml # Or use the API llamafactory-cli api config.yaml

Web UI Workflow

1. Open web UI: llamafactory-cli webui
2. Select model (Llama, Mistral, Gemma, Qwen, etc.)
3. Choose training method (SFT, DPO, PPO, etc.)
4. Select adapter (LoRA, QLoRA, Full)
5. Upload or select dataset
6. Configure hyperparameters
7. Click "Start Training"
8. Monitor loss curves in real-time
9. Export merged model

CLI Configuration

Supervised Fine-Tuning (SFT)

# sft_config.yaml model_name_or_path: meta-llama/Llama-3-8B-Instruct stage: sft finetuning_type: lora dataset: my_dataset template: llama3 cutoff_len: 4096 lora_rank: 16 lora_alpha: 32 lora_target: all per_device_train_batch_size: 2 gradient_accumulation_steps: 8 learning_rate: 5e-5 num_train_epochs: 3 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true output_dir: ./saves/llama3-8b-sft logging_steps: 10 save_steps: 500

DPO Alignment

stage: dpo finetuning_type: lora model_name_or_path: ./saves/llama3-8b-sft # Start from SFT checkpoint dataset: dpo_preference_data template: llama3 dpo_beta: 0.1 dpo_loss: sigmoid per_device_train_batch_size: 2 learning_rate: 5e-6 num_train_epochs: 1

Custom Dataset Registration

// data/dataset_info.json — register your dataset { "my_dataset": { "file_name": "my_data.json", "columns": { "prompt": "instruction", "query": "input", "response": "output" } } }

Supported Training Methods

MethodStageDescription
SFTsftSupervised fine-tuning on instruction data
RMrmReward model training
PPOppoProximal Policy Optimization (RLHF)
DPOdpoDirect Preference Optimization
KTOktoKahneman-Tversky Optimization
ORPOorpoOdds Ratio Preference Optimization
SimPOsimpoSimple Preference Optimization
Pre-TrainptContinued pre-training

Best Practices

  1. Start with the web UI — Great for experimentation before committing to CLI configs
  2. Use LoRA with rank 16-32 — Best quality/memory tradeoff
  3. SFT first, then DPO — Alignment works best on an already instruction-tuned model
  4. Use the built-in templates — Don't write chat templates manually; select from 100+
  5. Monitor loss curves — Web UI shows real-time training progress
  6. Export merged models — Use llamafactory-cli export for deployment
  7. Version your dataset_info.json — Track dataset configurations in git
  8. Use QLoRA for large models — 4-bit training on consumer GPUs
  9. Evaluate before deploying — Use built-in evaluation or lm-eval integration
  10. Start with small epochs — 1-3 epochs usually sufficient; more risks overfitting

Troubleshooting

Web UI not starting

# Check port availability lsof -i :7860 # Try different port GRADIO_SERVER_PORT=8080 llamafactory-cli webui

Template mismatch

# List available templates llamafactory-cli template list # Common templates: llama3, chatglm, qwen, mistral

Training too slow

# Enable flash attention and gradient checkpointing flash_attn: fa2 gradient_checkpointing: true # Use packing packing: true
Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates