Fine-Tuning with Axolotl

Overview

A comprehensive skill for fine-tuning LLMs using Axolotl — a streamlined tool that wraps HuggingFace Transformers, PEFT, and DeepSpeed into a YAML-driven configuration system. Axolotl makes it easy to fine-tune any model with LoRA, QLoRA, full fine-tuning, or DPO/RLHF — supporting all major architectures including Llama, Mistral, Gemma, and Phi.

When to Use

Fine-tuning LLMs with minimal code
Need YAML-driven training configuration
Want built-in support for LoRA, QLoRA, and full fine-tuning
Training on custom datasets (chat, instruction, completion)
Running DPO/RLHF alignment training
Multi-GPU training without writing distributed code

Quick Start


# Install
pip install axolotl
# Or from source for latest features
git clone https://github.com/axolotl-ai-cloud/axolotl
cd axolotl && pip install -e ".[flash-attn]"

# Fine-tune with a YAML config
accelerate launch -m axolotl.cli.train config.yaml

# Inference with fine-tuned model
accelerate launch -m axolotl.cli.inference config.yaml --lora_model_dir ./output/lora

Configuration

QLoRA Fine-Tuning (Recommended Starting Point)


# qlora_config.yaml
base_model: meta-llama/Llama-3-8B-Instruct
model_type: LlamaForCausalLM

load_in_4bit: true
adapter: qlora
lora_r: 32
lora_alpha: 64
lora_dropout: 0.05
lora_target_linear: true

datasets:
  - path: ./my_data.jsonl
    type: sharegpt
    conversation: chatml

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

micro_batch_size: 2
gradient_accumulation_steps: 4
num_epochs: 3
learning_rate: 2e-4
optimizer: adamw_torch
lr_scheduler: cosine
warmup_steps: 100

bf16: auto
flash_attention: true
gradient_checkpointing: true

output_dir: ./output
logging_steps: 10
save_strategy: steps
save_steps: 500
eval_steps: 500

Full Fine-Tuning with DeepSpeed


base_model: meta-llama/Llama-3-8B-Instruct

datasets:
  - path: ./data.jsonl
    type: sharegpt

sequence_len: 4096
micro_batch_size: 1
gradient_accumulation_steps: 16
num_epochs: 1
learning_rate: 2e-5
bf16: auto
flash_attention: true
gradient_checkpointing: true

deepspeed: deepspeed_configs/zero2.json
output_dir: ./output-full

Dataset Formats


# ShareGPT format (recommended)
{"conversations": [{"from": "human", "value": "What is X?"}, {"from": "gpt", "value": "X is..."}]}

# Alpaca format
{"instruction": "Summarize this", "input": "Long text...", "output": "Summary..."}

# Completion format
{"text": "Full text for completion training"}

Key Features

Feature	Config Key	Description
QLoRA	`adapter: qlora`	4-bit quantized LoRA
LoRA	`adapter: lora`	Standard LoRA
Full Fine-Tune	(no adapter)	Full parameter training
Sample Packing	`sample_packing: true`	Pack short samples into sequences
Flash Attention	`flash_attention: true`	Memory-efficient attention
DeepSpeed	`deepspeed: config.json`	Distributed training
DPO	`rl: dpo`	Direct Preference Optimization
NEFTune	`neftune_noise_alpha: 5`	Noise embedding fine-tuning

Best Practices

Start with QLoRA — Best balance of quality, speed, and memory
Use sample packing — Dramatically reduces training time for short conversations
Set lora_target_linear: true — Applies LoRA to all linear layers (better quality)
Use lora_r: 32-64 — Lower ranks lose quality; higher ranks waste memory
Enable flash attention — Free speedup and memory savings
Set sequence_len appropriately — Match your data; don't waste memory on padding
Use cosine learning rate schedule — Better convergence than constant or linear
Evaluate during training — Set eval_steps and validation dataset to catch overfitting
Merge LoRA after training — Use axolotl merge for deployment-ready models
Version your configs — Track YAML configs alongside data versions in git

Troubleshooting

OOM during training


# Reduce batch size and enable gradient checkpointing
micro_batch_size: 1
gradient_accumulation_steps: 16
gradient_checkpointing: true
# Or switch to QLoRA
load_in_4bit: true
adapter: qlora

Loss doesn't decrease


# Check data format — ensure it matches expected type
# Increase learning rate for LoRA
learning_rate: 5e-4  # LoRA needs higher LR than full fine-tuning
# Verify data is loading correctly
debug: true  # Prints first few processed samples

Model outputs garbage after fine-tuning


# Overfitting — reduce epochs or add regularization
num_epochs: 1
lora_dropout: 0.1
# Or increase dataset size

⚠️ Loading Issue

Fine Tuning Axolotl Kit