Fine Tuning Axolotl Kit
Powerful skill for expert, guidance, fine, tuning. Includes structured workflows, validation checks, and reusable patterns for ai research.
Fine-Tuning with Axolotl
Overview
A comprehensive skill for fine-tuning LLMs using Axolotl — a streamlined tool that wraps HuggingFace Transformers, PEFT, and DeepSpeed into a YAML-driven configuration system. Axolotl makes it easy to fine-tune any model with LoRA, QLoRA, full fine-tuning, or DPO/RLHF — supporting all major architectures including Llama, Mistral, Gemma, and Phi.
When to Use
- Fine-tuning LLMs with minimal code
- Need YAML-driven training configuration
- Want built-in support for LoRA, QLoRA, and full fine-tuning
- Training on custom datasets (chat, instruction, completion)
- Running DPO/RLHF alignment training
- Multi-GPU training without writing distributed code
Quick Start
# Install pip install axolotl # Or from source for latest features git clone https://github.com/axolotl-ai-cloud/axolotl cd axolotl && pip install -e ".[flash-attn]" # Fine-tune with a YAML config accelerate launch -m axolotl.cli.train config.yaml # Inference with fine-tuned model accelerate launch -m axolotl.cli.inference config.yaml --lora_model_dir ./output/lora
Configuration
QLoRA Fine-Tuning (Recommended Starting Point)
# qlora_config.yaml base_model: meta-llama/Llama-3-8B-Instruct model_type: LlamaForCausalLM load_in_4bit: true adapter: qlora lora_r: 32 lora_alpha: 64 lora_dropout: 0.05 lora_target_linear: true datasets: - path: ./my_data.jsonl type: sharegpt conversation: chatml sequence_len: 4096 sample_packing: true pad_to_sequence_len: true micro_batch_size: 2 gradient_accumulation_steps: 4 num_epochs: 3 learning_rate: 2e-4 optimizer: adamw_torch lr_scheduler: cosine warmup_steps: 100 bf16: auto flash_attention: true gradient_checkpointing: true output_dir: ./output logging_steps: 10 save_strategy: steps save_steps: 500 eval_steps: 500
Full Fine-Tuning with DeepSpeed
base_model: meta-llama/Llama-3-8B-Instruct datasets: - path: ./data.jsonl type: sharegpt sequence_len: 4096 micro_batch_size: 1 gradient_accumulation_steps: 16 num_epochs: 1 learning_rate: 2e-5 bf16: auto flash_attention: true gradient_checkpointing: true deepspeed: deepspeed_configs/zero2.json output_dir: ./output-full
Dataset Formats
# ShareGPT format (recommended) {"conversations": [{"from": "human", "value": "What is X?"}, {"from": "gpt", "value": "X is..."}]} # Alpaca format {"instruction": "Summarize this", "input": "Long text...", "output": "Summary..."} # Completion format {"text": "Full text for completion training"}
Key Features
| Feature | Config Key | Description |
|---|---|---|
| QLoRA | adapter: qlora | 4-bit quantized LoRA |
| LoRA | adapter: lora | Standard LoRA |
| Full Fine-Tune | (no adapter) | Full parameter training |
| Sample Packing | sample_packing: true | Pack short samples into sequences |
| Flash Attention | flash_attention: true | Memory-efficient attention |
| DeepSpeed | deepspeed: config.json | Distributed training |
| DPO | rl: dpo | Direct Preference Optimization |
| NEFTune | neftune_noise_alpha: 5 | Noise embedding fine-tuning |
Best Practices
- Start with QLoRA — Best balance of quality, speed, and memory
- Use sample packing — Dramatically reduces training time for short conversations
- Set
lora_target_linear: true— Applies LoRA to all linear layers (better quality) - Use
lora_r: 32-64— Lower ranks lose quality; higher ranks waste memory - Enable flash attention — Free speedup and memory savings
- Set sequence_len appropriately — Match your data; don't waste memory on padding
- Use cosine learning rate schedule — Better convergence than constant or linear
- Evaluate during training — Set eval_steps and validation dataset to catch overfitting
- Merge LoRA after training — Use
axolotl mergefor deployment-ready models - Version your configs — Track YAML configs alongside data versions in git
Troubleshooting
OOM during training
# Reduce batch size and enable gradient checkpointing micro_batch_size: 1 gradient_accumulation_steps: 16 gradient_checkpointing: true # Or switch to QLoRA load_in_4bit: true adapter: qlora
Loss doesn't decrease
# Check data format — ensure it matches expected type # Increase learning rate for LoRA learning_rate: 5e-4 # LoRA needs higher LR than full fine-tuning # Verify data is loading correctly debug: true # Prints first few processed samples
Model outputs garbage after fine-tuning
# Overfitting — reduce epochs or add regularization num_epochs: 1 lora_dropout: 0.1 # Or increase dataset size
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.