F

Fine Tuning Axolotl Kit

Powerful skill for expert, guidance, fine, tuning. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

Fine-Tuning with Axolotl

Overview

A comprehensive skill for fine-tuning LLMs using Axolotl — a streamlined tool that wraps HuggingFace Transformers, PEFT, and DeepSpeed into a YAML-driven configuration system. Axolotl makes it easy to fine-tune any model with LoRA, QLoRA, full fine-tuning, or DPO/RLHF — supporting all major architectures including Llama, Mistral, Gemma, and Phi.

When to Use

  • Fine-tuning LLMs with minimal code
  • Need YAML-driven training configuration
  • Want built-in support for LoRA, QLoRA, and full fine-tuning
  • Training on custom datasets (chat, instruction, completion)
  • Running DPO/RLHF alignment training
  • Multi-GPU training without writing distributed code

Quick Start

# Install pip install axolotl # Or from source for latest features git clone https://github.com/axolotl-ai-cloud/axolotl cd axolotl && pip install -e ".[flash-attn]" # Fine-tune with a YAML config accelerate launch -m axolotl.cli.train config.yaml # Inference with fine-tuned model accelerate launch -m axolotl.cli.inference config.yaml --lora_model_dir ./output/lora

Configuration

# qlora_config.yaml base_model: meta-llama/Llama-3-8B-Instruct model_type: LlamaForCausalLM load_in_4bit: true adapter: qlora lora_r: 32 lora_alpha: 64 lora_dropout: 0.05 lora_target_linear: true datasets: - path: ./my_data.jsonl type: sharegpt conversation: chatml sequence_len: 4096 sample_packing: true pad_to_sequence_len: true micro_batch_size: 2 gradient_accumulation_steps: 4 num_epochs: 3 learning_rate: 2e-4 optimizer: adamw_torch lr_scheduler: cosine warmup_steps: 100 bf16: auto flash_attention: true gradient_checkpointing: true output_dir: ./output logging_steps: 10 save_strategy: steps save_steps: 500 eval_steps: 500

Full Fine-Tuning with DeepSpeed

base_model: meta-llama/Llama-3-8B-Instruct datasets: - path: ./data.jsonl type: sharegpt sequence_len: 4096 micro_batch_size: 1 gradient_accumulation_steps: 16 num_epochs: 1 learning_rate: 2e-5 bf16: auto flash_attention: true gradient_checkpointing: true deepspeed: deepspeed_configs/zero2.json output_dir: ./output-full

Dataset Formats

# ShareGPT format (recommended) {"conversations": [{"from": "human", "value": "What is X?"}, {"from": "gpt", "value": "X is..."}]} # Alpaca format {"instruction": "Summarize this", "input": "Long text...", "output": "Summary..."} # Completion format {"text": "Full text for completion training"}

Key Features

FeatureConfig KeyDescription
QLoRAadapter: qlora4-bit quantized LoRA
LoRAadapter: loraStandard LoRA
Full Fine-Tune(no adapter)Full parameter training
Sample Packingsample_packing: truePack short samples into sequences
Flash Attentionflash_attention: trueMemory-efficient attention
DeepSpeeddeepspeed: config.jsonDistributed training
DPOrl: dpoDirect Preference Optimization
NEFTuneneftune_noise_alpha: 5Noise embedding fine-tuning

Best Practices

  1. Start with QLoRA — Best balance of quality, speed, and memory
  2. Use sample packing — Dramatically reduces training time for short conversations
  3. Set lora_target_linear: true — Applies LoRA to all linear layers (better quality)
  4. Use lora_r: 32-64 — Lower ranks lose quality; higher ranks waste memory
  5. Enable flash attention — Free speedup and memory savings
  6. Set sequence_len appropriately — Match your data; don't waste memory on padding
  7. Use cosine learning rate schedule — Better convergence than constant or linear
  8. Evaluate during training — Set eval_steps and validation dataset to catch overfitting
  9. Merge LoRA after training — Use axolotl merge for deployment-ready models
  10. Version your configs — Track YAML configs alongside data versions in git

Troubleshooting

OOM during training

# Reduce batch size and enable gradient checkpointing micro_batch_size: 1 gradient_accumulation_steps: 16 gradient_checkpointing: true # Or switch to QLoRA load_in_4bit: true adapter: qlora

Loss doesn't decrease

# Check data format — ensure it matches expected type # Increase learning rate for LoRA learning_rate: 5e-4 # LoRA needs higher LR than full fine-tuning # Verify data is loading correctly debug: true # Prints first few processed samples

Model outputs garbage after fine-tuning

# Overfitting — reduce epochs or add regularization num_epochs: 1 lora_dropout: 0.1 # Or increase dataset size
Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates