LLM Fine-Tuning | The Garnet Wiki

Fine-tuning adapts a pre-trained LLM to a specific task or domain by training on your data. Instead of building a model from scratch (millions of dollars, months of compute), you take an existing model and teach it your domain in hours with hundreds of examples. The result is a model that speaks your language, follows your conventions, and handles your edge cases.

When to Fine-Tune

Don't fine-tune (use prompting instead):
  ☐ Few-shot examples solve the problem
  ☐ Task is general-purpose (translation, summarization)
  ☐ You have < 100 training examples
  ☐ Requirements change frequently

Fine-tune when:
  ☐ Prompting consistently fails on your domain
  ☐ You need specific output format/style
  ☐ Latency matters (fine-tuned = shorter prompts)
  ☐ Cost matters (shorter prompts = cheaper inference)
  ☐ You have 500+ high-quality training examples
  ☐ Domain-specific terminology and patterns

Fine-Tuning Methods

Full Fine-Tuning:
  Update all model parameters
  Cost: Very high (full model in GPU memory)
  Quality: Best results
  When: You have the compute budget and large dataset
  
LoRA (Low-Rank Adaptation):
  Freeze base model, add small trainable adapters
  Cost: Low (< 1% of parameters trained)
  Quality: 95%+ of full fine-tuning quality
  When: Limited GPU memory, quick iteration
  
QLoRA (Quantized LoRA):
  Quantize model to 4-bit, apply LoRA adapters
  Cost: Very low (fits 70B model on single GPU)
  Quality: 90-95% of full fine-tuning quality
  When: Consumer GPU, largest models

Dataset Preparation

# Training data format (instruction fine-tuning)
training_data = [
    {
        "messages": [
            {"role": "system", "content": "You are a medical coding assistant."},
            {"role": "user", "content": "Patient presents with acute bronchitis, prescribed azithromycin."},
            {"role": "assistant", "content": "ICD-10: J20.9 (Acute bronchitis, unspecified)\nCPT: 99213 (Office visit, established patient)\nNDC: 0069-3150-75 (Azithromycin 250mg)"}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "You are a medical coding assistant."},
            {"role": "user", "content": "Follow-up for type 2 diabetes, HbA1c 7.2%, adjusted metformin."},
            {"role": "assistant", "content": "ICD-10: E11.65 (Type 2 diabetes with hyperglycemia)\nCPT: 99214 (Office visit, moderate complexity)\nLab: 83036 (HbA1c)\nNDC: 0093-7214-01 (Metformin 500mg)"}
        ]
    },
    # ... 500+ examples
]

# Data quality checklist:
# ☐ Diverse examples covering edge cases
# ☐ Consistent format in all outputs
# ☐ Verified by domain experts
# ☐ No PII or sensitive data
# ☐ Balanced across categories
# ☐ 80/10/10 train/validation/test split

LoRA Fine-Tuning

from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from trl import SFTTrainer

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3-8b",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3-8b")

# LoRA configuration
lora_config = LoraConfig(
    r=16,                     # Rank (higher = more capacity, more memory)
    lora_alpha=32,            # Scaling factor
    lora_dropout=0.05,        # Regularization
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    task_type=TaskType.CAUSAL_LM,
)

model = get_peft_model(model, lora_config)
print(f"Trainable params: {model.num_parameters(only_trainable=True):,}")
# Trainable: ~6.5M (vs 8B total = 0.08%)

# Training
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=TrainingArguments(
        output_dir="./fine-tuned-model",
        num_train_epochs=3,
        per_device_train_batch_size=4,
        learning_rate=2e-4,
        warmup_steps=100,
        logging_steps=10,
        eval_strategy="steps",
        eval_steps=50,
        save_strategy="steps",
        save_steps=100,
    ),
)

trainer.train()

Anti-Patterns

Anti-Pattern	Consequence	Fix
Fine-tune before trying prompting	Wasted time and compute	Start with few-shot prompting
Low-quality training data	Model learns bad patterns	Expert-verified, diverse examples
No evaluation dataset	Cannot measure improvement	Hold out 10% for evaluation
Overfitting on small dataset	Works on training data, fails on new data	LoRA dropout, early stopping
Not merging adapters for production	Inference overhead from adapter loading	Merge LoRA into base model

Fine-tuning is powerful but not magic. It works best when you have high-quality training data, a clear task definition, and have already tried prompting first.

When to Fine-Tune

Fine-Tuning Methods

Dataset Preparation

LoRA Fine-Tuning

Anti-Patterns

More in AI Engineering

AI Agent Orchestration

AI Agent Tool Selection Optimization

AI Agent Architecture