ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

LLM Fine-Tuning

Fine-tune large language models for domain-specific tasks. Covers full fine-tuning, LoRA, QLoRA, dataset preparation, evaluation, deployment, and the patterns that produce specialized models without the cost of training from scratch.

Fine-tuning adapts a pre-trained LLM to a specific task or domain by training on your data. Instead of building a model from scratch (millions of dollars, months of compute), you take an existing model and teach it your domain in hours with hundreds of examples. The result is a model that speaks your language, follows your conventions, and handles your edge cases.


When to Fine-Tune

Don't fine-tune (use prompting instead):
  ☐ Few-shot examples solve the problem
  ☐ Task is general-purpose (translation, summarization)
  ☐ You have < 100 training examples
  ☐ Requirements change frequently

Fine-tune when:
  ☐ Prompting consistently fails on your domain
  ☐ You need specific output format/style
  ☐ Latency matters (fine-tuned = shorter prompts)
  ☐ Cost matters (shorter prompts = cheaper inference)
  ☐ You have 500+ high-quality training examples
  ☐ Domain-specific terminology and patterns

Fine-Tuning Methods

Full Fine-Tuning:
  Update all model parameters
  Cost: Very high (full model in GPU memory)
  Quality: Best results
  When: You have the compute budget and large dataset
  
LoRA (Low-Rank Adaptation):
  Freeze base model, add small trainable adapters
  Cost: Low (< 1% of parameters trained)
  Quality: 95%+ of full fine-tuning quality
  When: Limited GPU memory, quick iteration
  
QLoRA (Quantized LoRA):
  Quantize model to 4-bit, apply LoRA adapters
  Cost: Very low (fits 70B model on single GPU)
  Quality: 90-95% of full fine-tuning quality
  When: Consumer GPU, largest models

Dataset Preparation

# Training data format (instruction fine-tuning)
training_data = [
    {
        "messages": [
            {"role": "system", "content": "You are a medical coding assistant."},
            {"role": "user", "content": "Patient presents with acute bronchitis, prescribed azithromycin."},
            {"role": "assistant", "content": "ICD-10: J20.9 (Acute bronchitis, unspecified)\nCPT: 99213 (Office visit, established patient)\nNDC: 0069-3150-75 (Azithromycin 250mg)"}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "You are a medical coding assistant."},
            {"role": "user", "content": "Follow-up for type 2 diabetes, HbA1c 7.2%, adjusted metformin."},
            {"role": "assistant", "content": "ICD-10: E11.65 (Type 2 diabetes with hyperglycemia)\nCPT: 99214 (Office visit, moderate complexity)\nLab: 83036 (HbA1c)\nNDC: 0093-7214-01 (Metformin 500mg)"}
        ]
    },
    # ... 500+ examples
]

# Data quality checklist:
# ☐ Diverse examples covering edge cases
# ☐ Consistent format in all outputs
# ☐ Verified by domain experts
# ☐ No PII or sensitive data
# ☐ Balanced across categories
# ☐ 80/10/10 train/validation/test split

LoRA Fine-Tuning

from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from trl import SFTTrainer

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3-8b",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3-8b")

# LoRA configuration
lora_config = LoraConfig(
    r=16,                     # Rank (higher = more capacity, more memory)
    lora_alpha=32,            # Scaling factor
    lora_dropout=0.05,        # Regularization
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    task_type=TaskType.CAUSAL_LM,
)

model = get_peft_model(model, lora_config)
print(f"Trainable params: {model.num_parameters(only_trainable=True):,}")
# Trainable: ~6.5M (vs 8B total = 0.08%)

# Training
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=TrainingArguments(
        output_dir="./fine-tuned-model",
        num_train_epochs=3,
        per_device_train_batch_size=4,
        learning_rate=2e-4,
        warmup_steps=100,
        logging_steps=10,
        eval_strategy="steps",
        eval_steps=50,
        save_strategy="steps",
        save_steps=100,
    ),
)

trainer.train()

Anti-Patterns

Anti-PatternConsequenceFix
Fine-tune before trying promptingWasted time and computeStart with few-shot prompting
Low-quality training dataModel learns bad patternsExpert-verified, diverse examples
No evaluation datasetCannot measure improvementHold out 10% for evaluation
Overfitting on small datasetWorks on training data, fails on new dataLoRA dropout, early stopping
Not merging adapters for productionInference overhead from adapter loadingMerge LoRA into base model

Fine-tuning is powerful but not magic. It works best when you have high-quality training data, a clear task definition, and have already tried prompting first.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →