Verified by Garnet Grid

LLM Fine-Tuning Strategies

Fine-tune large language models effectively. Covers when to fine-tune vs prompt engineer, LoRA/QLoRA, training data preparation, evaluation methodology, and cost optimization.

Fine-tuning is the most misunderstood concept in applied AI. Teams rush to fine-tune when prompt engineering would work better, or they fine-tune on irrelevant data and get worse results than the base model. This guide covers when fine-tuning actually makes sense and how to do it well.


When to Fine-Tune vs Prompt Engineer

ScenarioApproachWhy
Follow specific output formatPrompt engineerFaster, cheaper, adjustable
Use domain knowledgeRAG (Retrieval Augmented Generation)Knowledge can be updated without retraining
Match specific writing style/toneFine-tuneStyle is hard to specify in prompts
Reduce token usage (shorter prompts)Fine-tuneModel learns task, needs less instruction
Handle proprietary vocabularyFine-tuneDomain terms not in base model
Improve speed (fewer prompt tokens)Fine-tuneShorter prompts = faster inference
General question answeringDon’t fine-tuneBase models are already excellent

Fine-Tuning Decision Tree

Does the task require specific domain knowledge?
├── Yes → Can the knowledge be retrieved at inference time?
│   ├── Yes → Use RAG (retrieval augmented generation)
│   └── No  → Consider fine-tuning (or RAG + fine-tune)

└── No  → Is the issue output format or style?
    ├── Yes → Can it be solved with better prompts?
    │   ├── Yes → Prompt engineering (cheaper, faster)
    │   └── No  → Fine-tune for style/format

    └── No  → Is it a latency/cost issue?
        ├── Yes → Fine-tune smaller model to replace large one
        └── No  → Base model is probably sufficient

LoRA / QLoRA

MethodGPU MemoryTraining TimeQualityCost
Full fine-tuningVery high (80GB+)LongHighest$$$$
LoRA (Low-Rank Adaptation)Medium (24GB)MediumNear-full$$
QLoRA (Quantized LoRA)Low (16GB)MediumGood$
Prompt tuningVery lowFastModerate$
from peft import LoraConfig, get_peft_model

# LoRA configuration
lora_config = LoraConfig(
    r=16,                # Rank (lower = less params, faster)
    lora_alpha=32,       # Scaling factor
    target_modules=[     # Which layers to adapt
        "q_proj", "v_proj", "k_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA to base model
model = get_peft_model(base_model, lora_config)
print(f"Trainable params: {model.num_parameters(only_trainable=True):,}")
# ~0.1% of total parameters

Training Data Preparation

Quality FactorBadGood
Volume10 examples500-5,000 examples
DiversitySame type of example repeatedVaried examples covering edge cases
QualityUnreviewed, noisy dataHuman-reviewed, consistent quality
FormatInconsistent formattingStandardized instruction-response pairs
Balance90% one categoryProportional representation

Anti-Patterns

Anti-PatternProblemFix
Fine-tune for general knowledgeModel still hallucinates, waste of computeRAG for factual knowledge, fine-tune for behavior
Tiny training set (< 100 examples)Overfitting, no improvementMinimum 500 high-quality examples
No evaluation methodologyCan’t tell if fine-tuned model is betterEval set, automated metrics, human eval
Fine-tune largest modelExpensive, diminishing returnsStart with smallest model that works
No comparison to base + promptDon’t know if fine-tuning was even neededAlways benchmark: base + prompt vs fine-tuned

Checklist

  • Decision made: fine-tune vs prompt engineering vs RAG
  • Base model selected (smallest that meets quality bar)
  • Training data: 500+ examples, human-reviewed, diverse
  • Evaluation set: held out, representative, labeled
  • Method chosen: LoRA/QLoRA for most cases
  • Benchmarked against base model + good prompts
  • Hyperparameters tuned (learning rate, epochs, rank)
  • Model versioned and registered
  • Monitoring: output quality tracked post-deployment

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For AI/ML consulting, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →