LLM Fine-Tuning Strategies | The Garnet Wiki

Fine-tuning is the most misunderstood concept in applied AI. Teams rush to fine-tune when prompt engineering would work better, or they fine-tune on irrelevant data and get worse results than the base model. This guide covers when fine-tuning actually makes sense and how to do it well.

When to Fine-Tune vs Prompt Engineer

Scenario	Approach	Why
Follow specific output format	Prompt engineer	Faster, cheaper, adjustable
Use domain knowledge	RAG (Retrieval Augmented Generation)	Knowledge can be updated without retraining
Match specific writing style/tone	Fine-tune	Style is hard to specify in prompts
Reduce token usage (shorter prompts)	Fine-tune	Model learns task, needs less instruction
Handle proprietary vocabulary	Fine-tune	Domain terms not in base model
Improve speed (fewer prompt tokens)	Fine-tune	Shorter prompts = faster inference
General question answering	Don’t fine-tune	Base models are already excellent

Fine-Tuning Decision Tree

Does the task require specific domain knowledge?
├── Yes → Can the knowledge be retrieved at inference time?
│   ├── Yes → Use RAG (retrieval augmented generation)
│   └── No  → Consider fine-tuning (or RAG + fine-tune)
│
└── No  → Is the issue output format or style?
    ├── Yes → Can it be solved with better prompts?
    │   ├── Yes → Prompt engineering (cheaper, faster)
    │   └── No  → Fine-tune for style/format
    │
    └── No  → Is it a latency/cost issue?
        ├── Yes → Fine-tune smaller model to replace large one
        └── No  → Base model is probably sufficient

LoRA / QLoRA

Method	GPU Memory	Training Time	Quality	Cost
Full fine-tuning	Very high (80GB+)	Long	Highest	$$$$
LoRA (Low-Rank Adaptation)	Medium (24GB)	Medium	Near-full	$$
QLoRA (Quantized LoRA)	Low (16GB)	Medium	Good	$
Prompt tuning	Very low	Fast	Moderate	$

from peft import LoraConfig, get_peft_model

# LoRA configuration
lora_config = LoraConfig(
    r=16,                # Rank (lower = less params, faster)
    lora_alpha=32,       # Scaling factor
    target_modules=[     # Which layers to adapt
        "q_proj", "v_proj", "k_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA to base model
model = get_peft_model(base_model, lora_config)
print(f"Trainable params: {model.num_parameters(only_trainable=True):,}")
# ~0.1% of total parameters

Training Data Preparation

Quality Factor	Bad	Good
Volume	10 examples	500-5,000 examples
Diversity	Same type of example repeated	Varied examples covering edge cases
Quality	Unreviewed, noisy data	Human-reviewed, consistent quality
Format	Inconsistent formatting	Standardized instruction-response pairs
Balance	90% one category	Proportional representation

Anti-Patterns

Anti-Pattern	Problem	Fix
Fine-tune for general knowledge	Model still hallucinates, waste of compute	RAG for factual knowledge, fine-tune for behavior
Tiny training set (< 100 examples)	Overfitting, no improvement	Minimum 500 high-quality examples
No evaluation methodology	Can’t tell if fine-tuned model is better	Eval set, automated metrics, human eval
Fine-tune largest model	Expensive, diminishing returns	Start with smallest model that works
No comparison to base + prompt	Don’t know if fine-tuning was even needed	Always benchmark: base + prompt vs fine-tuned

Checklist

Decision made: fine-tune vs prompt engineering vs RAG
Base model selected (smallest that meets quality bar)
Training data: 500+ examples, human-reviewed, diverse
Evaluation set: held out, representative, labeled
Method chosen: LoRA/QLoRA for most cases
Benchmarked against base model + good prompts
Hyperparameters tuned (learning rate, epochs, rank)
Model versioned and registered
Monitoring: output quality tracked post-deployment

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For AI/ML consulting, visit garnetgrid.com. :::

When to Fine-Tune vs Prompt Engineer

Fine-Tuning Decision Tree

LoRA / QLoRA

Training Data Preparation

Anti-Patterns

Checklist

More in AI & Machine Learning

Responsible AI: Bias Detection & Mitigation

Agentic AI: Orchestration Frameworks

AI Cost Optimization: GPU vs API vs Edge