Fine-Tuning & Custom Models
Train AI models on your data. From LoRA adapters to full RLHF — domain-specific intelligence at any scale.
Why fine-tune?
Fine-tuning sits at the sweet spot of the customization spectrum — more powerful than prompting, more practical than training from scratch.
Prompt Engineering
You need quick results with off-the-shelf models and no training data.
None
Zero — just examples in the prompt
RAG
Your knowledge base changes frequently and you need grounded, citation-backed answers.
+50–200ms retrieval
Document corpus (any size)
Fine-Tuning
You need the model to internalize domain vocabulary, tone, reasoning patterns, or structured output formats.
None at inference
100–10,000 curated examples
Training from Scratch
No existing model covers your domain and you have massive proprietary corpora (rare).
None at inference
Billions of tokens + months of compute
Fine-tuning techniques
Six approaches to adapting a model — from full parameter updates to lightweight adapters and preference learning.
Full Fine-Tuning
Updates every parameter in the model. The original approach — maximum flexibility but maximum cost. All weights are unfrozen and trained on your dataset with a low learning rate.
How it works
Load the pre-trained model and unfreeze all layers
Prepare your dataset in instruction/completion format
Train with a low learning rate (1e-5 to 5e-5) to avoid catastrophic forgetting
Evaluate on a held-out validation set and checkpoint the best model
Organizations with large GPU clusters and datasets > 50K examples that need deep behavioral changes.
Advantages
Trade-offs
Model landscape
Choosing the right base model determines your ceiling. Open-weight models offer full control; API models trade flexibility for convenience.
Llama
MetaBest open-weight all-rounder. Strong reasoning, multilingual, 128K context. The default choice for most fine-tuning projects.
Full, LoRA, QLoRA — full ecosystem support
Mistral / Mixtral
Mistral AIExceptional efficiency. Mixtral MoE architecture activates only 2 of 8 experts per token — 70B-class quality at 13B inference cost.
Full, LoRA, QLoRA — strong community support
Phi
MicrosoftPunches above its weight. Data-quality-focused training yields strong reasoning in a tiny model. Ideal for on-device and edge deployment.
Full, LoRA, QLoRA — excellent for constrained environments
Gemma
GoogleStrong instruction following and safety alignment out of the box. Good multilingual performance with efficient architecture.
Full, LoRA — well-supported in Keras and JAX
OpenAI API
OpenAIFrontier capability across all modalities. Fine-tuning available via API — no infrastructure management required.
API fine-tuning only — no weight access
Anthropic API
AnthropicBest-in-class instruction following, safety, and long-context performance. Prompt-based customization only — no fine-tuning API.
No fine-tuning — prompt engineering and RAG only
Data requirements
The quality of your fine-tuned model is bounded by the quality of your data. Relevance matters more than volume.
Data quality pyramid
Data from your actual domain and use cases
Covers edge cases, formats, and variations
Accurate, well-formatted, consistent examples
Enough examples to learn patterns
Data preparation pipeline
Collection
Gather domain-specific instruction/response pairs from existing workflows, documentation, and expert annotations.
Cleaning
Remove duplicates, fix encoding, strip PII, normalize formatting. Bad data in → bad model out.
Formatting
Convert to the model's expected format — Alpaca, ShareGPT, or chat-ML. Add system prompts and metadata.
Validation
Expert review of a random sample. Check for hallucinations, bias, formatting errors, and label quality.
Splitting
Train (80%) / validation (10%) / test (10%). Stratify by category to ensure balanced evaluation.
Quality thresholds
When to fine-tune vs. other approaches
The right approach depends on your data, latency requirements, and how deeply you need to change model behavior.
| Scenario | Prompt Eng. | RAG | Fine-Tune | From Scratch |
|---|---|---|---|---|
| Need domain-specific vocabulary and jargon | ||||
| Need real-time access to changing data | ||||
| Need specific output format or tone | ||||
| Need to reduce hallucinations with citations | ||||
| Need specialized reasoning patterns | ||||
| Need to work within budget constraints | ||||
| Need model to run on-device / at the edge | ||||
| Need to combine external knowledge + custom behavior |
Combine RAG + Fine-Tuning
Use RAG for real-time knowledge and fine-tuning for domain behavior — the most powerful pattern for enterprise AI.
Start with LoRA
For 90% of enterprise use cases, LoRA delivers the best ROI: fast training, tiny adapters, and production-ready results.
Evaluate continuously
Benchmark on held-out data before and after. Track perplexity, task accuracy, and human preference scores.
Related Topics
We also build
Explore next
Train a model on your data.
Describe your domain and data — we'll recommend the right fine-tuning approach, base model, and training strategy.