How AI Works

Fine-Tuning & Custom Models

Train AI models on your data. From LoRA adapters to full RLHF — domain-specific intelligence at any scale.

Approach Spectrum

Why fine-tune?

Fine-tuning sits at the sweet spot of the customization spectrum — more powerful than prompting, more practical than training from scratch.

Prompt Engineering

You need quick results with off-the-shelf models and no training data.

Cost

Low

Latency

None

Data needed

Zero — just examples in the prompt

Training intensity

RAG

Your knowledge base changes frequently and you need grounded, citation-backed answers.

Cost

Medium

Latency

+50–200ms retrieval

Data needed

Document corpus (any size)

Training intensity

Sweet spot

Fine-Tuning

You need the model to internalize domain vocabulary, tone, reasoning patterns, or structured output formats.

Cost

Medium–High

Latency

None at inference

Data needed

100–10,000 curated examples

Training intensity

Training from Scratch

No existing model covers your domain and you have massive proprietary corpora (rare).

Cost

Very High

Latency

None at inference

Data needed

Billions of tokens + months of compute

Training intensity

Techniques

Fine-tuning techniques

Six approaches to adapting a model — from full parameter updates to lightweight adapters and preference learning.

Full Fine-Tuning

Updates every parameter in the model. The original approach — maximum flexibility but maximum cost. All weights are unfrozen and trained on your dataset with a low learning rate.

How it works

Load the pre-trained model and unfreeze all layers

Prepare your dataset in instruction/completion format

Train with a low learning rate (1e-5 to 5e-5) to avoid catastrophic forgetting

Evaluate on a held-out validation set and checkpoint the best model

Organizations with large GPU clusters and datasets > 50K examples that need deep behavioral changes.

ParametersAll parameters (~7B–405B)

Memory2–4× model size in VRAM

TimeDays to weeks

Advantages

Maximum control over model behavior

Can fundamentally shift model capabilities

Well-understood training dynamics

Trade-offs

Requires full model in GPU memory (e.g. 140GB for 70B)

Risk of catastrophic forgetting without careful tuning

Slow: days to weeks on multiple GPUs

Base Models

Model landscape

Choosing the right base model determines your ceiling. Open-weight models offer full control; API models trade flexibility for convenience.

Llama

Mistral / Mixtral

Mistral AI

7B8×7B8×22B

Apache 2.0

Exceptional efficiency. Mixtral MoE architecture activates only 2 of 8 experts per token — 70B-class quality at 13B inference cost.

Fine-tuning

Full, LoRA, QLoRA — strong community support

Phi

Microsoft

3.8B7B14B

MIT

Punches above its weight. Data-quality-focused training yields strong reasoning in a tiny model. Ideal for on-device and edge deployment.

Fine-tuning

Full, LoRA, QLoRA — excellent for constrained environments

Gemma

Google

2B9B27B

Gemma Terms of Use

Strong instruction following and safety alignment out of the box. Good multilingual performance with efficient architecture.

Fine-tuning

Full, LoRA — well-supported in Keras and JAX

OpenAI API

OpenAI

API only (undisclosed)

Proprietary API

Frontier capability across all modalities. Fine-tuning available via API — no infrastructure management required.

Fine-tuning

API fine-tuning only — no weight access

Anthropic API

Anthropic

API only (undisclosed)

Proprietary API

Best-in-class instruction following, safety, and long-context performance. Prompt-based customization only — no fine-tuning API.

Fine-tuning

No fine-tuning — prompt engineering and RAG only

Data Quality

Data requirements

The quality of your fine-tuned model is bounded by the quality of your data. Relevance matters more than volume.

Data quality pyramid

Domain Relevance95% importance

Data from your actual domain and use cases

Diversity80% importance

Covers edge cases, formats, and variations

Quality65% importance

Accurate, well-formatted, consistent examples

Quantity45% importance

Enough examples to learn patterns

LoRA / QLoRA

100 – 10,000

curated examples

Full Fine-Tune

10,000 – 100,000+

diverse examples

RLHF / DPO

5,000 – 50,000

preference pairs

Distillation

50,000 – 500,000+

teacher outputs

Data preparation pipeline

Collection

Gather domain-specific instruction/response pairs from existing workflows, documentation, and expert annotations.

Cleaning

Remove duplicates, fix encoding, strip PII, normalize formatting. Bad data in → bad model out.

Formatting

Convert to the model's expected format — Alpaca, ShareGPT, or chat-ML. Add system prompts and metadata.

Validation

Expert review of a random sample. Check for hallucinations, bias, formatting errors, and label quality.

Splitting

Train (80%) / validation (10%) / test (10%). Stratify by category to ensure balanced evaluation.

Quality thresholds

95%

Accuracy

90%

Consistency

85%

Coverage

Decision Guide

When to fine-tune vs. other approaches

The right approach depends on your data, latency requirements, and how deeply you need to change model behavior.

Scenario	Prompt Eng.	RAG	Fine-Tune	From Scratch
Need domain-specific vocabulary and jargon
Need real-time access to changing data
Need specific output format or tone
Need to reduce hallucinations with citations
Need specialized reasoning patterns
Need to work within budget constraints
Need model to run on-device / at the edge
Need to combine external knowledge + custom behavior

Combine RAG + Fine-Tuning

Use RAG for real-time knowledge and fine-tuning for domain behavior — the most powerful pattern for enterprise AI.

Start with LoRA

For 90% of enterprise use cases, LoRA delivers the best ROI: fast training, tiny adapters, and production-ready results.

Evaluate continuously

Benchmark on held-out data before and after. Track perplexity, task accuracy, and human preference scores.

Train a model on your data.

Describe your domain and data — we'll recommend the right fine-tuning approach, base model, and training strategy.

Ask the AI Architect See the fine-tuning blueprint