Fine-tuning is an advanced technique that allows you to adapt a pre-trained AI model to your specific data and needs. Through this method, you can significantly improve model performance in a particular domain or task.
What is Fine-tuning and Why You Need It¶
Fine-tuning is the process of further training a pre-trained AI model on specific data, which allows you to adapt a general model to your specific needs. Instead of training a model from scratch, you utilize already learned knowledge and simply extend it with your domain.
Main advantages of fine-tuning:
- Significantly lower computational requirements than training from scratch
- Better results on specific tasks than general models
- Ability to use smaller datasets (hundreds to thousands of examples)
- Preservation of general knowledge from the base model
When to Use Fine-tuning vs Prompt Engineering¶
Fine-tuning isn’t always the best solution. The decision process should look like this:
Use prompt engineering when:
- You need a quick solution without additional costs
- You only have a few examples (less than 100)
- The task is general and well-describable
Use fine-tuning when:
- You have a specific domain with its own terminology
- You need consistent response format
- You have a quality dataset (100+ examples)
- You want to reduce latency and inference costs
Types of Fine-tuning¶
Full Fine-tuning¶
Updates all model parameters. Most effective, but also most demanding on computational resources and memory.
Parameter Efficient Fine-Tuning (PEFT)¶
Modern approach that updates only a small portion of parameters. Main techniques:
- LoRA (Low-Rank Adaptation): Adds small adaptation layers
- QLoRA: LoRA with quantization for even lower memory requirements
- Adapter layers: Inserts small layers between existing layers
- Prefix tuning: Optimizes only special prefix tokens
Practical Implementation with LoRA¶
Example of fine-tuning using Hugging Face Transformers and PEFT library:
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from datasets import Dataset
import torch
# Load base model
model_name = "microsoft/DialoGPT-medium"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
# LoRA configuration
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16, # rank - higher = more parameters
lora_alpha=32,
lora_dropout=0.1,
target_modules=["c_attn", "c_proj"] # which layers to modify
)
# Apply LoRA to model
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
Training Data Preparation¶
def prepare_dataset(examples):
"""Prepare data for causal language modeling"""
inputs = []
for example in examples:
# Format: "Question: {question} Answer: {answer}"
text = f"Question: {example['question']} Answer: {example['answer']}"
inputs.append(text)
# Tokenization
model_inputs = tokenizer(
inputs,
truncation=True,
padding=True,
max_length=512,
return_tensors="pt"
)
# For causal LM, labels are the same as input_ids
model_inputs["labels"] = model_inputs["input_ids"].clone()
return model_inputs
# Sample data
train_data = [
{"question": "How does Docker work?", "answer": "Docker is a containerization platform..."},
{"question": "What is Kubernetes?", "answer": "Kubernetes is an orchestration system..."},
# ... more examples
]
dataset = Dataset.from_list(train_data)
tokenized_dataset = dataset.map(prepare_dataset, batched=True)
Running Training¶
from transformers import Trainer, DataCollatorForLanguageModeling
# Training configuration
training_args = TrainingArguments(
output_dir="./fine-tuned-model",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
warmup_steps=100,
learning_rate=5e-5,
logging_steps=10,
save_steps=500,
evaluation_strategy="steps",
eval_steps=500,
fp16=True, # for memory savings
)
# Data collator for language modeling
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False # causal LM, not masked LM
)
# Initialize trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset,
data_collator=data_collator,
)
# Start training
trainer.train()
trainer.save_model()
Monitoring and Evaluation¶
Proper monitoring of fine-tuning progress is crucial:
# Track metrics during training
import wandb
# Integration with Weights & Biases
wandb.init(project="fine-tuning-experiment")
# Evaluation function
def compute_perplexity(eval_dataset):
"""Calculate perplexity on evaluation dataset"""
model.eval()
total_loss = 0
num_batches = 0
with torch.no_grad():
for batch in eval_dataset:
outputs = model(**batch)
total_loss += outputs.loss.item()
num_batches += 1
avg_loss = total_loss / num_batches
perplexity = torch.exp(torch.tensor(avg_loss))
return perplexity.item()
# Test generated responses
def test_generation(prompt, max_length=100):
"""Test text generation"""
inputs = tokenizer.encode(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs,
max_length=max_length,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
Optimization and Best Practices¶
Hyperparameter Selection¶
- Learning rate: Start with 5e-5, try 1e-4 for smaller models
- Batch size: As large as fits in memory
- LoRA rank (r): 16-64 for most tasks, higher for more complex domains
- Epochs: 2-5, watch for overfitting
Data Quality¶
Data quality is critical for fine-tuning success:
# Dataset quality validation
def validate_dataset(dataset):
"""Check training data quality"""
issues = []
for i, example in enumerate(dataset):
# Length check
if len(example['question']) < 10:
issues.append(f"Row {i}: Question too short")
# Duplicate check
if example['question'] in seen_questions:
issues.append(f"Row {i}: Duplicate question")
# Format check
if not example['answer'].strip():
issues.append(f"Row {i}: Empty answer")
return issues
# Data cleaning
def clean_dataset(examples):
"""Basic dataset cleaning"""
cleaned = []
for example in examples:
# Remove extra whitespace
question = example['question'].strip()
answer = example['answer'].strip()
# Filter by length
if 10 <= len(question) <= 500 and 20 <= len(answer) <= 1000:
cleaned.append({
'question': question,
'answer': answer
})
return cleaned
Deployment and Inference¶
After fine-tuning completion, the model needs to be efficiently deployed:
from peft import PeftModel
# Load fine-tuned model for inference
base_model = AutoModelForCausalLM.from_pretrained(
"microsoft/DialoGPT-medium",
torch_dtype=torch.float16,
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "./fine-tuned-model")
# Inference API
class FineTunedAPI:
def __init__(self, model, tokenizer):
self.model = model
self.tokenizer = tokenizer
def generate_response(self, question: str) -> str:
"""Generate response to question"""
prompt = f"Question: {question} Answer:"
inputs = self.tokenizer.encode(prompt, return_tensors="pt")
with torch.no_grad():
outputs = self.model.generate(
inputs,
max_new_tokens=200,
temperature=0.7,
do_sample=True,
pad_token_id=self.tokenizer.eos_token_id
)
response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
# Extract only new part of response
return response[len(prompt):].strip()
# Usage
api = FineTunedAPI(model, tokenizer)
answer = api.generate_response("How to optimize PostgreSQL database?")
Summary¶
Fine-tuning is a powerful tool for adapting AI models to specific needs. The key is correctly choosing between prompt engineering and fine-tuning, using quality data and appropriate PEFT techniques like LoRA. With increasing availability of tools like Hugging Face PEFT, fine-tuning is becoming more accessible even for smaller teams. Remember thorough testing, metrics monitoring, and gradual improvement of data quality to achieve the best results.