_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

Fine-tuning — přizpůsobení AI modelu vašim datům

06. 02. 2026 5 min read intermediate

Fine-tuning is an advanced technique that allows you to adapt a pre-trained AI model to your specific data and needs. Through this method, you can significantly improve model performance in a particular domain or task.

What is Fine-tuning and Why You Need It

Fine-tuning is the process of further training a pre-trained AI model on specific data, which allows you to adapt a general model to your specific needs. Instead of training a model from scratch, you utilize already learned knowledge and simply extend it with your domain.

Main advantages of fine-tuning:

  • Significantly lower computational requirements than training from scratch
  • Better results on specific tasks than general models
  • Ability to use smaller datasets (hundreds to thousands of examples)
  • Preservation of general knowledge from the base model

When to Use Fine-tuning vs Prompt Engineering

Fine-tuning isn’t always the best solution. The decision process should look like this:

Use prompt engineering when:

  • You need a quick solution without additional costs
  • You only have a few examples (less than 100)
  • The task is general and well-describable

Use fine-tuning when:

  • You have a specific domain with its own terminology
  • You need consistent response format
  • You have a quality dataset (100+ examples)
  • You want to reduce latency and inference costs

Types of Fine-tuning

Full Fine-tuning

Updates all model parameters. Most effective, but also most demanding on computational resources and memory.

Parameter Efficient Fine-Tuning (PEFT)

Modern approach that updates only a small portion of parameters. Main techniques:

  • LoRA (Low-Rank Adaptation): Adds small adaptation layers
  • QLoRA: LoRA with quantization for even lower memory requirements
  • Adapter layers: Inserts small layers between existing layers
  • Prefix tuning: Optimizes only special prefix tokens

Practical Implementation with LoRA

Example of fine-tuning using Hugging Face Transformers and PEFT library:

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from datasets import Dataset
import torch

# Load base model
model_name = "microsoft/DialoGPT-medium"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

# LoRA configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,  # rank - higher = more parameters
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=["c_attn", "c_proj"]  # which layers to modify
)

# Apply LoRA to model
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

Training Data Preparation

def prepare_dataset(examples):
    """Prepare data for causal language modeling"""
    inputs = []
    for example in examples:
        # Format: "Question: {question} Answer: {answer}"
        text = f"Question: {example['question']} Answer: {example['answer']}"
        inputs.append(text)

    # Tokenization
    model_inputs = tokenizer(
        inputs,
        truncation=True,
        padding=True,
        max_length=512,
        return_tensors="pt"
    )

    # For causal LM, labels are the same as input_ids
    model_inputs["labels"] = model_inputs["input_ids"].clone()
    return model_inputs

# Sample data
train_data = [
    {"question": "How does Docker work?", "answer": "Docker is a containerization platform..."},
    {"question": "What is Kubernetes?", "answer": "Kubernetes is an orchestration system..."},
    # ... more examples
]

dataset = Dataset.from_list(train_data)
tokenized_dataset = dataset.map(prepare_dataset, batched=True)

Running Training

from transformers import Trainer, DataCollatorForLanguageModeling

# Training configuration
training_args = TrainingArguments(
    output_dir="./fine-tuned-model",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    warmup_steps=100,
    learning_rate=5e-5,
    logging_steps=10,
    save_steps=500,
    evaluation_strategy="steps",
    eval_steps=500,
    fp16=True,  # for memory savings
)

# Data collator for language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # causal LM, not masked LM
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator,
)

# Start training
trainer.train()
trainer.save_model()

Monitoring and Evaluation

Proper monitoring of fine-tuning progress is crucial:

# Track metrics during training
import wandb

# Integration with Weights & Biases
wandb.init(project="fine-tuning-experiment")

# Evaluation function
def compute_perplexity(eval_dataset):
    """Calculate perplexity on evaluation dataset"""
    model.eval()
    total_loss = 0
    num_batches = 0

    with torch.no_grad():
        for batch in eval_dataset:
            outputs = model(**batch)
            total_loss += outputs.loss.item()
            num_batches += 1

    avg_loss = total_loss / num_batches
    perplexity = torch.exp(torch.tensor(avg_loss))
    return perplexity.item()

# Test generated responses
def test_generation(prompt, max_length=100):
    """Test text generation"""
    inputs = tokenizer.encode(prompt, return_tensors="pt")

    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=max_length,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

Optimization and Best Practices

Hyperparameter Selection

  • Learning rate: Start with 5e-5, try 1e-4 for smaller models
  • Batch size: As large as fits in memory
  • LoRA rank (r): 16-64 for most tasks, higher for more complex domains
  • Epochs: 2-5, watch for overfitting

Data Quality

Data quality is critical for fine-tuning success:

# Dataset quality validation
def validate_dataset(dataset):
    """Check training data quality"""
    issues = []

    for i, example in enumerate(dataset):
        # Length check
        if len(example['question']) < 10:
            issues.append(f"Row {i}: Question too short")

        # Duplicate check
        if example['question'] in seen_questions:
            issues.append(f"Row {i}: Duplicate question")

        # Format check
        if not example['answer'].strip():
            issues.append(f"Row {i}: Empty answer")

    return issues

# Data cleaning
def clean_dataset(examples):
    """Basic dataset cleaning"""
    cleaned = []

    for example in examples:
        # Remove extra whitespace
        question = example['question'].strip()
        answer = example['answer'].strip()

        # Filter by length
        if 10 <= len(question) <= 500 and 20 <= len(answer) <= 1000:
            cleaned.append({
                'question': question,
                'answer': answer
            })

    return cleaned

Deployment and Inference

After fine-tuning completion, the model needs to be efficiently deployed:

from peft import PeftModel

# Load fine-tuned model for inference
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/DialoGPT-medium",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "./fine-tuned-model")

# Inference API
class FineTunedAPI:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer

    def generate_response(self, question: str) -> str:
        """Generate response to question"""
        prompt = f"Question: {question} Answer:"

        inputs = self.tokenizer.encode(prompt, return_tensors="pt")

        with torch.no_grad():
            outputs = self.model.generate(
                inputs,
                max_new_tokens=200,
                temperature=0.7,
                do_sample=True,
                pad_token_id=self.tokenizer.eos_token_id
            )

        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        # Extract only new part of response
        return response[len(prompt):].strip()

# Usage
api = FineTunedAPI(model, tokenizer)
answer = api.generate_response("How to optimize PostgreSQL database?")

Summary

Fine-tuning is a powerful tool for adapting AI models to specific needs. The key is correctly choosing between prompt engineering and fine-tuning, using quality data and appropriate PEFT techniques like LoRA. With increasing availability of tools like Hugging Face PEFT, fine-tuning is becoming more accessible even for smaller teams. Remember thorough testing, metrics monitoring, and gradual improvement of data quality to achieve the best results.

fine-tuningpeftllm
Share:

CORE SYSTEMS tým

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.