Fine-tuning — přizpůsobení AI modelu vašim datům

Fine-tuning is an advanced technique that allows you to adapt a pre-trained AI model to your specific data and needs. Through this method, you can significantly improve model performance in a particular domain or task.

What is Fine-tuning and Why You Need It¶

Fine-tuning is the process of further training a pre-trained AI model on specific data, which allows you to adapt a general model to your specific needs. Instead of training a model from scratch, you utilize already learned knowledge and simply extend it with your domain.

Main advantages of fine-tuning:

Significantly lower computational requirements than training from scratch
Better results on specific tasks than general models
Ability to use smaller datasets (hundreds to thousands of examples)
Preservation of general knowledge from the base model

When to Use Fine-tuning vs Prompt Engineering¶

Fine-tuning isn’t always the best solution. The decision process should look like this:

Use prompt engineering when:

You need a quick solution without additional costs
You only have a few examples (less than 100)
The task is general and well-describable

Use fine-tuning when:

You have a specific domain with its own terminology
You need consistent response format
You have a quality dataset (100+ examples)
You want to reduce latency and inference costs

Types of Fine-tuning¶

Full Fine-tuning¶

Updates all model parameters. Most effective, but also most demanding on computational resources and memory.

Parameter Efficient Fine-Tuning (PEFT)¶

Modern approach that updates only a small portion of parameters. Main techniques:

LoRA (Low-Rank Adaptation): Adds small adaptation layers
QLoRA: LoRA with quantization for even lower memory requirements
Adapter layers: Inserts small layers between existing layers
Prefix tuning: Optimizes only special prefix tokens

Practical Implementation with LoRA¶

Example of fine-tuning using Hugging Face Transformers and PEFT library:

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from datasets import Dataset
import torch

# Load base model
model_name = "microsoft/DialoGPT-medium"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

# LoRA configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,  # rank - higher = more parameters
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=["c_attn", "c_proj"]  # which layers to modify
)

# Apply LoRA to model
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

Training Data Preparation¶

def prepare_dataset(examples):
    """Prepare data for causal language modeling"""
    inputs = []
    for example in examples:
        # Format: "Question: {question} Answer: {answer}"
        text = f"Question: {example['question']} Answer: {example['answer']}"
        inputs.append(text)

    # Tokenization
    model_inputs = tokenizer(
        inputs,
        truncation=True,
        padding=True,
        max_length=512,
        return_tensors="pt"
    )

    # For causal LM, labels are the same as input_ids
    model_inputs["labels"] = model_inputs["input_ids"].clone()
    return model_inputs

# Sample data
train_data = [
    {"question": "How does Docker work?", "answer": "Docker is a containerization platform..."},
    {"question": "What is Kubernetes?", "answer": "Kubernetes is an orchestration system..."},
    # ... more examples
]

dataset = Dataset.from_list(train_data)
tokenized_dataset = dataset.map(prepare_dataset, batched=True)

Running Training¶

from transformers import Trainer, DataCollatorForLanguageModeling

# Training configuration
training_args = TrainingArguments(
    output_dir="./fine-tuned-model",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    warmup_steps=100,
    learning_rate=5e-5,
    logging_steps=10,
    save_steps=500,
    evaluation_strategy="steps",
    eval_steps=500,
    fp16=True,  # for memory savings
)

# Data collator for language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # causal LM, not masked LM
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator,
)

# Start training
trainer.train()
trainer.save_model()

Monitoring and Evaluation¶

Proper monitoring of fine-tuning progress is crucial:

# Track metrics during training
import wandb

# Integration with Weights & Biases
wandb.init(project="fine-tuning-experiment")

# Evaluation function
def compute_perplexity(eval_dataset):
    """Calculate perplexity on evaluation dataset"""
    model.eval()
    total_loss = 0
    num_batches = 0

    with torch.no_grad():
        for batch in eval_dataset:
            outputs = model(**batch)
            total_loss += outputs.loss.item()
            num_batches += 1

    avg_loss = total_loss / num_batches
    perplexity = torch.exp(torch.tensor(avg_loss))
    return perplexity.item()

# Test generated responses
def test_generation(prompt, max_length=100):
    """Test text generation"""
    inputs = tokenizer.encode(prompt, return_tensors="pt")

    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=max_length,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

Optimization and Best Practices¶

Hyperparameter Selection¶

Learning rate: Start with 5e-5, try 1e-4 for smaller models
Batch size: As large as fits in memory
LoRA rank (r): 16-64 for most tasks, higher for more complex domains
Epochs: 2-5, watch for overfitting

Data Quality¶

Data quality is critical for fine-tuning success:

# Dataset quality validation
def validate_dataset(dataset):
    """Check training data quality"""
    issues = []

    for i, example in enumerate(dataset):
        # Length check
        if len(example['question']) < 10:
            issues.append(f"Row {i}: Question too short")

        # Duplicate check
        if example['question'] in seen_questions:
            issues.append(f"Row {i}: Duplicate question")

        # Format check
        if not example['answer'].strip():
            issues.append(f"Row {i}: Empty answer")

    return issues

# Data cleaning
def clean_dataset(examples):
    """Basic dataset cleaning"""
    cleaned = []

    for example in examples:
        # Remove extra whitespace
        question = example['question'].strip()
        answer = example['answer'].strip()

        # Filter by length
        if 10 <= len(question) <= 500 and 20 <= len(answer) <= 1000:
            cleaned.append({
                'question': question,
                'answer': answer
            })

    return cleaned

Deployment and Inference¶

After fine-tuning completion, the model needs to be efficiently deployed:

from peft import PeftModel

# Load fine-tuned model for inference
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/DialoGPT-medium",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "./fine-tuned-model")

# Inference API
class FineTunedAPI:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer

    def generate_response(self, question: str) -> str:
        """Generate response to question"""
        prompt = f"Question: {question} Answer:"

        inputs = self.tokenizer.encode(prompt, return_tensors="pt")

        with torch.no_grad():
            outputs = self.model.generate(
                inputs,
                max_new_tokens=200,
                temperature=0.7,
                do_sample=True,
                pad_token_id=self.tokenizer.eos_token_id
            )

        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        # Extract only new part of response
        return response[len(prompt):].strip()

# Usage
api = FineTunedAPI(model, tokenizer)
answer = api.generate_response("How to optimize PostgreSQL database?")

Summary¶

Fine-tuning is a powerful tool for adapting AI models to specific needs. The key is correctly choosing between prompt engineering and fine-tuning, using quality data and appropriate PEFT techniques like LoRA. With increasing availability of tools like Hugging Face PEFT, fine-tuning is becoming more accessible even for smaller teams. Remember thorough testing, metrics monitoring, and gradual improvement of data quality to achieve the best results.

fine-tuningpeftllm

CORE SYSTEMS tým

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Všechny články