_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

Transfer Learning — Leveraging Pre-trained Models

10. 08. 2024 4 min read intermediate

Transfer Learning is a technique that allows you to leverage knowledge learned on one task to solve another, similar problem. Instead of training a model from scratch, you can use pre-trained models and adapt them to your specific needs.

What is Transfer Learning

Transfer Learning represents one of the most effective techniques in modern machine learning. Instead of training a model from scratch, we utilize knowledge already learned on large datasets and adapt it to our specific problem. This approach saves time, computational resources, and often achieves better results than classical learning from the beginning.

The basic idea is simple: a model that has learned to recognize general patterns in data (such as edges, textures, or linguistic structures) can apply this knowledge to related tasks. You just need to “fine-tune” the last layers for our specific domain.

Types of Transfer Learning

We distinguish several main approaches:

  • Feature Extraction - freeze the weights of the pre-trained model and use it as a feature extractor
  • Fine-tuning - gradually unlock and retrain some layers on our data
  • Domain Adaptation - adapt the model to a new type of data (e.g., from photographs to drawings)

Feature Extraction in Practice

The simplest approach uses a pre-trained model as a black box for feature extraction:

import torch
import torchvision.models as models
from torch import nn

# Load pre-trained ResNet
base_model = models.resnet50(pretrained=True)

# Freeze all parameters
for param in base_model.parameters():
    param.requires_grad = False

# Replace classifier for our task (e.g., 10 classes)
base_model.fc = nn.Linear(base_model.fc.in_features, 10)

# Only the new layer will be trained
optimizer = torch.optim.Adam(base_model.fc.parameters(), lr=0.001)

Gradual Fine-tuning

A more sophisticated approach gradually “unlocks” layers for retraining:

class TransferModel(nn.Module):
    def __init__(self, num_classes, freeze_layers=True):
        super().__init__()
        self.backbone = models.resnet50(pretrained=True)

        if freeze_layers:
            # Freeze first layers
            for param in self.backbone.layer1.parameters():
                param.requires_grad = False
            for param in self.backbone.layer2.parameters():
                param.requires_grad = False

        # Modify classifier
        self.backbone.fc = nn.Linear(self.backbone.fc.in_features, num_classes)

    def unfreeze_layers(self, layer_names):
        """Gradual layer unfreezing"""
        for name in layer_names:
            layer = getattr(self.backbone, name)
            for param in layer.parameters():
                param.requires_grad = True

model = TransferModel(num_classes=10)

# After several epochs, we can unfreeze additional layers
model.unfreeze_layers(['layer2', 'layer3'])

Transfer Learning for NLP

In the field of natural language processing, transfer learning is even more important. Models like BERT, GPT, or RoBERTa are trained on massive text corpora and can capture complex linguistic patterns.

Fine-tuning BERT for Classification

from transformers import BertForSequenceClassification, BertTokenizer
from transformers import TrainingArguments, Trainer

# Load pre-trained BERT
model = BertForSequenceClassification.from_pretrained(
    'bert-base-multilingual-cased',
    num_labels=3  # E.g., sentiment: positive, negative, neutral
)

tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-cased')

# Data preparation
def tokenize_function(examples):
    return tokenizer(
        examples['text'], 
        truncation=True, 
        padding=True, 
        max_length=512
    )

train_dataset = train_dataset.map(tokenize_function, batched=True)

# Training setup
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=16,
    learning_rate=2e-5,
    warmup_steps=500,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()

Best Practices

Learning Rate Selection

When fine-tuning, it’s crucial to properly set the learning rate. Generally:

  • For new layers: higher learning rate (1e-3 to 1e-4)
  • For pre-trained layers: lower learning rate (1e-5 to 1e-6)
  • Gradual reduction with continued training
# Differentiated learning rate for different parts of the model
def get_optimizer_grouped_parameters(model, backbone_lr=1e-5, head_lr=1e-3):
    no_decay = ["bias", "LayerNorm.weight"]

    optimizer_grouped_parameters = [
        {
            "params": [p for n, p in model.backbone.named_parameters() 
                      if not any(nd in n for nd in no_decay)],
            "weight_decay": 0.01,
            "lr": backbone_lr
        },
        {
            "params": [p for n, p in model.backbone.named_parameters() 
                      if any(nd in n for nd in no_decay)],
            "weight_decay": 0.0,
            "lr": backbone_lr
        },
        {
            "params": model.fc.parameters(),
            "lr": head_lr
        }
    ]

    return torch.optim.AdamW(optimizer_grouped_parameters)

Data Augmentation and Regularization

With smaller datasets, it’s important to prevent overfitting:

import torchvision.transforms as transforms

# Augmentation for computer vision
transform = transforms.Compose([
    transforms.RandomRotation(15),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                        std=[0.229, 0.224, 0.225])
])

# Dropout in custom layers
class FineTunedModel(nn.Module):
    def __init__(self, base_model, num_classes):
        super().__init__()
        self.backbone = base_model
        self.dropout = nn.Dropout(0.3)
        self.classifier = nn.Linear(base_model.fc.in_features, num_classes)

    def forward(self, x):
        features = self.backbone.features(x)
        pooled = nn.AdaptiveAvgPool2d((1, 1))(features)
        flattened = torch.flatten(pooled, 1)
        dropped = self.dropout(flattened)
        return self.classifier(dropped)

Practical Tips for Successful Transfer

Domain similarity: The more similar the source and target domains are, the better results we can expect. A model trained on general photographs will adapt better to medical images than to satellite data.

Dataset size: For small datasets (hundreds of samples), feature extraction is a safer choice. For larger datasets (thousands of samples), we can experiment with fine-tuning.

Gradual unfreezing: Instead of immediately unlocking all layers, we gradually unfreeze from top to bottom layers:

def gradual_unfreeze_schedule(model, epoch):
    """Gradual unfreezing by epoch"""
    if epoch >= 5:
        # From epoch 5, unfreeze top layers
        for param in model.backbone.layer4.parameters():
            param.requires_grad = True

    if epoch >= 10:
        # From epoch 10, additional layers
        for param in model.backbone.layer3.parameters():
            param.requires_grad = True

    # Learning rate is also gradually reduced
    if epoch >= 5:
        for param_group in optimizer.param_groups:
            param_group['lr'] *= 0.5

Summary

Transfer Learning represents a fundamental change in the approach to machine learning. Instead of training models from scratch, we leverage collective “knowledge” stored in pre-trained models. The key to success is the proper choice of strategy (feature extraction vs. fine-tuning), careful setting of learning rates for different parts of the model, and a gradual approach to unfreezing layers. With the growth of pre-trained models such as foundation models, transfer learning is becoming an even more important tool for efficient AI application development.

transfer learningpre-trainingfine-tuning
Share:

CORE SYSTEMS tým

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.