Loss Functions — Overview and Selection for Your Model

Loss functions are a key element of machine learning model training - they determine how the model measures its errors and learns to correct them. The right choice of loss function can dramatically affect your model’s performance. We’ll go through the most commonly used types and show when to use which one.

What Are Loss Functions and Why They Are Key¶

Loss functions represent the heart of every machine learning model. They define how we measure the “distance” between predicted and actual values, thereby directly affecting how the model learns. The right choice of loss function can mean the difference between a model that works excellently and one that never converges.

Essentially, it’s a mathematical formulation of what we consider “error”. During training, the model minimizes this error using optimization algorithms like SGD or Adam. Different types of problems require different loss functions – what works for classification may not be suitable for regression.

Regression Loss Functions¶

Mean Squared Error (MSE)¶

MSE is the most popular loss function for regression tasks. It calculates the average of squared differences between predicted and actual values:

import torch
import torch.nn as nn

# Loss Functions — Overview and Selection for Your Model
mse_loss = nn.MSELoss()
predictions = torch.tensor([2.5, 0.0, 2.1])
targets = torch.tensor([3.0, -0.5, 2.0])
loss = mse_loss(predictions, targets)
print(f"MSE Loss: {loss.item()}")

# Manual implementation
def mse_manual(y_pred, y_true):
    return torch.mean((y_pred - y_true) ** 2)

MSE Advantages: Simple to implement, strongly penalizes large errors due to squaring. Disadvantages: Sensitive to outliers, which can significantly distort training.

Mean Absolute Error (MAE)¶

MAE calculates the average of absolute values of differences. It’s more robust to outliers than MSE:

mae_loss = nn.L1Loss()  # L1Loss = MAE in PyTorch
loss = mae_loss(predictions, targets)

# Manual implementation
def mae_manual(y_pred, y_true):
    return torch.mean(torch.abs(y_pred - y_true))

Huber Loss¶

Huber Loss combines the advantages of MSE and MAE. It behaves like MSE for small errors and like MAE for large errors:

huber_loss = nn.HuberLoss(delta=1.0)
loss = huber_loss(predictions, targets)

# Manual implementation
def huber_loss_manual(y_pred, y_true, delta=1.0):
    error = torch.abs(y_pred - y_true)
    is_small_error = error <= delta
    squared_loss = 0.5 * error ** 2
    linear_loss = delta * error - 0.5 * delta ** 2
    return torch.mean(torch.where(is_small_error, squared_loss, linear_loss))

Classification Loss Functions¶

Cross-Entropy Loss¶

Cross-entropy is the standard choice for classification tasks. It measures the “distance” between probability distributions:

# Binary classification
binary_ce = nn.BCELoss()
sigmoid_output = torch.sigmoid(torch.tensor([0.8, -1.2, 2.1]))
binary_targets = torch.tensor([1.0, 0.0, 1.0])
loss = binary_ce(sigmoid_output, binary_targets)

# Multi-class classification
ce_loss = nn.CrossEntropyLoss()
logits = torch.tensor([[2.0, 1.0, 0.1], [0.5, 2.0, 0.2]])
targets = torch.tensor([0, 1])  # Class indices
loss = ce_loss(logits, targets)

Cross-entropy has an important property – it converges quickly when the model is very poor, but slows down as it approaches the optimum. This leads to stable learning.

Focal Loss¶

Focal Loss addresses the problem of imbalanced datasets by reducing the weight of “easy” examples and focusing on difficult cases:

class FocalLoss(nn.Module):
    def __init__(self, alpha=1, gamma=2):
        super().__init__()
        self.alpha = alpha
        self.gamma = gamma

    def forward(self, inputs, targets):
        ce_loss = nn.functional.cross_entropy(inputs, targets, reduction='none')
        pt = torch.exp(-ce_loss)
        focal_loss = self.alpha * (1-pt)**self.gamma * ce_loss
        return focal_loss.mean()

# Usage
focal_loss = FocalLoss(alpha=1, gamma=2)
loss = focal_loss(logits, targets)

Advanced Loss Functions¶

Contrastive Loss¶

Used in metric learning, where we want to teach the model to recognize similarity between objects:

def contrastive_loss(output1, output2, label, margin=1.0):
    euclidean_distance = nn.functional.pairwise_distance(output1, output2)
    loss_contrastive = torch.mean(
        (1-label) * torch.pow(euclidean_distance, 2) +
        label * torch.pow(torch.clamp(margin - euclidean_distance, min=0.0), 2)
    )
    return loss_contrastive

Dice Loss¶

Popular in segmentation tasks, especially in medical imaging:

def dice_loss(inputs, targets, smooth=1):
    inputs = torch.sigmoid(inputs)
    inputs = inputs.view(-1)
    targets = targets.view(-1)

    intersection = (inputs * targets).sum()
    dice = (2. * intersection + smooth) / (inputs.sum() + targets.sum() + smooth)

    return 1 - dice

Practical Tips for Loss Function Selection¶

Choosing the right loss function depends on several factors:

Problem type: Regression vs. classification vs. ranking
Data distribution: Balanced vs. imbalanced classes
Outlier sensitivity: MSE vs. MAE for regression
Interpretability: Some loss functions have clear statistical meaning

For debugging and monitoring, I recommend tracking multiple metrics simultaneously:

# Combination of multiple loss functions for better insight
class CombinedLoss(nn.Module):
    def __init__(self, weights={'mse': 0.7, 'mae': 0.3}):
        super().__init__()
        self.weights = weights
        self.mse = nn.MSELoss()
        self.mae = nn.L1Loss()

    def forward(self, predictions, targets):
        mse_loss = self.mse(predictions, targets)
        mae_loss = self.mae(predictions, targets)

        total_loss = (self.weights['mse'] * mse_loss + 
                     self.weights['mae'] * mae_loss)

        return total_loss, {'mse': mse_loss.item(), 'mae': mae_loss.item()}

Don’t forget to experiment! It often pays to start with standard functions (MSE for regression, Cross-Entropy for classification) and gradually optimize based on the specific needs of your problem.

Summary¶

Loss functions are a fundamental building block of machine learning models. For regression tasks, start with MSE or MAE; for classification, with Cross-Entropy. Advanced functions like Focal Loss or Dice Loss solve specific problems like imbalanced datasets or segmentation tasks. The key is experimentation and understanding how different functions affect your model’s behavior. Track multiple metrics simultaneously and don’t forget about validation data when evaluating performance.

loss functionsmsecross-entropy

CORE SYSTEMS team

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.

All articles