_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

Backpropagation — jak se neuronové sítě učí

19. 02. 2025 4 min read intermediate

Backpropagation is the heart of every neural network - an elegant algorithm that enables machines to learn from their own mistakes. You’ll understand how this mathematical principle drives the learning process from image recognition to language models.

How Backpropagation Works in Neural Networks

Backpropagation is an algorithm that allows neural networks to learn from errors by gradually propagating gradients back through the network. Without this mechanism, deep learning wouldn’t exist in its current form. Let’s look at exactly how it works and why it’s so important.

Basic Principle of Forward and Backward Pass

Neural network learning occurs in two phases. First, the forward pass — data flows through the network forward and creates predictions. Then follows the backward pass — the error propagates back and weights are updated.

# Forward pass - simple network with one hidden layer
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def forward_pass(X, W1, b1, W2, b2):
    # Hidden layer
    z1 = X.dot(W1) + b1
    a1 = sigmoid(z1)

    # Output layer
    z2 = a1.dot(W2) + b2
    a2 = sigmoid(z2)

    return z1, a1, z2, a2

Computing Error and Gradients

The key is understanding how gradients are calculated. We use the chain rule from mathematical analysis — we decompose the derivative of a composite function into a product of individual derivatives.

For mean squared error and sigmoid activation, the gradient looks like this:

def compute_gradients(X, y, z1, a1, z2, a2, W1, W2):
    m = X.shape[0]  # number of samples

    # Gradient for output layer
    dz2 = a2 - y  # derivative of MSE loss
    dW2 = (1/m) * a1.T.dot(dz2)
    db2 = (1/m) * np.sum(dz2, axis=0)

    # Gradient for hidden layer (chain rule)
    da1 = dz2.dot(W2.T)
    dz1 = da1 * a1 * (1 - a1)  # derivative of sigmoid
    dW1 = (1/m) * X.T.dot(dz1)
    db1 = (1/m) * np.sum(dz1, axis=0)

    return dW1, db1, dW2, db2

Weight Updates Using Gradient Descent

Once we have the gradients, we can update the weights. Gradient descent adjusts each weight in the opposite direction to where the gradient points — this gradually gets us to the minimum of the loss function.

def update_weights(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2

    return W1, b1, W2, b2

# Complete training cycle
def train_step(X, y, W1, b1, W2, b2, learning_rate=0.01):
    # Forward pass
    z1, a1, z2, a2 = forward_pass(X, W1, b1, W2, b2)

    # Compute loss
    loss = np.mean((a2 - y)**2)

    # Backward pass
    dW1, db1, dW2, db2 = compute_gradients(X, y, z1, a1, z2, a2, W1, W2)

    # Update weights
    W1, b1, W2, b2 = update_weights(W1, b1, W2, b2, 
                                    dW1, db1, dW2, db2, learning_rate)

    return W1, b1, W2, b2, loss

Problems with Vanishing and Exploding Gradients

In deep networks, vanishing gradients can occur — gradients diminish exponentially during backpropagation. The opposite problem is exploding gradients, where gradients grow infinitely.

Solutions include:

  • Gradient clipping — limiting the maximum gradient magnitude
  • Better activation functions — ReLU instead of sigmoid
  • Batch normalization — normalizing inputs to each layer
  • Residual connections — direct connections between distant layers
# Gradient clipping in practice
def clip_gradients(gradients, max_norm=1.0):
    total_norm = 0
    for grad in gradients:
        total_norm += np.sum(grad**2)
    total_norm = np.sqrt(total_norm)

    clip_coef = max_norm / (total_norm + 1e-6)
    if clip_coef < 1:
        for i, grad in enumerate(gradients):
            gradients[i] = grad * clip_coef

    return gradients

Backpropagation Optimization

Modern implementations use advanced optimizers that adjust the learning rate or add momentum:

class AdamOptimizer:
    def __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999):
        self.lr = learning_rate
        self.beta1 = beta1
        self.beta2 = beta2
        self.m = {}  # first moment
        self.v = {}  # second moment
        self.t = 0   # time step

    def update(self, params, gradients):
        self.t += 1

        for key in params:
            if key not in self.m:
                self.m[key] = np.zeros_like(params[key])
                self.v[key] = np.zeros_like(params[key])

            # Update biased first/second moment estimates
            self.m[key] = self.beta1 * self.m[key] + (1 - self.beta1) * gradients[key]
            self.v[key] = self.beta2 * self.v[key] + (1 - self.beta2) * gradients[key]**2

            # Bias correction
            m_hat = self.m[key] / (1 - self.beta1**self.t)
            v_hat = self.v[key] / (1 - self.beta2**self.t)

            # Update parameters
            params[key] -= self.lr * m_hat / (np.sqrt(v_hat) + 1e-8)

Backpropagation in Practice with PyTorch

In real projects, we use frameworks that implement backpropagation automatically:

import torch
import torch.nn as nn

# Network definition
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.hidden = nn.Linear(input_size, hidden_size)
        self.output = nn.Linear(hidden_size, output_size)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.sigmoid(self.hidden(x))
        x = self.sigmoid(self.output(x))
        return x

# Training with automatic backpropagation
model = SimpleNet(10, 20, 1)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

for epoch in range(1000):
    # Forward pass
    outputs = model(X)
    loss = criterion(outputs, y)

    # Backward pass - automatically!
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Summary

Backpropagation is an algorithm that allows neural networks to learn by minimizing error through gradient descent. The key components are three steps: forward pass to compute predictions, backward pass to propagate gradients, and weight updates. In practice, we use optimizers like Adam and frameworks like PyTorch that automate the entire process. However, understanding backpropagation principles is essential for debugging and optimizing deep learning models.

backpropagationgradient descentdeep learning
Share:

CORE SYSTEMS tým

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.