Neural networks are the foundation of modern artificial intelligence, but their principles are surprisingly simple. In this article, we’ll explain how they work and create our first functional model in Python with just a few lines of code.
What Are Neural Networks and How Do They Work¶
Neural networks are mathematical models inspired by the functioning of the human brain. At their core lies an artificial neuron (perceptron), which receives input signals, processes them using weight coefficients and bias values, and produces output through an activation function.
Each neuron in the network performs a simple operation: output = activation_function(sum(inputs × weights) + bias). When we connect multiple neurons into layers and interconnect the layers, we get a neural network capable of solving complex problems.
Basic Components of a Neural Network¶
A neural network consists of three types of layers:
- Input Layer – receives data and passes it forward
- Hidden Layers – process data using weight transformations
- Output Layer – produces the final prediction
Each connection between neurons has its weight, which determines how strongly one neuron influences another. During training, these weights are gradually adjusted using the backpropagation algorithm.
Implementing a Simple Neural Network in PyTorch¶
For practical demonstration, we’ll create a neural network that can classify data from the well-known Iris dataset. The network will have one hidden layer and use the ReLU activation function.
Data Preparation and Environment Setup¶
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
# Load Iris dataset
iris = load_iris()
X = iris.data # 4 features: sepal length/width, petal length/width
y = iris.target # 3 classes: setosa, versicolor, virginica
# Split into training and testing data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Data normalization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train_scaled)
X_test_tensor = torch.FloatTensor(X_test_scaled)
y_train_tensor = torch.LongTensor(y_train)
y_test_tensor = torch.LongTensor(y_test)
Neural Network Architecture Definition¶
class SimpleNeuralNetwork(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleNeuralNetwork, self).__init__()
# Define layers
self.hidden = nn.Linear(input_size, hidden_size)
self.output = nn.Linear(hidden_size, output_size)
# Activation functions
self.relu = nn.ReLU()
self.softmax = nn.Softmax(dim=1)
def forward(self, x):
# Forward pass - data flowing through the network
x = self.hidden(x) # Linear transformation
x = self.relu(x) # Apply ReLU activation
x = self.output(x) # Output layer
return x
# Create model instance
model = SimpleNeuralNetwork(
input_size=4, # 4 features from Iris dataset
hidden_size=10, # 10 neurons in hidden layer
output_size=3 # 3 classes for classification
)
print(f"Model architecture:\n{model}")
Training the Neural Network¶
For training, we need to define a loss function and optimizer. For classification, we’ll use CrossEntropyLoss and Adam optimizer.
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
# Training loop
epochs = 1000
losses = []
for epoch in range(epochs):
# Forward pass
outputs = model(X_train_tensor)
loss = criterion(outputs, y_train_tensor)
# Backward pass and optimization
optimizer.zero_grad() # Clear gradients
loss.backward() # Backpropagation
optimizer.step() # Update weights
losses.append(loss.item())
# Print progress every 100 epochs
if (epoch + 1) % 100 == 0:
print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')
print("Training completed!")
Model Evaluation¶
# Testing on test data
model.eval() # Switch to evaluation mode
with torch.no_grad():
test_outputs = model(X_test_tensor)
_, predicted = torch.max(test_outputs.data, 1)
# Calculate accuracy
total = y_test_tensor.size(0)
correct = (predicted == y_test_tensor).sum().item()
accuracy = 100 * correct / total
print(f'Accuracy on test data: {accuracy:.2f}%')
# Detailed look at predictions
print("\nComparison of actual and predicted values:")
for i in range(min(10, len(y_test))):
actual = iris.target_names[y_test[i]]
predicted_class = iris.target_names[predicted[i]]
print(f"Actual: {actual:12} | Predicted: {predicted_class:12}")
How Neural Networks “Learn”¶
The neural network learning process occurs in four steps:
- Forward Propagation – data flows through the network forward and creates a prediction
- Loss Calculation – comparing prediction with actual value
- Backpropagation – calculating gradients using the chain rule
- Weight Update – adjusting weights based on gradients
The optimizer plays a key role in determining how quickly and efficiently the network learns. The Adam optimizer combines advantages of momentum and adaptive learning rate, often leading to faster convergence.
Activation Functions and Their Significance¶
ReLU (Rectified Linear Unit) is the most popular activation function for hidden layers. Its simplicity (max(0, x)) brings several advantages:
# Comparison of different activation functions
import matplotlib.pyplot as plt
x = torch.linspace(-5, 5, 100)
relu = torch.relu(x)
sigmoid = torch.sigmoid(x)
tanh = torch.tanh(x)
# ReLU: f(x) = max(0, x)
# Sigmoid: f(x) = 1 / (1 + e^(-x))
# Tanh: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))
print("ReLU advantages:")
print("- Fast computations")
print("- Solves vanishing gradient problem")
print("- Sparsity - many neurons are inactive")
Extensions and Practical Tips¶
To improve neural network performance, we can use several techniques:
class ImprovedNeuralNetwork(nn.Module):
def __init__(self, input_size, hidden_sizes, output_size, dropout_rate=0.2):
super(ImprovedNeuralNetwork, self).__init__()
layers = []
prev_size = input_size
# Create multiple hidden layers
for hidden_size in hidden_sizes:
layers.append(nn.Linear(prev_size, hidden_size))
layers.append(nn.ReLU())
layers.append(nn.Dropout(dropout_rate)) # Regularization
prev_size = hidden_size
# Output layer
layers.append(nn.Linear(prev_size, output_size))
self.network = nn.Sequential(*layers)
def forward(self, x):
return self.network(x)
# Using more advanced architecture
advanced_model = ImprovedNeuralNetwork(
input_size=4,
hidden_sizes=[16, 8], # Two hidden layers
output_size=3,
dropout_rate=0.3
)
Summary¶
Neural networks are a powerful tool for solving complex machine learning problems. Understanding basic principles – from neuron structure through forward and backward propagation to optimization – is key to effective work with deep learning. PyTorch provides an intuitive interface for implementing and experimenting with various architectures. Starting with simple models like in our example is an ideal way to master the basics before moving on to more complex architectures like CNN or Transformer.