A year ago, we built infrastructure by hand. Clicking through the AWS console, notes in Confluence, and praying nobody would delete that security group. Today, everything is in Terraform — versioned, reviewed, automatically deployed. Here’s our story.
The Problem: Snowflake Servers¶
Every server was unique. The development environment differed from staging, staging from production. Nobody knew exactly which configuration was running on which server. Documentation was outdated the day after it was written. When a server went down, recovery took hours — because nobody remembered all the steps.
This is the classic anti-pattern known as “snowflake server.” Each one is special, each one is different, and none can be easily reproduced. When you manage three servers, it still works. At thirty, it’s a nightmare. At three hundred, it’s impossible.
Why Terraform and Not CloudFormation or Ansible¶
CloudFormation is AWS-only. We run infrastructure on AWS and in on-premise environments (VMware). We needed a tool that handles both. Terraform has providers for AWS, Azure, GCP, VMware, Consul, and dozens of other services.
Ansible is a configuration management tool, not a provisioning tool. Great for configuring existing servers, but for creating infrastructure (VPCs, subnets, load balancers, RDS instances), Terraform is the better choice. In practice, we use both — Terraform creates infrastructure, Ansible configures it.
Terraform uses a declarative approach: you describe what the infrastructure should look like, and Terraform figures out what needs to change. Compared to an imperative approach (do step 1, then step 2, then step 3), it’s fundamentally simpler to maintain.
Project Structure¶
After several iterations, we settled on the following structure:
infrastructure/
├── modules/
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── ecs-cluster/
│ ├── rds/
│ └── monitoring/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf
│ ├── staging/
│ └── production/
└── README.md
Modules define reusable components. The VPC module creates the network topology, the ECS module configures the container cluster, the RDS module handles the database. Each module has clearly defined inputs (variables) and outputs.
Environments are specific deployments of modules with different parameters. Dev has smaller instances, staging mirrors production at a smaller scale, production has a multi-AZ setup with higher redundancy.
State Management — A Key Concept¶
Terraform stores infrastructure state in a terraform.tfstate file. This file is critically important — lose it and Terraform doesn’t know what it manages. That’s why you should never store state locally.
# backend.tf
terraform {
backend "s3" {
bucket = "core-terraform-state"
key = "production/terraform.tfstate"
region = "eu-central-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
The S3 backend with DynamoDB locking ensures two people can’t modify infrastructure simultaneously. Encryption at rest for the state file, since it contains sensitive information (RDS passwords, API keys). S3 bucket versioning as a safeguard against accidental overwrites.
Code Review for Infrastructure¶
This is where Infrastructure as Code truly pays off. Every infrastructure change goes through a pull request. Before merge, we run terraform plan and attach the output to the PR. The reviewer sees exactly what will change:
$ terraform plan
~ aws_instance.api_server
instance_type: "t2.medium" => "t2.large"
+ aws_cloudwatch_metric_alarm.cpu_high
alarm_name: "api-cpu-high"
comparison_operator: "GreaterThanThreshold"
threshold: "80"
Plan: 1 to add, 1 to change, 0 to destroy.
No surprises. No “who changed that security group on Friday night.” Everything is traceable in git history — who, when, why. For audit and compliance, it’s gold.
Modules — The DRY Principle for Infrastructure¶
Our VPC module is used across all projects. It defines a standard network topology: public and private subnets across three availability zones, NAT gateway, route tables, flow logs. Parameterized CIDR block and tagging.
# environments/production/main.tf
module "vpc" {
source = "../../modules/vpc"
environment = "production"
cidr_block = "10.0.0.0/16"
azs = ["eu-central-1a", "eu-central-1b", "eu-central-1c"]
tags = {
Project = "client-x"
ManagedBy = "terraform"
}
}
When we find a bug or improvement in a module, we fix it once and propagate it to all environments. We version modules using git tags — production uses a proven version, dev can test the latest.
Pitfalls and Lessons Learned¶
Drift detection. Someone changes something manually in the console — and Terraform doesn’t know. On the next terraform apply, it overwrites the change. Solution: regular terraform plan in the CI pipeline that detects drift and alerts the team.
Destructive changes. Some Terraform changes require destroy + recreate. Changing the AMI on an EC2 instance means a new server. Changing the engine_version on RDS can mean downtime. Always read the plan carefully — red lines with a minus sign are a warning.
Secrets management. Never put passwords in .tf files. We use AWS Secrets Manager with data source references. Alternatively, HashiCorp Vault — but that’s a topic for a separate article.
Importing existing infrastructure. terraform import exists but isn’t painless. For each resource, you have to manually write the configuration and then import it. For larger infrastructure, that’s weeks of work. Lesson: start with Terraform as early as possible.
IaC Isn’t a Choice, It’s a Necessity¶
Infrastructure as Code isn’t just a buzzword. It’s a fundamental shift in how we approach infrastructure — from manual craft to an engineering process. Terraform gave us reproducibility, auditability, and speed. Today, we create a complete production environment in 20 minutes with a single command. A year ago, it took two days and three people.
Need help with implementation?
Our experts can help with design, implementation, and operations. From architecture to production.
Contact us