OpenStack — building a private cloud from scratch

A client in the financial sector needed cloud — but for regulatory reasons could not place data outside their own data centre. Public clouds such as AWS were out of the question. We decided to build a private cloud on OpenStack.

Why OpenStack?¶

In the private cloud market in 2014 there were several alternatives: VMware vCloud Suite (expensive, but proven), CloudStack (simpler deployment), and OpenStack (largest community, modular architecture). The client wanted a vendor-independent solution with no lock-in to a single supplier. OpenStack — backed by companies such as Rackspace, IBM, HP, Red Hat, and dozens of others — was the logical choice.

In April 2014 the Icehouse release arrived, bringing significant stability improvements and new features in the networking layer. That is what we built on.

OpenStack components¶

OpenStack is not a single product — it is an ecosystem of services, where each one addresses one aspect of cloud infrastructure:

Keystone — identity and authentication, centralised management of users and tenants
Nova — compute service, manages the lifecycle of virtual machines
Neutron — networking service, virtual networks, subnets, routers, firewalls
Cinder — block storage, persistent disks for VMs
Glance — registry of operating system images
Horizon — web dashboard for management
Swift — object storage (analogous to S3)
Heat — orchestration, Infrastructure as Code templates

Hardware and topology¶

We deployed OpenStack across 12 physical servers in one rack: 3 controller nodes (HA), 8 compute nodes, and 1 storage node with a Ceph cluster. Each compute node had two Xeon E5-2650 v2 processors, 128 GB RAM, and 2× SSD for ephemeral storage. Networking was 10 GbE with dedicated VLANs for management, tenant traffic, and storage.

# Heat template — creating a VM with a floating IP
heat_template_version: 2014-10-16

resources:
  web_server:
    type: OS::Nova::Server
    properties:
      image: ubuntu-14.04-lts
      flavor: m1.medium
      key_name: deploy-key
      networks:
        - network: { get_resource: internal_net }

  floating_ip:
    type: OS::Neutron::FloatingIP
    properties:
      floating_network: external

  association:
    type: OS::Neutron::FloatingIPAssociation
    properties:
      floatingip_id: { get_resource: floating_ip }
      port_id: { get_attr: [web_server, addresses, internal_net, 0, port] }

Neutron: the networking layer as the biggest challenge¶

If Nova is the heart of OpenStack, Neutron is its most complex organ. Virtual networks, floating IP addresses, security groups, load balancers — all managed by Neutron. And in 2014 it was also the component with the most bugs.

We chose Open vSwitch as the networking backend with VXLAN tunnels between compute nodes. Each tenant has an isolated virtual network — it cannot see another tenant’s traffic. Security groups implement a stateful firewall at the port level.

The biggest problem: the Neutron L3 agent was a single point of failure. If the network node went down, all floating IPs stopped working. The solution — Distributed Virtual Router (DVR) — was experimental in Icehouse. We deployed an HA L3 agent with VRRP failover.

Ceph as a storage backend¶

For Cinder (block storage) and Glance (image registry) we chose Ceph as the backend — a distributed storage system. Three OSD nodes with a total of 24 disks provided redundant storage with a replication factor of 3.

The advantage of Ceph + OpenStack integration: copy-on-write cloning. Creating a new VM from an image takes seconds — Ceph simply creates a thin clone; actual data is copied only on the first write.

Deployment automation¶

Manual installation of 12 servers would have taken weeks. We used Puppet with modules from stackforge (now openstack-puppet) for automated configuration of all components. The entire cluster could be “wiped and recreated” within two hours — critical for testing upgrades.

Operational experience¶

Monitoring¶

OpenStack generates enormous amounts of logs. We deployed an ELK stack and Nagios for monitoring API endpoints. The key metric: API response time. If the Nova API takes longer than 2 seconds to respond, something is wrong — typically an overloaded RabbitMQ or a full compute node.

Upgrades¶

Upgrading OpenStack between versions is notoriously difficult. Each component has its own database migrations, API versions change, and configuration is renamed. Our strategy: blue-green deployment of controller nodes — deploy the new version to standby nodes, switch traffic, keep the old nodes as a fallback.

Results after the first year¶

The cluster runs over 200 virtual machines across 8 tenants. Provisioning a new VM takes under 30 seconds. Developers create environments themselves via the Horizon dashboard or CLI — no ticket to the infrastructure team required. Platform availability in the first year: 99.95%.

OpenStack is not for everyone, but for the right use cases it is excellent¶

OpenStack requires a dedicated team — at minimum two people for full-time operations. It is not “install and forget”. But for organisations that need cloud and cannot or will not go to a public cloud, it is the best open-source alternative.

The key lesson: start with a minimal configuration (Nova, Neutron, Cinder, Keystone), prove it in production, and only then add additional services.

openstackcloudvirtualizaceinfrastruktura

CORE SYSTEMS

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.