Docker — containerization is changing the rules of the game

At PyCon in Santa Clara, Solomon Hykes presented the Docker project just a few days ago — a tool for packaging applications into lightweight containers. What we saw has the potential to fundamentally change the way we develop, test and deploy software. Let’s look at why Docker is so interesting and what it means for enterprise developers.

What Docker is and why it was created¶

Docker is an open-source platform that allows you to package an application together with all its dependencies into a standardized container. Unlike virtual machines, which emulate complete hardware and run their own operating system kernel, containers share the host system’s kernel and isolate only the userspace. The result is dramatically lower overhead — a container starts in seconds, not minutes, and takes up tens of megabytes instead of gigabytes.

The project was born at dotCloud, a company operating a Platform-as-a-Service. Solomon Hykes and his team found that the internal tool they use to manage applications on their platform could be useful to the entire community. And so Docker was born as an open-source project under the Apache 2.0 license.

Technically, Docker stands on two key Linux kernel technologies — cgroups (control groups) for resource limiting and namespaces for process isolation. These mechanisms have existed in Linux for years, but Docker is the first tool to make them accessible to the broader developer community through a simple interface.

How Docker works¶

The fundamental concept is the Docker image — a read-only template that contains an operating system, runtime environment, libraries and an application. Images are created using a file called a Dockerfile, which describes the steps for assembling the environment. Each step creates a new layer, enabling efficient sharing and caching.

FROM ubuntu:12.04 RUN apt-get update && apt-get install -y python python-pip COPY requirements.txt /app/ RUN pip install -r /app/requirements.txt COPY . /app/ CMD [“python”, “/app/server.py”]

This simple Dockerfile shows the power of the concept — in six lines we define a complete environment for a Python application. Anyone with this file can build an identical image on any machine with Docker.

A container is created from an image — a running instance of the image. The container adds a writable layer on top of the read-only image layers. Changes in the container (new files, configuration edits) remain in this top layer and don’t modify the original image. This is absolutely critical for reproducibility.

Docker vs. virtualization — where the differences lie¶

Our team has been using VMware for production and VirtualBox for development for years. Naturally we ask — why would we want something else? The differences are fundamental and worth examining in detail.

Performance: A virtual machine runs on a hypervisor and emulates complete hardware. That means its own kernel, its own init system, its own memory management. A container shares the kernel with the host and isolates only processes. In practice this means you can run dozens of containers on a single server where you would otherwise have a handful of VMs. Container startup is on the order of seconds; VM startup is on the order of minutes.

Isolation: Here virtual machines still win. Full hardware virtualization provides stronger isolation — a compromised VM cannot easily affect the host. Containers share the kernel, and if a kernel vulnerability exists, an attacker could theoretically escape from a container. For security-critical applications in the banking sector this is a relevant consideration.

Portability: A Docker image runs identically on a developer’s laptop, on a staging server and in production. No “it works on my machine.” The image contains everything — from system libraries to application configuration. This is a huge advantage over VMs, where we have to maintain consistency between environments using tools like Puppet or Chef.

Union File System — the key innovation¶

Docker uses AUFS (Another Union File System) to manage image layers. Each instruction in the Dockerfile creates a new layer. Layers are read-only and shared between images — if ten of your applications use Ubuntu 12.04 as a base, that layer exists on disk only once. This dramatically reduces disk requirements and speeds up downloading new images.

The container’s writable layer uses a copy-on-write strategy. When a container modifies a file from a lower layer, the file is first copied to the top layer and then modified. This means reads are fast (directly from the lower layer), but the first write to an existing file is slower.

Docker introduces the concept of a registry — a central repository for Docker images. The public Docker Hub already contains hundreds of images — from official distributions (Ubuntu, CentOS, Debian) through databases (MySQL, PostgreSQL, MongoDB) to application servers (Nginx, Apache, Tomcat). You can also run a private registry for internal use, which is a necessity in enterprise environments.

The workflow is simple: developers build images locally, push them to the registry, and the deployment system pulls them to production servers. Everyone works with the same artifact — no more “but I had a different version of the library.”

Practical use — what we tried¶

In our team we tried Docker on an internal project — a microservice for processing invoices. The application is written in Java 7, runs on Tomcat 7 and connects to a PostgreSQL database. Traditionally we would deploy a WAR file to a shared Tomcat server and configure a JNDI datasource. With Docker we took a different approach.

We created a Dockerfile that starts from an official OpenJDK image, adds Tomcat, copies the WAR file and sets environment variables for the database connection. The whole build takes 45 seconds. The resulting image is 340 MB — that includes the operating system, JDK, Tomcat and our application.

Deploying to a test server? One command: docker run -d -p 8080:8080 -e DB_HOST=postgres.internal our-invoices:1.0. Running in three seconds. Want a new version? Build a new image, stop the old container, start the new one. Rollback? Start the previous version of the image. The simplicity is almost suspicious.

Limitations and concerns¶

Docker is a young project and has its limits, which need to be named. First, it only runs on Linux. For developers on Mac OS X or Windows this means running Docker inside a VM (via Vagrant or boot2docker). That adds a layer of complexity and partially negates the simplicity advantage.

Second, Docker doesn’t yet support orchestration of multiple containers. If your application consists of a web server, application server, database and message broker, you have to manually coordinate starting and linking containers. For simple applications this isn’t a problem, but for enterprise systems with dozens of components it’s a significant limitation. We expect the community to soon deliver orchestration tools.

Third, data persistence. A container is ephemeral — all changes in the writable layer are lost when it stops. For stateless applications this isn’t a problem, but for databases and other stateful data we must use Docker volumes, which map a directory from the host into the container. Volume management is still fairly primitive.

Fourth, monitoring and logging. Traditional tools like Nagios or Zabbix assume that the application runs directly on a server. Containers add a layer of abstraction that these tools can’t yet handle well. We’ll need new approaches to monitoring.

Impact on the CI/CD pipeline¶

Where we see Docker’s greatest potential is in continuous integration and continuous delivery. Today our CI server (Jenkins) builds the application, runs tests and creates a deployable artifact (WAR, JAR, RPM). With Docker the artifact becomes the image itself — it contains not just the application, but the entire environment.

This eliminates a whole class of problems. We’ll no longer have a test pass on the CI server but fail in production due to a different version of a system library. The image that passed tests is identical to the image that will run in production. No other tool gives us this guarantee.

Furthermore, Docker enables parallel testing. Need to test the application against PostgreSQL 9.1 and 9.3? Start two containers with different database versions and the tests run in parallel. Without Docker we would need two separate servers or complex configuration on one.

What this means for enterprise¶

Is Docker ready for production enterprise use? Today, in March 2013, honestly — not yet. The project is in a very early stage, the API is changing, documentation is sparse and the community is still forming. But the direction is clear and the potential is enormous.

For our clients in the banking and insurance sector we see Docker as the future standard for application deployment. Imagine a world where every application runs in its own isolated container with precisely defined dependencies. Where deployment means one command and rollback means another. Where the development environment is identical to production. That is the world Docker promises.

We recommend starting to experiment now. Try Docker on internal projects, learn the concepts, understand the limitations. When Docker is ready for production — and we’re confident that will happen within one to two years — you’ll be ready to deploy it.

Comparison with LXC¶

Linux Containers (LXC) have existed longer than Docker and provide similar functionality at a lower level. Docker originally used LXC as its backend for container management. The main difference is in the level of abstraction — LXC provides a low-level API for container management, while Docker adds concepts like images, Dockerfiles, registries and versioning. It’s the difference between assembly language and a high-level language.

For system administrators accustomed to working with LXC, Docker may seem unnecessarily abstract. For developers who simply want to package and run their application, Docker is significantly more accessible. And it’s accessibility that will determine the adoption of a technology.

The future of containerization¶

Docker is not the only player in the containerization space. Google has used an internal container technology called Borg for more than ten years and runs virtually all its services in it. The fact that Google runs two billion containers per week proves that containerization works at enormous scale.

We expect Docker to catalyze an entire ecosystem — tools for orchestration, monitoring, networking and container security. It’s possible that in five years, deploying an application without containers will seem as archaic as deploying without version control today.

Conclusion and recommendation¶

Docker is one of the most significant technologies we have seen in recent years. It promises an end to the “it works on my machine” problem, dramatically faster deployments and better utilization of server resources. The project is young, but the direction is clear.

Our recommendation: Start experimenting. Install Docker, create a Dockerfile for one of your applications, understand the concepts of images and containers. When Docker matures for production, you’ll be ready.

dockerkontejnerydevopslinux

CORE SYSTEMS

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Docker — containerization is changing the rules of the game

What Docker is and why it was created¶

How Docker works¶

Docker vs. virtualization — where the differences lie¶

Union File System — the key innovation¶

Practical use — what we tried¶

Limitations and concerns¶

Impact on the CI/CD pipeline¶

What this means for enterprise¶

Comparison with LXC¶

The future of containerization¶

Conclusion and recommendation¶

CORE SYSTEMS

Need help with implementation?

Related articles

Docker Compose: multi-container applications made easy

Docker — First Steps with Containers for Java Developers

Docker Swarm Mode: Native Container Orchestration

Docker — containerization is changing the rules of the game

What Docker is and why it was created¶

How Docker works¶

Docker vs. virtualization — where the differences lie¶

Union File System — the key innovation¶

Docker Registry — sharing images¶

Practical use — what we tried¶

Limitations and concerns¶

Impact on the CI/CD pipeline¶

What this means for enterprise¶

Comparison with LXC¶

The future of containerization¶

Conclusion and recommendation¶

CORE SYSTEMS

Need help with implementation?

Related articles

Docker Compose: multi-container applications made easy

Docker — First Steps with Containers for Java Developers

Docker Swarm Mode: Native Container Orchestration