Containers are not lightweight VMs

Jan 8, 2022 · 6 min read · Kubernetes Docker ·

How often have you read that Docker containers are like lightweight virtual machines? But are they really?

Sure, containers virtualise computing resources, but treating them like lightweight virtual machines trivializes their impact on system architectures and leads to anti-patterns.

And why are they called "Docker" containers anyway? Docker is the name of a company, not the name of the technology that underpins containers.

Why containers are not virtual machines

Virtual machines are servers running on servers. They each have their own full-blown operating system that runs on the host server. That is why you can run Windows servers on a Linux hypervisor.

By contrast, a container is a collection of kernel processes running on a server.

It is a constrained slice of computing resources running in its own namespace on its own patch of the file system.

It is part and parcel of the O/S kernel that is hosting the container because it is carved out of the host's computing resources. It is not a little computer that runs on a server.

The container image defines the boundaries of that slice. When you fire up a container, the container runtime dynamically dedicates a portion of the kernel's CPU processes, RAM and file storage to run the executables defined in the container image. The kernel does not create a virtual machine. It cordons off a piece of itself and gives it a name.

Containers are not actors on a stage. They are sock puppets created by the kernel. Treating them as mini-VMs causes anti-patterns.

Anti-patterns

Installing software in containers

We have all done it. You run the container with a bash shell using interactive mode and then off you go: running sudo su -, installing security updates, fixing the locale setting, cursing because nothing is available, not even vi.

Don't. You have not ssh'd into a VM. You have transported yourself to an ephemeral walled-off garden inside the kernel and all your changes will be gone after your next reboot.

Container images must be immutable. Immutability is what gives containers one of their superpowers: scalability. The purpose of the image is to stamp out as many identical containers as you need to absorb the load. Think cattle, not pets.

Use the interactive mode to troubleshoot and scratch around the inside of the container out of curiosity but if you have to change anything, update the Dockerfile, build a new image with a new tag, and start a new container.

Running multiple services in one container

A container is not a lightweight VM so don't run your database and application servers in the same container. Each container should serve a single purpose and provide a single service.

This will keep the size of the images small so they are cheaper to store and run, and you can scale the service replicas independently of each other.

It might be easier to throw everything in one container to avoid the headache of wiring up the services across containers later, but avoid the temptation. Keep your Dockerfile simple. It should not be the equivalent of an Ansible script. If you have big, multi-stage builds, you probably should be separating out services into individual Dockerfiles.

Hardcoding configuration

Don't include credentials or configuration settings in your Dockerfile. Just don't. Besides being a security risk, hardcoded settings defeat the purpose of deploying the same container image across multiple environments.

Pass them as environment variables when you start the container, or inject them as secrets. If you are in a Kubernetes environment, pull them from etcd or another key-value store like Consul.

Storing data in the container

You will enter a world of pain if you store user-specific data in your containers. By their nature, containers are ephemeral creatures that disappear as quickly as you conjure them into existence, and when they disapper, the data goes with them.

Besides that, the layered Union File System inside a container uses a “copy-on-write” strategy for modifying files, and that is a significant performance bottleneck if you keep updating files.

Container storage should only be used to store instance-specific data. User-specific content and transactional data should be stored in a database outside the container, or in a filestore attached to the container.

Other operational data like logs should be streamed to the underlying node. Don't dump logs to a default directory that gets lost when the container crashes or when the pod gets evicted.

Using a container as a sandbox

Running an application in a container will not protect the host server from malware. If an application runs as root in the container, it is running as root on the host. A privilege escalation in a container could spill into the host.

A container should be treated with the same caution as you would treat any other service. It runs in its own namespace but it still has access to file systems such as /sys and /proc, devices such as /dev/mem and /dev/sd*, and other major kernel subsystems including SELinux, none of which are namespaced.

By contrast, processes running in a virtual machine are constrained by the hypervisor and they do not have direct access to the host kernel's subsystems. A container does not have the same layer of insulation.

Using different containers for development and production

Don't define container images that are specifically tailored for a given environment. You defeat the point of continuous deployment if you build a container for development using a Dockerfile that differs from the one you use for testing or production.

The container image should describe a fixed bill-of-materials that can be locked down and secured across the software development lifecycle. Use environment variables and configuration files to manage the behaviour of the container as it moves through the deployment pipeline.

Implications

Containers are at the heart of a paradigm shift in infrastructure provisioning. Till now, the go-to level of abstraction for a discrete unit of computing power has been the virtual machine. Now the level of abstraction has ratcheted up a notch, to the level of the container.

This shift to containers is accompanied by a shift away from the imperative model of managing compute resources. Tools like Chef, Puppet and Ansible take an imperative "if-this-install-that" approach to configuring and managing VMs, but Kubernetes is based on a declarative model. You define the desired configuration of containers and resource requirements and let the Kubernetes control plane orchestrate the containers in their pods to reach steady-state equilibrium.

This is only possible if containers are defined as discrete, immutable units of compute resources. Don't regard them as mini-VMs that must be hand-reared and lovingly maintained. Your Dockerfile must be a cookie-cutter that stamps out fixed, secure, repeatable computing units that can be created as fast as they can be thrown away.

This shift to a lighter, faster, asynchronous provisioning model provides flexibility and elasticity but it also brings complexity to architecture governance. Archton is uniquely positioned to assist C-suite executives manage this risk in cloud-first, serverless computing.