The Virtual World of Clouds
Virtualization is a cornerstone of cloud computing. When you get an "instance" from a cloud provider, what does that mean? An instance is typically described with the same terminology used for computer servers: memory, cores, attached networking and storage devices. But does such a computer really exist? Maybe it is just simulated, perhaps it is just a virtual computer.
Computing with virtual facilities is an old idea. Everyone has used Virtual memory for example. The idea of virtual memory is to substitute pages (say 4k blocks) of data that reside on persistent storage (like SSD) for real memory, but use real memory as a conduit to the data whenever it is needed for read/write operations. One interesting, lesser known optimization in modern virtual memory is Copy-on-write, which speeds up the creation of a new address space when a process forks a child process.
An Emulator is a program that simulates the op codes of a machine language. An example of emulation can be found in the development environments for phone apps. Instead of debugging on a phone, a desktop machine can simulate the instructions of the system on the phone. One example of a popular emulator is QEMU.
Emulators have many advantages: one machine can emulation many types of hardware; the emulated environment can be made secure (incapable of changing real persistent storage). A major disadvantage is the overhead of simulating instructions, which can slow computing by a factor of 100 or more. In one case, there is a clever optimization: if the target system to be emulated has the same machine code as the real hardware, then we can skip much of the simulation. Instead, most of the computing work is just executed directly. This is a tricky thing to do safely, but it is a widely used idea in cloud computing: a Virtual machine (VM). In VM terminology, the "kernel" of a virtual machine is the Hypervisor which hosts "guest" virtual machines. Another related idea is a Virtual network.
The overhead of virtual machines is not only the fact that the hypervisor uses CPU and has to intercept privileged instructions on guest machines (plus carefully deal with devices shared between guests, like networking and input/output). Virtual machines also consume lots of memory. Suppose three virtual machines are running on one real machine. These three virtual machines might be using the same operating system: perhaps they are all running Ubuntu as their OS. The same kernel code and lots of memory is replicated three times! The hypervisor is unaware of the commonality of the guests and cannot safely optimize memory. What can be done to improve on this situation?
One recent answer is containerization. Made possible by a linux concept of cgroups, it is possible to partition users of one linux system into groups that are independent: users in one group are completely unaware of the existence of users in another group. The file systems of groups are separate, the processes are separate, and they might be using different versions of software (database, Java version, etc). What all users share is the kernel, which carefully keeps them separated.
The most popular implementation of containers is Docker (software). Docker provides software to manage running containers and software for repositories of "images" and "recipes" for containers.
Dockerfile Syntax (think of this as the container recipe)
Docker Hub (a community repository)
Amazon's cloud can run Docker instances via the ECS (Elastic Container Service). There are many documents describing this service of AWS, for instance you can start with Getting Started. But why use ECS at all? Let's look at another tutorial which has more about the rationale.
See Also: Google Container Engine