Using Docker Swarm clusters on Azure

One of the demos I prepared for the Microsoft Azure Conference in Pune, India in March of 2015 was about running orchestration engines on Azure to manage clusters of hosts and containers using Docker Swarm. Docker Swarm, if you didn't know is a clustering solution for Docker containers from the open source Docker project. You should probably read the documentation to understand what Swarm does but in case you aren't in the mood or are just plain lazy then here's an extremely brief primer.

So what is Docker Swarm?

Swarm basically allows you to treat a cluster of nodes, i.e. a collection of physical/virtual machines, as one giant Docker host. It attempts to abstract away from you the fact that there are a cluster of individual Docker hosts that are actually running your containers for you. Simple enough, isn't it? The basic idea is that you stick a bunch of VMs (or physical machines) behind a swarm manager service which then sets about providing the Docker host REST API for you. The swarm manager's implementation of the Docker host API will simply delegate the actual work to one or more of the worker nodes that it has access to.

Since the swarm manager's API is basically exactly equivalent to the Docker host API, it means that in order to manage the cluster, you can use pretty much the exact same tooling that you use today when you deal with a single Docker host. You can, for instance, use the Docker CLI to deal with a swarm manager host just as you would a regular Docker host. For those who prefer looking at a picture instead of reading a bunch of text (well, any more than you already have), here's a graphic that shows how it works. Image credits: Docker Inc. on SlideShare.

Docker Swarm

The swarm software itself is available as a public Docker image on Docker Hub (apart from being open source that is).

Pluggable Schedulers

So when you ask a swarm manager to spin up a container how exactly does it know which of the possibly 100s of nodes you've got it configured with to use? Swarm comes with a built-in scheduler that can figure things out by itself but is also designed to make the scheduling process pluggable - meaning, you can replace it's built-in scheduler with another one of your choice if you so feel like it. For example, you can use the Apache Mesos project's scheduler and hook it up with Docker Swarm so that Mesos takes care of picking out the best node to spin up a given container.

Pluggable node discovery

Swarm supports multiple mechanisms for associating nodes (which can be physical or virtual machines) with an instance of the swarm manager. As with scheduling, there's a bulit-in hosted discovery service provided by docker.com which you can choose to use or set one up yourself using etcd, consul, zookeeper or just a plain text file containing host names and IP addresses.

My demo for my talk

So there, now you know what Docker Swarm is all about (kind of). For my talks though I needed a way of easily spinning up and tearing down Docker Swarm clusters on Azure. Remember, this was in March of 2015 and support for Docker Swarm was just beginning to show up in Docker Machine which is a tool that lets you easily create and manage Docker host VMs and Swarm clusters. So I quickly put together a few bash scripts to automate provisioning of the VMs for my Swarm cluster. This post is about those scripts and how you can use them for your own needs.

Firstly, the bash scripts are open source and hosted on GitHub here:

https://github.com/avranju/azure-swarm

The main two script files in question are the following:

  • swarm-up.sh - this brings up the cluster for you
  • swarm-down.sh - this tears down the cluster you created using swarm-up.sh

Setting up your PC

Before you can run the scripts you'll need to do the following:

  1. If you're on Windows, then install Git so that you get the Git Bash console. If you're on Mac/Linux, well, you already have bash.
  2. Install Node.js using your favorite method. I myself like Node Version Manager (NVM) to manage my node.js versions (there's a Windows version available too).
  3. Install Git if you don't have it already.
  4. Install json from NPM from a terminal like so: npm install -g json
  5. Install the Azure CLI like so: npm install -g azure-cli. Configure the Azure CLI with a valid Azure Subscription. If you don't know how to do that then this handy guide should help.
  6. Clone the repo like so:

git clone https://github.com/avranju/azure-swarm.git

Running the scripts

Running the script isn't very hard. To setup a cluster with default options (1 small master VM and 2 small worker node VMs located in "West US") just run this:

./swarm-up.sh

This will do the following:

  1. Generate new SSH keys
  2. Create a new storage account and container
  3. Create a new Azure virtual network
  4. Spin up a VM to run the Swarm Manager service in the virtual network created in step 3
  5. Spin up as many worker node VMs as needed (again, in the same virtual network)
  6. Create a bunch of files in a folder called output.

If everything goes well you should have a Docker Swarm cluster of your own with everything hooked up.

Output files

Each run of the script is identified by a randomly generated 8 character long hex string. For e.g. you might get this: 35f8fa98. A file containing this ID is produced in the output folder. For instance, for the ID 35f8fa98, the file would be called swarm-35f8fa98.deployment. You'll see in a bit why this is important.

Another file that you'll be interested in is a file containing SSH cofiguration information. For the same deployment ID as before, this file will be called ssh-35f8fa98.config. You can use this file to SSH into any of the VMs. For example, to SSH into the swarm-master VM, you'd run the following command:

ssh -F output/ssh-35f8fa98.config swarm-master  

The same command will work for any of the worker node VMs (just change swarm-master to swarm-00 or swarm-01 and so forth).

Tearing down the cluster

The whole deployment ID shebang that I described above pays off when it comes to tearing down everything because having a deployment ID allows us to cleanly delete the deployment. Continuing with the same deployment ID as before, bringing a cluster down involves running the following script:

./swarm-down.sh output/swarm-35f8fa98.deployment

swarm-down.sh will attempt to delete everything that swarm-up.sh created - virtual network, cloud service, VMs and storage account. This will work even with partially deployed clusters (for e.g. you started running the script and then stopped it mid-way because, well, let's just say you had your reasons) because in that case the script will simply attempt to delete something that doesn't exist which is, well, harmless.

Customize your deployment

There are a few options that you can customize by editing the value of various variables in the options.sh file. Here're the ones you're likely to be interested in:

  • VNET_LOCATION - The Azure data center where your VMs will be provisioned. This is "West US" by default.
  • VM_SIZE - The size of the VMs. Accepts any valid size string that designates a VM size. This is "Small" by default.
  • VM_IMAGE - This is the name of the Linux VM image to use. By default this is Ubuntu 14.04 LTS. Ubuntu 15 doesn't work with this script at this point since Ubuntu has switched to systemd for running system services from v15 onwards while the script relies on it being upstart.
  • VM_USER_NAME - The SSH user name. "avranju" by default.
  • SWARM_WORKER_NODES - The number of worker VMs to spin up. This is 2 by default.

Running your containers

To run your containers you'll want to SSH into the swarm-master VM and set your environment up so that it points to the Swarm Manager service which is itself running as a container listening on port 2377 on the VM. Using the output files generated by swarm-up, you'd do the following:

$ ssh -F output/ssh-35f8fa98.config swarm-master
avranju@swarm-master:~$ export DOCKER_HOST=0.0.0.0:2377
avranju@swarm-master:~$ docker version
Client version: 1.7.0
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 851c91a
OS/Arch (client): linux/amd64
Server version: swarm/0.4.0
Server API version: 1.16
Go version (server): go1.4.2
Git commit (server): d647d82
OS/Arch (server): linux/amd64

As you can tell from the text in yellow highlight the CLI is talking to a Docker Swarm host. Now you can go ahead and start spinning up containers willy nilly and Docker Swarm should dutifully schedule them on your worker node VMs.

Finis

That's pretty much it. As always, please feel free to fork, modify, send pull requests etc on these scripts and/or sound out in the comments below.

comments powered by Disqus