Fractional CTO: Technical leadership and business advice on demand. Specialist in the creation, design and build of innovative, technology-led Businesses, Products & Services.
HypriotOS: Get Swarm working on a Raspberry Pi Cluster
9 minute read
Introduction
In a previous post I described how I set up docker on a 4-node Raspberry Pi cluster. The reason for doing this was to be able to investigate Docker Swarm in more detail. I chose to set up the cluster from scratch so as to learn as much as I could. Well, I learned that sometimes discretion is the better part of valour. This post explains why I decided to install HypriotOS instead and how I finally got it to work.
Updating the Cluster
So, having installed Docker on my Pi cluster I wanted to create a Swarm. I also wanted to use Docker 1.12 release, and so far I had only managed to install release 1.10.3. Since I have Docker for Mac installed on my laptop I followed this article to try to run release 1.12 on my Pi nodes even though the local version is 1.10. This is where I started to run into problems:
Right. OK. So my setup might be missing something after all! Let’s try doing the same from one of my Pi nodes and see what happens:
Ugh. I’m sure I could have worked my way around these issues. I’m also sure it would take a long time, and I don’t want to waste time on the details of how Docker works (or fails to work!) on a bare-bones Raspberry Pi. Time to back-track and make my life easier by installing the HypriotOS. I followed this fantastic article to get HypriotOS installed.
Install HypriotOS
The first task was to overwrite my SD cards with the new image using the Hypriot flash tool:
Be aware that raspi-config is not part of HyproitOS, but it automatically resizes the filesystem to the size of the SD card so this isn’t much of an issue. It does mean, however, that to change your password you will have to use the passwd command-line tool (the default username/password credentials are pirate/hypriot).
There is no need to update the /etc/hosts file as HypriotOS starts the avahi-daemon to announce the hostname through mDNS, so each Pi is immediately reachable through {hostname}.local. That’s the theory anyway. I had enormous problems with the ‘configuration free’ avahi-daemon.
Network, what network?
The first disaster on starting up the newly configured Pis was that as soon as I stopped any of them I lost my internet connection within a minute or two of re-starting them. From that point onwards, no matter what I did, restarting any of the Pis completely hosed my internet connection. I couldn’t even access the GUI for my router (a bog-standard Virgin box) but I could access my local network. Sort of.
There was no problem ssh-ing onto any individual Pi fom my Mac using {hostname}.local but once I was in any Pi I couldn’t see the others using the same mechanism. A bit of searching led me to realise that an essential package was missing from my HypriotOS. I needed libnss-mdns before the Pis would resolve each other. I thought it somewhat strange that this would be missing, but there you go. Perhaps this was why I was having problems with my router? Let’s install it then:
Nope, that didn’t work. I could now see one Pi from another but I still couldn’t access my router or internet connection. I spent the next day stopping and starting Pis and stopping and starting my router to no avail. No matter what I did I kept losing my internet connection when I started the Pis. As a last resort I disconnected all but one of the boards and disabled the avahi-daemon. Finally, I could stop and start this Pi without losing my internet connection.
Reinstall avahi-daemon
I was about to give up and do this on each board when I thought I’d try one last thing: uninstall and reinstall avahi-daemon to see if it made a difference.
I reviewed the daemon status before uninstalling:
OK, the fact that a new address of 192.168.200.1 is being registered goes some way toward explaining why I can reach my local network but not my router (or, therefore, the internet). The router is on 192.168.0.1 and the new address doesn’t match the subnet mask (255.255.255.0). Why the heck it’s inserted the 200 and how it’s persuaded my Mac to change its local record is something of a mystery to me but not one I care to investigate in any great detail. Perhaps I should mention at this point that I’m still on OS X Yosemite which may be part of the problem (I’ve seen a lot of posts indicating that the bonjour service on Yosemite is at least partly broken).
Whatever, let’s uninstall and re-install avahi-daemon and see what happens:
I can immediately see that things are getting better as libnss-mdns is installed as a dependency of avahi-daemon (which makes it all the more strange that it wasn’t there in the first place). Now let’s look at the service status:
Not only does that look better, it is better as now I can stop and re-start Pis at will and still connect to the internet (and from one Pi to another). Problem solved.
Configure docker.service
From here I continued exactly as I had before and generated SSH keys for each node which I copied to the other nodes so that I could ssh between them (using the {hostname}.local address). There’s no need to update /etc/hosts as avahi-daemon is now taking care of that correctly. I re-mounted the USB drive on the head node, installed and started nfs-kernel-server on the head node and nfs-common and autofs on the client nodes to share the drive between nodes. Finally, I updated docker engine on each node to release 1.12:
Now we’re ready to go, right? Let’s check:
Oh boy, it never rains but it pours. Now what’s wrong?
Hang on a minute, the IP address the docker service is being started with is the old incorrect one that avahi-daemon created before being reinstalled. How about if I update the service default values at /etc/systemd/system/docker.service and replace this IP address with the new one allocated by avahi-daemon (which in this case is 192.168.0.12)? Well, I did that and ran:
OK, so the default values have been cached. Let’s clear the cache and try again:
Right, that actually worked. Now to check docker-engine:
Phew, we’re there! I tried using {hostname}.local instead of the IP address in /etc/systemd/system/docker.service but, as I expected, that didn’t work. So I’ve got everything working now, but what if avahi-daemon registers a different IP address for this node on startup? Will I have to update the docker.service defaults? I don’t know for now, let’s wait and see.
As a start up you can’t afford experienced engineers so you hire recent graduates. Here are some tips for creating an effective development team from a dispa...
Innovate UK Smart Grants can give a real boost to start-ups, but with great power comes great responsibility. Are you properly prepared for your application ...
Leave a comment