I made a Kubernetes game where you explore your cluster and destroy pods

a game where you can explore and destroy pods in your kubernetes cluster

I enjoy game development as a hobby on the side. I also enjoy working with container schedulers like Kubernetes. Over the weekend I decided to create a Kubernetes game, combining those two thoughts.

In the game you enter and explore nodes in your cluster, and can destroy your very own live, running pods. Hide prod away!

The game is put together using my engine of choice, Unity. With Unity you code using C#.

3 x Nodes represented in-game from my Raspberry Pi Kubernetes cluster.
3 x Nodes represented in-game from my Raspberry Pi Kubernetes cluster. Can you spot the naming convention from one of my favourite movies?

Game Logic

The game logic was simple to put together. I have a couple of modular systems I’ve already developed (and actually sell on the Unity Asset Store), so those made the movement and shooting logic, as well as background grid effects a breeze.

Movement is implemented in a simple ‘twin-stick’ controller Script (a Unity concept, which is a class implementing Monobehaviour).

Other game logic is mostly contained in the bullet pattern module. I have some more Scripts that arrange and control the Kubernetes entities as well as their labels.

The interaction with Kubernetes itself is fairly hacked together. I wanted to put the game together as quickly as possible as I only worked on it over a couple of weekend evenings.

Let the hacky code flow…

Unity is a bit behind in .NET Framework version support and .NET Core was out of the question. This meant using the Kubernetes csharp client was not going to happen easily (directly in Unity that is). It would have been my first choice otherwise.

With that in mind, I skipped over to a hacky solution of invocating the kubectl client directly from within the game.

The game code executes kubectlcommands on threads separate to the main game loop and returns the results formatted accordingly, back to the game’s main thread. I used System.Diagnostics.Process for this.

From there, game entities are instantiated and populated with info and labels. (E.g. the nodes and the pods).

pods spawned in the game, bouncing around

Pods have health

Pods are given health (hit points) and they simply bounce around after spawning in. You can chase after and shoot them, at which point a kubectl destroy pod command is actually sent to the Kube API via kubectl!

The game world

You enter the world in a ‘node’ view, where you can see all of your cluster’s nodes. From there you can approach nodes to have them slide open a ‘door’. Entering the door transports you ‘into’ the node, where you can start destroying pods at will.

For obvious reasons I limit the pods that are destroyable to a special ‘demo’ namespace.

Putting together the demo pods

I use a great little tool called arkade in my Kubernetes Pi cluster.

arkade makes it really simple to install apps into your cluster. Great for quick POCs and demos.

Arkade offers a small library of useful and well thought out apps that are simple to install. The CLI provides strongly-typed flags to install these apps (or any helm charts) in short, one-line operations.

It also handles the logic around figuring out which platform you’re running on, and pulling down the correct images for that platform (if supported). Super useful when you’re on ARM as you are with the Raspberry Pi.

Straight from the GitHub page, this is how simple it is to setup:

# Note: you can also run without `sudo` and move the binary yourself
curl -sLS https://dl.get-arkade.dev | sudo sh

arkade --help
ark --help  # a handy alias

# Windows users with Git Bash
curl -sLS https://dl.get-arkade.dev | sh

I then went about installing a bunch of apps and charts with arkade. For example:

arkade install loki --namespace demo

Hooking the game up to my Kube Cluster

With the demo namespace complete, and the application pods running, I needed to get my Windows machine running the game talking to my Pi Cluster (on another local network).

I have a Pi ‘router’ setup that is perfectly positioned for this. All that is required is to run a kube proxy on this, listening on 0.0.0.0 and accepting all hosts.

kubectl proxy --address='0.0.0.0' --port=8001 --accept-hosts='.*'

I setup a local kube config pointing to the router’s local IP address on the interface facing my Windows machine’s network, and switched context to that configuration.

From there, the game’s kubectl commands get sent to this context and traverse the proxy to hit the kube API.

Destroying pods sure does exercise those ReplicaSets!

ReplicaSets spinning up new pods as quickly as they're destroyed in-game!
ReplicaSets spinning up new pods as quickly as they’re destroyed in-game!

Source

If there is any interest, I would be happy to publish the (hacky) source for the main game logic and basic logic that sends the kubectl processes off to other threads.

This is post #5 in my effort towards 100DaysToOffload.

Building a Pi Kubernetes Cluster – Part 3 – Worker Nodes and MetalLB

Building a Raspberry Pi Kubernetes Cluster - part 3 - worker nodes featured image

This is the third post in this series and the focus will be on completing the Raspberry Pi Kubernetes cluster by adding a worker node. You’ll also setup a software based load-balancer implementation designed for bare metal Kubernetes Clusters by leveraging MetalLB.

Here are some handy links to other parts in this blog post series:

By now you should have 1 x Pi running as the dedicated Pi network router, DHCP, DNS and jumpbox, as well as 1 x Pi running as the cluster Master Node.

Of course it’s always best to have more than 1 x Master node, but as this is just an experimental/fun setup, one is just fine. The same applies to the Worker nodes, although in my case I added two workers with each Pi 4 having 4GB RAM.

Joining a Worker Node to the Cluster

Start off by completing the setup steps as per the Common Setup section in Part 2 with your new Pi.

Once your new Worker Pi is ready and on the network with it’s own static DHCP lease, join it to the cluster (currently only the Master Node) by using the kubeadm join command you noted down when you first initialised your cluster in Part 2.

E.g.

sudo kubeadm join 10.0.0.50:6443 --token kjx8lp.wfr7n4ie33r7dqx2 \
     --discovery-token-ca-cert-hash sha256:25a997a1b37fb34ed70ff4889ced6b91aefbee6fb18e1a32f8b4c8240db01ec3

After a few moments, SSH back to your master node and run kubectl get nodes. You should see the new worker node added and after it pulls down and starts the weave net CNI image it’s status will change to Ready.

kubernetes worker node added to cluster

Setting up MetalLB

The problem with a ‘bare metal’ Kubernetes cluster (or any self-installed, manually configured k8s cluster for that matter) is that it doesn’t have any load-balancer implementation to handle LoadBalancer service types.

When you run Kubernetes on top of a cloud hosting platform like AWS or Azure, they are backed natively by load-balancer implementations that work seamlessly with those cloud platform’s load-balancer services. E.g. classic application or elastic load balancers with AWS.

However, with a Raspberry Pi cluster, you don’t have anything fancy like that to provide LoadBalancer services for your applications you run.

MetalLB provides a software based implementation that can work on a Pi cluster.

Install version 0.8.3 of MetalLB by applying the following manifest with kubectl:

kubectl apply -f https://gist.githubusercontent.com/Shogan/d418190a950a1d6788f9b168216f6fe1/raw/ca4418c7167a64c77511ba44b2c7736b56bdad48/metallb.yaml

Make sure the MetalLB pods are now up and running in the metallb-system namespace that was created.

metallb pods running

Now you will create a ConfigMap that will contain the settings your MetalLB setup will use for the cluster load-balancer services.

Create a file called metallb-config.yaml with the following content:

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses:
      - 10.23.220.88-10.23.220.98

Update the addresses section to use whichever range of IP addresses you would like to assign for use with MetalLB. Note, I only used 10 addresses as below for mine.

Apply the configuration:

kubectl apply -f ./metallb-config.yaml

Setup Helm in the Pi Cluster

First of all you’ll need an ARM compatible version of Helm. Download it and move it to a directory that is in your system PATH. I’m using my Kubernetes master node as a convenient location to use kubectl and helm commands from, so I did this on my master node.

Install Helm Client

export HELM_VERSION=v2.9.1
wget https://kubernetes-helm.storage.googleapis.com/helm-$HELM_VERSION-linux-arm.tar.gz
tar xvzf helm-$HELM_VERSION-linux-arm.tar.gz
sudo mv linux-arm/helm /usr/bin/helm

Install Helm Tiller in the Cluster

Use the following command to initialise the tiller component in your Pi cluster.

helm init --tiller-image=jessestuart/tiller --service-account tiller --override spec.selector.matchLabels.'name'='tiller',spec.selector.matchLabels.'app'='helm' --output yaml | sed 's@apiVersion: extensions/v1beta1@apiVersion: apps/v1@' | kubectl apply -f -

Note: it uses a custom image from jessestuart/tiller (as this is ARM compatible). The command also replaces the older api spec for the deployment with the apps/v1 version, as the older beta one is no longer applicable with Kubernetes 1.16.

Deploy an Ingress Controller with Helm

Now that you have something to fulfill LoadBalancer service types (MetalLB), and you have Helm configured, you can deploy an NGINX Ingress Controller with a LoadBalancer service type for your Pi cluster.

helm install --name nginx-ingress stable/nginx-ingress --set rbac.create=true --set controller.service.type=LoadBalancer

If you list out your new ingress controller pods though you might find a problem with them running. They’ll likely be trying to use x86 architecture images instead of ARM. I manually patched my NGINX Ingress Controller deployment to point it at an ARM compatible docker image.

kubectl set image deployment/nginx-ingress-controller     nginx-ingress-controller=quay.io/kubernetes-ingress-controller/nginx-ingress-controller-arm:0.26.1

After a few moments the new pods should now show as running:

new nginx ingress pods running with ARM image

Now to test everything, you can grab the external IP that should have been assigned to your NGINX ingress controller LoadBalancer service and test the default NGINX backend HTTP endpoint that returns a simple 404 message.

List the service and get the EXTERNAL-IP (this should sit in the range you configured MetalLB with):

kubectl get service --selector=app=nginx-ingress

Curl the NGINX Ingress Controller LoadBalancer service endpoint with a simple GET request:

curl -i http://10.23.220.88

You’ll see the default 404 not found response which indicates that the controller did indeed receive your request from the LoadBalancer service and directed it appropriately down to the default backend pod.

the nginx default backend 404 response

Concluding

At this point you’ve configured:

  • A Raspberry Pi Kubernetes network Router / DHCP / DNS server / jumpbox
  • Kubernetes master node running the master components for the cluster
  • Kubernetes worker nodes
  • MetalLB load-balancer implementation for your cluster
  • Helm client and Tiller agent for ARM in your cluster
  • NGINX ingress controller

In part 1, recall you setup some iptables rules on the Router Pi as an optional step?

These PREROUTING AND POSTROUTING rules were to forward packets destined for the Router Pi’s external IP address to be forwarded to a specific IP address in the Kubernetes network. In actual fact, the example I provided was what I used to forward traffic from the Pi router all the way to my NGINX Ingress Controller load balancer service.

Revisit this section if you’d like to achieve something similar (access services inside your cluster from outside the network), and replace the 10.23.220.88 IP address in the example I provided with the IP address of your own ingress controller service backed by MetalLB in your cluster.

Also remember that at this point you can add as many worker nodes to the cluster as you like using the kubeadm join command used earlier.

Building a Raspberry Pi Kubernetes Cluster – Part 2 – Master Node

Building a Raspberry Pi Kubernetes Cluster - part 2 - master node title featured image

The Kubernetes Master node is one that runs what are known as the master processes: The kube-apiserver, kube-controller-manager and kube-scheduler.

In this post we’ll go through some common setup that all nodes (masters and workers) in your cluster should get, and then on top of that, the specific setup that will finally configure a single node in the cluster to be the master.

If you would like to jump to the other partes in this series, here are the links:

By now you should have some sort of stack or collection of Raspberry Pis going. As mentioned in the previous post, I used a Raspberry Pi 3 for my router/dhcp server for the Kubernetes Pi Cluster network, and Raspberry Pi 4’s with 4GB RAM each for the master and worker nodes. Here is how my stack looks now:

picture of raspberry pi devices in stack, forming the kubernetes cluster
The stack of Rasperry Pi’s in my cluster. Router Pi at the bottom, master and future worker nodes above. They’re sitting on top of the USB power hub and 8 port gigabit network switch

Common Setup

This setup will be used for both masters and workers in the cluster.

Start by writing the official Raspbian Buster Lite image to your microSD card. (I used the 26th September 2019 version), though as you’ll see next I also updated the Pi’s firmware and OS using the rpi-update command.

After attaching your Pi (master) to the network switch, it should pick up an IP address from the DHCP server you setup in part 1.

SSH into the Pi and complete the basic setup such as setting a hostname and ensuring it gets a static IP address lease from DHCP by editing your dnsmasq configuration (as per part 1).

Note: As the new Pi is running on a different network behind your Pi Router, you can either SSH into your Pi Router (like a bastion host or jump box) and then SSH into the new Master Pi node from there.

Now update it:

sudo rpi-update

After the update completes, reboot the Pi.

sudo reboot now

SSH back into the Pi, then download and install Docker. I used version 19.03 here, though at the moment it is not ‘officially’ supported.

export VERSION=19.03
curl -sSL get.docker.com | sh && sudo usermod pi -aG docker && newgrp docker

Kubernetes nodes should have swap disabled, so do that next. Additionally, you’ll enable control groups (cgroups) for resource isolation.

sudo dphys-swapfile swapoff
sudo dphys-swapfile uninstall
sudo update-rc.d dphys-swapfile remove
sudo systemctl disable dphys-swapfile.service

sudo sed -i -e 's/$/ cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory/' /boot/cmdline.txt

Installing kubeadm and other Kubernetes components

Next you’ll install the kubeadm tool (helps us create our cluster quickly), as well as a bunch of other components required, such as the kubelet (the main node agent that registers nodes with the API server among other things), kubectl and the kubernetes cni (to provision container networking).

Next up, install the legacy iptables package and setup networking so that it traverses future iptables rules.

Note: when I built my cluster initially I discovered problems with iptables later on, where the kube-proxy and kubelet services had trouble populating all their required iptables rules using the pre-installed version of iptables. Switching to legacy iptables fixed this.

The error I ran into (hopefully those searching it will come across this post too) was:

proxier.go:1423] Failed to execute iptables-restore: exit status 2 (iptables-restore v1.6.0: Couldn't load target `KUBE-MARK-DROP':No such file or directory

Setup iptables and change it to the legacy version:

sudo sysctl net.bridge.bridge-nf-call-iptables=1
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy

Lastly to finish off the common (master or worker) node setup, reboot.

sudo reboot now

Master Node Setup

Now you can configure this Pi as a master Kubernetes node. SSH back in after the reboot and pull down the various node component docker images, then initialise it.

Important: Make sure you change the 10.0.0.50 IP address in the below code snippet to match whatever IP address you reserved for this master node in your dnsmasq leases configuration. This is the IP address that the master API server will advertise out with.

Note: In my setup I am using 192.168.0.0./16 as the pod CIDR (overlay network). This is specifically to keep it separate from my internal Pi network of 10.0.0.0/8.

sudo kubeadm config images pull -v3
sudo kubeadm init --token-ttl=0 --apiserver-advertise-address=10.0.0.50 --pod-network-cidr=192.168.0.0/16

# capture text and run as normal user. e.g.:
# mkdir -p $HOME/.kube
# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# sudo chown $(id -u):$(id -g) $HOME/.kube/config

Once the kubeadm commands complete, the init command will output a bunch of commands to run. Copy and enter them afterwards to setup the kubectl configuration under $HOME/.kube/config.

You’ll also see a kubeadm join command/token. Take note of that and keep it safe. You’ll use this to join other workers to the cluster later on.

kubeadm join 10.0.0.50:6443 --token yi4hzn.glushkg39orzx0fk \
    --discovery-token-ca-cert-hash sha256:xyz0721e03e1585f86e46e477de0bdf32f59e0a6083f0e16871ababc123

Installing the CNI (Weave)

You’ll setup Weave Net next. At a high level, Weave Net creates a virtual container network that connects your containers that are scheduled across (potentially) many different hosts and enables their automatic discovery across these hosts too.

Kubernetes has a pluggable architecture for container networking, and Weave Net is one implementation of this.

Note: the command below assumes you’re using an overlay/container network of 192.168.0.0/16. Change this if you’re not using this range.

On your Pi master node:

curl --location -o ./weave-cni.yaml "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=192.168.0.0/16"
kubectl apply -f ./weave-cni.yaml

After a few moments waiting for your node to pull down the weave net container images, check that the weave container(s) are running and that the master node is showing as ready. Here is how that should look…

kubectl -n kube-system get pods
kubectl get nodes
pi@korben:~ $ kubectl -n kube-system get pods | grep weave
weave-net-cfxhr                  2/2     Running   20         10d
weave-net-chlgh                  2/2     Running   17         23d
weave-net-rxlg8                  2/2     Running   13         23d

pi@korben:~ $ kubectl get nodes
NAME     STATUS   ROLES    AGE   VERSION
korben   Ready    master   23d   v1.16.2

That is pretty much it for the master node setup. You now have a single master node running the Kubernetes master components / API server, and have even used to successfully provision and configure container networking.

As a result of deploying Weave Net, you now have a DaemonSet that will ensure that any new node that joins the cluster will automatically get the Weave Net CNI. All other nodes in the cluster will automatically update to ‘know’ about the new node and subsequently containers in the cluster will be able to talk to each other over the overlay network.

Building a Raspberry Pi Kubernetes Cluster – Part 1 – Routing

Building a Raspberry Pi Kubernetes Cluster - part 1 - routing - title featured image

I’ve recently built myself a Kubernetes (1.16.2) cluster running on a combination of Raspberry Pi 4 and 3 devices.

Raspberry Pi Cluster Stack

I’ll be running through the steps I took to build it out in this series, with part 1 focusing on the router and internal node network side of things.

If you want to jump to the other parts in this series:

First off, here is a list of parts I used to set everything up:

To make the setup as portable as possible, and also slightly seggregated from my home network, I used the 1 x Raspberry Pi 3 device I had as a router between my home network and my Kubernetes Layer 2 Network (effectively the devices on the 8 port Netgear Switch).

Here is a network diagram that shows the setup.

Raspberry Pi Kubernetes Network Diagram

Building the Raspberry Pi Cluster Router

Of course you’ll need an OS on the microSD card for each Raspberry Pi you’re going to be using. I used the latest Raspbian Buster Lite image from the official Raspbian Downloads page (September 26).

This is a minimal image and is exactly what we need. You’ll need to write it to your microSD card. There are plenty tutorials out there on doing this, so I won’t cover it here.

One piece of advice though, would be to create a file called “ssh” on the imaged card filesystem after writing the image. This enables you to SSH on directly without the need to connect up a screen and setup the SSH daemon yourself. Basically just login to your home network DHCP server and look for the device once it boots then SSH to it’s automatically assigned IP address.

Also, it would be wise to reserve an IP address on your home network’s DHCP service for your Pi Router. Grab the MAC address of your Pi and add it to your home network DHCP service’s reserved IP addresses. I set mine to 192.168.2.30 on my WiFi network.

List the wlan interface’s MAC address with:

ifconfig wlan0

Setting Hostname and Changing the Default Password

On the Router Raspberry Pi, run the following command to change the hostname to something other than “raspberry” and change the default password too:

sudo raspi-config
Change the default password and hostname of the Raspberry Pi

Setting up the Pi Router

Now the rest of the guide deserves much credit to this blog post, however, I did change a few things on my setup, as the routing was not configured 100% correctly to allow external access to services on the internal Kubernetes network.

I needed to add a couple of iptables rules in order to be able to access my Ingress Controller from my home network. More on that later though.

Interface Setup

You need to configure the WiFi interface (wlan0) and the Ethernet Interface (eth0) for each “side” of the network.

Edit the dhcpd.conf file and add an eth0 configuration right at the bottom, then save.

sudo nano /etc/dhcpcd.conf
interface eth0
static ip_address=10.0.0.1/8
static domain_name_servers=1.1.1.1,208.67.222.222
nolink

Of course replace the above DNS servers with whichever you prefer to use. I’ve used Cloudflare and OpenDNS ones here.

Next, setup your WiFi interface to connect to your home WiFi. WiFi connection details get saved to /etc/wpa_supplicant/wpa_supplicant.conf but it is best to use the built-in configuration tool (raspi-config) to do the WiFi setup.

sudo raspi-config

Go to Network Options and enter your WiFi details. Save/Finish afterwards.

Install and Configure dnsmasq

sudo apt update
sudo apt install dnsmasq
sudo mv /etc/dnsmasq.conf /etc/dnsmasq.conf.backup

Create a new /etc/dnsmasq.conf file with the below command:

The script is the main dnsmasq configuration that sets DHCP up over the eth0 interface (for the 10.0.0.0/8 network side) and configures some nameservers for DNS as well as a few other bits.

Edit the service file for dnsmasq (/etc/init.d/dnsmasq) to prevent issues with start-up order of dnsmasq and dhcpcd:

sudo nano /etc/init.d/dnsmasq

Change the top of the file to look like this:

#!/bin/sh

# Hack to wait until dhcpcd is ready
sleep 10

### BEGIN INIT INFO
# Provides:       dnsmasq
# Required-Start: $network $remote_fs $syslog $dhcpcd
# Required-Stop:  $network $remote_fs $syslog
# Default-Start:  2 3 4 5
# Default-Stop:   0 1 6
# Description:    DHCP and DNS server
### END INIT INFO

The lines changed above are the sleep 10 command and the Required-Start addition of $dhcpcd.

At this point its a good idea to reboot.

sudo reboot now

After the reboot, check that dnsmasq is running.

sudo systemctl status dnsmasq

Setup iptables

First of all, enable IP forwarding. Edit the /etc/sysctl.conf file and uncomment this line:

net.ipv4.ip_forward=1

This enables us to use NAT rules with iptables.

Now you’ll configuring some POSTROUTING and FORWARD rules in iptables to allow your Raspberry Pi devices on the 10.0.0.0/8 network to access the internet via your Pi Router’s wlan0 interface.

sudo iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
sudo iptables -A FORWARD -i wlan0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
sudo iptables -A FORWARD -i eth0 -o wlan0 -j ACCEPT

Optional Step

This is optional, and you might only need to do this later on once you start running services in your Kubernetes Pi Cluster.

Forward Traffic from your home network to a Service or Node IP in your Cluster Network:

sudo iptables -t nat -A PREROUTING -i wlan0 -p tcp --dport 80 -j DNAT --to-destination 10.23.220.88:80
sudo iptables -t nat -A POSTROUTING -p tcp -d 10.23.220.88 --dport 80 -j SNAT --to-source 10.0.0.1

The above assumes a couple of things that you should change accordingly (if you use this optional step):

  • You have a Service running in the Kubnernetes network, listening on port 80 (http) on IP 10.23.220.88
  • You setup your Pi Router to use 10.0.0.1 as the eth0 device IP (as per above in this post), and your wlan0 interface is the connection that your Pi router is using to connect to your home network (WiFi).
  • You actually want to forward traffic hitting your Pi Router (from the WiFi wlan0 interface) through the 10.0.0.1 eth0 interface and into a service IP on the 10.0.0.0/8 network. (In my example above I have an nginx Ingress Controller running on 10.23.220.88).

Persisting your iptables rules across reboots

Persist all of your iptables rules by installing iptables-persistent:

sudo apt install iptables-persistent

The above will run a wizard after installation and you’ll get the option to save your IPv4 rules. Choose Yes, then reboot afterwards.

After reboot, run sudo iptables -L -n -v to check that the rules persisted after reboot.

Note: if you ever update your Pi Router’s iptables rules and want to re-save the new set of rules to persist across reboots, you’ll need to re-save them using the iptables-persistent package.

sudo dpkg-reconfigure iptables-persistent

Adding new Pi devices to your network in future

Whenever you add an additional Raspberry Pi device to the 8 port switch / Kubernetes network in the future, make sure you edit /etc/dnsmasq.conf to update the list of MAC addresses assigned to 10.0.0.x IP addresses.

You’ll want to set the new Pi’s eth0 MAC address up in the list of pre-defined DHCP leases.

You can also view the /var/lib/misc/dnsmasq.leases file to see the current dnsmasq DHCP leases.

This is handy when adding a new, un-configured Pi to the network – you can pick up the auto-assigned IP address here, and then SSH to that for initial configuration.

Concluding

That is pretty much the setup and configuration for the Pi Router complete. As mentioned above, much credit for this configuration goes to this guide on downey.io.

I ended up modifying the iptables rules for service traffic forwarding from my home network side into some Kubernetes LoadBalancer services I ended up running later on which I covered above in the Optional Steps section.

At this point you should have your Pi Router connected to your home network via WiFi, and have the Ethernet port plugged into your network switch. Make sure the switch is not connected back to your home network via an Ethernet cable or you’ll run into some strange network loop issues.

You should now be able to plug in new Pi’s to the network switch, and they should get automatically assigned DHCP addresses on the 10.0.0.0/8 network.

Updating your dnsmasq.conf file with the new Pi’s ethernet MAC addresses means that they can get statically leases IP addresses too, which you’ll need for your Kubernetes nodes once you start adding them (see Part 2 coming next).

Troubleshooting Amazon EKS Worker Nodes not joining the cluster

mechanic underneath car fixing things

I’ve recently been doing a fair bit of automation work on bringing up AWS managed Kubernetes clusters using Terraform (with Packer for building out the worker group nodes). Read on for some handy tips on troubleshooting EKS worker nodes.

Some of my colleagues have not worked with EKS (or Kubernetes) much before and so I’ve also been sharing knowledge and helping others get up to speed. A colleague was having trouble with their newly provisioned personal test EKS cluster found that the kube-system / control plane related pods were not starting.  I assisted with the troubleshooting process and found the following…

Upon diving into the logs of the kube-system related pods (dns, aws CNI, etc…) it was obvious that the pods were not being scheduled on the brand new cluster. The next obvious command to run was kubectl get nodes -o wide to take a look at the general state of the worker nodes.

Unsurprisingly there were no nodes in the cluster.

Troubleshooting worker nodes not joining the cluster

The first thing that comes to mind when you have worker nodes that are not joining the cluster on startup is to check the bootstrapping / startup scripts. In EKS’ case (and more specifically EC2) the worker nodes should be joining the cluster by running a couple of commands in the userdata script that the EC2 machines run on launch.

If you’re customising your worker nodes with your own custom AMI(s) then you’ll most likely be handling this userdata script logic yourself, and this is the first place to check.

The easiest way of checking userdata script failures on an EC2 instance is to simply get the cloud-init logs direct from the instance. Locate the EC2 machine in the console (or the instance-id inspect the logs for failures on the section that logs execution of your userdata script.

  • In the EC2 console: Right-click your EC2 instance -> Instance Settings -> Get System Log.
  • On the instance itself:
    • cat /var/log/cloud-init.log | more
    • cat /var/log/cloud-init-output.log | more

Upon finding the error you can then check (using intuition around the specific error message you found):

  • Have any changes been introduced lately that might have caused the breakage?
  • Has the base AMI that you’re building on top of changed?
  • Have any resources that you might be pulling into the base image builds been modified in any way?

These are the questions to ask and investigate first. You should be storing base image build scripts (packer for example) in version control / git, so check the recent git commits and image build logs first.

How to restart a slave FortiGate firewall in an HA cluster

Here’s a quick how-to on restarting a specific member of a High Availability FortiGate hardware firewall cluster. I have only tested this on a cluster of FG60 units, but am quite sure the steps would be similar for a cluster of FG100s, FG310s etc…

get-ha-status

First of all you may or may not want to set up some monitoring going to your various WAN connections on the HA cluster. Restarting the slave unit should not have any effect on these connections in theory as your master unit is the one handling all the work. The slave is merely there to take over should things go pear shaped on the master unit. When the slave restarts you can watch your ping statistics or other connections just to ensure everything stays up whilst it reboots.

1. Start by logging in to the web interface of your firewall cluster. https://ipaddress

2. Specify a custom port number if you have the management GUI on a custom port for example https://ipaddress:555

3. Login and look for “HA status” under the status area – this should be the default page that loads. It should show as “Active-passive” if this is the mode your HA cluster is in. Click the [Configure] link next to this.

4. This will give you an overview of your HA cluster – you can view which unit is the Master and which is the slave. This step is optional and just gives you a nice overview of how things are looking at the moment. Click “View HA statistics” near the top right if you would like to view each unit’s CPU/Memory usage and other statistics.

5. Return to the “Status” home page of your firewall GUI. Click in the “CLI Console” black window area to get to your console. (Optionally, you could also just SSH in if you have this enabled).

6. Type the following command to bring up your HA cluster details: get system ha status

7. This will show which firewall is master and slave in the cluster e.g.

Master:129 FG60-1 FWF60Bxxxxxxxx65 1
Slave :125 FG60-2 FWF60Bxxxxxxxx06 0

Look for the number right at the end and note this down. In the above example the Slave unit has the number “0” . Note this down.

8. Next enter the following command: execute ha manage x

Where “x” is the number noted down in step number 7.

This will change your management console to this particular firewall unit. i.e. the slave unit in our case. You should notice your command line change to reflect the name of the newly selected HA member.

9. Enter the following command to reboot the slave: execute reboot

10. Press “Y” to confirm and reboot the slave.

Monitor your ping / connection statistics to ensure everything looks fine. Give it a minute or so to boot up again, then return to your HA statistics page to ensure everything looks good.

That is all there is to it.