Regain Cluster Access After Expired Kubernetes Certificates

Recently I had to renew expired kubernetes certificates on my home lab cluster after getting locked out from managing it. I wasn’t tracking their age and all of a sudden I found them expired. Kubeadm has a feature to auto-renew certificates during control plane upgrades. Unfortunately I had not done an upgrade on my home cluster in the last year. I realised when issuing a kubectl command to the cluster and receiving an error along the lines of x509: certificate has expired or is not yet valid.

Preparation

To regain access I needed to SSH onto a master node in the cluster and do the following:

Move / backup old certificate and kubeadm config files:

sudo mv /etc/kubernetes/pki/apiserver.key /etc/kubernetes/pki/apiserver.key.old
sudo mv /etc/kubernetes/pki/apiserver.crt /etc/kubernetes/pki/apiserver.crt.old
sudo mv /etc/kubernetes/pki/apiserver-kubelet-client.crt /etc/kubernetes/pki/apiserver-kubelet-client.crt.old
sudo mv /etc/kubernetes/pki/apiserver-kubelet-client.key /etc/kubernetes/pki/apiserver-kubelet-client.key.old
sudo mv /etc/kubernetes/pki/front-proxy-client.crt /etc/kubernetes/pki/front-proxy-client.crt.old
sudo mv /etc/kubernetes/pki/front-proxy-client.key /etc/kubernetes/pki/front-proxy-client.key.old

sudo mv /etc/kubernetes/admin.conf /etc/kubernetes/admin.conf.old
sudo mv /etc/kubernetes/kubelet.conf /etc/kubernetes/kubelet.conf.old
sudo mv /etc/kubernetes/controller-manager.conf /etc/kubernetes/controller-manager.conf.old
sudo mv /etc/kubernetes/scheduler.conf /etc/kubernetes/scheduler.conf.old

Renew Expired Kubernetes Certificates

Now use kubeadm to renew all certificates.

sudo kubeadm alpha certs renew all

Inspect the generated certificates under /etc/kubernetes/pki to make sure they are generated correctly. E.g. expiry dates a year in the future, etc… For example, checking the new kube api server certificate details:

sudo openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text -certopt no_subject,no_version,no_serial,no_signame,no_issuer,no_pubkey,no_sigdump,no_header,no_aux
Renew Expired Kubernetes Certificates - checking new certificate details

Regenerate Kubeconfig Files

Now you can regenerate the kubeconfig files for the kubelet, controller-manager, scheduler, etc…

sudo kubeadm alpha phase kubeconfig all --apiserver-advertise-address 10.0.0.50

Note: you can leave the --apiserver-advertise-address option off, but the value used will default to your system’s default network interface.

Check that all the .conf files under /etc/kubernetes are re-created. i.e.

  • admin.conf
  • kubelet.conf
  • controller-manager.conf
  • scheduler.conf

You can also update your kubectl config with the admin.conf file that was generated if that is the config/context you were using from the master node.

mv .kube/config .kube/config.old
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
sudo chmod 777 $HOME/.kube/config
export KUBECONFIG=.kube/config

If your kube services have stopped on the master (kubelet etc) as a result of expired certificates and having tried to restart services recently (as was my case), reboot the master node at this point. All services that were failing to start because of expired certificates should now start correctly.

Reconnect and check everything is working as expected:

kubectl get nodes
kubeadm token list

Hashicorp Waypoint Server on Raspberry Pi

waypoint server running on raspberry pi

This evening I finally got a little time to play around with Waypoint. This wasn’t a straightforward install of Waypoint on my desktop though. I wanted to run and test HashiCorp Waypoint Server on Raspberry Pi. Specifically on my Pi Kubernetes cluster.

Out of the box Waypoint is simple to setup locally, whether you’re on Windows, Linux, or Mac. The binary is written in the Go programming language, which is common across HashiCorp software.

There is even an ARM binary available which lets you run the CLI on Raspberry Pi straight out of the box.

Installing Hashicorp Waypoint Server on Raspberry Pi hosted Kubernetes

I ran into some issues initially when assuming that waypoint install --platform=kubernetes -accept-tos would ensure an ARM docker image was pulled down for my Pi based Kubernetes hosts though.

My Kubernetes cluster also has the nfs-client-provisioner setup, which fulfills PersistentVolumeClaim resources with storage from my home FreeNAS Server Build. I noticed that PVCs were not being honored because they did not have the specific storage-class of nfs-storage that my nfs-client-provisioner required.

Fixing the PVC Issue

Looking at the waypoint CLI command, it’s possible to generate the YAML for the Kubernetes resources it would deploy with a --platform=kubernetes flag. So I fetched a base YAML resource definition:

waypoint install --platform=kubernetes -accept-tos --show-yaml

I modified the volumeClaimTemplates section to include my required PVC storageClassName of nfs-storage.

volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: nfs-storage
      resources:
        requests:
          storage: 1Gi

That sorted out the pending PVC issue in my cluster.

Fixing the ARM Docker Issue

Looking at the Docker image that the waypoint install command for Kubernetes gave me, I could see right away that it was not right for ARM architecture.

To get a basic Waypoint server deployment for development and testing purposes on my Raspberry Pi Kubernetes Cluster, I created a simple Dockerfile for armhf builds.

Basing it on the hypriot/rpi-alpine image, to get things moving quickly I did the following in my Dockerfile.

  • Added few tools, such as cURL.
  • Added a RUN command to download the waypoint ARM binary (currently 0.1.3) from Hashicorp releases and place in /usr/bin/waypoint.
  • Setup a /data volume mount point.
  • Created a waypoint user.
  • Created the entrypoint for /usr/bin/waypoint.

You can get my ARM Waypoint Server Dockerfile on Github, and find the built armhf Docker image on Docker Hub.

Now it is just a simple case of updating the image in the generated YAML StatefulSet to use the ARM image with the ARM waypoint binary embedded.

containers:
- name: server
  image: shoganator/waypoint:0.1.3.20201026-armhf
  imagePullPolicy: Always

With the YAML updated, I simply ran kubectl apply to deploy it to my Kubernetes Cluster. i.e.

kubectl apply -f ./waypoint-armhf.yaml

Now Waypoint Server was up and running on my Raspberry Pi cluster. It just needed bootstrapping, which is expected for a new installation.

Hashicorp Waypoint Server on Raspberry Pi - pod started.

Configuring Waypoint CLI to Connect to the Server

Next I needed to configure my internal jumpbox to connect to Waypoint Server to verify everything worked.

Things may differ for you here slightly, depending on how your cluster is setup.

Waypoint on Kubernetes creates a LoadBalancer resource. I’m using MetalLB in my cluster, so I get a virtual LoadBalancer, and the EXTERNAL-IP MetalLB assigned to the waypoint service for me was 10.23.220.90.

My cluster is running on it’s own dedicated network in my house. I use another Pi as a router / jumpbox. It has two network interfaces, and the internal interface is on the Kubernetes network.

By getting an SSH session to this Pi, I could verify the Waypoint Server connectivity via it’s LoadBalancer resource.

curl -i --insecure https://10.23.220.90:9702

HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 3490
Content-Type: text/html; charset=utf-8
Last-Modified: Mon, 19 Oct 2020 21:11:45 GMT
Date: Mon, 26 Oct 2020 14:27:33 GMT

Bootstrapping Waypoint Server

On a first time run, you need to bootstrap Waypoint. This also sets up a new context for you on the machine you run the command from.

The Waypoint LoadBalancer has two ports exposed. 9702 for HTTPS, and 9701 for the Waypoint CLI to communicate with using TCP.

With connectivity verified using curl, I could now bootstrap the server with the waypoint bootstrap command, pointing to the LoadBalancer EXTERNAL-IP and port 9701.

waypoint server bootstrap -server-addr=10.23.220.90:9701 -server-tls-skip-verify
waypoint context list
waypoint context verify

This command gives back a token as a response and sets up a waypoint CLI context from the machine it ran from.

Waypoint context setup and verified from an internal kubernetes network connected machine.

Using Waypoint CLI from a machine external to the Cluster

I wanted to use Waypoint from a management or workstation machine outside of my Pi Cluster network. If you have a similar network setup, you could also do something similar.

As mentioned before, my Pi Router device has two interfaces. A wireless interface, and a phyiscal network interface. To get connectivity over ports 9701 and 9702 I used some iptables rules. Importantly, my Kubernetes facing network interface is on 10.0.0.1 in the example below:

sudo iptables -t nat -A PREROUTING -i wlan0 -p tcp --dport 9702 -j DNAT --to-destination 10.23.220.90:9702
sudo iptables -t nat -A POSTROUTING -p tcp -d 10.23.220.90 --dport 9702 -j SNAT --to-source 10.0.0.1
sudo iptables -t nat -A PREROUTING -i wlan0 -p tcp --dport 9701 -j DNAT --to-destination 10.23.220.90:9701
sudo iptables -t nat -A POSTROUTING -p tcp -d 10.23.220.90 --dport 9701 -j SNAT --to-source 10.0.0.1

These rules have the effect of sending traffic destined for port 9701 and 9702 hitting the wlan0 interface, to the MetalLB IP 10.23.220.90.

The source and destination network address translation will translate the ‘from’ address of the TCP reply packets to make them look like they’re coming from 10.0.0.1 instead of 10.23.220.90.

Now, I can simply setup a Waypoint CLI context on a machine on my ‘normal’ network. This network has visibility of my Raspberry Pi Router’s wlan0 interface. I used my previously generated token in the command below:

waypoint context create -server-addr=192.168.7.31:9701 -server-tls-skip-verify -server-auth-token={generated-token-here} rpi-cluster
waypoint context verify rpi-cluster
Connectivity verified from my macOS machine, external to my Raspberry Pi Cluster with Waypoint Server running there.

Concluding

Waypoint Server is super easy to get running locally if you’re on macOS, Linux or Windows.

With a little bit of extra work you can get HashiCorp Waypoint Server on Raspberry Pi working, thanks to the versatility of the Waypoint CLI!

Kubernetes Backup on Raspberry Pi

storage server in data center rack

Once you start using your Raspberry Pi cluster for workloads with persistence, you’re probably going to want to implement a decent Kubernetes backup strategy.

I have been using my Raspberry Pi Cluster for a number of workloads with persistence requirements, from WordPress sites with MySQL databases, to Minecraft servers.

This post will detail how to get a basic install of Velero up and running on an ARM based Raspberry Pi Kubernetes cluster that uses NFS volume mounts for container persistence.

In other words, a solution that will allow you to backup both your Kubernetes resource states, and the storage that those resources are mounting and reading/writing from and to.

First though, I’ll revisit how I have been backing up my Pi cluster up until recently.

A Simple and Naive Backup Approach

One way of dealing with backups is a fairly brute force, copy data off and upload to another location method. I’ve been using this for a while now, but wanted a more Kubernetes-native method as an extra backup.

I have had a cronjob that does a mysql dump of all databases and then gzips and uploads this to BackBlaze B2 object storage.

Another cronjob handles compression and upload of a variety of different NFS volumes that pods mount. This cronjob runs in a BSD jail on my FreeNAS storage server, where the same Kubernetes NFS storage is mounted to.

You can see the details of how I do this in my Cheap S3 Cloud Backup with BackBlaze B2 post.

Kubernetes Backup of State and Volumes with Velero

Velero is nice in the way that it is pluggable. You can use different plugins and adapters to talk to different types of storage services.

It works well enough to backup your pod storage if you’re running Kubernetes on a platform that where that storage runs too. For example, if you’re running Kubernetes on AWS and using EBS persistent volumes. Another example would be VMware with vSAN storage.

But in this post we’re dealing with Kubernetes on Raspberry Pi using NFS shared storage. If you haven’t yet setup shared storage, here is a guide to setting up NFS storage on Raspberry Pi Kubernetes.

We’ll assume you’re also wanting to backup your state and pod volume storage to AWS S3. You can quite easily modify some of the commands and use S3 API compatible storage instead though. E.g. minio.

Install Velero

On a management machine (where you’re setup with kubectl and your cluster context), download Velero and make it executable. I’m using a Raspberry Pi here, so I’ve downloaded the ARM version.

curl -L -O https://github.com/vmware-tanzu/velero/releases/download/v1.5.1/velero-v1.5.1-linux-arm.tar.gz
tar -xvf velero-v1.5.1-linux-arm.tar.gz
sudo mv velero-v1.5.1-linux-arm/velero /usr/local/bin/velero
sudo chmod +x /usr/local/bin/velero

Create a dedicated IAM user for velero to use. You’ll also setup your parameters for S3 bucket and region, and add permissions to your IAM user for the target S3 bucket. Remember to change to use the AWS region of your preference.

Now you’re ready to install Velero into your cluster. Apply the magic incantation:

velero install --provider aws --plugins velero/velero-plugin-for-aws-arm:main --bucket $BUCKET --backup-location-config region=$REGION --secret-file ./credentials-velero --use-restic --use-volume-snapshots=false

There are a few things going on here that are different to the standard / example install commands in the documentation…

  • The plugins parameter specifies the ARM version of the Velero AWS plugin. I found that the install command and the usual velero-plugin-for-aws selection tries to pull down the wrong docker image (for x86 architecture). Here we specify we want the ARM version from the main branch.
  • Use Restic Integration. This enables the Restic open source integration for persistent volume backup. This is the magic that allows us to backup our NFS mounted volumes. Also handy if you happen to be using other file storage types like EFS, AzureFile, or local mounted.
  • Disable volume snapshots. We’re on a Pi cluster with NFS storage, so we don’t want to use volume snapshots at all for backup.

After the Velero install completes, you should get some output to indicate success.

Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.

You should also see your pods in the velero namespace up and running for Velero and Restic.

kubectl -n velero get pods
NAME                     READY   STATUS    RESTARTS   AGE
restic-589rm             1/1     Running   0          7m29s
restic-g6swc             1/1     Running   0          7m29s
velero-555695d95-dbzmd   1/1     Running   0          7m29s

If you see pod initialization errors, then double check you didn’t specify the normal velero plugin for AWS that would have caused the incorrect docker image architecture to be used.

With that complete, you’re ready to run your first backup.

Initiating a Manual Backup

Request a backup with the velero backup command. In the example below I’ll target my demo namespace. I’ll get volume backups for everything in this namespace that uses supported persistent volumes with the --default-volumes-to-restic flag.

If you don’t specify that flag, all backups will be opted-out by default for volume backups. That is, you opt-in to restic volume backups with this flag.

velero backup create demo-backup --include-namespaces=demo --default-volumes-to-restic

You can request backup state and details using the velero backup describe command.

velero backup describe demo-backup --details

It’s worth running a backup of a test namespace with some test workloads. Then simulate failure by deleting everything, followed by a velero restore.

This should prove the backup process works 100% and the DR strategy to be sound.

Don’t forget to do ocassional test DR scenarios to exercise your deployed backup solution!

Conclusion

It’s a good idea to backup your Kubernetes cluster once you start running services with persistence. Even if it is just for personal use.

A quick and dirty approach is to script out the export and backup of Kubernetes resources and mounted container storage. In the long run however, you’ll save time and get a more fully featured solution by using tools that are specifically designed for this use case.

Kube Chaos game source and executable release

kube chaos screenshot destroying a pod

Source code here: https://github.com/Shogan/kube-chaos

I recently blogged about a small game I made that interfaces with any Kubernetes cluster and allows you to enter cluster nodes and destroy live, running pods.

I mentioned I would make the source available if there was any interest, so here it is.

After a little bit of code clean-up, and removing my 2D Shooter Bullet and Weapon System and 2D Neon Grid asset code from the game, the source is now available. These modules are game development modules/assets that I sell on the Unity Asset Store, and as such could not include in the source here. The fancy grid/background effect is therefore missing in this version.

I’ve also added an initial Windows build in the releases section, which is ready to go. Just unzip, run and enter your kube context/namespace details to get going.

the kube chaos start screen

If you decide to compile from source yourself, just make sure you have Unity 2019.4 or later installed and you should be good to go. I’ve not yet tested on macOS or Linux, but

This is post #7 in my effort towards 100DaysToOffload.

I made a Kubernetes game where you explore your cluster and destroy pods

a game where you can explore and destroy pods in your kubernetes cluster

I enjoy game development as a hobby on the side. I also enjoy working with container schedulers like Kubernetes. Over the weekend I decided to create a Kubernetes game, combining those two thoughts.

In the game you enter and explore nodes in your cluster, and can destroy your very own live, running pods. Hide prod away!

The game is put together using my engine of choice, Unity. With Unity you code using C#.

3 x Nodes represented in-game from my Raspberry Pi Kubernetes cluster.
3 x Nodes represented in-game from my Raspberry Pi Kubernetes cluster. Can you spot the naming convention from one of my favourite movies?

Game Logic

The game logic was simple to put together. I have a couple of modular systems I’ve already developed (and actually sell on the Unity Asset Store), so those made the movement and shooting logic, as well as background grid effects a breeze.

Movement is implemented in a simple ‘twin-stick’ controller Script (a Unity concept, which is a class implementing Monobehaviour).

Other game logic is mostly contained in the bullet pattern module. I have some more Scripts that arrange and control the Kubernetes entities as well as their labels.

The interaction with Kubernetes itself is fairly hacked together. I wanted to put the game together as quickly as possible as I only worked on it over a couple of weekend evenings.

Let the hacky code flow…

Unity is a bit behind in .NET Framework version support and .NET Core was out of the question. This meant using the Kubernetes csharp client was not going to happen easily (directly in Unity that is). It would have been my first choice otherwise.

With that in mind, I skipped over to a hacky solution of invocating the kubectl client directly from within the game.

The game code executes kubectlcommands on threads separate to the main game loop and returns the results formatted accordingly, back to the game’s main thread. I used System.Diagnostics.Process for this.

From there, game entities are instantiated and populated with info and labels. (E.g. the nodes and the pods).

pods spawned in the game, bouncing around

Pods have health

Pods are given health (hit points) and they simply bounce around after spawning in. You can chase after and shoot them, at which point a kubectl destroy pod command is actually sent to the Kube API via kubectl!

The game world

You enter the world in a ‘node’ view, where you can see all of your cluster’s nodes. From there you can approach nodes to have them slide open a ‘door’. Entering the door transports you ‘into’ the node, where you can start destroying pods at will.

For obvious reasons I limit the pods that are destroyable to a special ‘demo’ namespace.

Putting together the demo pods

I use a great little tool called arkade in my Kubernetes Pi cluster.

arkade makes it really simple to install apps into your cluster. Great for quick POCs and demos.

Arkade offers a small library of useful and well thought out apps that are simple to install. The CLI provides strongly-typed flags to install these apps (or any helm charts) in short, one-line operations.

It also handles the logic around figuring out which platform you’re running on, and pulling down the correct images for that platform (if supported). Super useful when you’re on ARM as you are with the Raspberry Pi.

Straight from the GitHub page, this is how simple it is to setup:

# Note: you can also run without `sudo` and move the binary yourself
curl -sLS https://dl.get-arkade.dev | sudo sh

arkade --help
ark --help  # a handy alias

# Windows users with Git Bash
curl -sLS https://dl.get-arkade.dev | sh

I then went about installing a bunch of apps and charts with arkade. For example:

arkade install loki --namespace demo

Hooking the game up to my Kube Cluster

With the demo namespace complete, and the application pods running, I needed to get my Windows machine running the game talking to my Pi Cluster (on another local network).

I have a Pi ‘router’ setup that is perfectly positioned for this. All that is required is to run a kube proxy on this, listening on 0.0.0.0 and accepting all hosts.

kubectl proxy --address='0.0.0.0' --port=8001 --accept-hosts='.*'

I setup a local kube config pointing to the router’s local IP address on the interface facing my Windows machine’s network, and switched context to that configuration.

From there, the game’s kubectl commands get sent to this context and traverse the proxy to hit the kube API.

Destroying pods sure does exercise those ReplicaSets!

ReplicaSets spinning up new pods as quickly as they're destroyed in-game!
ReplicaSets spinning up new pods as quickly as they’re destroyed in-game!

Source

If there is any interest, I would be happy to publish the (hacky) source for the main game logic and basic logic that sends the kubectl processes off to other threads.

This is post #5 in my effort towards 100DaysToOffload.