Building a Raspberry Pi Kubernetes Cluster – Part 1 – Routing

Building a Raspberry Pi Kubernetes Cluster - part 1 - routing - title featured image

I’ve recently built myself a Kubernetes (1.16.2) cluster running on a combination of Raspberry Pi 4 and 3 devices.

Raspberry Pi Cluster Stack

I’ll be running through the steps I took to build it out in this series, with part 1 focusing on the router and internal node network side of things.

If you want to jump to the other parts in this series:

First off, here is a list of parts I used to set everything up:

To make the setup as portable as possible, and also slightly seggregated from my home network, I used the 1 x Raspberry Pi 3 device I had as a router between my home network and my Kubernetes Layer 2 Network (effectively the devices on the 8 port Netgear Switch).

Here is a network diagram that shows the setup.

Raspberry Pi Kubernetes Network Diagram

Building the Raspberry Pi Cluster Router

Of course you’ll need an OS on the microSD card for each Raspberry Pi you’re going to be using. I used the latest Raspbian Buster Lite image from the official Raspbian Downloads page (September 26).

This is a minimal image and is exactly what we need. You’ll need to write it to your microSD card. There are plenty tutorials out there on doing this, so I won’t cover it here.

One piece of advice though, would be to create a file called “ssh” on the imaged card filesystem after writing the image. This enables you to SSH on directly without the need to connect up a screen and setup the SSH daemon yourself. Basically just login to your home network DHCP server and look for the device once it boots then SSH to it’s automatically assigned IP address.

Also, it would be wise to reserve an IP address on your home network’s DHCP service for your Pi Router. Grab the MAC address of your Pi and add it to your home network DHCP service’s reserved IP addresses. I set mine to 192.168.2.30 on my WiFi network.

List the wlan interface’s MAC address with:

ifconfig wlan0

Setting Hostname and Changing the Default Password

On the Router Raspberry Pi, run the following command to change the hostname to something other than “raspberry” and change the default password too:

sudo raspi-config
Change the default password and hostname of the Raspberry Pi

Setting up the Pi Router

Now the rest of the guide deserves much credit to this blog post, however, I did change a few things on my setup, as the routing was not configured 100% correctly to allow external access to services on the internal Kubernetes network.

I needed to add a couple of iptables rules in order to be able to access my Ingress Controller from my home network. More on that later though.

Interface Setup

You need to configure the WiFi interface (wlan0) and the Ethernet Interface (eth0) for each “side” of the network.

Edit the dhcpd.conf file and add an eth0 configuration right at the bottom, then save.

sudo nano /etc/dhcpcd.conf
interface eth0
static ip_address=10.0.0.1/8
static domain_name_servers=1.1.1.1,208.67.222.222
nolink

Of course replace the above DNS servers with whichever you prefer to use. I’ve used Cloudflare and OpenDNS ones here.

Next, setup your WiFi interface to connect to your home WiFi. WiFi connection details get saved to /etc/wpa_supplicant/wpa_supplicant.conf but it is best to use the built-in configuration tool (raspi-config) to do the WiFi setup.

sudo raspi-config

Go to Network Options and enter your WiFi details. Save/Finish afterwards.

Install and Configure dnsmasq

sudo apt update
sudo apt install dnsmasq
sudo mv /etc/dnsmasq.conf /etc/dnsmasq.conf.backup

Create a new /etc/dnsmasq.conf file with the below command:

The script is the main dnsmasq configuration that sets DHCP up over the eth0 interface (for the 10.0.0.0/8 network side) and configures some nameservers for DNS as well as a few other bits.

Edit the service file for dnsmasq (/etc/init.d/dnsmasq) to prevent issues with start-up order of dnsmasq and dhcpcd:

sudo nano /etc/init.d/dnsmasq

Change the top of the file to look like this:

#!/bin/sh

# Hack to wait until dhcpcd is ready
sleep 10

### BEGIN INIT INFO
# Provides:       dnsmasq
# Required-Start: $network $remote_fs $syslog $dhcpcd
# Required-Stop:  $network $remote_fs $syslog
# Default-Start:  2 3 4 5
# Default-Stop:   0 1 6
# Description:    DHCP and DNS server
### END INIT INFO

The lines changed above are the sleep 10 command and the Required-Start addition of $dhcpcd.

At this point its a good idea to reboot.

sudo reboot now

After the reboot, check that dnsmasq is running.

sudo systemctl status dnsmasq

Setup iptables

First of all, enable IP forwarding. Edit the /etc/sysctl.conf file and uncomment this line:

net.ipv4.ip_forward=1

This enables us to use NAT rules with iptables.

Now you’ll configuring some POSTROUTING and FORWARD rules in iptables to allow your Raspberry Pi devices on the 10.0.0.0/8 network to access the internet via your Pi Router’s wlan0 interface.

sudo iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
sudo iptables -A FORWARD -i wlan0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
sudo iptables -A FORWARD -i eth0 -o wlan0 -j ACCEPT

Optional Step

This is optional, and you might only need to do this later on once you start running services in your Kubernetes Pi Cluster.

Forward Traffic from your home network to a Service or Node IP in your Cluster Network:

sudo iptables -t nat -A PREROUTING -i wlan0 -p tcp --dport 80 -j DNAT --to-destination 10.23.220.88:80
sudo iptables -t nat -A POSTROUTING -p tcp -d 10.23.220.88 --dport 80 -j SNAT --to-source 10.0.0.1

The above assumes a couple of things that you should change accordingly (if you use this optional step):

  • You have a Service running in the Kubnernetes network, listening on port 80 (http) on IP 10.23.220.88
  • You setup your Pi Router to use 10.0.0.1 as the eth0 device IP (as per above in this post), and your wlan0 interface is the connection that your Pi router is using to connect to your home network (WiFi).
  • You actually want to forward traffic hitting your Pi Router (from the WiFi wlan0 interface) through the 10.0.0.1 eth0 interface and into a service IP on the 10.0.0.0/8 network. (In my example above I have an nginx Ingress Controller running on 10.23.220.88).

Persisting your iptables rules across reboots

Persist all of your iptables rules by installing iptables-persistent:

sudo apt install iptables-persistent

The above will run a wizard after installation and you’ll get the option to save your IPv4 rules. Choose Yes, then reboot afterwards.

After reboot, run sudo iptables -L -n -v to check that the rules persisted after reboot.

Note: if you ever update your Pi Router’s iptables rules and want to re-save the new set of rules to persist across reboots, you’ll need to re-save them using the iptables-persistent package.

sudo dpkg-reconfigure iptables-persistent

Adding new Pi devices to your network in future

Whenever you add an additional Raspberry Pi device to the 8 port switch / Kubernetes network in the future, make sure you edit /etc/dnsmasq.conf to update the list of MAC addresses assigned to 10.0.0.x IP addresses.

You’ll want to set the new Pi’s eth0 MAC address up in the list of pre-defined DHCP leases.

You can also view the /var/lib/misc/dnsmasq.leases file to see the current dnsmasq DHCP leases.

This is handy when adding a new, un-configured Pi to the network – you can pick up the auto-assigned IP address here, and then SSH to that for initial configuration.

Concluding

That is pretty much the setup and configuration for the Pi Router complete. As mentioned above, much credit for this configuration goes to this guide on downey.io.

I ended up modifying the iptables rules for service traffic forwarding from my home network side into some Kubernetes LoadBalancer services I ended up running later on which I covered above in the Optional Steps section.

At this point you should have your Pi Router connected to your home network via WiFi, and have the Ethernet port plugged into your network switch. Make sure the switch is not connected back to your home network via an Ethernet cable or you’ll run into some strange network loop issues.

You should now be able to plug in new Pi’s to the network switch, and they should get automatically assigned DHCP addresses on the 10.0.0.0/8 network.

Updating your dnsmasq.conf file with the new Pi’s ethernet MAC addresses means that they can get statically leases IP addresses too, which you’ll need for your Kubernetes nodes once you start adding them (see Part 2 coming next).

Enabling and Using Ephemeral Containers on Kubernetes 1.16

Ephemeral Containers on Kubernetes 1.16

Ephemeral Containers are an early-state alpha feature in Kubernetes 1.16 and offer some interesting new dynamics when it comes to tooling that we can use in day-to-day Kubernetes operations.

To see this feature live, in action, check out the demo shell session below:

Before we look at Ephemeral Containers, let’s go over what a Pod is in the Kubernetes world.

Remember that a Pod in Kubernetes is a group of one or more containers (e.g. Docker containers).

With that basic tidbit of information out of the way, we’ll look at some characteristics that Pods and their containers have always had in the past:

  • They’re meant to be disposable and easily replaced in a controlled manner with Deployments.
  • You could not add containers to pods at runtime.
  • Containers in pods can have ports assigned for network access and use things like liveness probes.
  • Troubleshooting containers in pods usually meant looking at logs or using kubectl exec to get into the running container and poke around. The latter of course being useless if your container had already crashed.

So here is where I see one of the best use cases for the new Ephemeral Containers feature – troubleshooting.

Ephemeral Containers can be inserted into a live, running pod at runtime.

This means they are great for live troubleshooting of your applications. How many times have you wished your base docker image you’ve built your application image on top of has curl, dig, or even ping in some cases…

If we’ve been following best practices, we’ve kept our docker images as slim as possible, and removed as much attack surface area as possible. This usually means all the useful diagnostic tools are missing.

Ephemeral Containers are great. We can now keep a diagnostic Docker image handy with all the tools we need and live insert a diagnostic container into a running pod to troubleshoot when the time arises.

When an Ephemeral Container runs, it executes within the namespace of the target pod. So you’ll be able to access, for example, the filesystems and other resources that containers in the the pods have.

Demonstration

In order to follow along with this demo, you’ll need Kubernetes 1.16 or higher, and you’ll need to use two pod related features:

  • EphemeralContainers (of course) – disabled by default in 1.16 as it’s alpha.
  • PodShareProcessNamespace – for sharing the process namespace in a pod (enabled by default in 1.16 as it’s a beta feature).

To enable the Ephemeral Containers feature, edit the following configuration files on your Kubernetes master nodes and restart each master:

Enable the EphemeralContainers alpha feature gate in the following places

  • /etc/kubernetes/manifests/kube-apiserver.yaml
  • /etc/kubernetes/manifests/kube-scheduler.yaml

by adding the following line inside the command section:

--feature-gates=EphemeralContainers=true

Create a new pod (the one I’m using is Rabbit MQ and specific to ARM architecture as I’m using a Raspberry Pi cluster here), but replace this image with anything you like as its just for testing:

Save this as pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: rabbit
  labels:
    role: myrole
spec:
  shareProcessNamespace: true
  containers:
    - name: rabbitmq
      image: arm32v7/rabbitmq
      ports:
        - name: rabbit
          containerPort: 5672
          protocol: TCP

Create it with kubectl apply -f ./pod.yaml

Next, create an EphemeralContainer resource saving it as ephemeral-diagnostic-container.json

(Note that I’m using a Docker image I created, shoganator/rpi-alpine-tools with a bunch of diagnostic tools added, and that this image is specific to ARM architecture only). Replace the image in this file with anything else you like, e.g. busybox.

{
    "apiVersion": "v1",
    "kind": "EphemeralContainers",
    "metadata": {
            "name": "rabbit"
    },
    "ephemeralContainers": [{
        "command": [
            "bash"
        ],
        "image": "shoganator/rpi-alpine-tools",
        "imagePullPolicy": "Always",
        "name": "diagtools",
        "stdin": true,
        "tty": true,
        "terminationMessagePolicy": "File"
    }]
}

Apply this to your existing pod:

kubectl -n default replace --raw /api/v1/namespaces/default/pods/rabbit/ephemeralcontainers -f ./ephemeral-diagnostic-container.json

Describe your rabbit pod with kubectl describe pod rabbit

Ephemeral Containers:
   diagtools:
     Container ID:  docker://eb55c71f102ce3d56221934f6ebcabfd2da76204df718bd8d2573da24aecc8e9
     Image:         shoganator/rpi-alpine-tools
     Image ID:      docker-pullable://shoganator/rpi-alpine-tools@sha256:bb00f943d511c29cc2367183630940e797f5e9552683b672613bf4cb602a1c4c
     Port:          
     Host Port:     
     Command:
       bash
     State:          Running
       Started:      Sat, 16 Nov 2019 14:49:58 +0000
     Ready:          False
     Restart Count:  0
     Environment:    
     Mounts:         
 Conditions:
   Type              Status
   Initialized       True
   Ready             True
   ContainersReady   True
   PodScheduled      True

You can see now that the Ephemeral Containers section is populated with the newly running Ephemeral Container that we added into the rabbit pod.

The next step is to actually use this container to do some diagnosis / probing inside the pod. Attach to the diagtools ephemeral container inside the pod:

kubectl attach -it rabbit -c diagtools

Now you can list processes, ping items in the network, or maybe test another web service in your cluster from the context of this pod. E.g.

ps auxww
ping 192.168.192.13
curl http://hello-node1:8081
htop

Remember that with Ephemeral Containers:

  • Cannot have ports, so fields such as ports, livenessProbe, readinessProbe are not able to be used.
  • Setting resources is disallowed as pod resources are immutable.
    They will disappear if a pod is deleted/re-scheduled.

Definitive guide to using Weave Net CNI on AWS EKS

Looking to install the Weave Net CNI on AWS EKS / Kubernetes and remove the AWS CNI? Look no further. This guide will detail and demonstrate the process.

What this guide will cover

  • Removing AWS CNI plugin
  • Installing the Weave Net CNI on AWS EKS
  • Making sure your EC2 instances will work with Weave
  • Customising Weave Net CNI including custom pod overlay network ranges
  • Removing max-pods limit on your EKS worker nodes
  • Reconfiguring pods that don’t work after switching to Weave. (E.g. those that need to talk back to the EKS master nodes that do not get the Weave overlay network)

Want the Terraform source and test scripts to jump right in?

GitHub Terraform and test environment source

Otherwise, read on for step-by-step and more information…

There are a few guides floating around that detail how to install the Weave Net CNI plugin for Amazon Kubernetes clusters (EKS), however I’ve not seen them go into much detail.

Most tend to skip over some important steps and details when it comes to configuring weave and getting the pod networking functioning correctly.

There are also some important caveats that you should be aware of when replacing the AWS CNI Plugin with a different CNI, whether it be Weave, Calico, or any other.

Replacing CNI functionality

You should be 100% happy with what you’ll lose if completely replace the AWS CNI with another CNI. The AWS CNI has some very useful functionality such as:

  • Assigning IP addresses (via ENIs) to place pods directly into your VPC network
  • VPC flow logs that make sense

However, depending on your architecture and design decisions, as well as potential VPC network limitations, you may wish to opt out of the CNI that Amazon provides and instead use a different CNI that provides an overlay network with other functionality.

AWS CNI Limitations

One of the problems I have seen in VPCs is limited CIDR ranges, and therefore subnets that are carved up into smaller numbers of IP addresses.

The Amazon AWS CNI plugin is very IP address hungry and attaches multiple Secondary Private IP addresses to EKS worker nodes (EC2 instances) to provide pods in your cluster with directly assigned IPs.

This means that you can easily exhaust subnet IP addresses with just a few EKS worker nodes running.

This limitation also means that those who want high densities of pods running on worker nodes are in for a surprise. The IP address limit becomes an issue for maximum number of pods in these scenarios way before compute capacity becomes a problem.

This page shows the maximum number of ENI’s and Secondary IP addresses that can be used per EC2 instance: https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt

Removing the AWS CNI plugin

Note: This process will involve you needing to replace your existing EKS worker nodes (if any) in the cluster after installing the Weave Net CNI.

Assuming you have a connection to your cluster already, the first thing to do is to remove the AWS CNI.

kubectl -n=kube-system delete daemonset aws-node

With that gone, your future EKS workers will no longer assign multiple Secondary IP addresses from your VPC subnets.

Installing CNI Genie

With the AWS CNI plugin removed, your pods won’t be able to get a network connection when starting up from this point onward.

Installing a basic deployment of CNI Genie is a quick way to get automatic CNI selection working for containers that start from this point on.

CNI genie has tons of other great features like allowing you to customise which CNI containers use when starting up and more.

For now, you’re just using it to allow containers to start-up and use the Weave Net overlay network by default.

Install CNI Genie. This manifest works with Kubernetes 1.12, 1.13, and 1.14 on EKS.

kubectl apply -f https://raw.githubusercontent.com/Shogan/terraform-eks-with-weave/master/src/weave/genie-plugin.yaml

Installing Weave

Before continuing, you should ensure your EC2 machines disable source/destination network checking.

Make this change in the userdata script that your instances run when starting from their autoscale groups.

REGION_ID=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | grep -Po "(us|ca|ap|eu|sa)-(north|south)?(east|west|central)-[0-9]+")
aws ec2 modify-instance-attribute --instance-id $INSTANCE_ID --no-source-dest-check --region $REGION_ID

On to installing Weave Net CNI on AWS EKS…

Next, get a Weave Net CNI yaml manifest file. Decide what overlay network IP Range you are going to be using and fill it in for the env.IPALLOC_RANGE query string parameter value in the code block below before making the curl request.

curl --location -o ./weave-cni.yaml "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=192.168.0.0/16"

Note: the env.IPALLOC_RANGE query string param added is to specify you want a config with a custom CIDR range. This should be chosen specifically not to overlap with any network ranges shared with the VPC you’ll be deploying into.

In the example above I had a VPC and VPC peers that shared the CIDR block 10.0.0.0/8). Therefore I chose to use 192.168.0.0/16 for the Weave overlay network.

You should be aware of the network ranges you’re using and plan this out appropriately.

The config you now have as weave-cni.yaml will contain the environment variable IPALLOC_RANGE with the correct value that the weave pods will use to setup networking on the EKS Worker nodes.

Apply the weave Net CNI resources:

Note: This manifest is pre-created to use an overlay network range of 192.168.0.0/16

kubectl apply -f https://raw.githubusercontent.com/Shogan/terraform-eks-with-weave/master/src/weave/weave-cni.yaml

Note: Don’t expect things to change suddenly. The current EKS worker nodes will need to be rotated out (e.g. drain, terminate, wait for new to appear) in order for the IP addresses that the AWS CNI has kept warm/allocated to be released.

If you have any existing EKS workers running, drain them now and terminate/replace them with new workers. This includes the source/destination check change made previously.

kubectl get nodes
kubectl drain nodename --ignore-daemonsets

Remove max pod limits on nodes:

Your worker nodes by default have a limit set on how many pods they can schedule. The EKS AMI sets this based on EC2 type (and the max pods due to the usual ENI limitations / IP address limitations with the AWS CNI).

Check your max pod limits with:

kubectl get nodes -o yaml | grep pods

If you’re using the standard EKS optimized AMI (or a derivative of it) then you can simply pass an option to the bootstrap.sh script located in the image that setup the kubelet and joins the cluster. Set –use-max-pods false as an argument to the script.

For example, your autoscale group launch configuration might get the EC2 worker nodes to join the cluster using the bootstrap.sh script. You can update it like so:

/etc/eks/bootstrap.sh --b64-cluster-ca 'YOUR_BASE64_CLUSTER_CA_DATA_HERE' --apiserver-endpoint 'https://YOUR_EKS_CLUSTER_ENDPOINT_HERE' --use-max-pods false --kubelet-extra-args '' 'YOUR_CLUSTER_NAME_HERE'

If you’re using the EKS Terraform module you can simply pass in bootstrap-extra-args – this will automatically setup your worker node userdata templates with extra bootstrap arguments for the kubelet. See example here

Checking max-pods limit again after applying this change, you should see the previous pod limit (based on prior AWS CNI max pods for your instance type) removed now.

You’re almost running Weave Net CNI on AWS EKS, but first you need to roll out new worker nodes.

With the Weave Net CNI installed, the kubelet service updated and your EC2 source/destination checks disabled, you can rotate out your old EKS worker nodes, replacing them with the new nodes.

kubectl drain node --ignore-daemonsets

Once the new nodes come up and start scheduling pods, if everything went to plan you should see that new pods are using the Weave overlay network. E.g. 192.168.0.0/16.

A quick run-down on weave IP addresses and routes

If you get a shell to a worker node running the weave overlay network and do a listing of routes, you might see something like the following:

# ip route show
default via 10.254.109.129 dev eth0
10.254.109.128/26 dev eth0 proto kernel scope link src 10.254.109.133
169.254.169.254 dev eth0
192.168.0.0/16 dev weave proto kernel scope link src 192.168.192.0 

This routing table shows two main interfaces in use. One from the host (EC2) instance network interfaces itself, eth0, and one from weave called weave.

When network packets are destined for the 10.254.109.128/26 address space, then traffic is routed down eth0.

If traffic on the host is destined for any address on 192.168.0.0/16, it will instead route via the weave interface ‘weave’ and the weave system will handle routing that traffic appropriately.

Otherwise if the traffic is destined for some public IP address out on the wider internet, it’ll go down the default route which is down the interface, eth0. This is a default gateway in the VPC subnet in this case – 10.254.109.129.

Finally, metadata URL traffic for 169.254.169.254 goes down the main host eth0 interface of course.

Caveats

For the most part everything should work great. Weave will route traffic between it’s overlay network and your worker node’s host network just fine.

However, some of your custom workloads or kubernetes tools might not like being on the new overlay network. For example they might need to talk to other Kubernetes nodes that do not run weave net.

This is now where the limitation of using a managed Kubernetes offering like EKS becomes a bit of a problem.

You can’t run weave on the Kubernetes master / API servers that are effectively the ‘managed’ control plane that AWS EKS hosts for you.

This means that your weave overlay network does not span the Kubernetes master nodes where the Kubernetes API runs.

If you have an application or container in the weave overlay network and the Kubernetes master node / API needs to talk to it, this won’t work.

One potential solution though is to use hostNetwork: true in your pod specification. However you should of course be aware of how this would affect your application and application security.

In my case, I was running metrics-server and it stopped working after it started using Weave. I found out that the Kubernetes API needs to talk to the metrics-server service and of course this won’t work in the overlay network.

Example EKS with Weave Net CNI cluster

You can use the source code I’ve uploaded here.

There are five simple steps to deploy this example EKS cluster in your own account.

  • Modify the example.tfvars file to fit your own parameters.
  • terraform plan -var-file="example.tfvars" -out="example.tfplan"
  • terraform apply "example.tfplan"
  • ./setup-weave.sh
  • ./test-weave.sh

Warning: This will create a new VPC, subnets, NAT Gateway instance, Internet Gateway, EKS Cluster, and set of worker node autoscale groups. So be sure Terraform Destroy this if you’re just testing things out.

– Your wallet

After terraform creates all the resources, you can run the two included shell scripts. setup-weave.sh will remove the AWS CNI, install CNI genie, Weave, and deploy two simple example pods and services.

At this point you should terminate your existing worker nodes (that still use the AWS CNI) and wait for your new worker nodes to join the cluster.

test-weave.sh will wait for the hello-node test pods to become ready, and then execute a curl command inside one, talking to the other via the the service and vice versa. If successful, you’ll see a HTTP 200 OK response from each service.

Fast Batch S3 Bucket object deletion from the shell

This is a quick post showing a nice and fast batch S3 bucket object deletion technique.

I recently had an S3 bucket that needed cleaning up. It had a few million objects in it. With path separating forward slashes this means there were around 5 million or so keys to iterate.

The goal was to delete every object that did not have a .zip file extension. Effectively I wanted to leave only the .zip file objects behind (of which there were only a few thousand), but get rid of all the other millions of objects.

My first attempt was straight forward and naive. Iterate every single key, check that it is not a .zip file, and delete it if not. However, every one of these iterations ended up being an HTTP request and this turned out to be a very slow process. Definitely not fast batch S3 bucket object deletion…

I fired up about 20 shells all iterating over objects and deleting like this but it still would have taken days.

I then stumbled upon a really cool technique on serverfault that you can use in two stages.

  1. Iterate the bucket objects and stash all the keys in a file.
  2. Iterate the lines in the file in batches of 1000 and call delete-objects on these – effectively deleting the objects in batches of 1000 (the maximum for 1 x delete request).

In-between stage 1 and stage 2 I just had to clean up the large text file of object keys to remove any of the lines that were .zip objects. For this process I used sublime text and a simple regex search and replace (replacing with an empty string to remove those lines).

So here is the process I used to delete everything in the bucket except the .zip objects. This took around 1-2 hours for the object key path collection and then the delete run.

Get all the object key paths

Note you will need to have Pipe Viewer installed first (pv). Pipe Viewer is a great little utility that you can place into any normal pipeline between two processes. It gives you a great little progress indicator to monitor progress in the shell.

aws s3api list-objects --output text --bucket the-bucket-name-here --query 'Contents[].[Key]' | pv -l > all-the-stuff.keys

 

Remove any object key paths you don’t want to delete

Open your all-the-stuff.keys file in Sublime or any other text editor with regex find and replace functionality.

The regex search for sublime text:

^.*.zip*\n

Find and replace all .zip object paths with the above regex string, replacing results with an empty string. Save the file when done. Make sure you use the correctly edited file for the following deletion phase!

Iterate all the object keys in batches and call delete

tail -n+0 all-the-stuff.keys | pv -l | grep -v -e "'" | tr '\n' '\0' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket the-bucket-name-here --delete "Objects=[$(printf "{Key=%q}," "$@")],Quiet=false"' _

This one-liner effectively:

  • tails the large text file (mine was around 250MB) of object keys
  • passes this into pipe viewer for progress indication
  • translates (tr) all newline characters into a null character ‘\0’ (effectively every line ending)
  • chops these up into groups of 1000 and passes the 1000 x key paths as an argument with xargs to the aws s3api delete-object command. This delete command can be passed an Objects array parameter, which is where the 1000 object key paths are fed into.
  • finally quiet mode is disabled to show the result of the delete requests in the shell, but you can also set this to true to remove that output.

Effectively you end up calling aws s3api delete-object passing in 1000 objects to delete at a time.

This is how it can get through the work so quickly.

Nice!

Troubleshooting Amazon EKS Worker Nodes not joining the cluster

mechanic underneath car fixing things

I’ve recently been doing a fair bit of automation work on bringing up AWS managed Kubernetes clusters using Terraform (with Packer for building out the worker group nodes). Read on for some handy tips on troubleshooting EKS worker nodes.

Some of my colleagues have not worked with EKS (or Kubernetes) much before and so I’ve also been sharing knowledge and helping others get up to speed. A colleague was having trouble with their newly provisioned personal test EKS cluster found that the kube-system / control plane related pods were not starting.  I assisted with the troubleshooting process and found the following…

Upon diving into the logs of the kube-system related pods (dns, aws CNI, etc…) it was obvious that the pods were not being scheduled on the brand new cluster. The next obvious command to run was kubectl get nodes -o wide to take a look at the general state of the worker nodes.

Unsurprisingly there were no nodes in the cluster.

Troubleshooting worker nodes not joining the cluster

The first thing that comes to mind when you have worker nodes that are not joining the cluster on startup is to check the bootstrapping / startup scripts. In EKS’ case (and more specifically EC2) the worker nodes should be joining the cluster by running a couple of commands in the userdata script that the EC2 machines run on launch.

If you’re customising your worker nodes with your own custom AMI(s) then you’ll most likely be handling this userdata script logic yourself, and this is the first place to check.

The easiest way of checking userdata script failures on an EC2 instance is to simply get the cloud-init logs direct from the instance. Locate the EC2 machine in the console (or the instance-id inspect the logs for failures on the section that logs execution of your userdata script.

  • In the EC2 console: Right-click your EC2 instance -> Instance Settings -> Get System Log.
  • On the instance itself:
    • cat /var/log/cloud-init.log | more
    • cat /var/log/cloud-init-output.log | more

Upon finding the error you can then check (using intuition around the specific error message you found):

  • Have any changes been introduced lately that might have caused the breakage?
  • Has the base AMI that you’re building on top of changed?
  • Have any resources that you might be pulling into the base image builds been modified in any way?

These are the questions to ask and investigate first. You should be storing base image build scripts (packer for example) in version control / git, so check the recent git commits and image build logs first.

How to setup a basic Kubernetes cluster and add an NGINX Ingress Controller on DigitalOcean

Most of the steps in this how to post can be applied to any Kubernetes cluster to get an NGINX Ingress Controller deployed, so you don’t necessarily have to be running Kubernetes in DigitalOcean. With that said, let’s go through the process of setting up a Kubernetes cluster with NGINX Ingress Controller on DigitalOcean.

DigitalOcean have just officially announced their own Kubernetes offering so this guide covers initial deployment of a basic worker node pool on DigitalOcean, and then moves on to deploying an Ingress Controller setup.

Note: If you already have a Kubernetes cluster setup and configured, then you can skip the initial cluster and node pool provisioning step below and move on to the Helm setup part.

Deploy a Kubernetes node pool on DigitalOcean

You could simply do this with the Web UI console (which makes things really simple), but here I’ll be providing the doctl commands to do this via the command line.

First of all, if you don’t have it already download and setup the latest doctl release. Make sure it’s available in your PATH.

Initialise / authenticate doctl. Provide your own API key when prompted.

doctl auth init

Right now, the help documentation in doctl version 1.12.2 does not display the kubernetes related commands arguments, but they’re available and do work.

Create a new Kubernetes cluster with just a single node of the smallest size (you can adjust this to your liking of course). I want a nice cheap cluster with a single node for now.

doctl k8s cluster create example-cluster --count=1 --size=s-1vcpu-2gb

The command above will provision a new cluster with a default node pool in the NYC region and wait for the process to finish before completing. It’ll also update your kubeconfig file if it detects one on your system.

output of the doctl k8s cluster create command

Once it completes, it’ll return and you’ll see the ID of your new cluster along with some other details output to the screen.

Viewing the Kubernetes console in your browser should also show it ready to go. You can download the config from the web console too if you wish.

Kubeconfig setup

If you’re new to configuring kubectl to manage Kubernetes, follow the guide here to use your kube config file that DigitalOcean provides you with.

the load balancer that will front your Ingress Controller on DigitalOcean Kubernetes

Handling different cluster contexts

With kubectl configured, test that it works. Make sure you’re in your new cluster’s context.

kubectl config use-context do-nyc1-example-cluster

If you’re on a Windows machine and use PowerShell and have multiple Kubernetes clusters, here is a simple set of functions I usually add to my PowerShell profile – one for each cluster context that allows easy switching of contexts without having to type out the full kubectl command each time:

Open your PowerShell profile with:

notepad $profile

Add the following (one for each context you want) – make sure you replace the context names with your own cluster names:

function kubecontext-minikube { kubectl config use-context minikube }
function kubecontext-seank8s { kubectl config use-context sean.k8s.local }
function kubecontext-digitalocean { kubectl config use-context do-nyc1-example-cluster }

Simply enter the function name and hit enter in your PS session to switch contexts.

If you didn’t have any prior clusters setup in your kubeconfig file, you should just have your new DigitalOcean cluster context selected already by default.

Deploy Helm to your cluster

Time to setup Helm. Follow this guide to install and configure helm using kubectl.

helm logo

Deploy the Helm nginx-ingress chart to enable an Ingress Controller on DigitalOcean in your Kubernetes cluster

Now that you have helm setup, you can easily deploy an Ingress Controller to your cluster using the nginx helm chart (package).

helm install --name nginx-ingress stable/nginx-ingress --set service.type=LoadBalancer --namespace default

When you specify the service.type of “LoadBalancer”, DigitalOcean will provision a LoadBalancer that fronts this Kubernetes service on your cluster. After a few moments the Helm deployment should complete (it’ll run async in the background).

You can monitor the progress of the service setup in your cluster with the following command:

kubectl --namespace default get services -o wide -w nginx-ingress-controller

Open the Web console, go to Networking, and then look for Load Balancers.

You should see your new NGINX load balancer. This will direct any traffic through to your worker pool node(s) and into the Kubernetes Service resource that fronts the pods running NGINX Ingress.

view of the digitalocean load balancer

At this point you should be able to hit the IP Address in your web browser and get the default nginx backend for ingress (with a 404 response). E.g.

Great! This means it’s all working so far.

Create a couple of basic web deployments inside your cluster

Next up you’ll create a couple of very simple web server Deployments running in single pods in your cluster’s node pool.

Issue the following kubectl command to create two simple web deployments using Google’s official GCR hello-app image. You’ll end up with two deployments and two pods running separately hosted “hello-app” web apps.

kubectl run web-example1 --image=gcr.io/google-samples/hello-app:2.0 --port=8080
kubectl run web-example2 --image=gcr.io/google-samples/hello-app:2.0 --port=8080

Confirm they’re up and running wth 1 pod each:

kubectl get deployments
NAME                            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
web-example1                    1         1         1            1           12m
web-example2                    1         1         1            1           23m

Now you need a service to back the new deployment’s pods. Expose each deployment with a simple NodePort service on port 8080:

kubectl expose deployment/web-example1 --type="NodePort" --port 8080
kubectl expose deployment/web-example2 --type="NodePort" --port 8080

A NodePort service will effectively assign a port number from your cluster’s service node port range (default between 30000 and 32767) and each node in your cluster will proxy that specific port into your Service on the port you specify. Nodes are not available externally by default and so creating a NodePort service does not expose your service externally either.

Check the services are up and running and have node ports assigned:

kubectl get services
NAME                            TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE
web-example1                    NodePort       10.245.125.151   <none>           8080:30697/TCP               13m
web-example2                    NodePort       10.245.198.91    <none>           8080:31812/TCP               24m

DNS pointing to your Load Balancer

Next you’ll want to set up a DNS record to point to your NGINX Ingress Controller Load Balancer IP address. Grab the IP address from the new Kubernetes provisioned Load Balancer for Ingress from the DigitalOcean web console.

Create an A record to point to this IP address.

Create your Ingress Rules

With DNS setup, create a new YAML file called fanout.yaml:

This specification will create an Kubernetes Ingress Resource which your Ingress Controller will use to determine how to route incoming HTTP requests to your Ingress Controller Load Balancer.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: simple-fanout-example
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: example-ingress.yourfancydomainnamehere.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: web-example1
          servicePort: 8080
      - path: /web2/*
        backend:
          serviceName: web-example2
          servicePort: 8080

Make sure you update the host value under the first rule to point to your new DNS record (that fronts your Ingress Controller Load Balancer). i.e. the “example-ingress.yourfancydomainnamehere.com” bit needs to change to your own host / A record you created that points to your own Load Balancer IP address.

The configuration above is a typical “fanout” ingress setup. It provides two rules for two different paths on the host DNS you setup and allows you to route HTTP traffic to different services based on the hostname/path.

This is super useful as you can front multiple different services with a single Load Balancer.

  • example-ingress.yourfancydomainnamehere.com/* -> points to your simple web deployment backed by the web-example1 service you exposed it on. Any request that does not match any other rule will be directed to this service (*).
  • example-ingress.yourfancydomainnamehere.com/web2/* -> points to your web-example2 service. If you hit your hostname with the path /web2/* the request will go to this service.

Testing

Try browse to the first hostname using your own DNS record and try different combinations that match the rules you defined in your ingress rule on HTTP. You should get the web-example1 “hello-app” being served from your web-example1 pod for any request that does not match /web2/*. E.g. /foo.

For /web2/* you should get the web-example2 “hello-app” default web page. It’ll also display the name of the pod it was served from (in my case web-example2-75fd68f658-f8xcd).

Conclusion

Congratulations! You now have a single Load Balancer fronting an NGINX Ingress Controller on DigitalOcean Kubernetes.

You can now expose multiple Kubernetes run services / deployments from a single Ingress and avoid the need to have multiple Load Balancers running (and costing you money!)