The saga pattern is useful when you have transactions that require a bunch of steps to complete successfully, with failure of steps requiring associated rollback processes to run. This post will cover the saga pattern with aws-cdk, leveraging AWS Step Functions and Lambda.
If you need an introduction to the saga pattern in an easy to understand format, I found this GOTO conference session by Caitie McCaffrey very informative.
I’ll be taking things one step further by automating the setup and deployment of a sample app which uses the saga pattern with aws-cdk.
I’ve started using aws-cdk fairly frequently, but realise it has the issue of vendor lock-in. I found it nice to work with in the case of step functions particularly in the way you construct step chains.
Above you see a successful transaction run, where all records are saved to a DynamoDB table entry.
If one of those steps were to fail, you need to manage the rollback process of your transaction from that step backwards.
Illustrating Failure Rollback
As mentioned above, with the saga pattern you’ll want to rollback any steps that have run from the point of failure backward.
The example app has three steps:
Each step is a simple lambda function that writes some data to a DynamoDB table with a primary partition key of TransactionId.
In the screenshot below, TransformRecords has a simulated failure, which causes the lambda function to throw an error.
A catch step is linked to each of the process steps to handle rollback for each of them. Above, TransformRecordsRollbackTask is run when TransformRecordsTask fails.
The rollback steps cascade backward to the first ‘business logic’ step ProcessRecordsTask. Any steps that have run up to that point will therefore have their associated rollback tasks run.
Here is what an entry looks like in DynamoDB if it failed:
You’ll notice this one does not have the ‘Sample’ data that you see in the previously shown successful transaction. In reality, for a brief moment it does have that sample data. As each rollback step is run, the associated data for that step is removed from the table entry, resulting in the above entry for TransactionId 18.
Deploying the Sample Saga Pattern App with aws-cdk
The lambda functions check the payload input for the simulateFail flag, and if found will do a Math.random() check to give chance of failure in one of the process steps.
Taking it Further
To take this example further, you’ll want to more carefully manage step outputs using Step Function ResultPath configuration. This will ensure that your steps don’t overwrite data in the state machine and that steps further down the line have access to the data that they need.
You’ll probably also want a step at the end of the line for the case of failure (which runs after all rollback steps have completed). This can handle notifications or other tasks that should run if a transaction fails.
AWS have a handy post up that shows you how to get CodeBuild local by running it with Docker here.
Having a local CodeBuild environment available can be extremely useful. You can very quickly test your buildspec.yml files and build pipelines without having to go as far as push changes up to a remote repository or incurring AWS charges by running pipelines in the cloud.
I found a few extra useful bits and pieces whilst running a local CodeBuild setup myself and thought I would document them here, along with a summarised list of steps to get CodeBuild running locally yourself.
Place the shell script in your local project directory (alongside your buildspec.yml file).
Now it’s as easy as running this shell script with a few parameters to get your build going locally. Just use the -i option to specify the local docker CodeBuild image you want to run.
./codebuild_build.sh -c -i aws/codebuild/standard:3.0 -a output
The following two options are the ones I found most useful:
-c – passes in AWS configuration and credentials from the local host. Super useful if your buildspec.yml needs access to your AWS resources (most likely it will).
-b – use a buildspec.yml file elsewhere. By default the script will look for buildspec.yml in the current directory. Override with this option.
-e – specify a file to use as environment variable mappings to pass in.
Testing it out
Here is a really simple buildspec.yml if you want to test this out quickly and don’t have your own handy. Save the below YAML as simple-buildspec.yml.
- echo This is a test.
- echo This is the pre_build step
- echo This is the build step
- bash -c "if [ /"$CODEBUILD_BUILD_SUCCEEDING/" == /"0/" ]; then exit 1; fi"
- echo This is the post_build step
Now just run:
./codebuild_build.sh -b simple-buildspec.yml -c -i aws/codebuild/standard:3.0 -a output /tmp
You should see the script start up the docker container from your local image and ‘CodeBuild’ will start executing your buildspec steps. If all goes well you’ll get an exit code of 0 at the end.
I recently came across a scenario requiring CloudWatch log ingestion to a private Splunk HEC (HTTP Event Collector).
The first and preferred method of ingesting CloudWatch Logs into Splunk is by using AWS Firehose. The problem here though is that Firehose only seems to support an endpoint that is open to the public.
This is a problem if you have a Splunk HEC that is only available inside of a VPC and there is no option to proxy public connections back to it.
The next thing I looked at was the Splunk AWS Lambda function template to ingest CloudWatch logs from Log Group events. I had a quick look and it seems pretty out of date, with synchronous functions and libraries in use.
So, I decided to put together a small AWS Lambda Serverless project to improve on what is currently out there.
async / await, and for promised that wrap the synchronous libraries like zlib.
A module that handles identification of Log Group names based on a custom regex pattern. If events come from log groups that don’t match the naming convention, then they get rejected. The idea is that you can write another small function that auto-subscribes Log Groups.
Secrets Manager integration for loading the Splunk HEC token from Secrets Manager. (Or fall back to a simple environment variable if you like).
Serverless framework wrapper. Pass in your Security Group ID, Subnet IDs and tags, and let serverless CLI deploy the function for you.
Lambda VPC support by default. You should deploy this Lambda function in a VPC. You could change that, but my idea here is that most enterprises would be running their own internal Splunk inside of their corporate / VPC network. Change it by removing the VPC section in serverless.yml if you do happen to have a public facing Splunk.
You deploy it using Serverless framework, passing in your VPC details and a few other options for customisation.
Once configured, it’ll pick up any log events coming in from Log Groups you’ve ‘subscribed’ it to (Lambda CloudWatch Logs Triggers).
These events get enriched with extra metadata defined in the function. The metadata is derived by default from the naming convention used in the CloudWatch Log Groups. Take a close look at the included Regex pattern to ensure you name your Log Groups appropriately. Finally, they’re sent to your Splunk HEC for ingestion.
For an automated Log Group ingestion story, write another small helper function that:
Looks for Log Groups that are not yet subscribed as CloudWatch Logs Triggers.
Adds them to your CloudWatch to Splunk HEC function as a trigger and enables it.
In the future I might add this ‘automatic trigger adding function’ to the Github repository, so stay tuned!
Configure the Lambda function in Account B to allow invocation from the SNS topic in Account A
Next, add a resource-based permission policy to your Lambda function in Account B. This policy will effectively allow the specific SNS topic in Account A to invoke the Lambda function.
It’s always good practice to follow the principle of least privilege (POLP). In this case you’re only allowing the specific SNS topic in one account to invoke the specific Lambda function you’re adding the policy to.
Subscribe the Lambda function in Account B to the SNS topic in Account A
Of course you’ll need to actually subscribe the Lambda function to the SNS topic. From Account B (where your Lambda function is setup), run the following command to subscribe it to the SNS topic in Account A.
In this post I’ll explain how I created an AWS EC2 Spot Instance Termination Simulator to run on any EC2 instance.
It can signal typical monitoring scripts, applications, interrupt handlers, and more, just like a real Spot Termination signal would do.
Looking around, there aren’t really any straight forward ways of testing this behaviour.
The obvious way is to change the price you bid for spot instances to the threshold of the current market price. This is so that if there is a slight increase in demand your spot instance(s) will be terminated.
The problem with that approach is of course that you can’t easily predict when the price will move, or by how much.
How EC2 Spot Instance Termination warnings work
When your spot instance bid price is surpassed by the current market price, a Termination Notice becomes available on one of the instance’s metadata endpoints. For example, http://169.254.169.254/latest/meta-data/spot/termination-time.
There is one other, newer endpoint at http://169.254.169.254/latest/meta-data/spot/instance-action that could also be used.
At this point the endpoint returns an HTTP 200 response. A timestamp of when a shutdown signal is going to be sent to your instance’s OS is also returned.
In the case of the newer endpoint, a JSON string is returned with an action and time. Usually the endpoint usually simply returns a 404 not found response.
The tools out there tend to monitor this endpoint to warn and action in case of a Spot Termination notice being received.
The two minute warning feature was added by Amazon back in 2015, announced in this blog post.
The service is a NodePort service, exposing the port on the host it runs on.
List the service and the pod (and find the kubernetes node the pod is running on):
kubectl get svc ec2-spot-termination-simulator
kubectl get pod spot-term-simulator-xxxxxx -o wide
Take note of the NodePort the service is listening on for the Kubernetes node. E.g. port 30626.
docker run -p 30626:80 -e PORT=80 shoganator/ec2-spot-termination-simulator
Proxying the EC2 metadata service
Perform some trickery to proxy traffic destined to 169.254.169.254 on port 80 to localhost (where the container runs). This is so that the fake service can take the place of the real one.
SSH onto the Kubernetes node and run:
sudo ifconfig lo:0 169.254.169.254 up
sudo socat TCP4-LISTEN:80,fork TCP4:127.0.0.1:30626
Create an alias for the localhost interface at 169.254.169.254, effectively taking over the EC2 metadata service and sending that to 127.0.0.1 instead.
Forward TCP port 80 traffic (usually destined to the EC2 metadata service) to 127.0.0.1 on the NodePort 30626. This is where the ec2-spot-termination-simulator pod is running on this host. Substitute this with the correct port in your case.
As a result, it should return a 200 OK response with a timestamp from the fake service. This is the same as the real one would.
Looking at how the kube-spot-termination-notice-handler service works specifically:
This service runs as a DaemonSet, meaning one instance per host. The instance on the node you set the simualtor up on should immediately drain the node. The instance won’t actually be terminated as this was of course just a simulated termination.
If you’re not running on Kubernetes and are using a different spot termination handler, don’t worry. The system you’re using to monitor the EC2 instance metadata endpoint should still take action at this point.
The proxied web service is now returning a legitimate looking termination time notice on http://169.254.169.254/latest/meta-data/spot/termination-time.
Otherwise, read on for step-by-step and more information…
There are a few guides floating around that detail how to install the Weave Net CNI plugin for Amazon Kubernetes clusters (EKS), however I’ve not seen them go into much detail.
Most tend to skip over some important steps and details when it comes to configuring weave and getting the pod networking functioning correctly.
There are also some important caveats that you should be aware of when replacing the AWS CNI Plugin with a different CNI, whether it be Weave, Calico, or any other.
Replacing CNI functionality
You should be 100% happy with what you’ll lose if completely replace the AWS CNI with another CNI. The AWS CNI has some very useful functionality such as:
Assigning IP addresses (via ENIs) to place pods directly into your VPC network
VPC flow logs that make sense
However, depending on your architecture and design decisions, as well as potential VPC network limitations, you may wish to opt out of the CNI that Amazon provides and instead use a different CNI that provides an overlay network with other functionality.
AWS CNI Limitations
One of the problems I have seen in VPCs is limited CIDR ranges, and therefore subnets that are carved up into smaller numbers of IP addresses.
The Amazon AWS CNI plugin is very IP address hungry and attaches multiple Secondary Private IP addresses to EKS worker nodes (EC2 instances) to provide pods in your cluster with directly assigned IPs.
This means that you can easily exhaust subnet IP addresses with just a few EKS worker nodes running.
This limitation also means that those who want high densities of pods running on worker nodes are in for a surprise. The IP address limit becomes an issue for maximum number of pods in these scenarios way before compute capacity becomes a problem.
This page shows the maximum number of ENI’s and Secondary IP addresses that can be used per EC2 instance: https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt
Removing the AWS CNI plugin
Note: This process will involve you needing to replace your existing EKS worker nodes (if any) in the cluster after installing the Weave Net CNI.
Assuming you have a connection to your cluster already, the first thing to do is to remove the AWS CNI.
kubectl -n=kube-system delete daemonset aws-node
With that gone, your future EKS workers will no longer assign multiple Secondary IP addresses from your VPC subnets.
Installing CNI Genie
With the AWS CNI plugin removed, your pods won’t be able to get a network connection when starting up from this point onward.
Installing a basic deployment of CNI Genie is a quick way to get automatic CNI selection working for containers that start from this point on.
CNI genie has tons of other great features like allowing you to customise which CNI containers use when starting up and more.
For now, you’re just using it to allow containers to start-up and use the Weave Net overlay network by default.
Install CNI Genie. This manifest works with Kubernetes 1.12, 1.13, and 1.14 on EKS.
Next, get a Weave Net CNI yaml manifest file. Decide what overlay network IP Range you are going to be using and fill it in for the env.IPALLOC_RANGE query string parameter value in the code block below before making the curl request.
Note: the env.IPALLOC_RANGE query string param added is to specify you want a config with a custom CIDR range. This should be chosen specifically not to overlap with any network ranges shared with the VPC you’ll be deploying into.
In the example above I had a VPC and VPC peers that shared the CIDR block 10.0.0.0/8). Therefore I chose to use 192.168.0.0/16 for the Weave overlay network.
You should be aware of the network ranges you’re using and plan this out appropriately.
The config you now have as weave-cni.yaml will contain the environment variable IPALLOC_RANGE with the correct value that the weave pods will use to setup networking on the EKS Worker nodes.
Apply the weave Net CNI resources:
Note: This manifest is pre-created to use an overlay network range of 192.168.0.0/16
Note: Don’t expect things to change suddenly. The current EKS worker nodes will need to be rotated out (e.g. drain, terminate, wait for new to appear) in order for the IP addresses that the AWS CNI has kept warm/allocated to be released.
If you have any existing EKS workers running, drain them now and terminate/replace them with new workers. This includes the source/destination check change made previously.
kubectl get nodes
kubectl drain nodename --ignore-daemonsets
Remove max pod limits on nodes:
Your worker nodes by default have a limit set on how many pods they can schedule. The EKS AMI sets this based on EC2 type (and the max pods due to the usual ENI limitations / IP address limitations with the AWS CNI).
Check your max pod limits with:
kubectl get nodes -o yaml | grep pods
If you’re using the standard EKS optimized AMI (or a derivative of it) then you can simply pass an option to the bootstrap.sh script located in the image that setup the kubelet and joins the cluster. Set –use-max-pods false as an argument to the script.
For example, your autoscale group launch configuration might get the EC2 worker nodes to join the cluster using the bootstrap.sh script. You can update it like so:
If you’re using the EKS Terraform module you can simply pass in bootstrap-extra-args – this will automatically setup your worker node userdata templates with extra bootstrap arguments for the kubelet. See example here
Checking max-pods limit again after applying this change, you should see the previous pod limit (based on prior AWS CNI max pods for your instance type) removed now.
You’re almost running Weave Net CNI on AWS EKS, but first you need to roll out new worker nodes.
With the Weave Net CNI installed, the kubelet service updated and your EC2 source/destination checks disabled, you can rotate out your old EKS worker nodes, replacing them with the new nodes.
kubectl drain node --ignore-daemonsets
Once the new nodes come up and start scheduling pods, if everything went to plan you should see that new pods are using the Weave overlay network. E.g. 192.168.0.0/16.
A quick run-down on weave IP addresses and routes
If you get a shell to a worker node running the weave overlay network and do a listing of routes, you might see something like the following:
# ip route show
default via 10.254.109.129 dev eth0
10.254.109.128/26 dev eth0 proto kernel scope link src 10.254.109.133
169.254.169.254 dev eth0
192.168.0.0/16 dev weave proto kernel scope link src 192.168.192.0
This routing table shows two main interfaces in use. One from the host (EC2) instance network interfaces itself, eth0, and one from weave called weave.
When network packets are destined for the 10.254.109.128/26 address space, then traffic is routed down eth0.
If traffic on the host is destined for any address on 192.168.0.0/16, it will instead route via the weave interface ‘weave’ and the weave system will handle routing that traffic appropriately.
Otherwise if the traffic is destined for some public IP address out on the wider internet, it’ll go down the default route which is down the interface, eth0. This is a default gateway in the VPC subnet in this case – 10.254.109.129.
Finally, metadata URL traffic for 169.254.169.254 goes down the main host eth0 interface of course.
For the most part everything should work great. Weave will route traffic between it’s overlay network and your worker node’s host network just fine.
However, some of your custom workloads or kubernetes tools might not like being on the new overlay network. For example they might need to talk to other Kubernetes nodes that do not run weave net.
This is now where the limitation of using a managed Kubernetes offering like EKS becomes a bit of a problem.
You can’t run weave on the Kubernetes master / API servers that are effectively the ‘managed’ control plane that AWS EKS hosts for you.
This means that your weave overlay network does not span the Kubernetes master nodes where the Kubernetes API runs.
If you have an application or container in the weave overlay network and the Kubernetes master node / API needs to talk to it, this won’t work.
One potential solution though is to use hostNetwork: true in your pod specification. However you should of course be aware of how this would affect your application and application security.
In my case, I was running metrics-server and it stopped working after it started using Weave. I found out that the Kubernetes API needs to talk to the metrics-server service and of course this won’t work in the overlay network.
Example EKS with Weave Net CNI cluster
You can use the source code I’ve uploaded here.
There are five simple steps to deploy this example EKS cluster in your own account.
Modify the example.tfvars file to fit your own parameters.
terraform plan -var-file="example.tfvars" -out="example.tfplan"
terraform apply "example.tfplan"
Warning: This will create a new VPC, subnets, NAT Gateway instance, Internet Gateway, EKS Cluster, and set of worker node autoscale groups. So be sure Terraform Destroy this if you’re just testing things out.
– Your wallet
After terraform creates all the resources, you can run the two included shell scripts. setup-weave.sh will remove the AWS CNI, install CNI genie, Weave, and deploy two simple example pods and services.
At this point you should terminate your existing worker nodes (that still use the AWS CNI) and wait for your new worker nodes to join the cluster.
test-weave.sh will wait for the hello-node test pods to become ready, and then execute a curl command inside one, talking to the other via the the service and vice versa. If successful, you’ll see a HTTP 200 OK response from each service.