Provision your own Kubernetes cluster with private network topology on AWS using kops and Terraform – Part 2

Getting Started

If you managed to follow and complete the previous blog post, then you managed to get a Kubernetes cluster up and running in your own private AWS VPC using kops and Terraform to assist you.

In this blog post, you’ll cover following items:

  • Setup upstream DNS for your cluster
  • Get a Kubernetes Dashboard service and deployment running
  • Deploy a basic metrics dashboard for Kubernetes using heapster, InfluxDB and Grafana

Upstream DNS

In order for services running in your Kubernetes cluster to be able to resolve services outside of your cluster, you’ll now configure upstream DNS.

Containers that are started in the cluster will have their local resolv.conf files automatically setup with what you define in your upstream DNS config map.

Create a ConfigMap with details about your own DNS server to use as upstream. You can also set some external ones like Google DNS for example (see example below):

apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-dns
  namespace: kube-system
data:
  stubDomains: |
    {"yourinternaldomain.local": ["10.254.1.1"]}
  upstreamNameservers: |
    ["10.254.1.1", "8.8.8.8", "8.8.4.4"]

Save your ConfigMap as kube-dns.yaml and apply it to enable it.

kubectl apply -f kube-dns.yaml

You should now see it listed in Config Maps under the kube-system namespace.

Kubernetes Dashboard

Deploying the Kubernetes dashboard is as simple as running one kubectl command.

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml

You can then start a dashboard proxy using kubectl to access it right away:

kubectl proxy

Head on over to the following URL to access the dashboard via the proxy you ran:

http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

You can also access the Dashboard via the API server internal elastic load balancer that was set up in part 1 of this blog post series. E.g.

https://your-internal-elb-hostname/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/overview?namespace=default

Heapster, InfluxDB and Grafana (now deprecated)

Note: Heapster is now deprecated and there are alternative options you could instead look at, such as what the official Kubernetes git repo refers you to (metrics-server). Nevertheless, here are the instructions you can follow should you wish to enable Heapster and get a nice Grafana dashboard that showcases your cluster, nodes and pods metrics…

Clone the official Heapster git repo down to your local machine:

git clone https://github.com/kubernetes/heapster.git

Change directory to the heapster directory and run:

kubectl create -f deploy/kube-config/influxdb/
kubectl create -f deploy/kube-config/rbac/heapster-rbac.yaml

These commands will essentially launch deployments and services for grafana, heapster, and influxdb.

The Grafana service should attempt to get a LoadBalancer from AWS via your Kubernetes cluster, but if this doesn’t happen, edit the monitoring-grafana service YAML configuration and change the type to LoadBalancer. E.g.

"type": "LoadBalancer",

Save the monitoring-grafana service definition and your cluster should automatically provision a public facing ELB and set it up to point to the Grafana pod.

Note: if you want it available on an internal load balancer instead, you’ll need to create your grafana service using the aws-load-balancer-internal annotation instead.

Grafana dashboard for Kubernetes with Heapster

Now that you have Heapster running, you can also get some metrics displayed directly in your Kubernetes dashboard too.

You may need to restart the dashboard pods to access the new performance stats in the dashboard though. If this doesn’t work, delete the dashboard deployment, service, pods, role, and then re-deploy the dashboard using the same process you followed earlier.

Once its up and running, use the DNS for the new ELB to access grafana’s dashboard, login with admin/admin and change the default admin password to something secure and save. You can now access cluster stats/performance stats in kubernetes, as well as in Grafana.

Closing off

This concludes part two of this series. To sum up, you managed to configure upstream DNS, deploy the Kubernetes dashboard and set up Heapster to allow you to see metrics in the dashboard, as well as deploying InfluxDB for storing the metric data with Grafana as a front end service for viewing dashboards.

Provision your own Kubernetes cluster with private network topology on AWS using kops and Terraform – Part 1

Goals

In this post series I’ll be covering how to provision a brand new self-hosted Kubernetes environment provisioned into AWS (on top of EC2 instances) with a specific private networking topology as follows:

  • Deploy into an existing VPC
  • Use existing VPC Subnets
  • Use private networking topology (Calico), with a private/internal ELB to access the API servers/cluster
  • Don’t use Route 53 AWS DNS services or external DNS, instead use Kubernetes gossip DNS service for internal cluster name resolution, and allow for upstream DNS to be set up to your own private DNS servers for outside-of-cluster DNS lookups

This is a more secure set up than a more traditional / standard kops provisioned Kubernetes cluster,  placing API servers on a private subnet, yet still allows you the flexibility of using Load Balanced services in your cluster to expose web services or APIs for example to the public internet if you wish.

Set up your workstation with the right tools

You need a Linux or MacOS based machine to work from (management station/machine). This is because kops only runs on these platforms right now.

sudo apt install python-pip
  • Use pip to install the awscli
pip install awscli --upgrade --user
  • Create yourself an AWS credentials file (~/.aws/credentials) and set it up to use an access and secret key for your kops IAM user you created earlier.
  • Setup the following env variables to reference from, but make sure you fill in the values you require for this new cluster. So change the VPC ID, S3 state store bucket name, and cluster NAME.
export ZONES=us-east-1b,us-east-1c,us-east-1d
export KOPS_STATE_STORE=s3://your-k8s-state-store-bucket
export NAME=yourclustername.k8s.local
export VPC_ID=vpc-yourvpcidgoeshere
  • Note for the above exports above, ZONES is used to specify where the master nodes in the k8s cluster will be placed (Availability Zones). You’ll definitely want these spread out for maximum availability

Set up your S3 state store bucket for the cluster

You can either create this manually, or create it with Terraform. Here is a simple Terraform script that you can throw into your working directory to create it. Just change the name of the bucket to your desired S3 bucket name for this cluster’s state storage.

Remember to use the name for this bucket that you specified in your KOPS_STATE_STORE export variable.

resource "aws_s3_bucket" "state_store" {
  bucket        = "${var.name}-${var.env}-state-store"
  acl           = "private"
  force_destroy = true

  versioning {
    enabled = true
  }

  tags {
    Name        = "${var.name}-${var.env}-state-store"
    Infra       = "${var.name}"
    Environment = "${var.env}"
    Terraformed = "true"
  }
}

Terraform plan and apply your S3 bucket if you’re using Terraform, passing in variables for name/env to name it appropriately…

terraform plan
terraform apply

Generate a new SSH private key for the cluster

  • Generate a new SSH key. By default it will be created in ~/.ssh/id_rsa
ssh-keygen -t rsa

Generate the initial Kubernetes cluster configuration and output it to Terraform script

Use the kops tool to create a cluster configuration, but instead of provisioning it directly, you’ll output it to terraform script. This is important, as you’ll be wanting to change values in this output file to provision the cluster into existing VPC and subnets. You also want to change the ELB from a public facing ELB to internal only.

kops create cluster --master-zones=$ZONES --zones=$ZONES --topology=private --networking=calico --vpc=$VPC_ID --target=terraform --out=. ${NAME}

Above you ran the kops create cluster command and specified to use a private topology with calico networking. You also designated an existing VPC Id, and told the tool to create terraform script as the output in the current directory instead of actually running the create cluster command against AWS right now.

Change your default editor for kops if you require a different one to vim. E.g for nano:

set EDITOR=nano

Edit the cluster configuration:

kops edit cluster ${NAME}

Change the yaml that references the loadBalancer value as type Public to be Internal.

While you are still in the editor for the cluster config, you need to also change the entire subnets section to reference your existing VPC subnets, and egress pointing to your NAT instances. Remove the current subnets section, and add the following template, (updating it to reference your own private subnet IDs for each region availability zone, and the correct NAT instances for each too. (You might possibly use one NAT instance for all subnets or you may have multiple). The Utility subnets should be your public subnets, and the Private subnets your private ones of course. Make sure that you reference subnets for the correct VPC you are deploying into.

subnets:
- egress: nat-2xcdc5421df76341
  id: subnet-b32d8afg
  name: us-east-1b
  type: Private
  zone: us-east-1b
- egress: nat-04g7fe3gc03db1chf
  id: subnet-da32gge3
  name: us-east-1c
  type: Private
  zone: us-east-1c
- egress: nat-0cd542gtf7832873c
  id: subnet-6dfb132g
  name: us-east-1d
  type: Private
  zone: us-east-1d
- id: subnet-234053gs
  name: utility-us-east-1b
  type: Utility
  zone: us-east-1b
- id: subnet-2h3gd457
  name: utility-us-east-1c
  type: Utility
  zone: us-east-1c
- id: subnet-1gvb234c
  name: utility-us-east-1d
  type: Utility
  zone: us-east-1d
  • Save and exit the file from your editor.
  • Output a new terraform config over the existing one to update the script based on the newly changed ELB type and subnets section.
kops update cluster --out=. --target=terraform ${NAME}
  • The updated file is now output to kubernetes.tf in your working directory
  • Run a terraform plan from your terminal, and make sure that the changes will not affect any existing infrastructure, and will not create or change any subnets or VPC related infrastructure in your existing VPC. It should only list out a number of new infrastructure items it is going to create.
  • Once happy, run terraform apply from your terminal
  • Once terraform has run with the new kubernetes.tf file, the certificate will only allow the standard named cluster endpoint connection (cert only valid for api.internal.clustername.k8s.local for example). You now need to re-run kops update and output to terraform again.
kops update cluster $NAME --target=terraform --out=.
  • This will update the cluster state in your S3 bucket with new certificate details, but not actually change anything in the local kubernetes.tf file (you shouldn’t see any changes here). However you can now run a rolling update rolling update with the cloudonly and force –yes options:
kops rolling-update cluster $NAME --cloudonly --force --yes

This will roll all the masters and nodes in the cluster (the created autoscaling groups will initialise new nodes from the launch configurations) and when the ASGs initiate new instances, they’ll get the new certs applied from the S3 state storage bucket. You can then access the ELB endpoint on HTTPS, and you should get an auth prompt popup.

Find the endpoint on the internal ELB that was created. The rolling update may take around 10 minutes to complete, and as mentioned before, will terminate old instances in the Autoscaling group and bring new instances up with the new certificate configuration.

Tag your public subnets to allow auto provisioning of ELBs for Load Balanced Services

In order to allow Kubernetes to automatically create load balancers (ELBs) in AWS for services that use the LoadBalancer configuration, you need to tag your utility subnets with a special tag to allow the cluster to find these subnets automatically and provision ELBs for any services you create on-the-fly.

Tag the subnets that you are using as utility subnets (public) with the following tag:

Key: kubernetes.io/role/elb Value: (Don’t add a value, leave it blank)

Tag your private subnets for internal-only ELB provisioning for Load Balanced Services

In order to allow Kubernetes to automatically create load balancers (ELBs) in AWS for services that use the LoadBalancer configuratio and a private facing configuration, you need to tag the private subnets that the cluster operates in with a special tag to allow k8s to find these subnets automatically.

Tag the subnets that you are using as private (where your nodes and master nodes should be running now) with the following two tags:

Key: kubernetes.io/cluster/{yourclusternamehere.k8s.local} Value: shared
Key: kubernetes.io/role/internal-elb Value: 1

As an example for the above, the key might end up with a value of “kubernetes.io/cluster/yourclusternamehere.k8s.local” if your cluster is named “yourclusternamehere.k8s.local” (remember you named your cluster when you created your local workstation EXPORT value for {NAME}.

Closing off

This concludes part one of this series for now.

As a summary, you should now have a kubernetes cluster up and running in your private subnets, spread across availability zones, and you’ve done it all using kops and Terraform.

Straighten things out by creating a git repository, and commiting your terraform artifacts for the cluster and storing them in version control. Watch out for the artifacts that kops output along with the Terraform script like the private certificate files – these should be kept safe.

Part two should be coming soon, where we’ll run through some more tasks to continue setting the cluster up like setting upstream DNS, provisioning the Kubernetes Dashboard service/pod and more…

Streamlining AWS AMI image creation and management with Packer

If you want to set up quick and efficient provisioning and automation pipelines and you rely on machine images as a part of this framework, you’ll definitely want to prepare and maintain preconfigured images.

With AWS you can of course leverage Amazon’s AMIs for EC2 machine images. If you’re configuring autoscaling for an application, you definitely don’t want to be setting up your launch configurations to launch new EC2 instances using base Amazon AMI images and then installing any prerequesites your application may need at runtime. This will be slow and tedious and will lead to sluggish and unresponsive auto scaling.

Packer comes in at this point as a great tool to script, automate and pre-bake custom AMI images. (Packer is a tool by Hashicorp, of Terraform fame). Packer also enables us to store our image configuration in source control and set up pipelines to test our images at creation time, so that when it comes time to launching them, we can be confident they’ll work.

Packer doesn’t only work with Amazon AMIs. It supports tons of other image formats via different Builders, so if you’re on Azure or some other cloud or even on-premise platform you can also use it there.

Below I’ll be listing out the high level steps to create your own custom AMI using Packer. It’ll be Windows Server 2016 based, enable WinRM connections at build time (to allow Packer to remote in and run various setup scripts), handle sysprep, EC2 configuration like setting up the administrator password, EC2 computer name, etc, and will even run some provioning tests with Pester

You can grab the files / policies required to set this up on your own from my GitHub repo here.

Setting up credentials to run Packer and an IAM role for your Packer build machine to assume

First things first, you need to be able to run Packer with the minimum set of permissions it needs. You can run packer on an EC2 instance that has an EC2 role attached that provides it the right permissions, or if you’re running from a workstation, you’ll probably want to use an IAM user access/secret key.

Here is an IAM policy that you can use for either of these. Note it also includes an iam:PassRole statement that references an AWS account number and specific role. You’ll need to update the account number to your own, and create the Role called Packer-S3-Access in your own account.

IAM Policy for user or instance running Packer:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AttachVolume",
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:CopyImage",
                "ec2:CreateImage",
                "ec2:CreateKeypair",
                "ec2:CreateSecurityGroup",
                "ec2:CreateSnapshot",
                "ec2:CreateTags",
                "ec2:CreateVolume",
                "ec2:DeleteKeypair",
                "ec2:DeleteSecurityGroup",
                "ec2:DeleteSnapshot",
                "ec2:DeleteVolume",
                "ec2:DeregisterImage",
                "ec2:DescribeImageAttribute",
                "ec2:DescribeImages",
                "ec2:DescribeInstances",
                "ec2:DescribeRegions",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSnapshots",
                "ec2:DescribeSubnets",
                "ec2:DescribeTags",
                "ec2:DescribeVolumes",
                "ec2:DetachVolume",
                "ec2:GetPasswordData",
                "ec2:ModifyImageAttribute",
                "ec2:ModifyInstanceAttribute",
                "ec2:ModifySnapshotAttribute",
                "ec2:RegisterImage",
                "ec2:RunInstances",
                "ec2:StopInstances",
                "ec2:TerminateInstances",
                "ec2:RequestSpotInstances",
                "ec2:CancelSpotInstanceRequests"
            ],
            "Resource": "*"
        },
        {
            "Effect":"Allow",
            "Action":"iam:PassRole",
            "Resource":"arn:aws:iam::YOUR_AWS_ACCOUNT_NUMBER_HERE:role/Packer-S3-Access"
        }
    ]
}

IAM Policy to attach to new Role called Packer-S3-Access (Note, replace the S3 bucket name that is referenced with a bucket name of your own that will be used to provision into your AMI images with). See a little further down for details on the bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowS3BucketListing",
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::YOUR-OWN-PROVISIONING-S3-BUCKET-HERE"
            ],
            "Condition": {
                "StringEquals": {
                    "s3:prefix": [
                        "",
                        "Packer/"
                    ],
                    "s3:delimiter": [
                        "/"
                    ]
                }
            }
        },
        {
            "Sid": "AllowListingOfdesiredFolder",
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::YOUR-OWN-PROVISIONING-S3-BUCKET-HERE"
            ],
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "Packer/*"
                    ]
                }
            }
        },
        {
            "Sid": "AllowAllS3ActionsInFolder",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::YOUR-OWN-PROVISIONING-S3-BUCKET-HERE/Packer/*"
            ]
        }
    ]
}

This will allow Packer to use the iam_instance_profile configuration value to specify the Packer-S3-Access EC2 role in your image definition file. Essentially, this allows your temporary Packer EC2 instance to assume the Packer-S3-Access role which will grant the temporary instance enough privileges to download some bootstrapping files / artifacts you may wish to bake into your custom AMI. All quite securely too, as the policy will only allow the Packer instance to assume this role in addition to the Packer instance being temporary too.

Setting up your Packer image definition

Once the above policies and roles are in place, you can set up your main packer image definition file. This is a JSON file that will describe your image definition as well as the scripts and items to provision inside it.

Look at standardBaseImage.json in the GitHub repository to see how this is defined.

standardBaseImage.json

{
  "builders": [{
    "type": "amazon-ebs",
    "region": "us-east-1",
    "instance_type": "t2.small",
    "ami_name": "Shogan-Server-2012-Build-{{isotime \"2006-01-02\"}}-{{uuid}}",
    "iam_instance_profile": "Packer-S3-Access",
    "user_data_file": "./ProvisionScripts/ConfigureWinRM.ps1",
    "communicator": "winrm",
    "winrm_username": "Administrator",
    "winrm_use_ssl": true,
    "winrm_insecure": true,
    "source_ami_filter": {
      "filters": {
        "name": "Windows_Server-2012-R2_RTM-English-64Bit-Base-*"
      },
      "most_recent": true
    }
  }],
  "provisioners": [
    {
        "type": "powershell",
        "scripts": [
            "./ProvisionScripts/EC2Config.ps1",
            "./ProvisionScripts/BundleConfig.ps1",
            "./ProvisionScripts/SetupBaseRequirementsAndTools.ps1",
            "./ProvisionScripts/DownloadAndInstallS3Artifacts.ps1"
        ]
    },
    {
        "type": "file",
        "source": "./Tests",
        "destination": "C:/Windows/Temp"
    },
    {
        "type": "powershell",
        "script": "./ProvisionScripts/RunPesterTests.ps1"
    },
    {
        "type": "file",
        "source": "PesterTestResults.xml",
        "destination": "PesterTestResults.xml",
        "direction": "download"
    }
  ],
  "post-processors": [
    {
        "type": "manifest"
    }
  ]
}

When Packer runs it will build out an EC2 machine as per the definition file, copy any contents specified to copy, and provision and execute any scripts defined in this file.

The packer image definition in the repository I’ve linked above will:

  • Create a Server 2012 R2 base instance.
  • Enable WinRM for Packer to be able to connect to the temporary instance.
  • Run sysprep to generalize it.
  • Set up EC2 configuration.
  • Download a bunch of tools (including Pester for running test once the image build is done).
  • Download any S3 artifacts you’ve placed in a specific bucket in your account and store them on the image.

S3 Downloads into your AMI during build

Create a new S3 bucket and give it a unique name of your choice. Set it to private, and create a new virtual folder inside the bucket called Packer. This bucket should have the same name you specified in the Packer-S3-Access role policy in the few policy definition sections.

Place any software installers or artifacts you would like to be baked into your image in the /Packer virtual folder.

Update the DownloadAndInstallS3Artifacts.ps1 script to reference any software installers and execute the installers. (See the commented out section for an example). This PowerShell script will download anything under the /Packer virtual folder and store it in your image under C:\temp\S3Downloads.

Testing

Finally, you can add your own Pester tests to validate tasks carried out during the Packer image creation.

Define any custom tests under the /Tests folder.

Here is simple test that checks that the S3 download for items from /Packer was successful (The Read-S3Object cmdlet will create the folder and download items into it from your bucket):

Describe  'S3 Artifacts Downloads' {
    It 'downloads artifacts from S3' {
        "C:\temp\S3Downloads" | Should -Exist
    }
}

The main image definition file ensures that these are all copied into the image at build time (to the temp directory) and from there Pester executes them.

Hook up your image build process to a build system like TeamCity and you can get it to output the results of the tests from PesterTestResults.xml.

Have fun automating and streamlining your image builds with Pester!

Changing DNS on Azure IaaS VM’s NIC forces RDP / network disconnect

I  just noticed this happen to a VM I was connected to this evening.

All I did was change the primary DNS from automatically assigned to manual, gave it a DNS server IP, and provided a backup secondary IP, and my RDP session was instantly dropped. Other HTTPS traffic to the box stopped too.

I had to restart the VM in Azure to get connectivity back. This VM was deployed using the classic portal, but I’ve seen reports of it happening on newer ARM deployed VMs too. Here’s a thread with others that have found the same issue.

Hopefully Microsoft will resolve this soon.

Scaling Web API 2 and back-end SQL databases in Azure

I recently created a small Web API 2 project running with a back-end SQL database (Entity Framework code first), and had it deployed to an Azure web app, along with Azure SQL.

Naturally, I started it off using the free web app and one of the cheapest possible Azure SQL tiers (S0 – 10 DTUs).

After I finished working on the API, I wanted to see what sort of performance I could get out of it, by using Azure’s various scaling options.

To test I used Loader.io. This is a really nice and easy to use load testing service by SendGrid Labs. The free edition allows me to setup various API endpoint tests and run many concurrent connections for up to 1 minute at a time.

All my tests below were done using the same GET request test. The request always returned a collection of 5 x objects from the /Animals endpoint to keep things consistent.

My initial test was against the F1 free app tier for the Web app, with the SQL database running on S0 (10 DTUs). Here are the results of sending 500 requests per second for 1 minute.

S0-10DTU-result

The API struggled to complete the full 60k requests over 1 minute, and only completed about 8k requests, with an average response time of 4638ms. Terrible, but then again we are running on very low performance, cheap tiers. I had a look at the database performance stats and noticed that the DTUs were capped out at 100% during the 1 minute load test. At this point it definitely seems to be the database performance holding things back.

Scaling the database up to the S1 tier (20 DTUs) gives a definite improvement in response times and number of requests able to be sent within one minute. If we look at the database performance stats in the portal, we can now see that the DTUs are still maxing out at 100% though.

S1-20DTU-result

20-DTUs-maxed out

At this point I decided I would increase database performance again, but throw more requests per second at the API (from 500/second up to 1000/second).

Scaling the database up to S2 (50 DTUs) and throwing more requests a second at the API, and the number of requests completed in total higher now – up by about an extra 5k. Taking a look at the DTU performance status, we can see they now maxed out at around 60%. At this point it is pretty clear that the database is no longer the bottleneck.

50-DTUs-maxed out at 60% - even with doubling the requests per second from 500 to 1000

50-DTUs-maxed out at 60%

Now I scaled the web app tier up from free, to the B1 (Basic) tier, which gives you 1 Core, 1.75GB RAM, and up to 3 x instances scaled manually. I started with just the default 1 instance and ran the 1000 req/second for 1 minute test again.

boo-test-failed-error-rate-higher-than-50% due to timeouts

The results were pretty dismal compared to the free tier now. In fact the test failed due to an error rate of greater than 50% (all caused by timeouts). It is important to remember that we have not yet scaled out from the default 1 instance though.

Scaling up to 2 x instances on the B1 tier, helped quite a bit. The test now completes, and has a much smaller timeout error rate. Many more responses were served, but the response rate was quite slow. Taking a look at the distribution of CPU time over the two instances, we can also see that the traffic is indeed being split between the two instances we’ve scaled out with.

scale-B1-basic-from-1-to-2-instances

yay-test-finished-with much smaller error rate

processor time spread over two instances during load test

Taking this one step further to 3 x instances, and re-running the test nets us the best result so far. No timeout errors, and a response time averaging around 3000ms. Much better, but still quite a high response time, and not all 60k requests are being served.

I scaled up to the B2 tier for the following run. Each instance has 2 x cores and 3.5GB RAM this time. Starting at 1 x instance and running the test on these higher specification web instances seems to now handle things a lot better.

Little to no timeout errors, with about 5000ms avg response time, but using only 1 x instance this time!

Pushing things right up to 3 x instances (2 cores and 3.5GB RAM each) nets us the best result yet. The average response time is down to 1700ms and there are no timeout errors at all. The API was able to handle 49000 requests in the 1 minute test, which is the highest number of requests it has been able to handle so far.

B2-basic-test-with-3x-instances-good-result

I scaled up to the B3 tier from here, and tried another few runs using 3 x instances (at 4 x cores and 7GB RAM each). This didn’t help things much, netting around 200ms better response time, for a much pricier tier. It therefore looks like the sweet spot for this kind of work is to scale out with medium sized instances (2 x cores each), rather than scaling up too much.

I changed the tier to S2 (2 x cores 3.5GB RAM each, but allowing up to 10 x instances scaled out) and this time, running the test gave very similar results to 3 x instances. Clearly, the instances were now no longer the bottleneck. Looking back at the database performance, I saw that the DTUs were maxing out at around 90%. It was clear that there must have been some throttling happening there now.

I changed the database DTUs to 100 using the S3 tier, and re-ran the test once more.

bingo-60k-requests

Bingo! We’re now managing to serve the test’s 1000 requests a second, and over the 1 minute test, we get all 60k requests served successfully, and have a reasonable average response time of roughly 300-400ms.

I made a quick change to the GET method in the API for this endpoint to gather items from the database asynchronously, and running the same test again, now gets us all the way down to an average response time of just 100ms over the 60k requests in one minute. Excellent!

100ms-test-result

As you can see, by running load tests like this, and trying out different scaling options for the front end and back end, logically scaling each whenever you see bottlenecks in test results or performance metrics, you can after some time determine the best specification for your database and web apps.

 

Deploying a simple linked container web app with Docker

This is a simple guide on how to deploy a multi-container ‘linked’ web app using Docker.

If you have not yet installed or set up a Docker host to run the containers on, here is my guide on setting up a basic uBuntu 16.04 Docker host VM.

The ‘web app’ we’ll be looking at how to deploy will consist of two basic components – a MySQL database for the back-end, and a simple PHP script for the ‘web front-end’ which simply connects to the MySQL container and displays some info from a database table.

simple-web-app-linked-diagram

For the MySQL container we’ll be using the official Docker repo ‘mysql-server’ image, and for our web front-end, we’ll be creating our own Docker image using a custom Dockerfile we’ll craft ourselves, based on an uBuntu 15.04 image.

This means we’ll be covering the following Docker basics:

  • Running docker containers
  • Linking docker containers (more secure than exposing ports directly)
  • Creating custom docker images using a Dockerfile
  • Building a custom image

Start off by creating a new directory in your home directory called ‘web01’ to create and store the Dockerfile we’ll using to build our custom web front-end image. Then create an emtpy file called ‘Dockerfile’ in this directory and edit it using your favourite text editor. I’m using nano for this.

 

 

This is what your new Dockerfile should look like:

 

The commands do the following:

  • FROM – tells docker build to base this image build on the ubuntu:15.04 image
  • RUN – strings a few apt-get commands together to install apache, php5, and a few other tools like curl. This is important, as every RUN command in a Dockerfile creates a new image layer, and we don’t want our image to contain too many layers.
  • The last RUN command grabs the content from a gist I created which is a basic PHP script, and places it in the /var/www/html directory in the container, then deletes the default index.html file that apache places there. This is the script that will connect to our MySQL container and display some basic info (our basic ‘web app’).
  • EXPOSE – exposes port 80 so we can map this to our Docker host and access the website outside of the container.
  • CMD – runs the apache2 service with PID 1 when the container starts.

Now you can build the Dockerfile and create your own custom image, which is what will be used to start the web container later.

Use the following build command to build the new image from your custom Dockerfile

docker build -t=”web01image” ~/web01/Dockerfile

Run ‘docker images’ after the build completes and you should see the new image listed:

docker-images

Next, you’ll run a new container using the official mysql-server image from the Docker repository. You won’t yet have this image locally, but the command will automatically download the image for you.

docker run –name db01 -e MYSQL_ROOT_PASSWORD=MyRootPassword -d mysql/mysql-server:latest

Note that I’ve called my container ‘db01’ and given it a root password of ‘MyRootPassword’. The -e parameter specifies that an environment variable called MYSQL_ROOT_PASSWORD inside the container should be given the value of ‘MyRootPassword’. The MySQL container then uses this environment variable to setup the root user for MySQL when the container starts.

Now that the database container is up and running (verify by running ‘docker ps’ to check its running), you can deploy the custom web container using your image you created above. In this docker run command, you’ll also link  the web container to the db01 container you previously started up using the –link parameter. This is important to link the two containers.

The web container will be given environment variables with information telling it about the networking config of the DB container. These environment variables will then be access by the simple web PHP script to tell it where to find the database server, and what credentials to use to connect.

docker run –name=web01 –link=db01:mysql -d -p=80:80 web01image

Important: notice that in the –link parameter, the name of the database/MySQL container is specified. Make sure you use the exact name you gave your MySQL database container here – this ensures that the linking of the two containers is correct. The last ‘web01image’ bit specifies to base the container you are running off of the newly built ‘web01image’.

The -p parameter maps the exposed port 80 in the container to port 80 on the docker host, so you’ll be able to access the website by using http://dockerhost:80

Check that the new web container and previously created MySQL container are running by using the ‘docker ps’ command.

docker-ps-output

Out of interest, this is what the PHP script looks like (this is what is downloaded and placed on the web container as a RUN build step in the Dockerfile you created above):

You can see the environment variables that the PHP script grabs (top of the script) to establish the database connection from the docker container. These environment variables are what are created and populated by linking the web container to the db container using the –link parameter.

Lastly, you may want to create a sample database, table and some data for the simple ‘web app’ to display after it connects to the database container. Issue the following ‘docker exec’ command, which will add the sample database, create a sample table, and add some sample data.

Make sure you change the ‘MyRootPassword’ bit to whatever root MySQL password you chose when you ran the MySQL container above, and ensure you run exec against the name of the MySQL container you chose (I used db01). Keep the database name and the rest of the command intact, as the PHP script relies on these staying the same.

docker exec db01 mysql -u root -pMyRootPassword -e “create database testdb1; use testdb1; CREATE TABLE events (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, name VARCHAR(20), signup_date DATE); INSERT INTO events (id,name,signup_date) VALUES (NULL, ‘MySpecialEvent’, ‘2016-06-11’);”

Finally, browse to http://dockerhostnameorip and you should see the simple PHP script display some basic info, stating it was able to connect to the MySQL server and display the sample data in the database.

simple-php-web-app-display