Cloud Archives - Shogan.tech

Cheap S3 Cloud Backup with BackBlaze B2

July 6, 2020July 5, 2020 by Sean

white and blue fiber optic cables in a FC storage switch

I’ve been constantly evolving my cloud backup strategies to find the ultimate cheap S3 cloud backup solution.

The reason for sticking to “S3” is because there are tons of cloud provided storage service implementations of the S3 API. Sticking to this means that one can generally use the same backup/restore scripts for just about any service.

The S3 client tooling available can of course be leveraged everywhere too (s3cmd, aws s3, etc…).

BackBlaze B2 gives you 10GB of storage free for a start. If you don’t have too much to backup you could get creative with lifecycle policies and stick within the 10GB free limit.

a lifecycle policy to delete objects older than 7 days.

Current Backup Solution

This is the current solution I’ve setup.

I have a bunch of files on a FreeNAS storage server that I need to backup daily and send to the cloud.

I’ve setup a private BackBlaze B2 bucket and applied a lifecycle policy that removes any files older than 7 days. (See example screenshot above).

I leveraged a FreeBSD jail to install my S3 client (s3cmd) tooling, and mount my storage to that jail. You can follow the steps below if you would like to setup something similar:

Step-by-step setup guide

Create a new jail.

Enable VNET, DHCP, and Auto-start. Mount the FreeNAS storage path you’re interested in backing up as read-only to the jail.

The first step in a clean/base jail is to get s3cmd compiled and installed, as well as gpg for encryption support. You can use portsnap to get everything downloaded and ready for compilation.

portsnap fetch
portsnap extract # skip this if you've already run extract before
portsnap update

cd /usr/ports/net/py-s3cmd/
make -DBATCH install clean
# Note -DBATCH will take all the defaults for the compile process and prevent tons of pop-up dialogs asking to choose. If you don't want defaults then leave this bit off.

# make install gpg for encryption support
cd /usr/ports/security/gnupg/ && make -DBATCH install clean

The compile and install process takes a number of minutes. Once complete, you should be able to run s3cmd –configure to set up your defaults.

For BackBlaze you’ll need to configure s3cmd to use a specific endpoint for your region. Here is a page that describes the settings you’ll need in addition to your access / secret key.

After gpg was compiled and installed you should find it under the path /usr/local/bin/gpg, so you can use this for your s3cmd configuration too.

Double check s3cmd and gpg are installed with simple version checks.

gpg --version
s3cmd --version

A simple backup shell script

Here is a quick and easy shell script to demonstrate compressing a directory path and all of it’s contents, then uploading it to a bucket with s3cmd.

DATESTAMP=$(date "+%Y-%m-%d")
TIMESTAMP=$(date "+%Y-%m-%d-%H-%M-%S")

tar --exclude='./some-optional-stuff-to-exclude' -zcvf "/root/$TIMESTAMP-backup.tgz" .
s3cmd put "$TIMESTAMP-backup.tgz" "s3://your-bucket-name-goes-here/$DATESTAMP/$TIMESTAMP-backup.tgz"

Scheduling the backup script is an easy task with crontab. Run crontab -e and then set up your desired schedule. For example, daily at 25 minutes past 1 in the morning:

25 1 * * * /root/backup-script.sh

My home S3 backup evolution

I’ve gone from using Amazon S3, to Digital Ocean Spaces, to where I am now with BackBlaze B2. BackBlaze is definitely the cheapest option I’ve found so far.

Amazon S3 is overkill for simple home cloud backup solutions (in my opinion). You can change to use infrequent access or even glacier tiered storage to get the pricing down, but you’re still not going to beat BackBlaze on pure storage pricing.

Digital Ocean Spaces was nice for a short while, but they have an annoying minimum charge of $5 per month just to use Spaces. This rules it out for me as I was hunting for the absolute cheapest option.

BackBlaze currently has very cheap storage costs for B2. Just $0.005 per GB and only $0.01 per GB of download (only really needed if you want to restore some backup files of course).

Concluding

You can of course get more technical and coerce a willing friend/family member to host a private S3 compatible storage service for you like Minio, but I doubt many would want to go to that level of effort.

So, if you’re looking for a cheap S3 cloud backup solution with minimal maintenance overhead, definitely consider the above.

This is post #4 in my effort towards 100DaysToOffload.

AWS SNS to Lambda Cross Account Setup

February 23, 2020 by Sean

You’ve got a SNS topic in Account A and you wish to subscribe a Lambda function to this topic in Account B.

Setting this up requires configuration on both account sides with resource-based permission policies being applied to SNS in one account and Lambda in the other.

In other words, you’ll need to setup the permissions for SNS and Lambda to allow both subscription and invocation.

Getting Started

You should already have your SNS topic in Account A and a suitable Lambda function subscriber in Account B. For example:

Account A Id: 5556667778 (SNS topic lives here)
Account B Id: 12345678901 (Lambda function lives here)

Configure SNS topic in Account A to allow Subscriptions from Account B

Use the AWS CLI to add a resource-based permission policy to the SNS topic (using it’s ARN). This will allow the Receive and Subscribe actions from Account B.

aws sns add-permission \
    --topic-arn "arn:aws:sns:us-east-1:5556667778:cross-account-topic" \
    --label "AllowSubscriptionFromAccountB" \
    --aws-account-id "12345678901" \
    --action-name "Receive" "Subscribe"

Configure the Lambda function in Account B to allow invocation from the SNS topic in Account A

Next, add a resource-based permission policy to your Lambda function in Account B. This policy will effectively allow the specific SNS topic in Account A to invoke the Lambda function.

It’s always good practice to follow the principle of least privilege (POLP). In this case you’re only allowing the specific SNS topic in one account to invoke the specific Lambda function you’re adding the policy to.

aws lambda add-permission \
    --function-name "cross-account-lambda-subscriber" \
    --statement-id "AllowInvokeFromExampleSns" \
    --principal "sns.amazonaws.com" \
    --action "lambda:InvokeFunction" \
    --source-arn "arn:aws:sns:us-east-1:5556667778:cross-account-topic"

Subscribe the Lambda function in Account B to the SNS topic in Account A

Of course you’ll need to actually subscribe the Lambda function to the SNS topic. From Account B (where your Lambda function is setup), run the following command to subscribe it to the SNS topic in Account A.

aws sns subscribe \
    --topic-arn "arn:aws:sns:us-east-1:5556667778:cross-account-topic" \
    --protocol "lambda" \
    --notification-endpoint "arn:aws:lambda:us-east-1:12345678901:function:cross-account-lambda-subscriber"

Concluding

Send a test message to your SNS topic and you should see the Lambda function process the message in the other account.

If you need to diagnose anything, remember to check Lambda CloudWatch monitoring logs, or use the SNS Delivery Status feature.

That’s all there is to setting up SNS to Lambda cross account permissions.

AWS EC2 Spot Instance Termination Simulator

January 20, 2020January 18, 2020 by Sean

In this post I’ll explain how I created an AWS EC2 Spot Instance Termination Simulator to run on any EC2 instance.

It can signal typical monitoring scripts, applications, interrupt handlers, and more, just like a real Spot Termination signal would do.

Back story

Looking around, there aren’t really any straight forward ways of testing this behaviour.

The obvious way is to change the price you bid for spot instances to the threshold of the current market price. This is so that if there is a slight increase in demand your spot instance(s) will be terminated.

The problem with that approach is of course that you can’t easily predict when the price will move, or by how much.

How EC2 Spot Instance Termination warnings work

When your spot instance bid price is surpassed by the current market price, a Termination Notice becomes available on one of the instance’s metadata endpoints. For example, http://169.254.169.254/latest/meta-data/spot/termination-time.

There is one other, newer endpoint at http://169.254.169.254/latest/meta-data/spot/instance-action that could also be used.

At this point the endpoint returns an HTTP 200 response. A timestamp of when a shutdown signal is going to be sent to your instance’s OS is also returned.

In the case of the newer endpoint, a JSON string is returned with an action and time. Usually the endpoint usually simply returns a 404 not found response.

The tools out there tend to monitor this endpoint to warn and action in case of a Spot Termination notice being received.

The two minute warning feature was added by Amazon back in 2015, announced in this blog post.

Kubernetes Spot Termination / Interrupt handling

I’ve been looking at kube-aws/kube-spot-termination-notice-handler.

It runs as a DaemonSet (one container/pod on each node) in host networking mode. It then polls the EC2 metadata endpoint every ‘x’ number of seconds, watching for termination signals.

During polling, if one is found, it will send a kubectl drain command to it’s host node. Pods will move off to other nodes in the cluster.

Simulating Spot Instance Termination with a fake web service and some proxying

I created a simple web service/API that simply returns a 200 HTTP response on the same http://169.254.169.254/latest/meta-data/spot/termination-time endpoint.

This is the legacy endpoint that spot instance termination signals go to. However, there is another, newer one that AWS now recommend using instead.

The other endpoint is http://169.254.169.254/latest/meta-data/spot/instance-action. The web service will also send a response there as a JSON string with an action and time field.

All that needs to be done is to forward traffic from the metadata endpoint 169.254.169.254:80 to the custom web service.

Have a look at the code and Docker image at the following locations:

The code for the simple NodeJS web service / API is in GitHub here
You can find the Docker image over here.

Run the EC2 spot instance termination simulator endpoints

Identify a candidate EC2 instance that you don’t mind messing with.

Warning: the following steps will override the entire EC2 instance metadata service. No other metadata endpoints will work. This is because they’re not implemented in this web service.

Kubernetes

Deploy the docker container (a Kubernetes deployment / service manifest is in the git repository).

Ideally, create a nodeSelector / label up in the deployment manifest before you kubectl apply it. This is so you can get the pod to run on the specific node you want to test.

kubectl -n namespace apply -f https://raw.githubusercontent.com/Shogan/ec2-spot-termination-simulator/master/simple-k8s-deployment.yaml

The service is a NodePort service, exposing the port on the host it runs on.

List the service and the pod (and find the kubernetes node the pod is running on):

kubectl get svc ec2-spot-termination-simulator
kubectl get pod spot-term-simulator-xxxxxx -o wide

Take note of the NodePort the service is listening on for the Kubernetes node. E.g. port 30626.

Docker

docker run -p 30626:80 -e PORT=80 shoganator/ec2-spot-termination-simulator

Proxying the EC2 metadata service

Perform some trickery to proxy traffic destined to 169.254.169.254 on port 80 to localhost (where the container runs). This is so that the fake service can take the place of the real one.

SSH onto the Kubernetes node and run:

sudo ifconfig lo:0 169.254.169.254 up
sudo socat TCP4-LISTEN:80,fork TCP4:127.0.0.1:30626

Create an alias for the localhost interface at 169.254.169.254, effectively taking over the EC2 metadata service and sending that to 127.0.0.1 instead.
Forward TCP port 80 traffic (usually destined to the EC2 metadata service) to 127.0.0.1 on the NodePort 30626. This is where the ec2-spot-termination-simulator pod is running on this host. Substitute this with the correct port in your case.

Test the new, faked endpoint:

curl -i http://169.254.169.254/latest/meta-data/spot/termination-time

As a result, it should return a 200 OK response with a timestamp from the fake service. This is the same as the real one would.

Example

Looking at how the kube-spot-termination-notice-handler service works specifically:

This service runs as a DaemonSet, meaning one instance per host. The instance on the node you set the simualtor up on should immediately drain the node. The instance won’t actually be terminated as this was of course just a simulated termination.

Other scenarios

If you’re not running on Kubernetes and are using a different spot termination handler, don’t worry. The system you’re using to monitor the EC2 instance metadata endpoint should still take action at this point.

The proxied web service is now returning a legitimate looking termination time notice on http://169.254.169.254/latest/meta-data/spot/termination-time.

Running an S3 API compatible object storage server (Minio) on the Raspberry Pi

October 16, 2018 by Sean

I’ve recently become interested in hosting my own local S3 API compatible object storage server at home.

So tonight I set about setting up Minio.

Image result for minio

Minio is an object storage server that is S3 API compatible. This means I’ll be able to use my working knowledge of the Amazon S3 API and tools, but to interact with my own, locally hosted storage service running on a Raspberry Pi.

I had heard about Zenko before (an S3 API compatible object storage server) but was searching around for something really lightweight that I could easily run on ARM architecture – i.e. my Raspberry Pi model 3 I have sitting on my desk right now. In doing so, Minio was the first that I found that could easily be compiled to run on the Raspberry Pi.

The goal right now is to have a local object storage service that is compatible with S3 APIs that I can use for home use. This has a bunch of cool use cases, and the ones I am specifically interested in right now are:

Being able to write scripts that interact with S3, but test them locally with Minio before even having to think about deploying them to the cloud. A local object storage API is going to be free and fast. Plus it’s great knowing that you’re fully in control of your own data.
Setting up a publically exposable object storage service that I can target with serverless functions that I plan to be running on demand in the cloud to do processing and then output artifacts to my home object storage service.

The second use case above is what I intend on doing to send ffmpeg processed video to. Basically I want to be able to process video from online services using something like AWS Lambda (probably using ffmpeg bundled in with the function) and output the resulting files to my home storage system.

The object storage service will receive these output files from Lambda and I’ll have a cronjob or rsync setup to then sync the objects placed into my storage bucket(s) to my home Plex media share.

This means I’ll be able to remotely queue up stuff to watch via a simple interface I’ll expose (or a message queue of some sort) to be processed by Lambda, and by the time I’m home everything will be ready to watch in Plex.

Normally I would be more interesting in running the Docker image for Minio, but at home I want something that is really cheap to run, and so compiling Minio for Raspberry Pi makes total sense to me here, as this device is super cheap to level powered on 24/7 as opposed to running something beefier that would instead run as a Docker host or lightweight Kubernetes home cluster.

Here’s the quick start up guide to get it running on Raspberry Pi

You’ll basically download Go, extract it, set it up on your path, then use it to compile Minio’s source code into an ARM compatible binary that you can run on your pi.

wget https://dl.google.com/go/go1.10.3.linux-armv6l.tar.gz
sudo tar -C /usr/local -xzf go1.10.3.linux-armv6l.tar.gz
export PATH=$PATH:/usr/local/go/bin # put into ~/.profile
source .profile
go get -u github.com/minio/minio
mkdir ~/minio-data
cd go/bin
./minio server ~/minio-data/

And you’re up and running! It’s that simple to get going quickly.

Running interactively you’ll get a default access and secret key in the terminal, so head on over to the Web UI / interface to check things out: http://your-raspberry-pi-ip-or-hostname:9000/minio/

Enter your credentials to login.

Of course at this stage you can also start using your S3 API compatible command line tools to start working with your new object storage server too.

Nice!

Scaling Web API 2 and back-end SQL databases in Azure

August 20, 2016August 18, 2016 by Sean

I recently created a small Web API 2 project running with a back-end SQL database (Entity Framework code first), and had it deployed to an Azure web app, along with Azure SQL.

Naturally, I started it off using the free web app and one of the cheapest possible Azure SQL tiers (S0 – 10 DTUs).

After I finished working on the API, I wanted to see what sort of performance I could get out of it, by using Azure’s various scaling options.

To test I used Loader.io. This is a really nice and easy to use load testing service by SendGrid Labs. The free edition allows me to setup various API endpoint tests and run many concurrent connections for up to 1 minute at a time.

All my tests below were done using the same GET request test. The request always returned a collection of 5 x objects from the /Animals endpoint to keep things consistent.

My initial test was against the F1 free app tier for the Web app, with the SQL database running on S0 (10 DTUs). Here are the results of sending 500 requests per second for 1 minute.

S0-10DTU-result

The API struggled to complete the full 60k requests over 1 minute, and only completed about 8k requests, with an average response time of 4638ms. Terrible, but then again we are running on very low performance, cheap tiers. I had a look at the database performance stats and noticed that the DTUs were capped out at 100% during the 1 minute load test. At this point it definitely seems to be the database performance holding things back.

Scaling the database up to the S1 tier (20 DTUs) gives a definite improvement in response times and number of requests able to be sent within one minute. If we look at the database performance stats in the portal, we can now see that the DTUs are still maxing out at 100% though.

S1-20DTU-result

20-DTUs-maxed out

At this point I decided I would increase database performance again, but throw more requests per second at the API (from 500/second up to 1000/second).

Scaling the database up to S2 (50 DTUs) and throwing more requests a second at the API, and the number of requests completed in total higher now – up by about an extra 5k. Taking a look at the DTU performance status, we can see they now maxed out at around 60%. At this point it is pretty clear that the database is no longer the bottleneck.

50-DTUs-maxed out at 60% - even with doubling the requests per second from 500 to 1000

50-DTUs-maxed out at 60%

Now I scaled the web app tier up from free, to the B1 (Basic) tier, which gives you 1 Core, 1.75GB RAM, and up to 3 x instances scaled manually. I started with just the default 1 instance and ran the 1000 req/second for 1 minute test again.

boo-test-failed-error-rate-higher-than-50% due to timeouts

The results were pretty dismal compared to the free tier now. In fact the test failed due to an error rate of greater than 50% (all caused by timeouts). It is important to remember that we have not yet scaled out from the default 1 instance though.

Scaling up to 2 x instances on the B1 tier, helped quite a bit. The test now completes, and has a much smaller timeout error rate. Many more responses were served, but the response rate was quite slow. Taking a look at the distribution of CPU time over the two instances, we can also see that the traffic is indeed being split between the two instances we’ve scaled out with.

scale-B1-basic-from-1-to-2-instances

yay-test-finished-with much smaller error rate

processor time spread over two instances during load test

Taking this one step further to 3 x instances, and re-running the test nets us the best result so far. No timeout errors, and a response time averaging around 3000ms. Much better, but still quite a high response time, and not all 60k requests are being served.

I scaled up to the B2 tier for the following run. Each instance has 2 x cores and 3.5GB RAM this time. Starting at 1 x instance and running the test on these higher specification web instances seems to now handle things a lot better.

Little to no timeout errors, with about 5000ms avg response time, but using only 1 x instance this time!

Pushing things right up to 3 x instances (2 cores and 3.5GB RAM each) nets us the best result yet. The average response time is down to 1700ms and there are no timeout errors at all. The API was able to handle 49000 requests in the 1 minute test, which is the highest number of requests it has been able to handle so far.

B2-basic-test-with-3x-instances-good-result

I scaled up to the B3 tier from here, and tried another few runs using 3 x instances (at 4 x cores and 7GB RAM each). This didn’t help things much, netting around 200ms better response time, for a much pricier tier. It therefore looks like the sweet spot for this kind of work is to scale out with medium sized instances (2 x cores each), rather than scaling up too much.

I changed the tier to S2 (2 x cores 3.5GB RAM each, but allowing up to 10 x instances scaled out) and this time, running the test gave very similar results to 3 x instances. Clearly, the instances were now no longer the bottleneck. Looking back at the database performance, I saw that the DTUs were maxing out at around 90%. It was clear that there must have been some throttling happening there now.

I changed the database DTUs to 100 using the S3 tier, and re-ran the test once more.

bingo-60k-requests

Bingo! We’re now managing to serve the test’s 1000 requests a second, and over the 1 minute test, we get all 60k requests served successfully, and have a reasonable average response time of roughly 300-400ms.

I made a quick change to the GET method in the API for this endpoint to gather items from the database asynchronously, and running the same test again, now gets us all the way down to an average response time of just 100ms over the 60k requests in one minute. Excellent!

100ms-test-result

As you can see, by running load tests like this, and trying out different scaling options for the front end and back end, logically scaling each whenever you see bottlenecks in test results or performance metrics, you can after some time determine the best specification for your database and web apps.