Select, match and pipe output into another command with jq, grep, and xargs

select-match-and-pipe-output-with-jq-grep-xargs

This is a quick reference post if you’re looking to pipe output into another command on Linux using xargs.

The pipeline is immensley powerful and you can leverage it to act on different stages of your full command to do specific selecting, matching, and manipulation.

Say you are running an executable that outputs a bunch of JSON and you want to select certain a certain subset of this data, pattern match it, and then send that matched data into another command.

This is the perfect use case for a mixture of the jq, grep and xargs commands in the pipeline.

Practical example with xargs

Here is a practical example where you might want to list all your AWS CodePipeline pipelines, match only on certain ones, and then execute (Release Changes) on each of them.

aws codepipeline list-pipelines | jq -r '.pipelines[].name' | grep project-xyz | xargs -n1 aws codepipeline start-pipeline-execution --name

The piped command above does the following:

  • Lists all AWS CodePipelines with the command aws codepipeline list-pipelines
  • Uses jq to ‘raw’ select the name from each pipeline object in the pipelines[] array that the above command outputs
  • Sends each pipeline name into grep to match only those containing the string “project-xyz”
  • Pipes the resulting pipeline names using xargs into the command aws codepipeline start-pipeline-execution --name. The -n1 argument tells xargs to use at most max-args of 1 per command line.

Cheap S3 Cloud Backup with BackBlaze B2

white and blue fiber optic cables in a FC storage switch

I’ve been constantly evolving my cloud backup strategies to find the ultimate cheap S3 cloud backup solution.

The reason for sticking to “S3” is because there are tons of cloud provided storage service implementations of the S3 API. Sticking to this means that one can generally use the same backup/restore scripts for just about any service.

The S3 client tooling available can of course be leveraged everywhere too (s3cmd, aws s3, etc…).

BackBlaze B2 gives you 10GB of storage free for a start. If you don’t have too much to backup you could get creative with lifecycle policies and stick within the 10GB free limit.

a lifecycle policy to delete objects older than 7 days.

Current Backup Solution

This is the current solution I’ve setup.

I have a bunch of files on a FreeNAS storage server that I need to backup daily and send to the cloud.

I’ve setup a private BackBlaze B2 bucket and applied a lifecycle policy that removes any files older than 7 days. (See example screenshot above).

I leveraged a FreeBSD jail to install my S3 client (s3cmd) tooling, and mount my storage to that jail. You can follow the steps below if you would like to setup something similar:

Step-by-step setup guide

Create a new jail.

Enable VNET, DHCP, and Auto-start. Mount the FreeNAS storage path you’re interested in backing up as read-only to the jail.

The first step in a clean/base jail is to get s3cmd compiled and installed, as well as gpg for encryption support. You can use portsnap to get everything downloaded and ready for compilation.

portsnap fetch
portsnap extract # skip this if you've already run extract before
portsnap update

cd /usr/ports/net/py-s3cmd/
make -DBATCH install clean
# Note -DBATCH will take all the defaults for the compile process and prevent tons of pop-up dialogs asking to choose. If you don't want defaults then leave this bit off.

# make install gpg for encryption support
cd /usr/ports/security/gnupg/ && make -DBATCH install clean

The compile and install process takes a number of minutes. Once complete, you should be able to run s3cmd –configure to set up your defaults.

For BackBlaze you’ll need to configure s3cmd to use a specific endpoint for your region. Here is a page that describes the settings you’ll need in addition to your access / secret key.

After gpg was compiled and installed you should find it under the path /usr/local/bin/gpg, so you can use this for your s3cmd configuration too.

Double check s3cmd and gpg are installed with simple version checks.

gpg --version
s3cmd --version
quick version checks of gpg and s3cmd

A simple backup shell script

Here is a quick and easy shell script to demonstrate compressing a directory path and all of it’s contents, then uploading it to a bucket with s3cmd.

DATESTAMP=$(date "+%Y-%m-%d")
TIMESTAMP=$(date "+%Y-%m-%d-%H-%M-%S")

tar --exclude='./some-optional-stuff-to-exclude' -zcvf "/root/$TIMESTAMP-backup.tgz" .
s3cmd put "$TIMESTAMP-backup.tgz" "s3://your-bucket-name-goes-here/$DATESTAMP/$TIMESTAMP-backup.tgz"

Scheduling the backup script is an easy task with crontab. Run crontab -e and then set up your desired schedule. For example, daily at 25 minutes past 1 in the morning:

25 1 * * * /root/backup-script.sh

My home S3 backup evolution

I’ve gone from using Amazon S3, to Digital Ocean Spaces, to where I am now with BackBlaze B2. BackBlaze is definitely the cheapest option I’ve found so far.

Amazon S3 is overkill for simple home cloud backup solutions (in my opinion). You can change to use infrequent access or even glacier tiered storage to get the pricing down, but you’re still not going to beat BackBlaze on pure storage pricing.

Digital Ocean Spaces was nice for a short while, but they have an annoying minimum charge of $5 per month just to use Spaces. This rules it out for me as I was hunting for the absolute cheapest option.

BackBlaze currently has very cheap storage costs for B2. Just $0.005 per GB and only $0.01 per GB of download (only really needed if you want to restore some backup files of course).

Concluding

You can of course get more technical and coerce a willing friend/family member to host a private S3 compatible storage service for you like Minio, but I doubt many would want to go to that level of effort.

So, if you’re looking for a cheap S3 cloud backup solution with minimal maintenance overhead, definitely consider the above.

This is post #4 in my effort towards 100DaysToOffload.

AWS CodeBuild local with Docker

AWS have a handy post up that shows you how to get CodeBuild local by running it with Docker here.

Having a local CodeBuild environment available can be extremely useful. You can very quickly test your buildspec.yml files and build pipelines without having to go as far as push changes up to a remote repository or incurring AWS charges by running pipelines in the cloud.

I found a few extra useful bits and pieces whilst running a local CodeBuild setup myself and thought I would document them here, along with a summarised list of steps to get CodeBuild running locally yourself.

Get CodeBuild running locally

Start by cloning the CodeBuild Docker git repository.

git clone https://github.com/aws/aws-codebuild-docker-images.git

Now, locate the Dockerfile for the CodeBuild image you are interested in using. I wanted to use the ubuntu standard 3.0 image. i.e. ubuntu/standard/3.0/Dockerfile.

Edit the Dockerfile to remove the ENTRYPOINT directive at the end.

# Remove this -> ENTRYPOINT ["dockerd-entrypoint.sh"]

Now run a docker build in the relevant directory.

docker build -t aws/codebuild/standard:3.0 .

The image will take a while to build and once done will of course be available to run locally.

Now grab a copy of this codebuild_build.sh script and make it executable.

curl -O https://gist.githubusercontent.com/Shogan/05b38bce21941fd3a4eaf48a691e42af/raw/da96f71dc717eea8ba0b2ad6f97600ee93cc84e9/codebuild_build.sh
chmod +x ./codebuild_build.sh

Place the shell script in your local project directory (alongside your buildspec.yml file).

Now it’s as easy as running this shell script with a few parameters to get your build going locally. Just use the -i option to specify the local docker CodeBuild image you want to run.

./codebuild_build.sh -c -i aws/codebuild/standard:3.0 -a output

The following two options are the ones I found most useful:

  • -c – passes in AWS configuration and credentials from the local host. Super useful if your buildspec.yml needs access to your AWS resources (most likely it will).
  • -b – use a buildspec.yml file elsewhere. By default the script will look for buildspec.yml in the current directory. Override with this option.
  • -e – specify a file to use as environment variable mappings to pass in.

Testing it out

Here is a really simple buildspec.yml if you want to test this out quickly and don’t have your own handy. Save the below YAML as simple-buildspec.yml.

version: 0.2

phases:
  install:
    runtime-versions:
      java: openjdk11
    commands:
      - echo This is a test.
  pre_build:
    commands:
      - echo This is the pre_build step
  build:
    commands:
      - echo This is the build step
  post_build:
    commands:
      - bash -c "if [ /"$CODEBUILD_BUILD_SUCCEEDING/" == /"0/" ]; then exit 1; fi"
      - echo This is the post_build step
artifacts:
  files:
    - '**/*'
  base-directory: './'

Now just run:

./codebuild_build.sh -b simple-buildspec.yml -c -i aws/codebuild/standard:3.0 -a output /tmp

You should see the script start up the docker container from your local image and ‘CodeBuild’ will start executing your buildspec steps. If all goes well you’ll get an exit code of 0 at the end.

aws codebuild test run output from a local Docker container.

Good job!

This post contributes to my effort towards 100DaysToOffload.

Saving £500 on a new Apple Mac Mini with 32GB RAM

mac mini internals

I purchased a new Apple Mac Mini recently and didn’t want to fall victim to Apple’s “RAM Tax”.

I used Apple’s site to configure a Mac Mini with a quad core processor, 32GB RAM, and a 512GB SSD.

I was shocked to see they added £600.00 to the price of a base model with 8GB RAM. They’re effectively charging all of this money for 24GB of extra RAM. This memory is nothing special, it’s pretty standard 2666MHz DDR4 SODIMM modules. The same stuff that is used in generic laptops.

I decided to cut back my order to the base model with 8GB of RAM. I ordered a Crucial 32GB Kit (2 x 16GB DDR4-2666 SODIMM modules running at 1.2 volts with a CAS latency of 19ns). This kit cost me just over £100.00 online.

The Crucial 2 x 16GB DDR4-2666 SODIMM kit

In total I saved around £500.00 for the trouble of about 30 minutes of work to open up the Mac Mini and replace the RAM modules myself.

The Teardown Process

Use the iFixit Guide

You can use my photos and brief explanations below if you would like to follow the steps I took to replace the RAM, but honestly, you’re better off following iFixit’s excellent guide here.

Follow along Here

If you want to compare or follow along in my format, then read on…

Get a good tool kit with hex screw drivers. I used iFixit’s basic kit.

iFixit basic tool kit

Flip the Mac Mini upside down.

Pry open the back cover, carefully with a plastic prying tool

Undo the 6 x hex screws on the metal plate under the black plastic cover. Be careful to remember the positions of these, as there are 2 x different types. 3 x short screws, and 3 x longer.

opening the mac mini

Very carefully, move the cover to the side, revealing the WiFi antenna connector. Unscrew the small hex screw holding the metal tab on the cable. Use a plastic levering tool to carefully pop the antenna connector off.

Next, unscrew 4 x screws that hold the blower fan to the exhaust port. You can see one of the screws in the photo below. Two of the screws are angled at a 45 degree orientation, so carefully undo those, and use tweezers to catch them as they come out.

Carefully lift the blower fan up, and disconnect it’s cable using a plastic pick or prying tool. The trick is to lift from underneat the back of the cable’s connector and it’ll pop off.

mac mini blower fan removal

Next, disconnect the main power cable at the top right of the photo below. This requires a little bit of wiggling to loosen and lift it as evenly as possible.

Now disconnect the LED cable (two pin). It’s very delicate, so do this as carefully as possible.

There are two main hex screws to remove from the motherboard central area now. You can see them removed below near the middle (where the brass/gold coloured rings are).

With everything disconnected, carefully push the inner motherboard and it’s tray out, using your thumbs on the fan’s exhaust port. You should ideally position your thumbs on the screw hole areas of the fan exhaust port. It’ll pop out, then just very carefully push it all the way out.

The RAM area is protected by a metal ‘cage’. Unscrew it’s 4 x hex screws and slowly lift the cage off the RAM retainer clips.

Carefully push the RAM module retainer clips to the side (they have a rubber grommet type covering over them), and the existing SODIMM modules will pop loose.

mac mini SODIMM RAM modules and slots

Remove the old modules and replace with your new ones. Make sure you align the modules in the correct orientation. The slots are keyed, so pay attention to that. Push them down toward the board once aligned and the retainer clips will snap shut and lock them in place.

Replace the RAM ‘cage’ with it’s 4 x hex screws.

Reverse the steps you took above to insert the motherboard tray back into the chassis and re-attach all the cables and connectors in the correct order.

Make sure you didn’t miss any screws or cables when reconnecting everything.

Finally boot up and enjoy your cheap RAM upgrade.

Ingest CloudWatch Logs to a Splunk HEC with Lambda and Serverless

cloudwatch logs to splunk HEC via Lambda

I recently came across a scenario requiring CloudWatch log ingestion to a private Splunk HEC (HTTP Event Collector).

The first and preferred method of ingesting CloudWatch Logs into Splunk is by using AWS Firehose. The problem here though is that Firehose only seems to support an endpoint that is open to the public.

This is a problem if you have a Splunk HEC that is only available inside of a VPC and there is no option to proxy public connections back to it.

The next thing I looked at was the Splunk AWS Lambda function template to ingest CloudWatch logs from Log Group events. I had a quick look and it seems pretty out of date, with synchronous functions and libraries in use.

So, I decided to put together a small AWS Lambda Serverless project to improve on what is currently out there.

You can find the code over on Github.

The new version has:

  • async / await, and for promised that wrap the synchronous libraries like zlib.
  • A module that handles identification of Log Group names based on a custom regex pattern. If events come from log groups that don’t match the naming convention, then they get rejected. The idea is that you can write another small function that auto-subscribes Log Groups.
  • Secrets Manager integration for loading the Splunk HEC token from Secrets Manager. (Or fall back to a simple environment variable if you like).
  • Serverless framework wrapper. Pass in your Security Group ID, Subnet IDs and tags, and let serverless CLI deploy the function for you.
  • Lambda VPC support by default. You should deploy this Lambda function in a VPC. You could change that, but my idea here is that most enterprises would be running their own internal Splunk inside of their corporate / VPC network. Change it by removing the VPC section in serverless.yml if you do happen to have a public facing Splunk.

You deploy it using Serverless framework, passing in your VPC details and a few other options for customisation.

serverless deploy --stage test \
  --iamRole arn:aws:iam::123456789012:role/lambda-vpc-execution-role \
  --securityGroupId sg-12345 \
  --privateSubnetA subnet-123 \
  --privateSubnetB subnet-456 \
  --privateSubnetC subnet-789 \
  --splunkHecUrl https://your-splunk-hec:8088/services/collector \
  --secretManagerItemName your/secretmanager/entry/here

Once configured, it’ll pick up any log events coming in from Log Groups you’ve ‘subscribed’ it to (Lambda CloudWatch Logs Triggers).

add your lambda CloudWatch logs triggers and enabled them for automatic ingestion of these to Splunk

These events get enriched with extra metadata defined in the function. The metadata is derived by default from the naming convention used in the CloudWatch Log Groups. Take a close look at the included Regex pattern to ensure you name your Log Groups appropriately. Finally, they’re sent to your Splunk HEC for ingestion.

For an automated Log Group ingestion story, write another small helper function that:

  • Looks for Log Groups that are not yet subscribed as CloudWatch Logs Triggers.
  • Adds them to your CloudWatch to Splunk HEC function as a trigger and enables it.

In the future I might add this ‘automatic trigger adding function’ to the Github repository, so stay tuned!