I recently created a small Web API 2 project running with a back-end SQL database (Entity Framework code first), and had it deployed to an Azure web app, along with Azure SQL.
Naturally, I started it off using the free web app and one of the cheapest possible Azure SQL tiers (S0 – 10 DTUs).
After I finished working on the API, I wanted to see what sort of performance I could get out of it, by using Azure’s various scaling options.
To test I used Loader.io. This is a really nice and easy to use load testing service by SendGrid Labs. The free edition allows me to setup various API endpoint tests and run many concurrent connections for up to 1 minute at a time.
All my tests below were done using the same GET request test. The request always returned a collection of 5 x objects from the /Animals endpoint to keep things consistent.
My initial test was against the F1 free app tier for the Web app, with the SQL database running on S0 (10 DTUs). Here are the results of sending 500 requests per second for 1 minute.
The API struggled to complete the full 60k requests over 1 minute, and only completed about 8k requests, with an average response time of 4638ms. Terrible, but then again we are running on very low performance, cheap tiers. I had a look at the database performance stats and noticed that the DTUs were capped out at 100% during the 1 minute load test. At this point it definitely seems to be the database performance holding things back.
Scaling the database up to the S1 tier (20 DTUs) gives a definite improvement in response times and number of requests able to be sent within one minute. If we look at the database performance stats in the portal, we can now see that the DTUs are still maxing out at 100% though.
At this point I decided I would increase database performance again, but throw more requests per second at the API (from 500/second up to 1000/second).
Scaling the database up to S2 (50 DTUs) and throwing more requests a second at the API, and the number of requests completed in total higher now – up by about an extra 5k. Taking a look at the DTU performance status, we can see they now maxed out at around 60%. At this point it is pretty clear that the database is no longer the bottleneck.
Now I scaled the web app tier up from free, to the B1 (Basic) tier, which gives you 1 Core, 1.75GB RAM, and up to 3 x instances scaled manually. I started with just the default 1 instance and ran the 1000 req/second for 1 minute test again.
The results were pretty dismal compared to the free tier now. In fact the test failed due to an error rate of greater than 50% (all caused by timeouts). It is important to remember that we have not yet scaled out from the default 1 instance though.
Scaling up to 2 x instances on the B1 tier, helped quite a bit. The test now completes, and has a much smaller timeout error rate. Many more responses were served, but the response rate was quite slow. Taking a look at the distribution of CPU time over the two instances, we can also see that the traffic is indeed being split between the two instances we’ve scaled out with.
Taking this one step further to 3 x instances, and re-running the test nets us the best result so far. No timeout errors, and a response time averaging around 3000ms. Much better, but still quite a high response time, and not all 60k requests are being served.
I scaled up to the B2 tier for the following run. Each instance has 2 x cores and 3.5GB RAM this time. Starting at 1 x instance and running the test on these higher specification web instances seems to now handle things a lot better.
Little to no timeout errors, with about 5000ms avg response time, but using only 1 x instance this time!
Pushing things right up to 3 x instances (2 cores and 3.5GB RAM each) nets us the best result yet. The average response time is down to 1700ms and there are no timeout errors at all. The API was able to handle 49000 requests in the 1 minute test, which is the highest number of requests it has been able to handle so far.
I scaled up to the B3 tier from here, and tried another few runs using 3 x instances (at 4 x cores and 7GB RAM each). This didn’t help things much, netting around 200ms better response time, for a much pricier tier. It therefore looks like the sweet spot for this kind of work is to scale out with medium sized instances (2 x cores each), rather than scaling up too much.
I changed the tier to S2 (2 x cores 3.5GB RAM each, but allowing up to 10 x instances scaled out) and this time, running the test gave very similar results to 3 x instances. Clearly, the instances were now no longer the bottleneck. Looking back at the database performance, I saw that the DTUs were maxing out at around 90%. It was clear that there must have been some throttling happening there now.
I changed the database DTUs to 100 using the S3 tier, and re-ran the test once more.
Bingo! We’re now managing to serve the test’s 1000 requests a second, and over the 1 minute test, we get all 60k requests served successfully, and have a reasonable average response time of roughly 300-400ms.
I made a quick change to the GET method in the API for this endpoint to gather items from the database asynchronously, and running the same test again, now gets us all the way down to an average response time of just 100ms over the 60k requests in one minute. Excellent!
As you can see, by running load tests like this, and trying out different scaling options for the front end and back end, logically scaling each whenever you see bottlenecks in test results or performance metrics, you can after some time determine the best specification for your database and web apps.