11

Reliability Week: Static Stability & Sizing for n

Video soon - recovering from the flu

We've covered the principles of multi-AZ and multi-region. Now we can plan the structure of deployment.

Let's take our sample application - the timesheet.

On our legacy platform, this runs on 10 webservers, and each webserver maxes out at 10 requests per second. Our Non-Functional Requirements (NFRs) are 100 requests/per second served at <1s per request execution time.

We can refer to this baseline capacity as 'n', and any additional overprovisioning as n+{1,2,3} to show that we build in redundancy in the event of a node failure. In the event of a facility failure, we have a secondary site to invoke DR - which incurs a downtime penalty of roughly half an hour.

Assuming your Cloud VMs map 1:1 for performance, so your target node count is still 10 - most will default to stripe 10(+2) across AZs, with 6 into each. This gives you resiliency against a single node failure; in the event of the loss of a facility, system automation can spin up 6 into another availability zone.

In this design we still incur the downtime while the replacement instances are spun up. During this downtime, the orignal instances may become overwhelmed and fail entirely.

To address this issue we can follow the patterns of static stability. This pattern designs for overprovisioning to meet the minimum units of scale to hit 100% of our load even if we lose one AZ. If we deploy across 2 AZs, we overprovision by 100% (10), across 3, we overprovision by 50% (~7-8). This maintains the same base level of capacity.

In mature operating environments with instrumentation and load testing, by determining the impact of fewer nodes on execution time - we may still be able to achieve our NFRs with a smaller environment.

This is a trade off between execution time and cost. If you find that 10 nodes gives an execution time of 0.6s, you can test 8 nodes or fewer to determine the relative impact on execution time until you find that sweet spot of resiliency and cost.